Author Archive

Google has announced the open source release of TensorFlow . This is their second-generation machine learning system building on work done in the DistBelief project. TensorFlow is general, flexible, portable, easy-to-use, and completely open source. TensorFlow is twice as fast as DistBelief.

To understand what is possible, Google’s internal deep learning infrastructure DistBelief, developed in 2011, has allowed Googlers to build ever larger neural networks and scale training to thousands of cores in our datacenters. It has been used to demonstrate that concepts like “cat”can be learned from unlabeled YouTube images, to improve speech recognition in the Google app by 25%, and to build image search in Google Photos. DistBelief also trained the Inception model that won Imagenet’s Large Scale Visual Recognition Challenge in 2014, and drove our experiments in automated image captioning as well as DeepDream.

While DistBelief was very successful, it had some limitations. It was narrowly targeted to neural networks, it was difficult to configure, and it was tightly coupled to Google’s internal infrastructure — making it nearly impossible to share research code externally.

Tensorflow is build on Python as is alot of Google infrastructure. You can download the libraries/package and run it within your own python applications. Get started today!

Read more

Talend has started leveraging Apache Spark as part of its big data integration platform. Spark leverages the speedy in-memory execution capability to accelerate data ingestion. Migrating to Apache Spark can provide performance improvements from 5 to 100 times.

Talend promises to make the migration literally as simple as the push of a button with a new refactoring option that can automatically convert data pipelines written for MapReduce to Spark. MapReduce was the previous leader in high-performance data integration. That theoretically requires no changes to the high-level workflows that a user has defined for a cluster.

New projects also benefit from the upgrade, which brings some 100 pre-implemented data ingestion and integration functions that make it possible to pull data into Spark without having to do any programming. According to Talend, the result is an up to tenfold improvement in developer productivity.

There are a number of new Talend features, with the biggest additions being “masking” or also commonly known as Tokenisation. This allows an organization to replace a sensitive file with a structurally similar placeholder that doesn’t reveal any specific details. That’s useful in scenarios where, say, an analyst at a hospital that doesn’t have permission to view patient treatment history wants to check how many medical records there are in a given dataset coming into Spark.


Here are some examples of correlations made that make little sense.

Cat owners drive craft beer (in USA) (r=0.70) Shorter commutes for people who work for men (r=0.71) Obese kids like listening to “Purple Rain” by Prince. (r=0.77) Watch porn if you want make more money. (r=0.54) Walmart shoppers own dogs and drink wine (r=0.76, r=0.79) Norwegian oil and Train Accidents USA Space spending and Suicides by Hanging/Stranglation/Suffocation Age of Miss America and Murders by steam/hot vapours/hot objects Eating Organic food causes Autism

See more, more, more and more ….


To read or process the Ashley Madison data is fairly straight forward. The dataset comes with a suite of files. These are:

am_am.dump.gz aminno_member.dump.gz aminno_member_email.dump.gz member_details.dump.gz member_login.dump.gz CreditCardTransactions.7z README

Each of these files come with a PGP signature. You can use gunzip on a Mac (or unix platform) to extract the files. 7z files will require 7-Zip software on a Windows computer.

You will need MySQL software from Oracle to load this data. MySQL Community edition is free.

Top Cities by Users for Ashley Madison

Here’s are the Top 100 cities. It’s interesting that Singapore doesn’t feature on the list at all. The city-state had banned the site in the interest of the family. It looks like the ban worked. Sydney, New York and Toronto looks like a hotbed of infidelity.

São Paulo 374542 New York 268171 Sydney 251813 Toronto 222982 Santiago 218125 Melbourne 213847 Houston 186795 Los Angeles 181918 London 179129 Chicago 162444 Rio de Janeiro 156572 Madrid 135294 Bogotá 123559 Brisbane 118857 Brooklyn 110859 Miami 109505 Calgary 107021 San Antonio 99157 Dallas 97736 Brasília 97096 San Diego 94953 Perth 88754 Las Vegas 87720 Atlanta 86897 Philadelphia 86018 Edmonton 84971 Lima 82279 Phoenix 81913 Belo Horizonte 77834 香港 77561 Austin 77432 Columbus 73377 Montreal 72304 Washington 71779 Jacksonville 70134 Denver 70043 Mississauga 69403 Curitiba 68916 Barcelona 68513 Dublin 65658 Ciudad de México 64516 Orlando 63549 San Francisco 62333 Minneapolis 61403 灣仔 60674 Portland 60672 Charlotte 59686 Ottawa 58463 Seattle 56935 Indianapolis 56741 Buenos Aires 56701 Adelaide 55490 Tampa 55321 Cleveland 55031 Vancouver 52651 Fort Lauderdale 52554 Cincinnati 52055 Springfield 51644 Arlington 51345 Salvador 51069 San Jose 51043 Fort Worth 50976 Medellín 50308 Beverly Hills 49437 Bronx 49067 Boston 47951 Pittsburgh 47815 Kansas City 47793 Louisville 47239 Winnipeg 47202 Porto Alegre 47018 Saint Louis 46547 Richmond 46546 Buffalo 46532 North York 46223 Roma 46000 Johannesburg 45831 Sacramento 45777 Rochester 45216 Columbia 44541 Tucson 43293 Central 41900 Oklahoma City 41809 Salt Lake City 41773 El Paso 40914 Milwaukee 40392 Hamilton 40096 Cali 38847 Colorado Springs 38696 New Delhi 38620 London 38561 Brampton 38446 Madison 37813 Paris 37641 Saint Paul 37412 Cape Town 37001 Fortaleza 36922 Scarborough 35952 Albuquerque 35802 תל אביב יפו 35602

Yes, we have a copy of it. No, we’re not selling it. However, we’ll be putting our data analytics and data quality glasses on to see what lies within.

Several Australian cities featured prominently on the list of AM users. Singapore, which had banned the website was highly represented.

There is nothing special about Ashley Madison’s leak except that the brand attracts alot of negative emotions. They probably stepped on the toes on a capable geek. Reality is that nothing is safe on the internet, and transparency is your only defence.

Should Ashley Madison have done more to protect their data? Yes! The next simple step of basic data encryption should have been done. But it wasn’t.

Bye bye Ashley Madision. You’re not the first, and I am sure you won’t be the last. There is no defence against cheating spouses expect character, honesty, truth and love.


Oracle 12c was released in Jul, 2014. It packs a number of new features which bring its up to date with utilising significant improvements in hardware, virtualisation and storage technology.

1. Pluggable Databases Through Database Consolidation

The cloud has driven Oracle to address the problem of Multi-tenancy in Oracle 12c. The core database architecture has introduced Container Databases also called (CBD) and Pluggable Databases (PDB). Memory and processes are now owned by the Container Database. A container holds the metadata where the PDBs hold the user data. You can create upto 253 PDBs including the seed PDB.

In a larger Oracle setup, it is common to see 20 or 30 different instances running in production environment. This can create a maintenance nightmare as all these instances have to be separately

Upgraded Patched Monitored Tuned RAC Enabled Adjusted Backed up and Data Guarded.

Pluggable Databases feature allows you to do all this in ONE single instance. This is a significant efficiency improvement for DBAs.

2. Redaction Policy

Data Redaction helps you to mask data. You can setup a Data Redaction policy, for example SSN field in a Employee table can be masked. This is called redaction.

From Sql Develop you can do this by going to the table: Employee->Right click on Security Policy->click on New->click on Redaction Policy->Enter SSN. When you do a select * from employee, it will show that the SSN is masked. The new data masking will use a package called DBMS_REDACT. It is the extension to the FGAC and VPD present in earlier versions. By doing this, whoever needs to view the data will be able to see it where as the other users will not be able to view it.

3. Adaptive Query Optimization and Online Stats Gathering:

With this feature, it helps the optimizer to make runtime adjustments to execution plan which leads to better stats. For statements like CTAS (Create Table As Select) and IAS (Insert As Select), the stats is gathered online so that it is available immediately.

4. Restore a Table easily through RMAN:

Earlier if you had to restore a particular table, you had to do all sorts of things like restoring a tablespace and or do Export and Import. The new restore command in RMAN simplifies this task.

5. Size Limit on Varchar2, NVarchar2, Raw Data Types increased:

The previous limit on these data types was 4K. In 12C, it has been increased to 32,767 bytes. Upto 4K, the data is stored inline. I am sure everyone will be happy with this small and cute enhancement.

6. Inline PL/SQL Functions and Procedures:

The in line feature is extended in Oracle 12C. In addition to Views, we can now have PL/SQL Procedures and Functions as in line constructs. The query can be written as if it is calling a real stored procedure, but however the functions do not actually exist in the database. You will not be able to find them in ALL_OBJECTS. I think this will be a very good feature for the developers to explore as there is no code that needs to be compiled.

7. Generated as Identity/Sequence Replacement:

You can now create a col with ‘generated as identity’ clause. Thats it. Doing this is equivalent to creating a separate sequence and doing a sequence.nextval for each row. This is another handy and a neat feature which will help developer community. This is also called No Sequence Auto Increment Primary Key.

8. Multiple Indexes on a Single Column:

Prior to 12C, a column cant be in more than one index. In 12C, you can include a column in B-tree index as well as a Bit Map index. But, please note that only one index is usable at a given time.

9. Online Migration of Table Partition or Sub Partition:

You can very easily migrate a partition or sub partition from one tablespace to another. Similar to how the online migration was achieved for a non-partitioned table in prior releases, a table partition or sub partition can be moved to another tablespace online or offline. When an ONLINE clause is specified, all DML operations can be performed without any interruption on the partition|sub-partition which is involved in the procedure. In contrast, no DML operations are allowed if the partition|sub-partition is moved offline.

10. In Database Archiving:

This feature enables archiving rows within a table by marking them as inactive. These inactive rows are in the database and can be optimized using compression but are not visible to the application. These records are skipped during FTS (Full Table Scan).

Other Features:

Other Oracle features are:

Advanced Replication and Streams is deprecated. Oracle Golden Gate (a separate product) can provide this functionality. Invisible Columns. You can now have a invisible column in a table. When a column is defined as invisible, the column won’t appear in generic queries


You’ve heard of bitcoin. Its built using the blockchain. This is a distributed database that keeps track of every bitcoin transaction. Bitcoin popularity may have diminished, but the importance of the blockchain as truly revolutionary technology remains. 

The Internet’s key design objective was to survive nuclear attack. The need for resilience is what led to the distributed design pattern. But certain core features, like the IP address space and Domain Name Servers (DNS) systems were never decentralised as the solutions had yet to be defined.

Even early peer-to-peer products (like Skype and Napster) relied on managed super-nodes or centralised directories.

The blockchain provides a method of distributing trust and facilitating decentralised systems.

This excellent IEEE Spectrum essay explores how the blockchain could inveigle its way to improving core internet services from payment transactions, to online communities, to video games, to preventing spam email and more.

Bimodal IT refers to having two modes of IT, each designed to develop and deliver information- and technology-intensive services in its own way. Mode 1 is traditional, emphasizing scalability, efficiency, safety and accuracy. Mode 2 is non-sequential, emphasising agility and speed.

After fads of Agile Management, PRINCE2, ITIL, P3M3, Waterfall etc… which all remain relevant to project execution, today’s enterprise recognises that different parts of the organisation operates under different parameters.

Agile remains the high suitable for smaller teams with high performance goals. The Agile Director is an expert in agile project management.


Linode opens its newest datacenter in Singapore! This is Linode’s seventh datacenter, and is purpose-built to serve the already huge and growing markets in Southeast Asia, India, Australia, and surrounding regions.

Linode’s Singapore network is powered by Cisco ASR 9000-series routers, and currently blends connectivity from Telstra/Pacnet and PCCW, along with direct peering into the Equinix Internet Exchange (EIE) – providing you with access to hundreds of peering opportunities. Check out our Speedtest page to test latency and download speeds.

Singapore supports all of the standard Linode features available in all of our datacenters – like 40 Gbps redundant connectivity to each hypervisor host machine, the Linode Backup service, NodeBalancers, native IPv6, etc – and is the same simple pricing as in other Linode datacenters.


Microsoft announced in Jan 2015 of a new browser for Windows 10. Code-named the Spartan project, it is designed to work on all devices across the Microsoft family from Laptops, Desktops, Phones, Tablets etc… Internet Explorer will still be distributed to support legacy enterprise applications.

Read more