Are you famous yet? In another case of  “Schadenfreude“, the Panama Papers have placed a list of dignitaries in the public spotlight a year after the German newspaper Süddeutsche Zeitung received 2.6 terabytes of documents related to Mossack Fonseca from an anonymous source. This eclipses Wikileaks Cablegate 2010 (1.7 GB), Offshore Leaks 2013 (260 GB), Lux Leaks 2014 (4 GB), and Swiss Leaks 2015 (3.3 GB).

The Panama Papers comprises e-mails, PDF files, photos, and excerpts of an internal Mossack Fonseca database. It covers a period spanning from the 1970s to the spring of 2016 with data on some 214,000 companies. There is a folder for each shell firm that contains e-mails, contracts, transcripts, and scanned documents. The leak comprises 4,804,618 emails, 3,047,306 database format files, 2,154,264 PDFs, 1,117,026 images, 320,166 text files, and 2,242 files in other formats.

Meet Nuix, the Australian company that has the technology to make sense of all this data.

Congratulations to Pratap Ranade and Ryan Rowe as the web-scaping-as-a-service company which they co-founded (called Kimonolabs) has been acquired by Palantir.

Kimonolabs started as a Winter 2014 Y Combinator class startup. It recently raised USD5M in 2014, but this hasn’t help delaying their choice to shutter their doors for jobs at Palantir.  Pratap explained that the startup has not been able to have the impact it wanted within the two years from launch. So Kimonolabs falls too the wayside where many other web-scaping tools have gone leaving their 125K users in the lurch.

They have given 2 weeks notice to their users to migrate data and services from the platform. The last day is 29 Feb 2016. The absolute last day for API services is 31st March 2016. Your data will be purged and Palantir will not have access to it. If you depend on this service, you will probably be scrambling at this point for alternatives. I am sure that when you assess the risk for utilising a technology like Kimonolabs, you will consider the financial and resource stability of the company.

Here is a list of alternative web scraping tools and technologies. We also recommend utilising established SaaS ETL services as viable alternatives.

 

Meet CAPSICUM Business Architects, a company founded by CEO Terry Roach (Australia) and focused on understanding your business. The approach is to turn it into a digital model using semantics and a custom-built business modelling platform “Jalapeno” thus enabling, facilitating and quantifying change.

This is indeed revolutionary, and CAPSICUM leads where many have failed to map the evolving enterprise.

The Jalapeno tool is a semantic modelling tool leveraging tuples and RDF to describe an enterprise, and it stores this information in a database. It models the organisation from a top-down perspective, maybe a business-centric perspective, or from existing standards. Of course this tool is really as good as its modellers.

Why do I say this is revolutionary? The reason for this, is that the bulk of enterprise architects continually struggle to map the existing enterprise working from the ground up, and often are unable to plan the future state effectively.

Maybe the approach to enterprise architecture has been reactive for most of the time, and largely unable to meet the speed of changing business scenarios. Maybe Business Architecture has a better chance, but maybe Jalapeno has the design to be truely revolutionary.

It’s probably best to map an organisation at the strategic business level, where benefits of change are being considered, and mapping that through the organisation measuring CAPEX, OPEX, actors, structure and cost of change. Gap architecture really.

This all sounds very familiar as an idealistic state of well governed enterprise architecture or business architecture practices, and I’ll definitely be happy to see the Jalapeno platform make further progress.

 

Data visualization is the art of presenting often complex datasets in a visually engaging way. 

David McCandless, in his TED 2010 talk discussed that sight has by far the fastest and biggest bandwidth of any of the five senses. The eye takes in ~80% of our information. View his talk.

 

Periscope Data is a cloud-based business intelligence analytics and distribution platform. Periscope Data has taken the pain out of data loading by directly connecting to your data sources with no messy ETLs.

Periscope visualizes your data into charts, graphs and dashboards. All you need to do is to write SQL queries in Periscope and it returns charts and reports and dashboards that you can share or embed.

Periscope is licensed by the number of data rows you share with Periscope. You can have unlimited users. Your Periscope package includes Unlimited Charts, Unlimited Users, Dashboards, Unlimited Embedding and white-labeling, and Unlimited Support.

Pricing of packages start at $1,000 a month for up to 1 Billion rows of data and scale linearly from there. There is no annual commitment, you can pay month to month.

You can take advantage the Periscope caching tool at no additional cost. Caching reduces load on your database, results in faster performance and gives you the ability to upload csv’s and do cross database joins. Your query speeds will run 150x times faster with Periscope caching.

https://www.periscopedata.com/ http://wiki.glitchdata.com/index.php?title=Periscope_Data

GoodData is a company that has been on the data scene since 2007. Founded by Roman Stanek the former CEO of NetBeans, and Systinet, GoodData seems to be in good hands. Roman Stanek sold NetBeans previous to Sun Microsystems in 1999, and Systinet to HP 2006.

GoodData has raised USD53.5M in venture funding from the likes of Andreessen Horowitz, Tim O’Reilly, AlphaTech Ventures, General Catalyst Partners, Windcrest Partners, Intel Capital and TOTVS. It employs 291 staff across 5 offices in Prague, Brno, San Francisco, Portland, and Boston.

GoodData has a joint venture with Chris Gartlan to grow an APAC presence. Based in Melbourne-Australia, GoodData APAC has a team of 10 staff focused on growing the business.

So what is the GoodData value-proposition? It’s simply a fully managed cloud-based business intelligence platform. GoodData does it end-to-end taking on the capital costs of building data-warehouses and data-marts and providing speed and agility in delivering results.

These results are actionable insights which under traditional data integration would cost anywhere from 7x to 15x. So whether you run lean OPEX or CAPEX, this solution can be tailored to your requirements.

Agility comes in the form of a managed solution. Business Units can now independently build datamarts, and visualise data. This is where cloud-based BI performs.

So what are GoodData’s strengths with such a broad focus across a very big data chain? Customer-focus seems to be the key. Even with a fully out-of-the box solution, GoodData is agile enough to custom-fit various parts of the datachain. This could be data integration, data storage systems, to visualisation components.

 

Outsourced cloud-based BI is the new spin on the disk.

 

 

 

If you haven’t heard of Yellowfin BI, it is a passionate startup focused on making Business Intelligence easy. Established in 2003, Yellowfin has been developed to satisfy a range of BI needs, from small businesses, to massive enterprise deployments and software vendors.

Yellowfin makes a Business Intelligence platform built ontop of Tomcat/Java that processes and presents information in refreshing detail. Its easy to assemble, and allows you to focus on building new business value rapidly. Yellowfin can be deployed on any server (cloud or on premise).

Yellowfin is the second Australian vendor to ever get in the Gartner Magic Quadrant.

Growing organically, it can barely be called a startup these days with >100 employees and offices in 4 different countries. Yellowfin is running a series of presentation of their technology in December. These are:

Melbourne – 1 Dec Sydney – 2 Dec Auckland – 3 Dec

Register for the event today!

Google has announced the open source release of TensorFlow . This is their second-generation machine learning system building on work done in the DistBelief project. TensorFlow is general, flexible, portable, easy-to-use, and completely open source. TensorFlow is twice as fast as DistBelief.

To understand what is possible, Google’s internal deep learning infrastructure DistBelief, developed in 2011, has allowed Googlers to build ever larger neural networks and scale training to thousands of cores in our datacenters. It has been used to demonstrate that concepts like “cat”can be learned from unlabeled YouTube images, to improve speech recognition in the Google app by 25%, and to build image search in Google Photos. DistBelief also trained the Inception model that won Imagenet’s Large Scale Visual Recognition Challenge in 2014, and drove our experiments in automated image captioning as well as DeepDream.

While DistBelief was very successful, it had some limitations. It was narrowly targeted to neural networks, it was difficult to configure, and it was tightly coupled to Google’s internal infrastructure — making it nearly impossible to share research code externally.

Tensorflow is build on Python as is alot of Google infrastructure. You can download the libraries/package and run it within your own python applications. Get started today!

Read more

Talend has started leveraging Apache Spark as part of its big data integration platform. Spark leverages the speedy in-memory execution capability to accelerate data ingestion. Migrating to Apache Spark can provide performance improvements from 5 to 100 times.

Talend promises to make the migration literally as simple as the push of a button with a new refactoring option that can automatically convert data pipelines written for MapReduce to Spark. MapReduce was the previous leader in high-performance data integration. That theoretically requires no changes to the high-level workflows that a user has defined for a cluster.

New projects also benefit from the upgrade, which brings some 100 pre-implemented data ingestion and integration functions that make it possible to pull data into Spark without having to do any programming. According to Talend, the result is an up to tenfold improvement in developer productivity.

There are a number of new Talend features, with the biggest additions being “masking” or also commonly known as Tokenisation. This allows an organization to replace a sensitive file with a structurally similar placeholder that doesn’t reveal any specific details. That’s useful in scenarios where, say, an analyst at a hospital that doesn’t have permission to view patient treatment history wants to check how many medical records there are in a given dataset coming into Spark.

 

Here are some examples of correlations made that make little sense.

Cat owners drive craft beer (in USA) (r=0.70) Shorter commutes for people who work for men (r=0.71) Obese kids like listening to “Purple Rain” by Prince. (r=0.77) Watch porn if you want make more money. (r=0.54) Walmart shoppers own dogs and drink wine (r=0.76, r=0.79) Norwegian oil and Train Accidents USA Space spending and Suicides by Hanging/Stranglation/Suffocation Age of Miss America and Murders by steam/hot vapours/hot objects Eating Organic food causes Autism

See more, more, more and more ….