Archive for the ‘News’ Category

Are you famous yet? In another case of  “Schadenfreude“, the Panama Papers have placed a list of dignitaries in the public spotlight a year after the German newspaper Süddeutsche Zeitung received 2.6 terabytes of documents related to Mossack Fonseca from an anonymous source. This eclipses Wikileaks Cablegate 2010 (1.7 GB), Offshore Leaks 2013 (260 GB), Lux Leaks 2014 (4 GB), and Swiss Leaks 2015 (3.3 GB).

The Panama Papers comprises e-mails, PDF files, photos, and excerpts of an internal Mossack Fonseca database. It covers a period spanning from the 1970s to the spring of 2016 with data on some 214,000 companies. There is a folder for each shell firm that contains e-mails, contracts, transcripts, and scanned documents. The leak comprises 4,804,618 emails, 3,047,306 database format files, 2,154,264 PDFs, 1,117,026 images, 320,166 text files, and 2,242 files in other formats.

Meet Nuix, the Australian company that has the technology to make sense of all this data.

Congratulations to Pratap Ranade and Ryan Rowe as the web-scaping-as-a-service company which they co-founded (called Kimonolabs) has been acquired by Palantir.

Kimonolabs started as a Winter 2014 Y Combinator class startup. It recently raised USD5M in 2014, but this hasn’t help delaying their choice to shutter their doors for jobs at Palantir.  Pratap explained that the startup has not been able to have the impact it wanted within the two years from launch. So Kimonolabs falls too the wayside where many other web-scaping tools have gone leaving their 125K users in the lurch.

They have given 2 weeks notice to their users to migrate data and services from the platform. The last day is 29 Feb 2016. The absolute last day for API services is 31st March 2016. Your data will be purged and Palantir will not have access to it. If you depend on this service, you will probably be scrambling at this point for alternatives. I am sure that when you assess the risk for utilising a technology like Kimonolabs, you will consider the financial and resource stability of the company.

Here is a list of alternative web scraping tools and technologies. We also recommend utilising established SaaS ETL services as viable alternatives.

 

Periscope Data is a cloud-based business intelligence analytics and distribution platform. Periscope Data has taken the pain out of data loading by directly connecting to your data sources with no messy ETLs.

Periscope visualizes your data into charts, graphs and dashboards. All you need to do is to write SQL queries in Periscope and it returns charts and reports and dashboards that you can share or embed.

Periscope is licensed by the number of data rows you share with Periscope. You can have unlimited users. Your Periscope package includes Unlimited Charts, Unlimited Users, Dashboards, Unlimited Embedding and white-labeling, and Unlimited Support.

Pricing of packages start at $1,000 a month for up to 1 Billion rows of data and scale linearly from there. There is no annual commitment, you can pay month to month.

You can take advantage the Periscope caching tool at no additional cost. Caching reduces load on your database, results in faster performance and gives you the ability to upload csv’s and do cross database joins. Your query speeds will run 150x times faster with Periscope caching.

https://www.periscopedata.com/ http://wiki.glitchdata.com/index.php?title=Periscope_Data

If you haven’t heard of Yellowfin BI, it is a passionate startup focused on making Business Intelligence easy. Established in 2003, Yellowfin has been developed to satisfy a range of BI needs, from small businesses, to massive enterprise deployments and software vendors.

Yellowfin makes a Business Intelligence platform built ontop of Tomcat/Java that processes and presents information in refreshing detail. Its easy to assemble, and allows you to focus on building new business value rapidly. Yellowfin can be deployed on any server (cloud or on premise).

Yellowfin is the second Australian vendor to ever get in the Gartner Magic Quadrant.

Growing organically, it can barely be called a startup these days with >100 employees and offices in 4 different countries. Yellowfin is running a series of presentation of their technology in December. These are:

Melbourne – 1 Dec Sydney – 2 Dec Auckland – 3 Dec

Register for the event today!

Talend has started leveraging Apache Spark as part of its big data integration platform. Spark leverages the speedy in-memory execution capability to accelerate data ingestion. Migrating to Apache Spark can provide performance improvements from 5 to 100 times.

Talend promises to make the migration literally as simple as the push of a button with a new refactoring option that can automatically convert data pipelines written for MapReduce to Spark. MapReduce was the previous leader in high-performance data integration. That theoretically requires no changes to the high-level workflows that a user has defined for a cluster.

New projects also benefit from the upgrade, which brings some 100 pre-implemented data ingestion and integration functions that make it possible to pull data into Spark without having to do any programming. According to Talend, the result is an up to tenfold improvement in developer productivity.

There are a number of new Talend features, with the biggest additions being “masking” or also commonly known as Tokenisation. This allows an organization to replace a sensitive file with a structurally similar placeholder that doesn’t reveal any specific details. That’s useful in scenarios where, say, an analyst at a hospital that doesn’t have permission to view patient treatment history wants to check how many medical records there are in a given dataset coming into Spark.

 

To read or process the Ashley Madison data is fairly straight forward. The dataset comes with a suite of files. These are:

am_am.dump.gz aminno_member.dump.gz aminno_member_email.dump.gz member_details.dump.gz member_login.dump.gz CreditCardTransactions.7z README

Each of these files come with a PGP signature. You can use gunzip on a Mac (or unix platform) to extract the files. 7z files will require 7-Zip software on a Windows computer.

You will need MySQL software from Oracle to load this data. MySQL Community edition is free.

Top Cities by Users for Ashley Madison

Here’s are the Top 100 cities. It’s interesting that Singapore doesn’t feature on the list at all. The city-state had banned the site in the interest of the family. It looks like the ban worked. Sydney, New York and Toronto looks like a hotbed of infidelity.

São Paulo 374542 New York 268171 Sydney 251813 Toronto 222982 Santiago 218125 Melbourne 213847 Houston 186795 Los Angeles 181918 London 179129 Chicago 162444 Rio de Janeiro 156572 Madrid 135294 Bogotá 123559 Brisbane 118857 Brooklyn 110859 Miami 109505 Calgary 107021 San Antonio 99157 Dallas 97736 Brasília 97096 San Diego 94953 Perth 88754 Las Vegas 87720 Atlanta 86897 Philadelphia 86018 Edmonton 84971 Lima 82279 Phoenix 81913 Belo Horizonte 77834 香港 77561 Austin 77432 Columbus 73377 Montreal 72304 Washington 71779 Jacksonville 70134 Denver 70043 Mississauga 69403 Curitiba 68916 Barcelona 68513 Dublin 65658 Ciudad de México 64516 Orlando 63549 San Francisco 62333 Minneapolis 61403 灣仔 60674 Portland 60672 Charlotte 59686 Ottawa 58463 Seattle 56935 Indianapolis 56741 Buenos Aires 56701 Adelaide 55490 Tampa 55321 Cleveland 55031 Vancouver 52651 Fort Lauderdale 52554 Cincinnati 52055 Springfield 51644 Arlington 51345 Salvador 51069 San Jose 51043 Fort Worth 50976 Medellín 50308 Beverly Hills 49437 Bronx 49067 Boston 47951 Pittsburgh 47815 Kansas City 47793 Louisville 47239 Winnipeg 47202 Porto Alegre 47018 Saint Louis 46547 Richmond 46546 Buffalo 46532 North York 46223 Roma 46000 Johannesburg 45831 Sacramento 45777 Rochester 45216 Columbia 44541 Tucson 43293 Central 41900 Oklahoma City 41809 Salt Lake City 41773 El Paso 40914 Milwaukee 40392 Hamilton 40096 Cali 38847 Colorado Springs 38696 New Delhi 38620 London 38561 Brampton 38446 Madison 37813 Paris 37641 Saint Paul 37412 Cape Town 37001 Fortaleza 36922 Scarborough 35952 Albuquerque 35802 תל אביב יפו 35602

Yes, we have a copy of it. No, we’re not selling it. However, we’ll be putting our data analytics and data quality glasses on to see what lies within.

Several Australian cities featured prominently on the list of AM users. Singapore, which had banned the website was highly represented.

There is nothing special about Ashley Madison’s leak except that the brand attracts alot of negative emotions. They probably stepped on the toes on a capable geek. Reality is that nothing is safe on the internet, and transparency is your only defence.

Should Ashley Madison have done more to protect their data? Yes! The next simple step of basic data encryption should have been done. But it wasn’t.

Bye bye Ashley Madision. You’re not the first, and I am sure you won’t be the last. There is no defence against cheating spouses expect character, honesty, truth and love.

 

Bimodal IT refers to having two modes of IT, each designed to develop and deliver information- and technology-intensive services in its own way. Mode 1 is traditional, emphasizing scalability, efficiency, safety and accuracy. Mode 2 is non-sequential, emphasising agility and speed.

After fads of Agile Management, PRINCE2, ITIL, P3M3, Waterfall etc… which all remain relevant to project execution, today’s enterprise recognises that different parts of the organisation operates under different parameters.

Agile remains the high suitable for smaller teams with high performance goals. The Agile Director is an expert in agile project management.

 

Microsoft announced in Jan 2015 of a new browser for Windows 10. Code-named the Spartan project, it is designed to work on all devices across the Microsoft family from Laptops, Desktops, Phones, Tablets etc… Internet Explorer will still be distributed to support legacy enterprise applications.

Read more

The tech portal Gigaom which has been covering tech news since 2006 has closed citing “Financial difficulties”. It has literally run out of money and its financiers will now own the assets of the portal.

Bye Gigaom!