Archive for the ‘Infrastructure’ Category

Bitcoin is a crypto-currency which leverages the Blockchain. Similar to a real currency, Bitcoin is hailed as a revolutionary technology for storing value. It’s popularity stems from distrust in real world currencies, the un-regulated printing of currencies like the US Dollar.

Bitcoin is governed by “developers” and infrastructure. Crypto Miners who facilitate the transactions within the Bitcoin ecosystem have a shared consensus to regulate and evolve the Bitcoin economy. Bitcoin has governed parameters which include a limit on quantity, blocksize and others.

Today, the Bitcoin community seeks to evolve Bitcoin with agreed changes. Segwit2x, also known as the New York Agreement, is an industry-wide compromise that CEO and founder of Digital Currency Group Barry Silbert spearheaded in May to activate the Segregated Witness (Segwit) scaling upgrade for Bitcoin. Key mining pools and exchanges that agreed to the aforementioned plan include Bitmain’s Antpool, Btc.top, Bixin, Btcc Pool, F2pool, Huobi, Okcoin, Viabtc, BW, 1Hash, Canoe, Batpool, and Bitkan.

The event and announcement closely follow Bitmain’s release of its hard fork protection plan against UASF BIP148, which CEO Jihan Wu has described as an attack on Bitcoin. He spoke at the Summit on June 14 about how to prevent BIP148 from activating, outlining its weaknesses.

China owns ~80% of Bitcoin mining infrastructure and typically plays a dominant role in the future of Bitcoin.

Read more here.

In 2007, a team of Google engineers needed more accurate time for servers. Time is especially useful for synchronising data, and especially transactional data. Technologies like Cassandra depend on accurate time between database servers to be able to reconstruct the order-of-events on a database. The end goal is to be sure about the “State-of-Data”.

NTP (network time protocol) is what unix servers and internet machines use to synchronise time. Due to network delays or processing delays, a computer’s time can easily get out of sync with its peers. The margin of error has been minor, and the demand for high accuracy has been crucial. But for a large computing company like Google, keeping thousands of system accurate was important for them to create “Spanner“.

Spanner is the new time keeping platform that Google has constructed using GPS and an atomic clock. Fortunately the distances we have to cover is at most the span of the earth. I am sure the folks in NASA and other space faring agencies will have to consider time differences spanning larger quantums of space.

This is all in the effort to maintain the “State-of-Data”.

 

 

Talend, a 10-year old software company that specialises in open-sourced data management tools with a subscription-based premium model, raised $86+ million in an initial public offering (IPO). The lead underwriters include Goldman Sachs, J.P. Morgan, Barclays, and Citigroup.

The company said it issued 5.25 million American Depositary Shares at a price of $18 per share, above the $15-$17 range it originally declared, thus raising $94.5 million.

In 2014, Talend CEO Mike Tuchen said that the company could go public “sometime in the next couple of years.” The company trades on NASDAQ under the symbol TLND.

Talend’s competitive edge lies in the value of its data integration products which cost a fraction of tools sold by Informatica, Tibco, and enterprise software vendors like IBM, Microsoft, Oracle, and SAP. Talend customers include AOL, Citi, GE Healthcare, Groupon, Lenovo, Orange, Sky, and Sony.

Last year, Talend generated a total revenue of $76 million. Its subscription revenue grew 39% year over year, representing $62.7 million of the total. The company isn’t profitable: it reported a net loss of $22 million for 2015. In the first quarter of this year, Talend produced a $5.2 million loss on $22.7 million in revenue, up 33.5 percent year over year. For that quarter, 84 percent of the revenue derived from subscriptions; the rest resulted from professional services.

Talend started in 2005 and is headquartered in Redwood City, California. The company had 566 employees as of March 31. Investors include Bpifrance, Iris Capital, Silver Lake Sumeru, Balderton Capital, and Idinvest Partners.

The company offers cloud and on-premises versions of its software, which supports the Hadoop open-source big data software and is based on the open-source Apache Camel.

Estonia is a small country bordering Russia, Latvia and Finland. It boasts of an advanced information management platform for government.

This platform is the X-Road platform which is an invisible but crucial backbone for data transactions between the various e-services databases in the public and private sectors. X-Road facilitates harmonious interoperability.

Estonia’s data stores are de-centralised meaning:

There is no single owner / controller Every government agency or business can choose the right products suitable for them Services are added one at a time, as they are ready

All Estonian services that use multiple data stores use X-Road as a central connection between these data stores. All outgoing data from X-Road is digitally signed and encrypted. All incoming data is authenticated and logged.

X-Road was a system built to facilitate multi-data store queries, but has evolved to also facilitate multi-data store writes, and transmit large datasets. It was also designed for growth and currently supports:

287 million queries (2013) Connects 170 database in Estonia Provides 2000 services in Estonia Connects 900 organisations daily Supports >50% of Estonians who use the government portal Eesti.ee

Services provided via X-Road include:

Electronic Registration of residency Updating personal data (like address, exam results, health insurance etc…) Declare taxes electronically Check driving license validity Check for registered vehicles Registering newborn children for health insurance

Estonia showcases its e-society here. To transform its society into a community of digital governance and tech-savvy individuals, children as young as 7 are taught the principles and basics of coding.

Estonians are driven, forward-thinking and entrepreneurial, and the same goes for the government. It takes only five minutes to register a company there and, according to The Economist, the country in 2013 held the world record for the number of startups per person. And it’s not quantity over quality: Many Estonian startups are now successful companies that you may recognize, such as Skype, Transferwise, Pipedrive, Cloutex, Click & Grow, GrabCAD, Erply, Fortumo, Lingvist and others.

If all this sounds enticing and you wish to become an entrepreneur there, you’re in luck; starting a business in Estonia is easy, and you can do it without packing your bags, thanks to its e-residency service, a transnational digital identity available to anyone. An e-resident can not only establish a company in Estonia through the Internet, but they can also have access to other online services that have been available to Estonians for over a decade. This includes e-banking and remote money transfers, declaring Estonian taxes online, digitally signing and verifying contracts and documents, and much more.

E-residents are issued a smart ID card, a legal equivalent to handwritten signatures and face-to-face identification in Estonia and worldwide. The cards themselves are protected by 2048-bit encryption, and the signature/ID functionality is provided by two security certificates stored on the card’s microchip.

But great innovations don’t stop there. Blockchain, the principle behind bitcoin that also secures the integrity of e-residency data, will be used to provide unparalleled safety to 1 million Estonian health records. The blockchain will be used to register any and all changes, illicit or otherwise, done to the health records, protecting their authenticity and effectively eliminating any abuse of the data therein.

There are many lessons we can learn from Estonia. To increases efficiency and maturity of services, a country needs to be willing to adapt and evolve infrastructure to the needs to the new economy. These include transparency, precise and equitable delivery of services to the community.

Congratulations to Pratap Ranade and Ryan Rowe as the web-scaping-as-a-service company which they co-founded (called Kimonolabs) has been acquired by Palantir.

Kimonolabs started as a Winter 2014 Y Combinator class startup. It recently raised USD5M in 2014, but this hasn’t help delaying their choice to shutter their doors for jobs at Palantir.  Pratap explained that the startup has not been able to have the impact it wanted within the two years from launch. So Kimonolabs falls too the wayside where many other web-scaping tools have gone leaving their 125K users in the lurch.

They have given 2 weeks notice to their users to migrate data and services from the platform. The last day is 29 Feb 2016. The absolute last day for API services is 31st March 2016. Your data will be purged and Palantir will not have access to it. If you depend on this service, you will probably be scrambling at this point for alternatives. I am sure that when you assess the risk for utilising a technology like Kimonolabs, you will consider the financial and resource stability of the company.

Here is a list of alternative web scraping tools and technologies. We also recommend utilising established SaaS ETL services as viable alternatives.

 

GoodData is a company that has been on the data scene since 2007. Founded by Roman Stanek the former CEO of NetBeans, and Systinet, GoodData seems to be in good hands. Roman Stanek sold NetBeans previous to Sun Microsystems in 1999, and Systinet to HP 2006.

GoodData has raised USD53.5M in venture funding from the likes of Andreessen Horowitz, Tim O’Reilly, AlphaTech Ventures, General Catalyst Partners, Windcrest Partners, Intel Capital and TOTVS. It employs 291 staff across 5 offices in Prague, Brno, San Francisco, Portland, and Boston.

GoodData has a joint venture with Chris Gartlan to grow an APAC presence. Based in Melbourne-Australia, GoodData APAC has a team of 10 staff focused on growing the business.

So what is the GoodData value-proposition? It’s simply a fully managed cloud-based business intelligence platform. GoodData does it end-to-end taking on the capital costs of building data-warehouses and data-marts and providing speed and agility in delivering results.

These results are actionable insights which under traditional data integration would cost anywhere from 7x to 15x. So whether you run lean OPEX or CAPEX, this solution can be tailored to your requirements.

Agility comes in the form of a managed solution. Business Units can now independently build datamarts, and visualise data. This is where cloud-based BI performs.

So what are GoodData’s strengths with such a broad focus across a very big data chain? Customer-focus seems to be the key. Even with a fully out-of-the box solution, GoodData is agile enough to custom-fit various parts of the datachain. This could be data integration, data storage systems, to visualisation components.

 

Outsourced cloud-based BI is the new spin on the disk.

 

 

 

Google has announced the open source release of TensorFlow . This is their second-generation machine learning system building on work done in the DistBelief project. TensorFlow is general, flexible, portable, easy-to-use, and completely open source. TensorFlow is twice as fast as DistBelief.

To understand what is possible, Google’s internal deep learning infrastructure DistBelief, developed in 2011, has allowed Googlers to build ever larger neural networks and scale training to thousands of cores in our datacenters. It has been used to demonstrate that concepts like “cat”can be learned from unlabeled YouTube images, to improve speech recognition in the Google app by 25%, and to build image search in Google Photos. DistBelief also trained the Inception model that won Imagenet’s Large Scale Visual Recognition Challenge in 2014, and drove our experiments in automated image captioning as well as DeepDream.

While DistBelief was very successful, it had some limitations. It was narrowly targeted to neural networks, it was difficult to configure, and it was tightly coupled to Google’s internal infrastructure — making it nearly impossible to share research code externally.

Tensorflow is build on Python as is alot of Google infrastructure. You can download the libraries/package and run it within your own python applications. Get started today!

Read more

Talend has started leveraging Apache Spark as part of its big data integration platform. Spark leverages the speedy in-memory execution capability to accelerate data ingestion. Migrating to Apache Spark can provide performance improvements from 5 to 100 times.

Talend promises to make the migration literally as simple as the push of a button with a new refactoring option that can automatically convert data pipelines written for MapReduce to Spark. MapReduce was the previous leader in high-performance data integration. That theoretically requires no changes to the high-level workflows that a user has defined for a cluster.

New projects also benefit from the upgrade, which brings some 100 pre-implemented data ingestion and integration functions that make it possible to pull data into Spark without having to do any programming. According to Talend, the result is an up to tenfold improvement in developer productivity.

There are a number of new Talend features, with the biggest additions being “masking” or also commonly known as Tokenisation. This allows an organization to replace a sensitive file with a structurally similar placeholder that doesn’t reveal any specific details. That’s useful in scenarios where, say, an analyst at a hospital that doesn’t have permission to view patient treatment history wants to check how many medical records there are in a given dataset coming into Spark.

 

You’ve heard of bitcoin. Its built using the blockchain. This is a distributed database that keeps track of every bitcoin transaction. Bitcoin popularity may have diminished, but the importance of the blockchain as truly revolutionary technology remains. 

The Internet’s key design objective was to survive nuclear attack. The need for resilience is what led to the distributed design pattern. But certain core features, like the IP address space and Domain Name Servers (DNS) systems were never decentralised as the solutions had yet to be defined.

Even early peer-to-peer products (like Skype and Napster) relied on managed super-nodes or centralised directories.

The blockchain provides a method of distributing trust and facilitating decentralised systems.

This excellent IEEE Spectrum essay explores how the blockchain could inveigle its way to improving core internet services from payment transactions, to online communities, to video games, to preventing spam email and more.

Linode opens its newest datacenter in Singapore! This is Linode’s seventh datacenter, and is purpose-built to serve the already huge and growing markets in Southeast Asia, India, Australia, and surrounding regions.

Linode’s Singapore network is powered by Cisco ASR 9000-series routers, and currently blends connectivity from Telstra/Pacnet and PCCW, along with direct peering into the Equinix Internet Exchange (EIE) – providing you with access to hundreds of peering opportunities. Check out our Speedtest page to test latency and download speeds.

Singapore supports all of the standard Linode features available in all of our datacenters – like 40 Gbps redundant connectivity to each hypervisor host machine, the Linode Backup service, NodeBalancers, native IPv6, etc – and is the same simple pricing as in other Linode datacenters.