Clouds are gaining in popularity. The demand for data, analytics, and forecasting has grown significantly, and the future might belong to those who are able to predict it. However, to predict the future requires computing power – Lots of it. And cloud providers, hosting companies, startups, and big-technology companies are looking at providing this.

So what exactly is the cloud, and why will it provide the computing capabilities which has been dominated by super-computers over the last 2 decades. And why will cloud computing succeed where grid computing failed.

In May 17 1999, SETI@Home was release, and it gave the public a glimpse of how inter-connected computers could be leveraged to perform very large tasks. Grid technology was encapsulated in technologies like SunGrid and xGrid, but largely failed to gain traction. The internet was only starting to go mainstream, and computers were still expensive items.

A decade and a bit later in 2012, Cloud-computing is making headlines, and it seems that cloud-computing may succeed where grid computing failed. So what has changed since 1999?

Computers are cheaper The Internet is much faster VMWare and Virtualisation is making inroads into organisations Hosting and Infrastructure companies are virtualising Accessing virtual services like email, social media, SaaS is common place. Increased awareness of online computing via Amazon Web Services, SalesCloud, Azure.

So will super-computing be replaced? Will there be reduced demand in running parallel jobs on multiple computing nodes? NO. There is significantly increased demand in running computer-intensive and parallel jobs. However the way in how a super-computer might be implemented will change. Instead of proprietary platforms, super-computer will evolve to open-platforms and be built on the cloud. The proprietary bits of super-computing will be the charging mechanisms for the utility.

Will grid computing be replaced? Grid computing will fade away. Grid computing addresses the same type of distributed super-computing that cloud computing would replace. The traditional super-computer might still serve a purpose for tightly-coupled applications which are difficult to distribute to the cloud or grid.

Consumers are not interested in a technology, but rather what they can do with it. In Cloud computing, this becomes more apparent with products like:

Database processing Running an algorithm Getting an answer

 

 

OSDC11 was launched today in Canberra. With a small team of volunteers, the conference has managed to pull together some 250 participants, sponsors, and talented speakers for the 8th year running.

See photos here.

#gallery-1 { margin: auto; } #gallery-1 .gallery-item { float: left; margin-top: 10px; text-align: center; width: 33%; } #gallery-1 img { border: 2px solid #cfcfcf; } #gallery-1 .gallery-caption { margin-left: 0; } /* see gallery_shortcode() in wp-includes/media.php */ OSDC Banner DoD Intelligence & Security Palantir Folks Youngest Geek Geek Grin Ubuntu Swag Google Folks Pascal

Recently, I’ve had the experience of purchasing and attempting to get a NAS device working on my network. The Seagate Black Armor was recommended. My experience with the device unfortunately has not been rewarding.

The Seagate Black Armor has been noted to support up to 4x3TB drives. That is a total of 12TB.

Upon opening the box, setting up the device, and inserting the 4x 3TB Seagate harddrives, the box ran smoothly and the lights lit up in green with “Black Armor” turning up on the LCD screen. The tiny instruction booklet with brief instructions said that the drives should take 8-9 hours to be prepared before the words “Black Armor” turned up on the LCD. It turned up in less than 10 mins.

So maybe it was I had been playing around with the harddisks. I had attempted to get it to work on a Redhat workstation and used fdisk/parted to manipulate the drives partitions.

After querying the LCD screen for the IP address the NAS device had adopted, I open a web browser to view/configure the NAS device. Access to the NAS device via a browser was positive.

A simple login and TADA, the drives were viewable. However the NAS device has not properly accessed the 4x3TB devices. They were just sitting there and creating access volumes was a challenge

Seagates’ advice was to reset the Black Armor device. This is done by clicking on the pinhole at the back of the device. The NAS device lights would turn amber, and the device would finally reboot.

Git has emerged a strong contender for SCM. Previous commonly used SCM are:

CVS SVN Mercurial Bazaar Perforce SourceSafe

Git was another brain-child of Linus Tovalds, and was created to support linux development. Previously, linux development used xxxx. (mercurial).

The features that make Git exceptional are:

Branching Local Repository Distributed SCM Fast Small Footprint Staging Area Workflow – caters for various types of development WF. Easy to Learn Lots of Tools

Here are some updates on VMware. Also sometimes called “The Cloud”, it’s the current fad in IT infrastructure.

1) Virtualisation is going to happen whether we like it or not

This was driven initially by under-utilized servers, but ease of management and configuration has taken over as the leading reasons for virtualisation. Currently only 30% of organisational server infrastructure is in the virtualised environment. If an organisation doesn’t reach 80% virtualised, it doesn’t gets the efficiency benefits of virtual infrastructure, but ends up with large overheads managing both virtual infrastructure and traditional infrastructure. VMware hopes to push this to 50% in 2011-12 The issues with adoption are confidence levels in application-infrastructure interoperability, and security. VMware has notoriously low security, and is itself a gateway to accessing the entire virtualised infrastructure. (Search Google for “vmware hack”)

2) Overheads

Virtualisation comes with overheads. If installing Vista, or Windows 7 was not enough, virtualisation can help by adding 10-20% overhead to CPU usage. VMs also generate alot more network traffic.

3) Configuration

VM configuration is going to be crucial as “the server” as it is spread over a VM, SAN storage, and network “bus”, and actual physical locations. So when we have slow VMs, it could be the result of alot of different factors now. A clone of VM for failover/failback scenarios can also generate alot of network traffic. So virtualisation increases network overheads.

4) The Virtual Desktop

VMWare hopes to bring back thin-client computing with virtualised downloadable profiles from VM infrastructure. Personally, I think this is a shot in the dark, as the PC-era is gone, and computing is already transitioning to the fragmented plethora of thin-clients (eg mobile devices, ipads, netbooks) with profiles stored in SaaS applications. The benefits of centralized profiles is supposedly in data security, however, with SaaS, fragmenting application, platforms, I doubt the virtual desktop will make it to the enterprise before iDesktops.

5) Capability

VMWare ESX 5 now supports upto 32 cores, and upto 1TB RAM per VM. These are called the “Big VMs” (or “Monster VMs” if you were a VMware sales person) that VMware has now released. This may support the more computationally intensive applications, but only if the virtual infrastructure has been upgraded.

6) Visibility

From an application development point-of-view, understanding the performance and capability of an application in the virtual infrastructure is less transparent as performance issues are less transparent. (eg. is a network, or disk bottleneck? or over-utilisation of the CPU?) Processor CPU utility within a Windows/Unix VM is not an accurate reflection of the actual processing capability available to your application. So VM infrastructure performance statistics needs to be actively shared (in real-time) with application teams. Using SPEC CPU benchmarking tools is another way to measure application-infrastructure performance. However let’s hope for an open environment with open information sharing.

7) Super-Computing / Grid Computing

Although there has not been any noted implementation of supercomputing in VM infrastructure, there are no reasons why this is not possible. Grid Computing, and maybe some aspects of super-computing is probably possible on VM infrastructure with the appropriate HPC software in place.

8) The Carbon Footprint

The Carbon Footprint is now the new driver for VM infrastructure. Running un-optimised / under-utilitzed servers kills the environment. If electricity prices go up by 30% in the next 2-5 years, what will organisations have to do to mitigate that?

Orange is a component based machine learning library for Python developed at Laboratory of Artificial Intelligence, Faculty of Computer and Information Science, University of Ljubljana, Slovenia.

We can compare Orange to the Trident Platform from Microsoft. The only difference is that its open source and works better.

Orange is free software; you can redistribute it and/or modify i under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or Orange is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

The Agile Director <a href=”http://theagiledirector.com/content/4-things-twitter-can-give-business-intelligence” target=”_blank”>recently commented</a> on using Social Media feeds as a form of data to give organisations insight through Business Intelligence initiatives formed on social media. This is very true. If companies realise that their businesses are built on their customers,  all their internal systems should align accordingly. This is applicable to retail, property, media,  communications, telcos, etc.., and the end-results are forward thinking, pro-active, customer-centric organisations. <div>

The Data Chasm represents the gap between those who realise this paradigm. It’s as fundamental as the <a href=”http://www.catb.org/~esr/writings/homesteading/” target=”_blank”>manifesto </a>of “<a href=”http://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar” target=”_blank”>The Cathedral and the Bazaar</a>”.

Data – A large portion of the corporate future will be driven by those who have it, and those who don’t. Then its driven by those who know what to do with it, and those who don’t.

The gap between the haves and have nots is growing, where even governments, and corporations fall under the have nots.

Open data is the way forward to close the chasm. Supplying data alone  is only the first step. As in economics, banking, media, supply chain,  logistics, there are eco-systems of data analysts that churn out  information. But yes, the common denominator across all these diverse  industries is digital media. That is the key to bridge the data chasm.

</div> </div> </div>

Born in 1781, <a href=”http://en.wikipedia.org/wiki/Charles_Joseph_Minard” target=”_blank”>Charles Joseph Minard</a> is noted for his “inventions” in the infomation visualisation. Some of his visualisation include: <ul> <li style=”text-align: left;”>The progress if Napoleon’s Army vs Distance vs Temperature in the Russian Campaign of 1812</li> <li style=”text-align: left;”>The Origin of Cattle destined for Paris</li> </ul> Charles was trained as a civil engineer. <a href=”http://cartographia.wordpress.com/” target=”_blank”>Cartographia</a> has a good list of Minard’s work.

One of the biggest problems of delivering value in a business intelligence project is providing insight around a dataset. Delivering insight about any particular dataset is not about successfully processing the data in question and analysing it. In today business intelligence (BI) world, the expectations are alot higher. Valuable insight is derived from co-relating a particular dataset with sometimes a very different abstract perspective/dataset.

An Example

You have a dataset on radiation levels. (thanks to fallout from nuclear powerstations). A very quick and common question that demands immediate answers would be “What is the impact of increased radiation?”. That is a very broad question, and even with skillful narrowing of the scope of the question, this question still needs to be answered. Even the basic remaining key perspectives on the question may be:

Effect on population? Effect within a radius of 100km? Effect on transportation within 100km? Effect on travel? Effect on tourism? Effect on agriculture?

All these questions will require the custodians of co-related datasets to make their data available. The negotiations to acquire the data would probably take time. Followed by the data modeling, loading and analysis. The final outcomes would still be achieved, but under the strain of time and effort.

We can reduce some of this time by having open data, and configured data. Consider plug and play data. Consider being able to draw data from established datasets with minimal processing, and be able to derive results quickly. This is where Glitchdata would advocate data by convention.

 

 

The OSI Model has been around for several decades now. It remains especially relevant when extending the concepts of n-tiered application design. The application layer of the OSI model, can be expanded into:

The App Presentation Layer The App Web Services Layer The App Business Logic Layer The App Database Layer

As database systems have evolved rapidly over the last decade, we see database systems providing features like foreign key enforcement, indexing, view, triggers, data transformation, fulltext indexing, spatial capabilities, and more.

The problem here that databases start getting bloated, and they no longer focus on the key value that they provide. Data storage and retrieval.

So it stands to reason why Amazons Web Services have offered SimpleDB has its key database offering for Cloud services. Of course they also offer other relational database services.

So why does Amazons prefer SimpleDB? Scalability, and lower costs/GB of data stored.