Archive for April 2011 | Monthly archive page

Born in 1781, <a href=”http://en.wikipedia.org/wiki/Charles_Joseph_Minard” target=”_blank”>Charles Joseph Minard</a> is noted for his “inventions” in the infomation visualisation. Some of his visualisation include: <ul> <li style=”text-align: left;”>The progress if Napoleon’s Army vs Distance vs Temperature in the Russian Campaign of 1812</li> <li style=”text-align: left;”>The Origin of Cattle destined for Paris</li> </ul> Charles was trained as a civil engineer. <a href=”http://cartographia.wordpress.com/” target=”_blank”>Cartographia</a> has a good list of Minard’s work.

One of the biggest problems of delivering value in a business intelligence project is providing insight around a dataset. Delivering insight about any particular dataset is not about successfully processing the data in question and analysing it. In today business intelligence (BI) world, the expectations are alot higher. Valuable insight is derived from co-relating a particular dataset with sometimes a very different abstract perspective/dataset.

An Example

You have a dataset on radiation levels. (thanks to fallout from nuclear powerstations). A very quick and common question that demands immediate answers would be “What is the impact of increased radiation?”. That is a very broad question, and even with skillful narrowing of the scope of the question, this question still needs to be answered. Even the basic remaining key perspectives on the question may be:

Effect on population? Effect within a radius of 100km? Effect on transportation within 100km? Effect on travel? Effect on tourism? Effect on agriculture?

All these questions will require the custodians of co-related datasets to make their data available. The negotiations to acquire the data would probably take time. Followed by the data modeling, loading and analysis. The final outcomes would still be achieved, but under the strain of time and effort.

We can reduce some of this time by having open data, and configured data. Consider plug and play data. Consider being able to draw data from established datasets with minimal processing, and be able to derive results quickly. This is where Glitchdata would advocate data by convention.

 

 

The OSI Model has been around for several decades now. It remains especially relevant when extending the concepts of n-tiered application design. The application layer of the OSI model, can be expanded into:

The App Presentation Layer The App Web Services Layer The App Business Logic Layer The App Database Layer

As database systems have evolved rapidly over the last decade, we see database systems providing features like foreign key enforcement, indexing, view, triggers, data transformation, fulltext indexing, spatial capabilities, and more.

The problem here that databases start getting bloated, and they no longer focus on the key value that they provide. Data storage and retrieval.

So it stands to reason why Amazons Web Services have offered SimpleDB has its key database offering for Cloud services. Of course they also offer other relational database services.

So why does Amazons prefer SimpleDB? Scalability, and lower costs/GB of data stored.

 

 

Data Warehousing (DW) is a common term used business intelligence (BI) projects and systems. The data warehouse has traditionally been the overhead, a large storeroom which aggregated and staged data from multiple sources into at single point. Analytics could then be conducted on this, and provide valuable insights for management.

Now, the problem with the data warehouse is that its huge, and expensive. The processes to populate the data warehouse consume large computing resources, and the outcomes after a lengthy project might be inaccurate or off-focus.

Within modern applications, and data analytics, we should consider analytics as part of an application’s design, performing smaller analytics projects on smaller datasets before engaging in larger ones. We should also consider incremental processing of data by actively managing data state in a similar way in which we manage application states.

This fits well with the Agile methodology.

So just like abandoned warehouse along the rivers and docks of modern cities, data warehouses will be abandoned with JIT Analytics, Agile BI, and better application designs.