Data Convention over Configuration

Apr 11, 2011


In:Data, Storage

No comments

One of the biggest problems of delivering value in a business intelligence project is providing insight around a dataset. Delivering insight about any particular dataset is not about successfully processing the data in question and analysing it. In today business intelligence (BI) world, the expectations are alot higher. Valuable insight is derived from co-relating a particular dataset with sometimes a very different abstract perspective/dataset.

An Example

You have a dataset on radiation levels. (thanks to fallout from nuclear powerstations). A very quick and common question that demands immediate answers would be “What is the impact of increased radiation?”. That is a very broad question, and even with skillful narrowing of the scope of the question, this question still needs to be answered. Even the basic remaining key perspectives on the question may be:

  • Effect on population?
  • Effect within a radius of 100km?
  • Effect on transportation within 100km?
  • Effect on travel?
  • Effect on tourism?
  • Effect on agriculture?

All these questions will require the custodians of co-related datasets to make their data available. The negotiations to acquire the data would probably take time. Followed by the data modeling, loading and analysis. The final outcomes would still be achieved, but under the strain of time and effort.

We can reduce some of this time by having open data, and configured data. Consider plug and play data. Consider being able to draw data from established datasets with minimal processing, and be able to derive results quickly. This is where Glitchdata would advocate data by convention.