Data Will Work if it flows

How does data actually do work?

Data does work in a similar manner to water.

Water left alone can do very little.

Cleansed water can rejuvenate a tired body.

Channeled water can turn wheels to create energy for  transforming raw materials into usable materials.

Transforming water into steam can turn mighty turbines to create energy for a myriad of purposes depending on the enterprises goals.



Clean drinking water...not self-evident for ev...Image via Wikipedia

Water must be cleansed in order to be consumed. Impurities from the environment can contaminate the water making it unsuitable for a particular use.

Just as water needs to be cleansed before it can be used for a particular purpose, so to must data be cleansed to ensure that its usefulness can be guaranteed.

Data Cleansing is the act of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant etc. parts of the data and then replacing, modifying or deleting this dirty data.

For all data there is a standard of use. Is the data useful the way it was entered into the data capture system for the function that we are trying to perform? For example, do we need to look up any codes or convert timestamps from one timezone to another? If the data meets this standard, then it does not need cleansing. If it does not meet this standard then it does need cleansing. The standard of use is either explicit with rules around the processing of the data or implicit the data is simply accepted as it was entered.


Schematic diagram of an overshot water wheel.Image via Wikipedia
When a large body of water is channeled through a small channel the force of the volume of water moving through the channel moves obstacles out of the way.

When you take Big Data, and mine through the data to find a particular use case you are channeling the data to focus on a particular problem.


English: Ice Sculpture, Natural History Museum...
English: Ice Sculpture, Natural History Museum, London SW7 The lighting brings the ice sculpture to life. (Photo credit: Wikipedia)

When water is heated it becomes steam. When frozen it becomes ice.
Ice structures can be amazingly beautiful when created by the right artist.

Today's Enterprises, whether they be great or small, are all but ships on the great ocean of information. Each piece of data we encounter is as a drop in that ocean.

How do you make data work for you?

I originally wrote this article before the concept of a Data Lake became popular. As I am cleaning out my drafts as part of my challenge, I found this draft.

The concept of Data Lake, just like a man made reservoir, is to store the data until such time as you know you will need to use it. Proper Data Management best practices should be applied to the Data Lake, and when you need to understand a particular business question use some of the tools at your disposal, hopefully ones you have built, to filter the "water" in your data lake to answer your question.

By the way cleansing, transforming, and channeling is just another way to say use a Lambda function to apply to all of your data to get the answers you need.

And if you want to understand all of the "tributaries" coming into and leaving your Data Lake, might I suggest a Data Structure Graph? Keeping track of all of the data flows through an organization will become a tedious task if not done right from the beginning.

It's not magic.

Just follow best practices.

Enhanced by Zemanta

No comments:

Post a Comment