Pages

2016-01-23

Melvil Dewey - The first Data Architect

English: From left to right: R. R. Bowker, Mrs...
English: From left to right: R. R. Bowker, Mrs. Dewey and Melvil Dewey (Photo credit: Wikipedia)

What is a data Architect?


Let us start with understanding a basic definition of an application architect.

java Architects know the ins and outs of Java, the versions, the capabilities, the tools, the memory requirements, etc...

A Data Architect on the other hand has a detailed understanding of the impact of how data is organized.

All of the traditional normalization rules to which we owe Ted Codd , not to mention all of the denormalized usage patterns that Ralph Kimball wrote about.

Staging areas, ETL processes, the impact of the various RAID levels on performance. Which Data center which data should be copied to, how to do sharding on a relational system.

Basically, Data Architects do not write the original software that produces the data, but they know how the data should be organized in that application, as well as what to do with the data once it leaves the original application repository for use-cases the application developers did not envision.

Melvil Dewey was not an author.

However, he did more to allow people to have access to books, and be able to find books that were related to topics they were interested in than any other person.

Today with our Data Lakes, Data Warehouses, Spark Clusters, Hadoop Clusters, Data Marts, Data Scientists, and Data Analysts all trying to pull data together, organize it, channel it, and transform this raw data into business value.

We should remember the simple approach that Dewey took.

Know where your data is located.

Organize the data to make it easy for others to both use and find.

Categorize your data in a manner that makes sense to the most amount of people.

His method is not perfect, and some improvement has been made in the way data is organized, cataloged and searched for. But his methods stood the test of time for many years until later inventions were able to use newer methods to find and access data.

Will your architecture stand for more than a hundred years?

Will it survive the next CIO that takes over?

How much thought is given to the organization of the data within your organization, and how the various needs of different systems are met?

Do you use a Data Structure Graph  to keep track of the importance of the various data feeds that are the life-blood of your organization?

How do you organize your data?










No comments:

Post a Comment