Pages

2016-01-05

Data Warehouse Maturity

Not all Data Warehouse projects are created equal.


In my years of working in the data management profession, I have worked with a variety of organizations. In the last ten years or so I have worked on many projects that were all called data warehouses. Unfortunately, I can't say that any two have been alike.

Data Warehouse Overview
Data Warehouse Overview (Photo credit: Wikipedia)
Based on my experience, I can say that of all of the organizations I have worked with each fell into at least one of these categories for Data Warehouse maturity. Some organizations I have been able to take from one level to another, but it takes persistence and support from management. 



Level 0 – No formal business intelligence capabilities within the organization  

Reports are ad-hoc feeding data into local client tools managed by analysts.  Multiple people do the same work for both data extraction and report creation.  There is little cross department usage of data.  In addition, change control is not a huge priority since the majority of the analysis is ad-hoc.  The source code used to develop the reports and data transferred is not managed under a revision control system.  Queries are changed easily between runs making historical analysis difficult if not impossible. 

Level 1 – Using add-on tools supplied by vendors

This would include “built-in” data marts provided by vendors for data analytics of a particular application.  This could include, for example, SAP BW, which is a dedicated data mart solution for SAP transactions.  Even though SAP may be rather large, it is nevertheless a single data source with a predefined baseline data mart. 

Level 2 – Departmental data mart(s) with a front-end Business Intelligence tool or an Operational data store feeding common reports.  At this level, dedicated equipment for reporting and analysis is used. 

English: Datamart Architecture Pattern used Bu...
English: Datamart Architecture Pattern used Business Intelligence Reference Architecture used in conjuction with Operational Data Store and Datawarehouse. Supports Reporting , Analytics , Scorecards and Dashboard (Photo credit: Wikipedia)
Data from multiple systems integrated into a common back-end repository.  From a Kimball perspective, this is the foundation of the conformed dimension usage.  The Operational data store can be a copy of the data used by applications, but it can also be an extract of key data from the operational system to be able to provide KPI’s in a single location for multiple departments.

At this level of Data Warehouse Maturity, enough analytical and reporting queries are happening that the load needs to shift to dedicated equipment in order to reduce the load on the production systems.  Analysts will want to be looking at various scenarios that could negatively affect the native use of production applications so offloading this data and the analytical queries onto dedicated equipment becomes a priority for the organization.

This level of maturity requires, at a minimum, some dedicated technical resources to manage the infrastructure of the BI tool(s) and the job processing that controls the movement of data through the various stages that will ultimately make the data available to the BI tool.


This may not be the sole function of these personnel, but organic personnel should be managing this infrastructure.


Cross department usage of data begins here as analysts from different departments use data sourced from one or more systems into a common back-end repository.



Level 3 – ETL tool driven multiple data marts with a front-end business intelligence tool sharing sourced data with meta-data driven from ETL and data modeling tools

At this point in the growth of an organization, the volume of data moving between systems becomes large enough that sophisticated and dedicated tools becomes  a requirement to expedite the data transfer.

 Level 4 – Sophisticated “staging” area using industry best-practices (DW2.0)

A  DW2.0 integration sector called a “staging” area using Kimball terminology. The Presentation layer is generally what Kimball would call a data mart or basic dimensional model.


Level 5 –Multiple Data warehouse environments. (Large international organizations

This would be a really large organization, or one that had grown through the process of acquisitions. In the case of acquisitions, a Data Structure Graph should probably be used to keep analyze how data flows through an organization.


Why does it matter which level an organization is?  How can someone use these levels to affect their business bottom line?

Knowing how fare your organization is in this maturity process will help as you add other infrastructure like Big Data, or other systems. Also to go from one level to another is more of an incremental growth step. Each step takes a lot longer to go. From Level 0 to Level 1 takes a few weeks. From Level 1 to Level 2 takes a few months. Level 2 to 3 takes about 6 months. Level 3 to Level 4 may take up to a year to get everything complete. Level 5 is usually ongoing. Once your organization gets to Level 3 the data warehouse should be at critical mass where more and more people migrate to using it and other systems should be discontinued.


1 comment: