Pages

2010-11-23

Data never dies

Do you ever wish you could forget something?

Certainly there are traumatic events in some of our lives that we may wish that we could forget; more often than not most people wish they could remember something.

A taste, a name, the name of a restaurant, these are all things that some of us try very hard to remember at times.

For computer data management it is a different story. I have been in many discussions where we question how far back in time we are required to retain data.

By formal definition this is called the data retention policy of an organization. There are laws in place that require certain records to be retained for 7 years. Some data must be kept indefinitely.

This simple term: “Data Retention Policy” can have an impact on software purchases, hardware purchases, man-hours and processing time for an analytics application.

Why is my application taking so long?


As the volume of data within an application grows, the performance footprint of the application will change. Things that previously ran fast will begin to run slow. I once worked for an application vendor and handled more data management issues than software development issues. On one particular occasion shortly after I started there I received a call about the application not working. Upon review of the environment where “nothing had changed” I discovered the problem. The SQL Server database was severely underpowered. Simply manually executing the SQL directly through query analyzer showed dreadful performance.

We had to recommend an upgrade in hardware. Once the customer purchased new hardware I took a team to the customer site and we did a migration of the data from the old server to the new server. When I left the site, I heard numerous people talking about how the application had not worked that well since it had been freshly installed.

A simpler answer may have been to “archive” data, to clean it out so that the performance would have returned to a fresh state or even just delete it. The reason we could not do that is that this particular application was a time tracking system for recording time-in and time-outs of employees working at a refinery. Employee data is not something that should just be purged; especially data that directly impacts how contractors and employees are paid.

The data would be studied for some time to report on cost expenditures for the site where the work was recorded.

But simply upgrading the back end database server was really only a short term solution.
This is a case where we kept all of the historical data within the application itself for reporting and study.

Reporting systems can help


As a data warehouse engineer, now I would have suggested an alternative solution. I would have suggested that “warm” data should be moved to a reporting application for reporting and study.

A threshold should be established for what is useful within an application itself for data that is pertinent and needed on a daily and weekly basis. This is the “hot” fresh data that is currently being processed. The data that is important for business continuity and reporting to auditors, vendors other business units and executives does not necessarily need to be kept within the application itself. We should have spun off a reporting system that could be used to retain that data and allow study and reporting, but not bog down the application itself.

Building specific reporting systems is essential to maintain optimal application performance. By offloading this data into an Operational Data Store, Data Mart, or Data Warehouse you will keep your “hot” data hot and fresh and your “warm” data will be available for use in an environment that does not interfere in any way with the day to day work of your business.

How long do I keep data?


How long data is kept for each business unit within an organization is a question for each business owner. Law’s need to be examined for requirements, analysts need to make it clear how much data they need to see for trending analysis, and data management personnel need to contribute to the discussion by showing alternatives for data storage, retrieval and reporting.

Keep your corporate “memory” alive by having a current data retention policy that is reviewed every time a new application is brought online. Reviewing your data retention policy at least annually keeps this issue fresh in all of the stake-holders minds. Disaster recovery planning and performance optimization both are influenced by the data retention policy.

Since the data of your company is really all of the information about your customers, vendors, suppliers, employees and holdings data never dying is a good thing!

Enhanced by Zemanta