The-Data-Guy: 2011

2011-08-15

Steps to successful adoption of a new data warehouse

What is taking so long to get the data warehouse ready?

In a new deployment of a data warehouse there are many infrastructure components that have to be put in place. Modeling tools, ETL Servers, ETL processes, BI Servers, and Bi interfaces and finally reports and dashboards. Not to mention sessions for user interviews, business process review and metadata capture.

I say server(s) because there should be dev/test and prod platforms for each of these.

Figure 3-4: how data models deliver benefit

Image via WikipediaA recent article at Information-management.com talks about data modeling taking too much time if done correctly.

Add all of these things together and you have a significant period of time to wait before seeing a benefit to a Data Warehouse/Business Intelligence project.

Here are some suggestions to reassure the stakeholders early on during the project lifecycle.

Give them data early and often.

Put together a small and simple data model for the first pass. Load the small star schema with a subset of the data relevant to a group of business users, then create some reports or give some power users access to create their own reports.

This shows the concept of continuity. A Continuity test in electronics is the checking of an electrical circuit to see if current flows, or that it is a complete circuit.

Show the data quality issues

"A problem well stated is a problem half solved" Without seeing data quality issues, the people that enter data into the system of record can not fix it.

Get and give feedback often

As soon as people start using the "prototype", you will get feedback. Use this as an opportunity to explain why the process should take longer. It also identifies gaps in understanding among the team. Once people have a hands-on view of the presentation layer they will try a number of things.

They will use it to answer questions they already have answers to. Thus validating the transformation processes.

They will also start to try to answer questions they may not have asked before. This is the best opportunity for learning more about how the data is being used.

These steps lay the foundation for making data work for you and your business.

2011-08-01

3 Great Reasons to Build a Data Warehouse

Why should you build a Data Warehouse?

What problems do a Data Warehouse and Business Intelligence platform solve?

There are strong debates about the methods chosen for building a data warehouse, or choosing a business
intelligence tool.

Image via Wikipedia

Here are three great reasons for building a data warehouse.

Make more money

The initial cost of building a data warehouse can appear to be large. However, what is the cost in time for the people that are analyzing the data without a data warehouse. Ultimately each department, analyst or business unit is going through a similar process of getting data, putting it in a usable format, and storing it for reporting purposes(ETL). After going through this process they have to create reports, prepare presentations and perform analysis. The immediate time savings benefit comes to these folks who do not have to worry about finding the data once the data warehouse platform is built.

The following two points also allow you to make more money.

Make better decisions

In order to better know your customers, you must first better understand what they want from you.Once the people that spend most of their time analyzing the data do not have to spend so much time finding the data and focus their time on reviewing the data and making recommendations, the speed of decision making will increase. As better decisions are made, more decisions can be made faster. This increases agility, improves response time to the customer or environment, and intensifies decision making processes.

Once a decision making platform is built you can better see which type of customer is purchasing what type of product. This allows the marketing department to advertise to those types of customers. The merchandising department can ensure products are available when they are wanted. Purchasing can better anticipate getting raw materials so products are available. Inventory can best be managed when you are able to anticipate orders, shortages, and re-orders.

Make lasting impressions.

Customer service is improved when you better understand your customer. When you can recommend to your customers other products that they may like you become a partner to your customer. Amazon does an amazing job of this. Their recommendation engine is closely tied to their historical data, and pattern matching of which products are similar. Likewise, you may want to tell a customer that they may not want something that they want to purchase because a better solution is available. This makes a lasting impression on them that you are the one to help them in their decision making process.

Make data work

Building a data warehouse platform is one of the best ways to make data work for you, rather than you have to work for your data.

2011-07-25

Datagraphy or Datalogy?

What is the study of data management best practices?

Do data management professionals study Datagraphy, or Datalogy?

A few of the things that a data management professional studies and applies are

Tools

Data Modeling tools
ETL tools
Database Management tools

Procedures

Bus Matrix development
User session facilitation
Project feedback and tracking

Methodologies

Data Normalization
Dimensional Modeling
Data Architecture approaches

These, among many others, are applied to the needs of the business. Our application of these best practices make our enterprises more successful.

What should be the suffix of the word that sums up our body of knowledge?

Both "-graphy" and "logy" make sense, but let's look at these suffixes and their meaning.

-graphy

The wiki page for "-graphy" says: -graphy is the study, art, practice or occupation of...

The dictionary entry for "-graphy" says -"a process or form of drawing, writing, representing, recording, describing, etc., or an art or science concerned with such a process"

-logy

The wiki page for "-logy" says -logy is the study of ( a subject or body of knowledge).

The dictionary entry for "-logy" says: a combining form used in the names of sciences or bodies of knowledge.

Data

The key word that we all focus on is data.

In a previous blog entry, I wrote a review of the DAMA-DMBOK which is the Data Management Association Data Management Body Of Knowledge.

Data Management professionals study and contribute to this body of knowledge. As a data guy, I am inclined to study to works of those who have gone before. I want to both learn from their successes and avoid solutions that have been unsuccessful.

Some of the writings I study are by people like: Dan Linstedt, Len Silverston, Bill Inmon, Ralph Kimball, Karen Lopez, William Mcknight and many others.

I have seen first hand what happens to a project when expertise from the body of knowledge produced by these professionals has been discarded. It is not pretty.

Why do I study these particular authors? These folks share their experiences. When I face an intricate problem, I research some of their writings to see what they have done. Some tidbit of expertise they have written about has shed light on many problem I have faced, helping me to find the solution that much sooner.

When I follow their expertise my solutions may still be unique, but the solutions fit into patterns that have already been faced. I am standing on the shoulders of giants when I heed their advice.

When I am forced to ignore their advice, I struggle, fight and do battle with problems that either should not be solved or certainly not be solved in the manner in which I am forced to solve them.

Should the study of and contribution to the body of knowledge of data management be called data-graphy or data-logy?

Datagraphy

The term Datagraphy sums up the study of the data management body of knowledge succintly.

I refer back to the dictionary definition of the suffix "-graphy": "a process or form of drawing, writing, representing, recording, describing, etc., or an art or science concerned with such a process"

Data is recorded, described, written down,written about, represented (in many ways) and used as a source for many drawings and graphical representations.

What do you think? I will certainly be using Datagraphy.

2011-07-23

Data is killing us!

Are you drowning in Data?

You have a number of applications collecting various pieces of data in order to run your business. What do you have to do in order for an analyst to make an informed decision?

For the majority of your business operations, dashboards should show current activity. Thresholds can be established for when a particular event takes place and alerts sent automatically. Simulations can be run based on past performance to gauge or even predict the performance of what-if scenarios.

All of these things can be done, the question is: Are they being done?

Image via Wikipedia

Are there so many copies of your application databases, that the cost of servers, disk arrays and storage going through the roof?

Are multiple people required to keep track of which backups and restores are done on a nightly basis driving personnel costs up?

Are business analysts spending more time collecting data than understanding, interpreting and making recommendations, reducing efficiency?

There is a better way.

A person who studies the practices of data management and the applicability of the various data management tools, procedures or methodologies to the needs of the business can make a difference in the use of an organizations data.

This difference can be measured in many ways. It could be an increase in revenue because a relationship was found in the data that could not have been seen before a new business intelligence system was deployed. It could be cost savings of physical equipment.

More often it is the saving of personnel time associated with gathering data just to answer questions.

Some proponents of vendor solutions will suggest that they have all of the answers to your data needs. Perhaps some vendors do have solutions. However, bringing in a vendor solution will not relieve an organization of the responsibility of data management.

The best way to work with vendors is to get them to fully understand all of the pain points associated with your data. No single vendor can solve all problems. Smart people with a vested interest in making your company successful will help you management your data.

Proliferation of data makes an organization stronger. If data is killing you, then you need someone to tame the beast and make data work for you.

Make your data work for you, rather than you work for your data.

Who are the people that will make your data work for you? A database administrator is a good start, many I have spoken to have plenty of ideas for how to make things better.

A data architect is the best start. Data Architects are the people that have studied data management best practices. A great Data Architect can quickly come to an understanding of your pain points and make recommendations that can be done soon to make sure that data works for you.

2011-05-27

Analytical Skills?

One particular skill listed on many job descriptions I have seen gives me more questions than answers.

The skill they would like the candidate to have is listed as : Analytical Skills

How is this quantified? How do you know that someone has analytical skills?

Wiki defines Analysis as the ability to break down a complex topic or substance into smaller parts to gain a better understanding of it.

To what level of detail does this particular job need Analytical Skills? Will the candidate need to analyze the Enterprise accounting functions? Does it need to be the analysis of the physical server infrastructure? Does it need to be organizational layout? Will the candidate be analyzing forensic evidence?

Analytical skill is such a broad topic, and once a person does their analysis what happens to it? Is this a position that has the ability to not just analyze data, but also act on it? Is the analysis required in the position for recommendation purposes, educational purposes, or will the candidate be making decisions based on data provided by others?

Many people have analytical skills, but do they have good analytical skills? Do decision makers listen to and follow their recommendations after an analysis is done?

Data Management professionals are constantly analyzing data. The raw data can represent many diverse topics. Some of the key topics for analysis that come to mind are People, Processes and Things.

People

The analysis of people, their motivations, and their interactions is covered by subjects like anthropology, psychology, sociology and other behavioral sciences. Some people are naturally gifted people readers and can understand others with limited formal training. The analysis of people is useful to many groups within an organization, human resources, marketing, sales, even executive leadership.

Process

There are many types of processes in our lives, a process for getting a drivers license, getting married, fulfilling a product order, shipping a product, and many others. Understanding and recommending improvements to the nature of the processes that we interact with on a daily basis can be very valuable.

Things

Things can be companies, human languages, computer languages, web pages, corporate ledgers, computers, cars, religions, money, inventories, nature. Every "thing" can be studied and analyzed. The more we understand things the more data we generate about those things in order to contribute to human knowledge, self knowledge or our corporate enterprise.

Data

All things that are analyzed have one thing in common. Data. In all analysis data is what is collected, stored, manipulated, reported, recommended and decided upon.

There are best-practices for data management that can assist every type of analysis of every subject. "Pure" analytical skill is seldom used in a vacuum. Databases store data, operational systems collect data, a data warehouse helps in the correlation of data amongst multiple systems that are gathering data. Data Management by its very nature is an analytical skill.

When I see a position for a data management professional that requires analytical skills, I still find it humorous. Because the analytical skills that we can provide in both our own analysis and in guiding the analysis of others should go without saying.

2011-03-02

Data Management Industry?

How important is an industry classification for data management professionals?

I have been asked the question: What is your industry?

My reply, when given the option, is to say the Data Management Industry.

The particular vertical market that my company classifies itself according to Dun and Bradstreet Industry classification, Standard Industry Classification (SIC) or even North American Industry Classification System (NAICS) has a limited impact on my day to day duties.

Some industries have a more stringent requirement for data quality or data availability than others, but overall the manner in which data is managed between industries is consistently similar.

In every industry I have worked the same process is generally followed.

Data is captured

In Telecommunication, Energy and Supply Chain these systems are usually automated data capture via a field device such as a switch or a sensor, some are driven based on orders and some are driven based on consumer behavior.

In Retail and ECommerce the source data capture component is a customer facing system such as a web site or scanner for checking out at a grocery store.

Most companies have a human resources system that keep track of time for the customer facing employees tracking contextual information such as when did an employee arrive, what did they work on, when did they leave?

Data is integrated

Once we have the source data and as much contextual information about this data captured; that data is transferred to another system. This system could be a billing, payroll, time keeping or analytical system, such as a data warehouse or a data mart. The methods used to create this integration system can vary depending on the number of source systems involved and the requirements for cross referencing the data from one system with the data in other systems.

At times certain data points are transferred outside the organization. This data could be going to suppliers, vendors, customers or even external reporting analysts.

Internally each department within an organization needs to see certain data points. Marketing, Merchandising, Finance, Accounting, Legal, Human Resources, Billing, to name a few do not necessarily need to see all of the data captured. However the data they do require does need to get to them in a timely manner in order for these departments to support the overall organization.

Data is protected

During all of these data interchanges the same types of functions need to be performed.

Data must be secured (the users that need access have it, those that do not need access cannot see it), backed up, restores tested and verified, performance must be optimized, problems need to be addressed when they arise, quality must be maintained, and delivery must be verified.

Data Management Professionals

The Data Management profession consists of people striving to create, implement, maintain and evolve best practices for managing the data that runs our enterprise.

The challenges of data management, the problems we solve and the solutions we provide are more similar than they are different.

Is the industry classification of the enterprise we support all that important?

2011-02-17

Thought leadership

You have to be a thought leader in order to recognize one.

I hear the term thought leader bestowed upon people occasionally. I have even bestowed this term on some people that I consider to be extremely knowledgeable about building data warehouse systems.

The wealth of information for data management best practices continues to grow. Thought leaders can publish knowledge about solving a particular problem in a variety of forums now: blogs, books, articles, and even research papers. The sheer volume of information about the "best practices" is almost intimidating.

The ability to take in all of the information about best practices for a subject area, apply it to the situation at hand, consolidating the recommendations from multiple sources as well as ignoring those recommendations that are not applicable make you a thought leader. Google provides a way of finding a site that answers a particular question. If a person does not ask the correct question, Google does not provide a good answer. Once Google finds a particular answer to a keyword query you have to apply that answer to your particular situation.

Let us take a specific example.

The question should not be:

What is the best way to build A data warehouse?

The question should be:

What is the best way to build THIS data warehouse?

Even something as simple as learning a how to apply a new SQL trick that you learned to a specific problem you are working on shows the application of this knowledge. Best practices can be abstract, or even theoretical. When you can take recommendations from many sources and apply their expertise to your specific problem you have taken a big step.

This can apply to many other professional areas. SEO, Business Analysis, Business Process Re-engineering, ETL development,Resume writing, Financial Analysis, Online Marketing, etc...

If you can study multiple sources and apply their recommendations or findings to your own situation, you become a thought leader.

You become a recognized thought leader when you write about it.

2011-02-15

When is the Data Warehouse Done?

Is Data Warehouse development ever complete?

During the launch of a data warehouse project There are schedules and milestones published for everyone to mark on their calendar. A good portion of these milestones are met, the data model is reviewed, development is done, data is loaded, dashboards are created, reports generated and the users are happy right?

Image via Wikipedia
Well, one would hope.
Invariably there is always one more question. How hard would it be to add this metric?

Sometimes it is just a matter of spinning out a new report, new dashboard or even new report. Sometimes the question comes requiring data from an application that did not even exist when the data warehouse project was started. Now the architect has to go back and do integration work to incorporate the data source into the data warehouse, perhaps new modeling needs to be done, perhaps this requires some time for ETL development, sometimes it is just some front end business intelligence work that needs to be done.

Once that is deployed does the data warehouse answer all questions for the enterprise? Can the project then be said to be complete, done and over?

I think perhaps not.

Most data warehouse projects I have worked on have been released in phases. Once a phase is done and users are happy with it we move on to the next phase. Occasionally we have to go back and modify, for various reasons, things that we have already completed and put into production. Is it ever complete? Is it ever done?

I think a data warehouse requires no more modifications in only one case.

When the company no longer exists.

So long as the enterprise is vibrant and interacting with customers, suppliers, vendors and the like. So long as data comes in and goes out of the organization development of the data warehouse will need to continue. It may not be as intense as at the beginning of the original project, but development will need to be done.

So long as the enterprise lives, the data warehouse lives and changes.

2011-02-11

Dynamic Situational Awareness and Multi Agent Systems

I just watched the exultation of the people of Egypt after the announcement that Hosni Mubarak relinquished the Presidency of Egypt to the Military. Egypt's revolution represents many things to many people, and analysts, historians, economists and others will be analyzing the cause, and the events that took place throughout the revolution for some time. The story is just beginning.

One thing that this reminded me of is a model I studied some time ago.

Image via WikipediaThe concept of Multi Agent Systems, is a system composed of multiple interacting intelligent agents. Multi-agent systems can be used to solve problems which are difficult or impossible for an individual agent or monolithic system to solve. Borrowing from the definition is an overview:

Overview

The agents in a multi-agent system have several important characteristics:^[4]

Autonomy: the agents are at least partially autonomous
Local views: no agent has a full global view of the system, or the system is too complex for an agent to make practical use of such knowledge
Decentralization: there is no designated controlling agent (or the system is effectively reduced to a monolithic system)^[5]

Typically multi-agent systems research refers to software agents. However, the agents in a multi-agent system could equally well be robots,^[6] humans or human teams. A multi-agent system may contain combined human-agent teams.

What we as a community have seen is an example of a multi agent system that breaks one of these rules.

The rule of "Local views" did not apply to the crowd of people in Egypt. Thanks to Twitter, Facebook, Youtube and cell phones each person had access to the global view. Some examples of this is called: Situational Awareness

The new social media tools available to the communities of people in Egypt allowed them to have full situational awareness of their local environment as well as the environment in the country as a whole. Modeling this behavior will be something that I am sure many researchers will be working on for some time to come.

How will we see some of this translated into the business world? Dashboards, Mobile Business Intelligence based on clean, concise data systems that are fed from data warehouses or data marts can profoundly impact the capabilities of the people that use these systems. Quickly getting appropriate data to the appropriate person can completely change the ability of business analysts, financial planners, buyers, sellers, accountants and managers to do their daily jobs.

When faced with the questions of how can delivery of data quickly to intelligence systems impact the bottom line. The behavior of the people of Egypt shows how quick access to information can have an impact on an environment.

I hope that the people of Egypt build a stronger country and live in peace with their neighbors and the rest of the world. Peace loving people around the world congratulate you.

2011-02-04

What's your most profitable cupcake?

Intuit has a commercial about a small business owner wondering whether they should make more cookies and cream cupcakes.

Intuit's Cupcake commercial

In the commercial, a store owner says to a guy behind a computer, "I've been thinking about doing more cookies and cream cupcakes."

Close-up of cupcake with pink frosting and spr...

Image via Wikipedia
The go replies, "Oh well let's see what the data says."
A screen pops up with a chart, that shows growing numbers from left to right, and the computer guy looks around to see others in the computer room eating cookies and cream cupcakes.

The commercial then talks about how intuit small business software takes care of all of your data for you.

I have never used Intuit software, but I have built a number of data warehouse systems. A data warehouse is the foundation of a business intelligence solution that this commercial represents.

For small businesses, I am sure it is possible to look into a single application to get the answers to the implied question of the business owner in this commercial. The implied question, of course, is: What is my best selling cupcake?

But what if that question changed just a little bit? What if the question was: What is my most profitable cupcake?

To determine profitability takes a bit more data than just determining the best selling. If you sell 1000 types of cupcakes, but the profit margin is $.05 per cupcake, then you just made $50.00. If you have another type of cupcake that has a profit margin of $.25 per cupcake then you only have to sell 200 to make the same amount of money. In order to calculate this, you need to take into account the cost of the supplies and manpower required to make the cupcake.

Another question that comes to mind is: What is my second or even third best selling cupcakes? If someone buys the number 1 cupcake, what else do they buy? Can I give an incentive for customers to come in and buy my 2nd best selling cupcake, but once they are in the store upsell them to the other cupcakes available? This topic is really called market basket analysis, which is more than I intend to cover in this article.

Which supplier that provides the raw material for the cupcakes is the best? How do you determine the best supplier? Is it the price of the raw material? How many times has the delivery truck been late? Do you have alternative suppliers just in case there is a problem with your main supplier? What are the questions that are important to you in making these decisions?

Which employee makes the most cupcakes? Which employee makes the most profitable cupcake? Are they the same person? How do you know?

What happens when this store owner gets more than one store? If she is able to buy an existing store, will the new store have the same application? How will the data be integrated to be able to answer "simple" questions like the question posed above?

For some small business a single application may meet all of your needs. When things start to grow, as most small business owners want them to, having a data management strategy becomes a strategic priority. The rate at which data can grow for small business owners today can cause things to become complicated very quickly. The questions that are asked about your business will change over time, and data may need to come from more than one application to answer the questions of a growing business.

Do you know if you are asking the right questions for your business? Do you know if the data you are relying on to answer those questions is the coming from the correct application? What factors should be influencing your decisions that may not be represented in the "off the shelf" application?

How do you determine which cupcakes to make?

2011-01-30

Organic SEO Data Management

An overview of online marketing. A simple grap...

Image via WikipediaHow do you manage data outside your control?

Search Engines are great catalogers of meta data about your web site. Recently I was asked to do a bit of SEO for a friend. I worked on putting the HTML together, and researched the topic of SEO extensively. There is a lot of almost conflicting guidance for how to do SEO for a web site. So in order to understand how SEO works, I did what I am sure many other software developers have done.

I wrote my own search engine.

My search engine is anything but production grade, but the research project gave me some insight into how the big guys work. It has limitations, but rather than completely reinvent the wheel, I decided to leverage the capabilities of the tools I had.

I am sure that there are better ways to build a search engine, and this is not meant to be an authoritative methodology, but it is the way that I build my little utility to give me some insight on the process.

I used Fedora to build this, with a MySQL back end. The page crawler is all PHP code. Some of the PHP code I leveraged from the open source community. After all, the goal of this project is to understand the concepts of a search engine, not commercialize it. The web pages were broken down to the raw text, that text was then analyzed and restructured into a small data mart.

The steps my crawler performed were:
1. Identify the site and the individual pages.
2. Get Links
3. Get Meta tags for document.
4. Get the raw-text
5. Get any emails from the page.
6. Clean stopwords from the raw text.
7. Do word frequency analysis on the clean raw text
8. Stem the text
9 Do word frequency analysis on the stemmed text.
10. Store readability scores.

For each of these steps, I stored the results into a database. The database would ultimately turn into a star schema data mart for reporting and analysis.

Just as an overview I will touch on each of these steps, and why I did them. Each step has plenty of documentation about it across the internet, so this is not meant to be an exhaustive explanation.

Step 1. Identify the Site and the individual pages.

    Not only would you want the individual page that a search pattern shows up on, but you also want the site associated with that page. Simple Reg-ex work here.

Step 2. Get Links.

   Parse through the HTML in order to get all of the <a> tags and record the links this page makes reference to. The goal of this step is to record the number of inbound links to a particular page. By capturing the pages this particular page links to, when we scan those pages we can do a count to see the inbound links. The biggest problem with this method is I would either have to know which sites and pages link to this page, or scan the whole internet. I chose to simply record it for internal site page link counts.

Step 3. Process Meta Tags

   Record the Meta tags that the page author stored in the document. These are things like keywords, description, etc... These keywords are the keywords that I want google to use as the keywords for this site. What I found through this research project is that these keywords do NOT necessarily match what google determines is a keyword for your site.

Step 4. Get the Raw Text.

    This step is perhaps one of the most important. Stripping out the raw text is the human readable text that people will actually read. Humans do not process HTML tags, at least normal humans anyway. HTML tags are not cataloged in my process.

Step 5. Get any emails

     Another simple reg-ex that is run against the raw text to store the emails that were embedded in the text.

Step 6. Clean stopwords.

    Stopwords are defined here: Stopwords. Basically these words are common short function words such as the, is, and, or. These words are not very important for text scanning.

Step 7. Word frequency analysis.

    Word frequency analysis is the process of counting the word frequency of a body of text. For example in the text "The quick brown fox jumped over the slow red fox." The word "fox" would have a WFA count of two. Every other word would have a WFA "the", which would be considered a stopword in this example)

Step 8. Stem the raw text.

Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form – generally a written word form, from Stemming. Why is this important? Knowing the root word of a word gives you some flexibility when it comes to processing searches. If the text from the web page is: "Fishing for red worms", and the search is "Fished for worms", if you remove the stopwords and stem both phrases you will have "Fish red worm" and "Fish worm". By doing a Like search on the database table where the keyword phrases are stored the search "Fish worm" will find "Fish red worm" and you will have a hit.

Step 9. Word Frequency Analysis on Stemmed text.

   This is step 7, only using the stemmed text instead of the raw text.

Step 10. Store readability scores.

   A readability score like the Flesch Kincaid readability score give you the grade level of a body of text. Again, this is not an effort to explain the readability score, just a highlight that this particular score can be helpful in looking at a web site for analysis purposes.

Image via Wikipedia

Now that all of this was done, I produced some simple reports on the individual pages of the web site to see things like readability score, and the word frequency analysis of the pages. Funny enough, the high frequency words matched some of the keywords that google webmaster tools showed for the web-site I was working with.

One thing that google analytics and webmaster tools can do for an organization is to show which pages were clicked at what time. If you were using something like this in your data warehouse, it would be interesting to see the keywords used on a particular page that sold a product, or drove a customer to make contact with your organization.

After all, if the goal of the data warehouse is to provide a 360 degree view of the customer, then wouldn't incorporating key word analytics into the data warehouse provide better insight into how the consumer interacts with the web-site, and how that interaction directly is related to sales?

2011-01-23

Backup Strategies? How about a restore strategy?

LAS VEGAS - JANUARY 07: An ioSafe, Inc. Solo ...

Image by Getty Images via @daylife
Recently we had a catastrophic failure of the LUNS that were connected to our development database cluster. Upon starting a restore process, we discovered that there was an issue with our backup process.

Backups are just the first step in a backup strategy. How often are the backups tested? Some would say this is part of a "Disaster Recovery" process or program. I think there a degrees of disasters.

Most DR programs that I am familiar with are designed for a catastrophic failure of a site, or a data center. There are other failures that should be addressed in a restore strategy. Losing a development environment can still cost the company many man-hours of work. If you calculate the dollar value of each hour worked, or not worked as the case may be, you will see the financial impact of not periodically testing your restores.

There are different types of backups, full, incremental, snapshot, and full operating system backups. When you are restoring to a point in time, and you are doing incremental backups, you have to have all of the incremental backups available. You may need additional storage available to test out the restore of a database.

Restores generally take a bit longer to do than an actual backup. Especially if there are multiple steps to the backup. If you need to get the backup media from off-site storage, this can take even longer. How long does it take to get a specific tape or CD from your off-site vendor? How often is the time in this SLA tested?

A Dataflow Diagram of backup and recovery proc...

Image via Wikipedia

What is the business impact to losing a test or development environment? One may say, not much, but have you factored in the manpower cost to recovering that environment? How many people are involved in rebuilding the test or development environment that could be working on solving business problems with IT?

Does your backup or disaster recovery strategy include time and resources for periodically testing out restores of individual systems? If it does, what frequency is this done? If not, why not?

Related articles

2011-01-18

Data Management is a dirty job

Image via WikipediaOne of my family's favorite shows is Dirty Jobs with Mike Rowe.

The premise of the show is they profile jobs that are less than glamorous, and highlight the people that are not afraid to get dirty in their daily jobs.

As the show begins, Mike introduces the people he will be working with, then they describe the task at hand. Once Mike gets an overview of the job he dives in and tries his hand at the job. The purpose of this, of course, is entertainment. Mike is not a specialist in any of the jobs, yet he goes through a bit of training and then tries his hand. Mike has a great sense of humor and it shines through as he attempts to do the jobs in the show. Sometimes he is successful, sometimes his attempts are less than successful.

Either at the end of the segment, or throughout the segment, Mike attempts to understand and explain to the audience how doing such a job impacts the bottom-line of the business. For example, why is it important to this company to scrape out a refuse container? Well, if it gets too clogged it can cause damage to the machinery which will prevent the machines making the products the company sells.

As I watch these shows, I am reminded both of my time as a consultant and my roles as an architect in the various positions I have held. As a consultant one of the things that you have to do very quickly is to understand the business processes of what you are working on. Why is it important that the performance of this particular query be improved? If we don't get an answer to this question quickly it will delay our orders which ultimately impacts delivery of products to our customers.

The day to day work done as a data warehouse architect or business intelligence architect is quite a bit different from the activities that Mike Rowe does on his show, but in essence aren't they similar?

We look at the details of business processes, delve into the dirty details of how transactions are processed and data is accumulated, then present that data to as many people as we can through the front-end tools that we deploy. There are times when we spend many hours working on a particular problem of how to process and present a concept easily and in a repeatable manner.

Diving into the details of how a business runs can be a "dirty" job in that there are many details that must not be overlooked. In that sense, data management itself is a dirty job because so many people take for granted that when they look at a report, or a dashboard, that the data is clearly presented. We make it easy for data to be presented in a clean manner, but the details of how it got there is anything but clear to the majority of people within an organization.

Pages

2011-08-15

Give them data early and often.

Show the data quality issues

Get and give feedback often

2011-08-01

Make more money

Make better decisions

Make lasting impressions.

Make data work

2011-07-25

-graphy

-logy

Data

Datagraphy

Related articles

2011-07-23

Related articles

2011-05-27

People

Process

Things

Data

Related articles

2011-03-02

Data is captured

Data is integrated

Data is protected

Data Management Professionals

Related articles

2011-02-17

Related articles

2011-02-15

Related articles

2011-02-11

Overview

Related articles

2011-02-04

Related articles

2011-01-30

Related articles

2011-01-23

2011-01-18

Related articles