"Treat your servers like Cattle, not puppies" is  mantra I have heard repeatedly in a former engagement where I was actively involved in deploying a cloud based solution with RightScale across Rackspace, AWS, and Azure.

The idea is that if you have a problem with a particular server, simply kill it, relaunch it with the appropriate rebuild scripts like chef and some other automation, have it reconnect to the load balancer and you are off to the races.

I think there is one significant flaw in this philosophy.

Database servers, or data servers are neither Cattle nor puppies. They are however like Pack Mules.

If you have an issue with some data servers like Cassandra have some built in rebalancing capabilities that allow you to kill one particular instance of a Cassandra ring, bring up a new one, and it will redistribute the loads of data you may have. Traditional database engines like SQL Server, Oracle, MySQL, etc... do not have this built in capability.

Backups still need to be done, Restores still need to be tested, bringing up a new relational database server requires a bit of expertise to get it right. There is still plenty of room for automation, scripting, and other capabilities.

That being said, database infrastructure needs to have a trained, competent, database administrator overseeing its care and maintenance.

We have all seen movies where pack mules were carrying supplies of the adventurers. 

As you take a Pack Mule on your adventures or explorations with you, if it breaks its leg, you can get a new pack pule easy enough. However, you have to remove all of the things that the pack mule was carrying to the new pack mule. If you don't have a new pack mule, or can't get one quickly the adventurers themselves have to carry the supplies, the load is redistributed, priorities of what is truly needed to make it through the foreseeable next steps, and plans are made to find the local town to get a new pack mule.

Back in the present database infrastructure is truly the lifeblood of all of our organizations. Trying to "limp" through, or simply "blow away" our servers and rebuilding them is an extreme philosophy. There are laws, regulations, and customer agreements regarding the treatment and protection of data that must be adhered to.

Who is taking care of your pack mules? Will your current pack mule make it over the next hill you have to climb?

Enhanced by Zemanta


Steps to successful adoption of a new data warehouse

What is taking so long to get the data warehouse ready?

In a new deployment of a data warehouse there are many infrastructure components that have to be put in place. Modeling tools, ETL Servers, ETL processes, BI Servers,  and Bi interfaces and finally reports and dashboards. Not to mention sessions for user interviews, business process review and metadata capture.

I say server(s) because there should be dev/test and prod platforms for each of these.

Figure 3-4: how data models deliver benefitImage via WikipediaA recent article at talks about data modeling taking too much time if done correctly.

Add all of these things together and you have a significant period of time to wait before seeing a benefit to a Data Warehouse/Business Intelligence project.

Here are some suggestions to reassure the stakeholders early on during the project lifecycle.

Give them data early and often.

     Put together a small and simple data model for the first pass. Load the small star schema with a subset of the data relevant to a group of business users, then create some reports or give some power users access to create their own reports.

    This shows the concept of continuity. A Continuity test in electronics is the checking of an electrical circuit to see if current flows, or that it is a complete circuit.

Show the data quality issues

  "A problem well stated is a problem half solved" Without seeing data quality issues, the people that enter data into the system of record can not fix it.

Get and give feedback often

   As soon as people start using the "prototype", you will get feedback. Use this as an opportunity to explain why the process should take longer. It also identifies gaps in understanding among the team. Once people have a hands-on view of the presentation layer they will try a number of things.

They will use it to answer questions they already have answers to. Thus validating the transformation processes.

They will also start to try to answer questions they may not have asked before. This is the best opportunity for learning more about how the data is being used.

These steps lay the foundation for making data work for you and your business.

Enhanced by Zemanta


3 Great Reasons to Build a Data Warehouse

Why should you build a Data Warehouse?

What problems do a Data Warehouse and Business Intelligence platform solve?

There are strong debates about the methods chosen for building a data warehouse, or choosing a business
intelligence tool.Data Warehouse OverviewImage via Wikipedia

Here are three great reasons for building a data warehouse.

Make more money

The initial cost of building a data warehouse can appear to be large. However, what is the cost in time for the people that are analyzing the data without a data warehouse. Ultimately each department, analyst or business unit is going through a similar process of getting data, putting it in a usable format, and storing it for reporting purposes(ETL). After going through this process they have to create reports, prepare presentations and perform analysis. The immediate time savings benefit comes to these folks who do not have to worry about finding the data once the data warehouse platform is built.

The following two points also allow you to make more money.

Make better decisions

In order to better know your customers, you must first better understand what they want from you.Once the people that spend most of their time analyzing the data do not have to spend so much time finding the data and focus their time on reviewing the data and making recommendations, the speed of decision making will increase. As better decisions are made, more decisions can be made faster. This increases agility, improves response time to the customer or environment, and intensifies decision making processes.

Once a decision making platform is built you can better see which type of customer is purchasing what type of product. This allows the marketing department to advertise to those types of customers. The merchandising department can ensure products are available when they are wanted. Purchasing can better anticipate getting raw materials so products are available. Inventory can best be managed when you are able to anticipate orders, shortages, and re-orders.

Make lasting impressions.

Customer service is improved when you better understand your customer. When you can recommend to your customers other products that they may like you become a partner to your customer. Amazon does an amazing job of this. Their recommendation engine is closely tied to their historical data, and pattern matching of which products are similar. Likewise, you may want to tell a customer that they may not want something that they want to purchase because a better solution is available. This makes a lasting impression on them that you are the one to help them in their decision making process.

Make data work

Building a data warehouse platform is one of the best ways to make data work for you, rather than you have to work for your data.

Enhanced by Zemanta


Datagraphy or Datalogy?

What is the study of data management best practices?

Do data management professionals study Datagraphy, or Datalogy?

A few of the things that a data management professional studies and applies are
  • Tools
    • Data Modeling tools
    • ETL tools
    • Database Management tools
  • Procedures 
    • Bus Matrix development
    • User session facilitation
    • Project feedback and tracking
  • Methodologies 
    • Data Normalization
    • Dimensional Modeling
    • Data Architecture approaches

These, among many others, are applied to the needs of the business. Our application of these best practices make our enterprises more successful.

What should be the suffix of the word that sums up our body of knowledge?

Both "-graphy" and "logy" make sense, but let's look at these suffixes and their meaning.


The wiki page for "-graphy"  says: -graphy is the study, art, practice or occupation of... 

The dictionary entry for "-graphy" says -"a process or form of drawing, writing, representing, recording, describing, etc., or an art or science concerned with such a process"


The wiki page for  "-logy"  says -logy is the study of ( a subject or body of knowledge).

The dictionary entry for  "-logy" says: a combining form used in the names of sciences or bodies of knowledge. 


The key word that we all focus on is data. 

In a previous blog entry, I wrote a review of the DAMA-DMBOK  which is the Data Management Association Data Management Body Of Knowledge. 

Data Management professionals study and contribute to this body of knowledge. As a data guy, I am inclined to study to works of those who have gone before. I want to both learn from their successes and avoid solutions that have been unsuccessful. 

Some of the writings I study are by people like:  Dan LinstedtLen Silverston, Bill Inmon, Ralph Kimball, Karen Lopez, William Mcknight and many others. 

I have seen first hand what happens to a project when expertise from the body of knowledge produced by these professionals has been discarded. It is not pretty. 

Why do I study these particular authors? These folks share their experiences. When I face an intricate problem, I research some of their writings to see what they have done. Some tidbit of expertise they have written about has shed light on many problem I have faced, helping me to find the solution that much sooner.

When I follow their expertise my solutions may still be unique, but the solutions fit into patterns that have already been faced. I am standing on the shoulders of giants when I heed their advice. 

When I am forced to ignore their advice, I struggle, fight and do battle with problems that either should not be solved or certainly not be solved in the manner in which I am forced to solve them. 

Should the study of and contribution to the body of knowledge of data management be called data-graphy or data-logy? 


The term Datagraphy sums up the study of the data management body of knowledge succintly. 

I refer back to the dictionary definition of the suffix "-graphy": "a process or form of drawing, writing, representing, recording, describing, etc., or an art or science concerned with such a process"

Data is recorded, described, written down,written about, represented (in many ways) and used as a source for many drawings and graphical representations. 

What do you think? I will certainly be using Datagraphy.
Enhanced by Zemanta


Data is killing us!

Are you drowning in Data?

You have a number of applications collecting various pieces of data in order to run your business. What do you have to do in order for an analyst to make an informed decision?

For the majority of your business operations, dashboards should show current activity. Thresholds can be established for when a particular event takes place and alerts sent automatically. Simulations can be run based on past performance to gauge or even predict the performance of what-if scenarios.

All of these things can be done, the question is: Are they being done?

EMC Symmetrix DMX1000 Disk ArrayImage via Wikipedia

Are there so many copies of your application databases, that the cost of servers, disk arrays and storage going through the roof?

Are multiple people required to keep track of which backups and restores are done on a nightly basis driving personnel costs up?

Are business analysts spending more time collecting data than understanding, interpreting and making recommendations, reducing efficiency?

There is a better way.

A person who studies the practices of data management and the applicability of the various data management tools, procedures or methodologies to the needs of the business can make a difference in the use of an organizations data.

This difference can be measured in many ways. It could be an increase in revenue because a relationship was found in the data that could not have been seen before a new business intelligence system was deployed. It could be cost savings of physical equipment.

More often it is the saving of personnel time associated with gathering data just to answer questions.

Some proponents of vendor solutions will suggest that they have all of the answers to your data needs. Perhaps some vendors do have solutions. However, bringing in a vendor solution will not relieve an organization of the responsibility of data management.

The best way to work with vendors is to get them to fully understand all of the pain points associated with your data. No single vendor can solve all problems. Smart people with a vested interest in making your company successful will help you management your data.

Proliferation of data makes an organization stronger. If data is killing you, then you need someone to tame the beast and make data work for you.

Make your data work for you, rather than you work for your data.

Who are the people that will make your data work for you? A database administrator is a good start, many I have spoken to have plenty of ideas for how to make things better.

A data architect is the best start. Data Architects are the people that have studied data management best practices. A great Data Architect can quickly come to an understanding of your pain points and make recommendations that can be done soon to make sure that data works for you.

Enhanced by Zemanta