Comparing Data Science and Business Intelligence
Over the past few years as I have been supporting more and more "non-traditional" (i.e. not a Data Warehouse or Data Mart) analytical platforms, I have noticed a number of differences between Data Science approaches and Business Intelligence approaches.
This image sums up many of my observations and gives a touch point for comparing the differences as well as similarities between the two approaches.
Reproducible versus Repeatable
One of the goals of #DataOps is to keep data moving to the right location in a repeatable, automatized manner. Most of the data warehouse environments I have worked on, the person doing the analysis does not run the ETL jobs. Todays data flows into existing data marts, dashboards, dimensional models, and queries that drive it all. These are repeatable processes.
Performing a Reproducible process on the other hand shows the entire process soup to nuts. The analyst pulled this data from that system, used this transformation on these data elements, combined this data with that data, ran this regression, and produced this result. Therefore if we raise the price of this widget by $.05 we will have this lift in profit (Ceteris paribus).
Predictive versus Descriptive
As described above the Data Scientist attempts to make a prediction about something, whereas the Business Intelligence analyst is usually reporting on a condition that is considered a Key Performance Indicator of the company.
Explorative versus Comparative
In most Business Intelligence environments I have worked with, the questions are usually along these lines: "Is this product selling more than that product?"
The Data Scientist would want to look at what product has the highest margin, or the product that has the largest impact on the bottom line. If someone buys product X, do they also purchase product Y?
What else is impacting this particular store? Does weather have an impact on purchase patterns? What about twitter hashtags? What in our product line is most similar to a product that has a high purchase volume in the overall consumer community.
Attentive versus Advocating
Data Scientist: The data shows us that consumers that purchased X also purchased Y. I suggest we relocate Y for the stores in this geographic area by 1 meter away from X, and in this geographic area they should be 2 meters away. Then we will analyze the same visit purchases for those two items to determine if this should be done in all stores.
Business Intelligence: The latest data from our campaign is shown here. The response rate among 18-24 year old males is less than what we wanted but we expect to see more lift in the coming weeks.
Accepting versus Prescriptive
Data Scientist: Give me all data, I will analyze it as is, and determine what needs to be cleaned and what represents further opportunities. If there is a quality issue I will document it as part of the assumptions in my analysis.
Business Intelligence: The data has to be cleaned and high quality before it can be analyzed. No one should see the data before all of the quality checks, verifications, and cleansing processes are done.
Both of these approaches have business value.
I think Data Science will continue to get some press for quite a while, there will always be some amazing break through that someone used the algorithm of the day to solve a business problem. Then the performance of that algorithm will become a metric on a dashboard that is put into a data mart.
The guys and gals in #DataOps will make sure the data is current.
The Data is protected.
The Data is available.
The Data is shown on the right report to the people that are authorized to see it.
Your Data is safe.