Pages

2016-03-03

Bootstrap analysis

The scatterplot of Iris flower data set, colle...

Bootstrap Analysis


There are times in the life of a Data Scientist where you have a bit of data, and the broad directive of: "See what this says."

Recently I have faced this, and came up with this concept - Doing a Bootstrap analysis. 

Spend a very small time-period. (Small is relative to your initial understanding of the data). Pull the data together, and munge it into your tool of choice. 

Then come up with broad features that describe the data set at a high-level. These features are an enrichment of the data set itself. So in R terms it would be adding a column to a data frame. In SQL terms, creating a new table with a ranking column based on some grouping criteria for example. 

Once you have this enriched data set perform some simple trending analysis on your features compared to some outcome variable in the original data set. If you see something interesting, that is the thread you can pull to find out more details. 

Further contextualizing the data will give further insight, but this initial analysis will give you an idea of what further contextual information would be useful.

This "limited" analysis should enable you to answer the initial question of: How long will it take to get something useful out of this data?

This is a different concept from "bootstrapping ", others have explained in far more detail, and far better than I ever could about that technique. This is just a way to communicate to stakeholders that there needs to be some time given to initial analysis of a data set to get an idea of how deep the rabbit hole goes...






No comments:

Post a Comment