Becoming a Data Scientist

Everyone's path to becoming a Data Scientist is different.

This makes it difficult to recommend to someone outside the academic world how re-invent themselves into the image of a Data Scientist.

 Joel Grus makes a quite succinct overview of what a Data Scientist should be able to do. I first saw this image on twitter, and searching around found his presentation on Slideshare.

Joel makes some great points, about learning Data Science.

Science is a tool for understanding the world around us.

"Science is a systematic enterprise that creates, builds and organizes knowledge in the form of testable explanations and predictions about the universe."

Bill Nye the Science Guy at The UP Experience ...Some people may think of Data Science as part of the tools used by Scientists wearing white coats in pristine labs. You can begin doing Science in your own kitchen. I am much more a follower of Bill Nye, rather than Brian Cox.

One of the earliest Data predictions I did was to answer the following question: How big should this database be?

I created a Data Model in ERWin, then did Volumetric estimation based on daily table growth.

I had a model (the Entity Relationship Diagram). I had some assumptions (we are adding this many records every day). I was able to do forecasting(another word for prediction.).

Using this information we knew how to size the database server for this application, and use that information as part of the infrastructure pricing estimates given to management.

While this may not be considered "proper data science" by many people this is a simple example of how to begin using Science in even the most mundane way.

Sure there are some foundation components like Math, Data Management, Programming skills, Business acumen, and data visualization that need to be learned and understood. Try to solve a problem in your universe.

So the question before us is: How do I become a Data Scientist?

Here are some steps to follow:

  1. Pick a problem that interests you. (Ask an interesting question.)
  2. Learn all you can about the problem. (What is already known about the problem?) 
  3. Collect as much data as you can about the problem. (From as many sources as you can.)
  4. Make a prediction about the problem. (Don't worry if others have already done this.)
  5.  Be wrong! (This is most important. Fail first, Fail fast, and Fail Often!) 
  6.  Figure out what you did wrong, and correct it. (Then iterate. Go back to learn more (2), or get more data(3), or update your prediction(4) based on new information.)
  7.  Be right! (Finally! Now show why you were correct, and how you can apply this to other domains.)

If you are just starting on the journey to become a Data Scientist, do this a few times. The idea is to learn this process. The tools will change based on your environment, the specific problem you are trying to solve, and what tools your employer will allow you to use.

The point is, as Joel Grus, Bill Nye, and Nike says: Just do it!

1 comment:

  1. Thank you for another great article. Where else could anyone get that kind of information in such a perfect way of writing? I have a presentation next week, and I am on the look for such information.

    Data Analytics Courses in Chennai