My introduction to R


English: Logo for R
Some time ago, I began down a path of learning statistics. This was not a topic I had studied in detail before. I had learned a bit of statistics over time while doing other things but never formally. 

I recognized I needed to learn more formally. 

During a number of the lectures I followed multiple professors referenced the R language. 

In my experience I had heard about SAS, and even worked with people that needed data from the various data repositories I managed imported into SAS. 

So I called one of my friends who is a big SAS user, and asked him about R. 

His response, R is basically a cleanroom version of SAS, but you have to write a lot of code to do the same things SAS does. 

Now this is something I could get into. 

So formally from the wiki: 
"R is a programming language and software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, surveys of data miners, and studies of scholarly literature databases show that R's popularity has increased substantially in recent years." -- R programming language

I have been a DBA, and a data architect for quite some time. Generally, the types of systems I had built up to that point in time were analytical frameworks. The need for these is apparent with the majority of the Business Intelligence tools that are out there.

As a general rule, the performance of a BI tools is almost entirely dependent on the data model (dimensional) that the BI tool reads from.

There are a few exceptions to this rule, but it has been a guiding rule for the majority of my career.

Now, with R there is not as much of a need for the structuring of the data to support the analysis. R is a programming language, as such you can do the Data Munging necessary within your code.

And since R is a vector based language you can do set operations which are incredibly faster than doing for loops, cursors and the like.

R has many various packages for doing various types of analysis. Machine learning, Sentiment Analysis, Data Mining, various types of regressions.

All of which, only a few years ago I would have needed a SAS license to be able to attempt.

Since R is open-source, I am able to download it and run with it with no "request" and "approval" process. I don't have to justify an expenditure to get a tool that helps me do my job.

If you are in a data architect, DBA, or other DataOps  I encourage you to check-out R. You will find a new powerful tool in your toolbox.

I will be writing a bit more about R over time as well.

No comments:

Post a Comment