Kaggle?
Do you even Kaggle, Bro?
English: Kaggle logo (Photo credit: Wikipedia) |
Basically, Kaggle is a site where they peridocially have competitions for Machine Learning, and Data Mining Practitioners.
Generally the format of the competition is this:
Here is a training dataset
Here is a test data set
Here is a sample submission file
The training and test data set have the "key" features that the people designing the competition recognizes as being important. One row per observation with some particular "training" or "outcome" variable. The biggest difference between the two is that the test data set does not have the "training" or "outcome" variable.
This is what you need to accurately predict.
The sample submission file is generally a file with 2 columns an ID variable, and the Response variable.
The ID from the submission is a unique identifier from the test data set, and the response is what you, and your algorithm predict.
There is an evaluation process behind the scenes. Once you submit your prediction, your submission is compared to a set of "known" predictions. The predictions behind the scenes are the ones generally considered to be accurate by the group hosting the competition.
An evaluation score is calculated on your prediction versus the accepted predicted values. Some mechanism like F-Statistic, or RMSE, or area under the ROC are used as a score.
There is an evaluation process behind the scenes. Once you submit your prediction, your submission is compared to a set of "known" predictions. The predictions behind the scenes are the ones generally considered to be accurate by the group hosting the competition.
An evaluation score is calculated on your prediction versus the accepted predicted values. Some mechanism like F-Statistic, or RMSE, or area under the ROC are used as a score.
These competitions show a wide variety of industries, and some of the competitions allow you to win money.
Since there is money on the line, there are some groups that take these things incredibly seriously.
They will dedicate resources and time to winning a competition. As such the leaderboard may or may not be an accurate representation for how well someone knows an algorithm or methodology, rather it may represent the amount of time and resources dedicated to winning the competition.
For example, in one competition recently Springleaf the top score on the leaderboard is: .80427 second place is: .80394 a difference of .00033
The difference between my score and the leader was .04759!
These competitions can be fun, exciting, and provide for opportunities to try new tools, methods, and algorithms. However, unless you dedicate quite a decent amount of time and energy into them you may not come in first place.
Since there is money on the line, there are some groups that take these things incredibly seriously.
They will dedicate resources and time to winning a competition. As such the leaderboard may or may not be an accurate representation for how well someone knows an algorithm or methodology, rather it may represent the amount of time and resources dedicated to winning the competition.
For example, in one competition recently Springleaf the top score on the leaderboard is: .80427 second place is: .80394 a difference of .00033
The difference between my score and the leader was .04759!
These competitions can be fun, exciting, and provide for opportunities to try new tools, methods, and algorithms. However, unless you dedicate quite a decent amount of time and energy into them you may not come in first place.
No comments:
Post a Comment