How to get started in Kaggle competition
-
Upload
merja-kajava -
Category
Data & Analytics
-
view
970 -
download
0
Transcript of How to get started in Kaggle competition
![Page 1: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/1.jpg)
How to get started in
Merja Kajava September 1, 2015
Helsinki Data Analytics and Science Meetup
![Page 2: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/2.jpg)
is a competition platform for (aspiring) data
scientists
![Page 3: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/3.jpg)
![Page 4: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/4.jpg)
Why participate in Kaggle The data Competition steps Tips
![Page 5: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/5.jpg)
Why participate in Kaggle competition?
![Page 6: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/6.jpg)
1 Learn from the best.
Forums Scripts Solutions from prize winners
![Page 7: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/7.jpg)
2 Work with cool datasets.
Flights in GE Flights Quest
Driver telematic analysis
Amazon employee access rights
![Page 8: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/8.jpg)
+1 You can also win money
![Page 9: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/9.jpg)
What kinds of competitions Kaggle has?
![Page 10: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/10.jpg)
Public
In-class
Private
![Page 11: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/11.jpg)
What languages can you use?
Any open-source language (sometimes also sponsor’s
proprietary languages)
Gnu Octave (no Matlab)
![Page 12: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/12.jpg)
What is the data like?
![Page 13: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/13.jpg)
Data comes from companies and non-profit organizations
![Page 14: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/14.jpg)
Data sizes vary
Zip ~1 MB
Zip ~6 GB
![Page 15: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/15.jpg)
Data comes in all shapes
Customer data Log files Timeseries HTML pages Images Documents
![Page 16: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/16.jpg)
How does Kaggle competition work?
![Page 17: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/17.jpg)
Competition flow
Duration typically 4 to 8 weeks Max. 5 entries per day
![Page 18: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/18.jpg)
test
Model
train
submission.csvPredict
Build prediction model
Calculate CV to cross-validate
![Page 19: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/19.jpg)
Evaluate submission
Typical evaluations
Area under the ROC curve Normalized Gini coefficient RMSLE …
![Page 20: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/20.jpg)
Public leaderboard
Private leaderboard
~10-30% of test data
submission.csv
Submit entry
![Page 21: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/21.jpg)
Choose two entries for final
Practical choice
Best entry in public leaderboard +
Best CV from local entries
![Page 22: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/22.jpg)
Tips
![Page 23: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/23.jpg)
Look at data. Visualize it.
Source https://www.kaggle.com/justfor/liberty-mutual-group-property-inspection-prediction/explore-data/notebook https://www.kaggle.com/odiseo1982/liberty-mutual-group-property-inspection-prediction/compare-variables-between-train-and-test/files
![Page 24: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/24.jpg)
Focus on feature engineering
Feature selection Feature construction
Dates Locations Categories Segmentation Statistics
![Page 25: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/25.jpg)
Build different models
3 target variables 4 cities = Build 12 models
Source https://www.kaggle.com/c/see-click-predict-fix/visualization/1390
![Page 26: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/26.jpg)
Try different algorithms
Random forest Vowpal Wabbit GBM Xgboost
![Page 27: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/27.jpg)
Build ensembles
Average of submissions Weighted average of submissions Ranked average of submissions
Stacked generalization Blending
![Page 28: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/28.jpg)
Keep track of your submissions
Submission id
![Page 29: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/29.jpg)
Next steps
![Page 30: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/30.jpg)
Start competing
Create Kaggle account Choose competition Go for it!
![Page 31: How to get started in Kaggle competition](https://reader036.fdocuments.in/reader036/viewer/2022062316/58f2a6341a28abe2378b459f/html5/thumbnails/31.jpg)
Useful linksKaggle Blog
http://blog.kaggle.com
Kaggle Competitions: Where to begin
http://www.analyticsvidhya.com/blog/2015/06/start-journey-kaggle/
Kaggle Feature Engineering
http://machinelearningmastery.com/discover-feature-engineering-how-to-engineer-features-and-how-to-get-good-at-it/
Kaggle Ensembling Guide
http://mlwave.com/kaggle-ensembling-guide/