Kaggle - global Data Science community

Post on 22-Apr-2015

354 views 4 download

description

slides from the Lviv IT Arena talk

Transcript of Kaggle - global Data Science community

Kaggle – the global community of Data Science professionals

Anastasiia Kornilova

Who am I?

- MS in Applied Mathematics, - 3 years as a Data Scientist

What is Data Science?

Scientific Method

Math

Statistics

Data Engineering

Domain Expertise

Advanced ComputingVisualization

Hacker Mindset

What matters?

What is Kaggle?

2010 - founded in Melbourne, Australia by Antony Goldbloom

What problem they solve?

Data problems

Data solvers

In fact, a McKinsey Global Institute report estimates that by 2018, “the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.” !!!

Between 2010 and 2020, the data scientist career path is projected to increase by 18.7 percent, beat only by video game designers. The big data industry is expected to be a 53.4 billion industry by 2016.

Anyone with "data science" in his or her job title on a LinkedIn page is going to get "100 recruiter emails a day," said Josh Sullivan, who leads a 500-person data-science group at the consulting firm Booz Allen Hamilton Holding

Are you good enough?

First Competition: Forecast Eurovision Song Contest Voting

!

!

- 1000 dollars prize - 22 teams

Outperformed prediction markets: predict 7 countries from Top10, prediction markets only 5.

- 2011 - relocated to San Francisco - November, 2011 - raise 11M dollars fundings - July, 2013 - 100,000 data scientists involved - February, 2014 - more than 140,000 data

scientists

Short story of success

How you can use Kaggle?

Rewarding types

- Knowledge - Money - Job interview

Competitions for knowledge (always open)

!

- Digit recognizer, CIFAR-10, First steps with Julia - Titanic: Machine Learning for Disaster - Bike Sharing Demand - Learning Social Circles in Networks

Competitions with prize:Open: - American Epilepsy Society Seizure Prediction

Challenge: 25, 000 prize - Africa Soil Property Prediction Challenge: 8,000 prize - Tradeshift Text Classification: 5,000 prize

Completed competitions (170+)- Heritage Health Price: 500,000 - GE Flight Quest: 250,000 - GE Hospital Quest: 100,000 - Higgs Boson ML Challenge: 13,000 + invitation to

CERN - Galaxy Zoo: 16,000 - KDD Author Paper Identification Challenge - Job Recommendation Challenge

Job competitions (completed):Facebook:

- recommend missing links in social graph (who to follow) - optimal graph path - predict text tags

Yelp: - estimate the number of useful votes a review will receive

Wallmart: - predict store sales

+ Job Board

How to win?

Dig into the data

Stay on track

!

Kaggle competition == Data science?

1. Understand

2. Collect

3. Data exploration4. Clean and transform

5. Model

6. Validate

7. Communicating results

Deploy

?