Why We Need More Data by Lukas Biewald

Post on 12-Aug-2015

136 views 2 download

Tags:

Transcript of Why We Need More Data by Lukas Biewald

Why We Need More Data

Lots of Data

The Effect of Better Algorithms

Naïve Bayes Maximum Entropy SVM0%

5%

10%

15%

20%

25%

Classifier Error Rate

Active Semi-Supervised Learning for Improving Word Alignment

(Vamshi ACL ’10)

Real World Data

The Effect of Better Features

Unigrams Bigrams Unigrams+Bigrams0%

5%

10%

15%

20%

25%

30%

Classifier Error Rate

Real World Data

The Effect of More Data

90%

Acc

urat

e Dat

a

95%

Acc

urat

e Dat

a

100%

Acc

urat

e Dat

a0%

2%

4%

6%

8%

10%

12%

14%

Classifier Error Rate

Active Semi-Supervised Learning for Improving Word Alignment

(Vamshi ACL ’10)

Real World Data

The Effect of Cleaner Data

N 2N 4N0%

2%

4%

6%

8%

10%

12%

14%

Classifier Error Rate

Where Do Data Scientists Spend Their Time?

Source: CrowdFlower Data Science Report 2015

The Power of Open Data

CrowdFlower Data Enrichment Platform

Color Data

Fleshmap

19

Drug Side Effects

Apple Watch

Apple Watch

Apple Watch

Apple Watch

Data For Everyone

Collecting the Same Data Over and Over

28

Open Data

Make Your Data Public Setting

Data for Everyone

Data For Everyone Library

Data for Everyone

33

Data For Everyone

34

Categorize URLs

URL Categorization

Open Data API

Record Data

Extracting Names and Titles

Summarization

Is an Image Funny?

Classifying Medical Images

Attributes of People

396 Scripts

TWITTER.COM/CrowdFlowerINFO@CROWDFLOWER.COMCROWDFLOWER.COM

Thank YouLukas Biewald

lukas@crowdflower.com@L2K