Why We Need More Data by Lukas Biewald
-
Upload
crowdflower -
Category
Technology
-
view
136 -
download
2
Transcript of Why We Need More Data by Lukas Biewald
Why We Need More Data
Lots of Data
The Effect of Better Algorithms
Naïve Bayes Maximum Entropy SVM0%
5%
10%
15%
20%
25%
Classifier Error Rate
Active Semi-Supervised Learning for Improving Word Alignment
(Vamshi ACL ’10)
Real World Data
The Effect of Better Features
Unigrams Bigrams Unigrams+Bigrams0%
5%
10%
15%
20%
25%
30%
Classifier Error Rate
Real World Data
The Effect of More Data
90%
Acc
urat
e Dat
a
95%
Acc
urat
e Dat
a
100%
Acc
urat
e Dat
a0%
2%
4%
6%
8%
10%
12%
14%
Classifier Error Rate
Active Semi-Supervised Learning for Improving Word Alignment
(Vamshi ACL ’10)
Real World Data
The Effect of Cleaner Data
N 2N 4N0%
2%
4%
6%
8%
10%
12%
14%
Classifier Error Rate
Where Do Data Scientists Spend Their Time?
Source: CrowdFlower Data Science Report 2015
The Power of Open Data
CrowdFlower Data Enrichment Platform
Color Data
Fleshmap
19
Drug Side Effects
Apple Watch
Apple Watch
Apple Watch
Apple Watch
Data For Everyone
Collecting the Same Data Over and Over
28
Open Data
Make Your Data Public Setting
Data for Everyone
Data For Everyone Library
Data for Everyone
33
Data For Everyone
34
Categorize URLs
URL Categorization
Open Data API
Record Data
Extracting Names and Titles
Summarization
Is an Image Funny?
Classifying Medical Images
Attributes of People
396 Scripts