The Art of Predictive Analytics: More Data, Same Models...

95
The Art of Predictive Analytics: More Data, Same Models [STUDY SLIDES] Joseph Turian [email protected] @turian MetaOptimize 2012.02.02

Transcript of The Art of Predictive Analytics: More Data, Same Models...

Page 1: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

The Art of Predictive Analytics:More Data, Same Models

[STUDY SLIDES]

Joseph [email protected]

@turianMetaOptimize

2012.02.02

Page 2: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

NOTE:These are the STUDY slides from my talk at the predictive analytics meetup: http://bit.ly/xVLBuS

I have removed some graphics, and added some text.Please email me any questions

Page 3: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Who am I?

Engineer with 20 yrs coding exp

PhD 10 yrs exp: large-scale ML + NLP

Founded MetaOptimize

Page 4: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

What is MetaOptimize?

Consultancy + community on:

Large-scale ML + NLP

Well engineered solutions

Page 5: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

“ Both NLP and ML have a lot of folk wisdom about what works and what doesn't. [This site] is crucial for sharing this collective

knowledge.” - @aria42

http://metaoptimize.com/qa/

Page 6: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

http://metaoptimize.com/qa/

Page 7: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

http://metaoptimize.com/qa/

Page 8: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

“ A lot of expertise inmachine learning is simply

developing effective biases.”

-Dan Melamed(quoted from memory)

Page 9: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

What's a good choice of learning rate for the second layer of this neural net on image patches?

[intuition]

(Yoshua Bengio)

0.02!

Page 10: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Occam's Razoris a great example of ML intuition

Page 11: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Without the aid of prejudice and custom I should not be able to find my way across the room.

- William Hazlitt

Page 12: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

It's fun to be a geek

Page 13: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Be an artist

Page 14: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Be an artist

Page 15: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

How to build the world'sbiggest langid (langcat) model?

Page 16: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same
Page 17: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same
Page 18: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

+ Vowpal Wabbit = Win

Page 19: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

How to build the world'sbiggest langid (langcat) model?

SOLVED.

Page 20: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

The art of predictive analytics:1) Know the data out there2) Know the code out there3) Intuition (bias)

Page 21: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

A lot of data with one featurecorrelated with the label

Page 22: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Twitter sentiment analysis?

Page 23: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same
Page 24: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Awesome! RT @rupertgrintnet Harry Potter Marks Place

in Film History http://bit.ly/Eusxi :)

“ Distant supervision”(Go et al., 09)

(Use emoticons as labels)

Page 25: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Recipe:You know a lot about the problem

Smart Priors

Page 26: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

You know a lot about the problem:Smart Priors

Yarowsky (1995), WSD

1) One sense per collocation.2) One sense per discourse.

Page 27: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Recipe:You know a lot about the problem

Create new features

Page 28: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

You know a lot about the problem:Create new features

Error-analysis

Page 29: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

What errors is your model making?

DO SOME EXPLORATORYDATA ANALYSIS (EDA)

Page 30: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Andrew Ng: “ Advice for applying ML”Where do the errors come from?

Page 31: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Recipe:You know a little about the problem

Semi-supervised learning

Page 32: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

You know a little about the problem:Semi-supervised learning

JOINT semi-supervised learningAndo and Zhang (2005)

Suzuki and Isozaki (2008)Suzuki et al. (2009), etc.

=> effective but task-specific

Page 33: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

You know a little about the problem:Semi-supervised learning

Unsupervised learning,followed by Supervised learning

Page 34: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

34

Supmodel

Supdata

Supervised training

How can Bob improve his model?

Page 35: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

35

Supmodel

Supdata

Supervised training

Semi-suptraining?

Page 36: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

36

Supmodel

Supdata

Supervised training

Semi-suptraining?

Morefeats

Page 37: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

37

Supmodel

Supdata

Morefeats

Supmodel

Supdata

Morefeats

sup task 1

sup task 2

More features can be used on different tasks

Page 38: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

38

Semi-supmodel

Unsupdata

Supdata

Joint semi-sup

(standard semi-sup setup)

Page 39: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

39

Semi-supmodel

Unsupmodel

Unsupdata

Supdata

unsuppretraining

semi-supfine-tuningUnsupervised, then supervised

Page 40: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

40

Unsupmodel

Unsupdata

unsuptraining

unsupfeats

Use unsupervised learning to create new features

Page 41: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

41

Semi-supmodel

Unsupdata

unsuptraining

Sup training

Supdata

unsupfeats

These features can then be shared with other people

Page 42: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

42

Unsupdata

unsuptraining

unsupfeats

sup task 1 sup task 2 sup task 3

Page 43: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Recipe:You know almost nothing

about the problem

Build cool generic features

Page 44: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Know almost nothing about problem:Build cool generic features

Word features(Turian et al., 2010)

http://metaoptimize.com/projects/wordreprs/

Page 45: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

45

Brown clustering(Brown et al. 92)

(image from Terry Koo)

cluster(chairman) = `0010’2-prefix(cluster(chairman)) = `00’

Page 46: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

46

50-dim embeddings: Collobert + Weston (2008)t-SNE vis by

van der Maaten +Hinton (2008)

Page 47: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Know almost nothing about problem:Build cool generic features

Document features:

Document clusteringLSA/LDA

Deep model

Page 48: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Document features

Salakhutdinov + Hinton 06

Page 49: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Domain adaptationfor sentiment analysis

(Glorot et al. 11)

Document features example

Page 50: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Recipe:You know a little about the problem

Make more REAL training examples

Page 51: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Make more real training examplesCuz you have some time

or a small budget

Amazon Mechanical Turk

Page 52: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Snow et al. 08“ Cheap and Fast – But is it Good?”

1K turk labels per dollar

Average over (5) Turks to reduce noise

=> http://crowdflower.com/

Page 53: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Soylent (Bernstein et al. 10)

Find-Fix-Verify:Crowd control design pattern

Soylent, a prototype...Soylent, a prototype...Soylent, a prototype...Soylent, a prototype...

Find a problem

Fix each problem

Verify quality of each fix

Page 54: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Make more real training examples

Active learning

Page 55: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Dualist (Settles 11)http://code.google.com/p/dualist/

Page 56: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Dualist (Settles 11)http://code.google.com/p/dualist/

Applications:Document categorization

WSDInformation Extraction

Twitter sentiment analysis

Page 57: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

You know a little about the problem:Make more training examples

FAKE training examples

Page 58: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

NOISE

Page 59: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

FAKE training examples

Denoising AARBM

Page 60: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

MNIST distortions (LeCun et al. 98)

Page 61: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

No negative examples?

Page 62: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

FAKE training examples

Multi-view / multi-modal

Page 63: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Multi-view / multi-modal

How do you evaluate an IR system, if you have no labels?

See how good the title is at retrieving the body text.

Page 64: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

2) KNOW THE DATA

Page 65: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Know the data

Labelled/structured data:ODP, Freebase, Wikipedia,

Dbpedia, etc.

Page 66: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Know the data

Unlabelled data:WaCKy, ClueWeb09, CommonCrawl,

Ngram corpora

Page 67: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

NgramsGoogleBing

Google BooksRoll your own: Common crawl

Page 68: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Know the data

Do something stupid on a lot of data

Page 69: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Do something stupid on a lot of data:Ngrams

Spell-checkingPhrase segmentation

Word breakingSynonyms

Language modelsSee “An Overview of Microsoft Web N-gram Corpus and Applications” (Wang et al 10)

Page 70: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Do something stupid on a lot of data

Web-scale k-means for NER(Lin and Wu 09)

Page 71: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Do something stupid on a lot of data

Web-scale clustering

Page 72: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Know the data

Multi-modal learning

Page 73: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Multi-modal learningImages and captions

features features

“ facepalm”

=

Page 74: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Multi-modal learningTitles and article body

features features

Article body

=

Title

Page 75: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Multi-modal learningAudio and tags

features features

“ upbeat” ,“ hip hop”

=

Page 76: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

3) IT'S MODELSALL THE WAY DOWN

Page 77: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Break down a pipeline1-best (greedy), k-best,

Finkel et al. 06

Page 78: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Good code to build on

Stanford NLP tools, clustering algorithms, Terry Koo's parser, etc.

Page 79: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Good code to build on

YOUR MODEL

Page 80: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Eat your own dogfoodBootstrapping (Yarowsky 95)

Co-training (Blum+Mitchell 98)EM (Nigam et al., 00)

Self-training (McClosky et al., 06)

Page 81: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Dualist (Settles '11)Active learning + semisup learning

Page 82: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Eat your own dogfood

Cheap bootstrapping:One step of EM

(Settles 11)

“ Awesome! What a great movie!”

Page 83: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

It's models all the way down

Use models to annotate

Low recall + high precision+ lots of data = win

Page 84: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Use models to annotate

Face modeling

Page 85: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Pose-invariant face features

Page 86: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Pose-invariant face features

Page 87: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

It's models all the way down

THE FUTURE?

Joins on large noisy data sets

Page 88: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Joins on large noisy data sets

ReVerb (Fader et al., 11)http://reverb.cs.washington.edu

Extractions over entire ClueWeb09(826 MB compressed)

Page 89: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

ReVerb (Fader et al., 11)

Page 90: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Joins on noisy data sets(can clean up the data??)

???

Page 91: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

The art of predictive analytics:1) Know the data out there2) Know the code out there3) Intuition (bias)

Page 92: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Summary of recipes:Know your problem

Throw in good featuresUse other's good models in yr pipeline

Make more training examplesUse a lot of data

Page 93: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

"It especially annoys me when racists are accused of 'discrimination.'

The ability to discriminate is a precious facility; by judging all members of one

'race' to be the same, the racist precisely shows himself incapable of discrimination."

- Christopher Hitchens (RIP)

Page 94: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Other cool research to look at:* Frustratingly easy domain adaptation (Daume 07)* The Unreasonable Effectiveness of Data(Halevy et al 09)* Web-scale algorithms (search on http://metaoptimize.com/qa/)* Self-taught learning (Raina et al 07)

Page 95: The Art of Predictive Analytics: More Data, Same Models ...files.meetup.com/1542972/20120202-more-data-same-models...2012/02/02  · The Art of Predictive Analytics: More Data, Same

Joseph [email protected]

@turianhttp://metaoptimize.com/qa/

2012.02.02

Please email me any questions