Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10...

72
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Transcript of Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10...

Page 1: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Data Mining TechniquesCS 6220 - Section 2 - Spring 2017

Lecture 10Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Page 2: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Project

Page 3: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

• 3 Feb: Form teams of 2-4 people • 17 Feb: Submit abstract (1 paragraph) • 3 Mar: Submit proposals (2 pages) • 13 Mar: Milestone 1 (exploratory analysis) • 31 Mar: Milestone 2 (statistical analysis) • 16 Apr (Sun): Submit reports (10 pages) • 21 Apr (Fri): Submit peer reviews

Project Deadlines

Page 4: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

• ~10 pages (rough guideline) • Guidelines for contents

• Introduction / Motivation • Exploratory analysis (if applicable) • Data mining analysis • Discussion of results

Project Reports

Page 5: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

• 2 per person (randomly assigned) • Reviews should discuss 4 aspects

of the report • Clarity

(is the writing clear?) • Technical merit

(are methods valid?) • Reproducibility

(is it clear how results were obtained?) • Discussion

(are results interpretable?)

Project Review

Page 6: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Final Exam

Page 7: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Emphasis on post-midterm topics(but some pre-midterm topics included)

Topic Listhttp://www.ccs.neu.edu/home/jwvdm/teaching/cs6220/spring2017/final-topics.html

Page 8: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Recommender Systems

Page 9: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

The Long Tail

(from: https://www.wired.com/2004/10/tail/)

Page 10: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

The Long Tail

(from: https://www.wired.com/2004/10/tail/)

Page 11: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

The Long Tail

(from: https://www.wired.com/2004/10/tail/)

Page 12: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Problem Setting

Page 13: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Problem Setting

Page 14: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Problem Setting

Page 15: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Problem Setting

• Task: Predict user preferences for unseen items

Page 16: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Content-based Filtering

Geared towards females

Geared towards males

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s  11

Sense and Sensibility

Gus

Dave

Latent factor models

Page 17: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Content-based Filtering

Geared towards females

Geared towards males

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s  11

Sense and Sensibility

Gus

Dave

Latent factor models

Idea: Predict rating using item features on a per-user basis

Page 18: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Content-based Filtering

Geared towards females

Geared towards males

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s  11

Sense and Sensibility

Gus

Dave

Latent factor models

Idea: Predict rating using user features on a per-item basis

Page 19: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Collaborative FilteringPart I:

Basic neighborhood methods

Joe

#2

#3

#1

#4

Idea: Predict rating based on similarity to other users

Page 20: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Problem Setting

• Task: Predict user preferences for unseen items • Content-based filtering: Model user/item features • Collaborative filtering: Implicit similarity of users items

Page 21: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Recommender Systems

• Movie recommendation (Netflix)• Related product recommendation (Amazon)• Web page ranking (Google)• Social recommendation (Facebook)• Priority inbox & spam filtering (Google)• Online dating (OK Cupid)• Computational Advertising (Everyone)

Page 22: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Challenges• Scalability

• Millions of objects• 100s of millions of users

• Cold start• Changing user base• Changing inventory

• Imbalanced dataset• User activity / item reviews

power law distributed• Ratings are not missing at random

Page 23: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Running Example: Netflix Datascoredatemovieuser

15/7/02211

58/2/042131

43/6/013452

45/1/051232

37/15/027682

51/22/01763

48/3/00454

19/10/055685

23/5/033425

212/28/002345

58/11/02766

46/15/03566

scoredatemovieuser?1/6/05621

?9/13/04961

?8/18/0572

?11/22/0532

?6/13/02473

?8/12/01153

?9/1/00414

?8/27/05284

?4/4/05935

?7/16/03745

?2/14/04696

?10/3/03836

Training data Test data

Movie rating data

• Released as part of $1M competition by Netflix in 2006 • Prize awarded to BellKor in 2009

Page 24: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Running Yardstick: RMSE

rmse(S) =s|S|�1

X

(i,u)2S

(r̂ui � rui)2

Page 25: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Running Yardstick: RMSE

rmse(S) =s|S|�1

X

(i,u)2S

(r̂ui � rui)2

(doesn’t tell you how to actually do recommendation)

Page 26: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Content-based Filtering

Page 27: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Item-based Features

Page 28: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Item-based Features

Page 29: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Item-based Features

Page 30: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

wu = argminw|ru � X w |2

Per-user Regression

Learn a set of regression coefficients for each user

Page 31: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

User Bias and Item Popularity

Page 32: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Bias

Page 33: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Bias

Moonrise Kingdom 4 5 4 4 0.3 0.2

Page 34: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Bias

Moonrise Kingdom 4 5 4 4 0.3 0.2

Problem: Some movies are universally loved / hated

Page 35: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Bias

Moonrise Kingdom 4 5 4 4 0.3 0.2

Problem: Some movies are universally loved / hated some users are more picky than others

3

3

3

Page 36: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Bias

Solution: Introduce a per-movie and per-user bias

Problem: Some movies are universally loved / hated some users are more picky than others

Moonrise Kingdom 4 5 4 4 0.3 0.2

Page 37: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Collaborative Filtering

Page 38: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Neighborhood Based MethodsPart I:Basic neighborhood methods

Joe

#2

#3

#1

#4

Users and items form a bipartite graph (edges are ratings)

Page 39: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Neighborhood Based Methods(user, user) similarity

• predict rating based on average from k-nearest users

• good if item base is small• good if item base changes rapidly

(item,item) similarity• predict rating based on average

from k-nearest items• good if the user base is small• good if user base changes rapidly

Page 40: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Parzen-Window Style CF

• Define a similarity sij between items• Find set εk(i,u) of k-nearest neighbors

to i that were rated by user u• Predict rating using weighted average over set• How should we define sij?

Page 41: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Pearson Correlation Coefficient

sij =Cov[rui, ruj ]

Std[rui]Std[ruj ]

Estimating item-item similarities

• Common practice – rely on Pearson correlation coeff• Challenge – non-uniform user support of item ratings,

each item rated by a distinct set of users

1 ? ? 5 5 3 ? ? ? 4 2 ? ? ? ? 4 ? 5 4 1 ?

? ? 4 2 5 ? ? 1 2 5 ? ? 2 ? ? 3 ? ? ? 5 4

User ratings for item i:

User ratings for item j:

• Compute correlation over shared support

Page 42: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

(item,item) similarity

⇢̂ij =

Pu2U(i,j)(rui � bui)(ruj � buj)qP

u2U(i,j)(rui � bui)2P

u2U(i,j)(ruj � buj)2

Empirical estimate of Pearson correlation coefficient

sij =|U(i, j)|� 1

|U(i, j)|� 1 + �⇢̂ij

Regularize towards 0 for small support

Regularize towards baseline for small neighborhood

Page 43: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Similarity for binary labels

mi users acting on i

mij users acting on both i and j

m total number of users

sij =mij

↵+mi +mj �mijsij =

observed

expected

⇡ mij

↵+mimj/m

Jaccard similarity Observed / Expected ratio

Pearson correlation not meaningful for binary labels(e.g. Views, Purchases, Clicks)

Page 44: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Matrix Factorization Methods

Page 45: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Matrix Factorization

Moonrise Kingdom 4 5 4 4 0.3 0.2

Page 46: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Matrix Factorization

Moonrise Kingdom 4 5 4 4 0.3 0.2

Idea: pose as (biased) matrix factorization problem

Page 47: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Matrix FactorizationBasic matrix factorization model

45531

312445

53432142

24542

522434

42331

items

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1

~

~

items

users

users

A rank-3 SVD approximation

Page 48: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

PredictionEstimate unknown ratings as inner-products of factors:

45531

312445

53432142

24542

522434

42331

items

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1

~

~

items

users

A rank-3 SVD approximation

users

?

Page 49: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

PredictionEstimate unknown ratings as inner-products of factors:

45531

312445

53432142

24542

522434

42331

items

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1

~

~

items

users

2.4

A rank-3 SVD approximation

users

Page 50: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

SVD with missing valuesMatrix factorization model

45531

312445

53432142

24542

522434

42331

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1~

Properties:• SVD  isn’t  defined  when  entries  are  unknown  Î use

specialized methods• Very powerful model Î can easily overfit, sensitive to

regularization• Probably most popular model among Netflix contestants

– 12/11/2006: Simon Funk describes an SVD based method– 12/29/2006: Free implementation at timelydevelopment.com

Pose as regression problem

Regularize using Frobenius norm

Page 51: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Alternating Least SquaresMatrix factorization model

45531

312445

53432142

24542

522434

42331

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1~

Properties:• SVD  isn’t  defined  when  entries  are  unknown  Î use

specialized methods• Very powerful model Î can easily overfit, sensitive to

regularization• Probably most popular model among Netflix contestants

– 12/11/2006: Simon Funk describes an SVD based method– 12/29/2006: Free implementation at timelydevelopment.com

(regress wu given X)

Page 52: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Alternating Least SquaresMatrix factorization model

45531

312445

53432142

24542

522434

42331

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1~

Properties:• SVD  isn’t  defined  when  entries  are  unknown  Î use

specialized methods• Very powerful model Î can easily overfit, sensitive to

regularization• Probably most popular model among Netflix contestants

– 12/11/2006: Simon Funk describes an SVD based method– 12/29/2006: Free implementation at timelydevelopment.com

(regress wu given X)

Regularization

L2: closed form solution

w = (XT

X+ �I)�1X

T

y

L1: No closed form solution. Use quadraticprogramming:

minimize k Xw � y k2 s.t. k w k1 s

Yijun Zhao Linear Regression

Remember ridge regression?

Page 53: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Alternating Least SquaresMatrix factorization model

45531

312445

53432142

24542

522434

42331

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1~

Properties:• SVD  isn’t  defined  when  entries  are  unknown  Î use

specialized methods• Very powerful model Î can easily overfit, sensitive to

regularization• Probably most popular model among Netflix contestants

– 12/11/2006: Simon Funk describes an SVD based method– 12/29/2006: Free implementation at timelydevelopment.com

(regress xi given W)

(regress wu given X)

Page 54: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Stochastic Gradient DescentMatrix factorization model

45531

312445

53432142

24542

522434

42331

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1~

Properties:• SVD  isn’t  defined  when  entries  are  unknown  Î use

specialized methods• Very powerful model Î can easily overfit, sensitive to

regularization• Probably most popular model among Netflix contestants

– 12/11/2006: Simon Funk describes an SVD based method– 12/29/2006: Free implementation at timelydevelopment.com

• No need for locking• Multicore updates asynchronously

(Recht, Re, Wright, 2012 - Hogwild)

Page 55: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Sampling Bias

Page 56: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Ratings are not given at randomRatings are not given at random!

B.  Marlin  et  al.,  “Collaborative  Filtering  and  the  Missing  at  Random  Assumption”  UAI 2007

Yahoo! survey answersYahoo! music ratingsNetflix ratings

Distribution of ratings

Page 57: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Ratings are not given at random

• A powerful source of information:Characterize users by which movies they rated, rather than how they rated

• Î A dense binary representation of the data:

45531

312445

53432142

24542

522434

42331

users

movies

010100100101

111001001100

011101011011

010010010110

110000111100

010010010101

users

movies

Which movies users rate?

^ ` ,ui u iR r ^ ` ,ui u i

B b rui cui

matrix factorization

regression

data

Page 58: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Temporal Effects

Page 59: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Changes in user behaviorSomething Happened in Early 2004…

2004

Netflix ratings by date

Netflix changedrating labels

Page 60: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Are movies getting better with time?Movies get better with time?

Page 61: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Temporal Effects

Solution: Model temporal effects in bias not weights

Are movies getting better with time?

Page 62: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Netflix Prize

Page 63: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Netflix PrizeTraining data • 100 million ratings, 480,000 users, 17,770 movies • 6 years of data: 2000-2005 Test data • Last few ratings of each user (2.8 million) • Evaluation criterion: Root Mean Square Error (RMSE) Competition • 2,700+ teams• Netflix’s system RMSE: 0.9514• $1 million prize for 10% improvement on Netflix

Page 64: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Improvements40

6090

12818050

100200

50

100200

50

100 200 500

100200 500

50100 200 500 1000 1500

0.875

0.88

0.885

0.89

0.895

0.9

0.905

0.91

10 100 1000 10000 100000

RMSE

Millions of Parameters

Factor models: Error vs. #parameters

NMF

BiasSVD

SVD++

SVD v.2

SVD v.3

SVD v.4

Add biases

Do SGD, but also learn biases μ, bu and bi

Page 65: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Improvements

Account for fact that ratings are not missing at random.

40

6090

12818050

100200

50

100200

50

100 200 500

100200 500

50100 200 500 1000 1500

0.875

0.88

0.885

0.89

0.895

0.9

0.905

0.91

10 100 1000 10000 100000

RMSE

Millions of Parameters

Factor models: Error vs. #parameters

NMF

BiasSVD

SVD++

SVD v.2

SVD v.3

SVD v.4

“who  rated  what”

Page 66: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

40

6090

12818050

100200

50

100200

50

100 200 500

100200 500

50100 200 500 1000 1500

0.875

0.88

0.885

0.89

0.895

0.9

0.905

0.91

10 100 1000 10000 100000

RMSE

Millions of Parameters

Factor models: Error vs. #parameters

NMF

BiasSVD

SVD++

SVD v.2

SVD v.3

SVD v.4temporal effects

Improvements

Page 67: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

40

6090

12818050

100200

50

100200

50

100 200 500

100200 500

50100 200 500 1000 1500

0.875

0.88

0.885

0.89

0.895

0.9

0.905

0.91

10 100 1000 10000 100000

RMSE

Millions of Parameters

Factor models: Error vs. #parameters

NMF

BiasSVD

SVD++

SVD v.2

SVD v.3

SVD v.4temporal effects

Improvements

Still pretty far from 0.8563 grand prize

Page 68: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Winning Solution from BellKor

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 52

Page 69: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Last 30 days

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 53

June 26th submission triggers 30-day “last call”

Page 70: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

BellKor fends off competitors by a hair

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 56

Page 71: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

BellKor fends off competitors by a hair

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 57

Page 72: Data Mining Techniques · Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 10 Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS246)

Ratings aren’t everythingNetflix then Netflix now

• Only simpler submodels (SVD, RBMs) implemented • Ratings eventually proved to be only weakly informative

Netflix now