Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems:...

48
Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 1 CSE 6240: Web Search and Text Mining. Spring 2020 Recommendation Systems: Part II Prof. Srijan Kumar http://cc.gatech.edu/~srijan

Transcript of Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems:...

Page 1: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 1

CSE 6240: Web Search and Text Mining. Spring 2020

Recommendation Systems: Part II

Prof. Srijan Kumarhttp://cc.gatech.edu/~srijan

Page 2: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 2

Announcements• Project:

– Final report rubric: released – Final presentation: details forthcoming

Page 3: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 3

Recommendation Systems• Content-based• Collaborative Filtering• Latent Factor Models • Case Study: Netflix Challenge• Deep Recommender Systems

Slide reference: Mining Massive Dataset http://mmds.org/

Page 4: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 4

Latent Factor Models• These models learn latent factors to

represent users and items from the rating matrix– Latent factors are not directly observable – These are derived from the data

• Recall: Network embeddings• Methods:

– Singular value decomposition (SVD)– Principal Component Analysis (PCA)– Eigendecompositon

Page 5: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 5

Latent Factors: Example• Embedding axes are a type of latent factors• In a user-movie rating matrix: • Movie latent factors can represent axes:

– Comedy vs drama– Degree of action– Appropriateness to children

• User latent factors will measure a user’s affinity towards corresponding movie factors

Page 6: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Geared towards females

Geared towards males

Serious

Funny

Latent Factors: Example

6

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

Amadeus

Dumb and Dumber

Ocean’s 11

Sense and Sensibility

Factor 1

Factor 2

Page 7: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 7

SVD

• SVD: SVD decomposes an input matrix into multiple factor matrices– A = U S VT

– Where,– A: Input data matrix– U: Left singular vecs– V: Right singular vecs– S: Singular values

Am

n

Sm

n

VT»U

Page 8: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 8

SVD• SVD gives minimum reconstruction error

(Sum of Squared Errors):

min$,&,'

( 𝐴*+ − 𝑈Σ𝑉0 *+1

*+∈4• SSE and RMSE are monotonically related:

– 𝑅𝑀𝑆𝐸 = ;<𝑆𝑆𝐸� è SVD is minimizing RMSE

• Complication: The sum in SVD error term is over all entries. But our R has missing entries.– Solution: no-rating in interpreted as zero-rating.

Page 9: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 9

SVD on Rating Matrix

• “SVD” on rating data: R ≈ Q · PT

• Each row of Q represents an item• Each column of P represents a user

45531

312445

53432142

24542

522434

42331

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1≈

users

item

s

PT

Q

item

s

users

R

factors

factors

Page 10: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 10

Ratings as Products of Factors• How to estimate the missing rating of

user x for item i?45531

312445

53432142

24542

522434

42331

item

s

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1

item

s

users

users

?

PT

𝒓>𝒙𝒊 = 𝒒𝒊 ⋅ 𝒑𝒙=(𝒒𝒊𝒇 ⋅ 𝒑𝒙𝒇

𝒇

qi = row i of Qpx = column x of PT

fact

ors

Qfactors

Page 11: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 11

Ratings as Products of Factors• How to estimate the missing rating of

user x for item i?45531

312445

53432142

24542

522434

42331

item

s

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1

item

s

users

users

?

PT

fact

ors

Qfactors

𝒓>𝒙𝒊 = 𝒒𝒊 ⋅ 𝒑𝒙=(𝒒𝒊𝒇 ⋅ 𝒑𝒙𝒇

𝒇

qi = row i of Qpx = column x of PT

Page 12: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 12

Ratings as Products of Factors• How to estimate the missing rating of

user x for item i?45531

312445

53432142

24542

522434

42331

item

s

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1

item

s

users

users

?

QPT

2.4

fact

ors

factors

𝒓>𝒙𝒊 = 𝒒𝒊 ⋅ 𝒑𝒙=(𝒒𝒊𝒇 ⋅ 𝒑𝒙𝒇

𝒇

qi = row i of Qpx = column x of PT

Page 13: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Geared towards females

Geared towards males

Serious

Funny

Latent Factor Models: Example

13

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

Amadeus

Dumb and Dumber

Ocean’s 11

Sense and Sensibility

Factor 1

Fact

or 2

Movies plotted in two dimensions. Dimensions have meaning.

Page 14: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Geared towards females

Geared towards males

Serious

Funny

Latent Factor Models

14

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

Amadeus

Dumb and Dumber

Ocean’s 11

Sense and Sensibility

Factor 1

Fact

or 2

Users fall in the same space, showing their preferences.

Page 15: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 15

SVD: Problems

• SVD minimizes SSE for training data– Want large k (# of factors) to capture all the

signals– But, error on test data begins to rise for k > 2

• This is a classical example of overfitting:– With too much freedom (too many free

parameters) the model starts fitting noise– Model fits too well the training data and thus not

generalizing well to unseen test data

Page 16: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 16

Preventing Overfitting• To solve overfitting we introduce

regularization:– Allow rich model where there are sufficient data– Shrink aggressively where data are scarce

úû

ùêë

é++- ååå

ii

xx

trainingxixi

QPqppqr 2

22

12

,)(min ll

l1, l2 … user set regularization parameters

“error” “length”

Note: We do not care about the “raw” value of the objective function,but we care in P,Q that achieve the minimum of the objective

Page 17: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Geared towards females

Geared towards males

serious

funny

The Effect of Regularization

17

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11

Sense and Sensibility

Factor 1

Fact

or 2

The PrincessDiaries

Page 18: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 18

Modeling Biases and Interactions

¡ μ =overallmeanrating¡ bx =biasofuserx¡ bi =biasofmoviei

user-movie interactionmovie biasuser bias

User-Movieinteraction¡ Characterizesthematchingbetween

usersandmovies¡ Attractsmostresearchinthefield¡ Benefitsfromalgorithmicand

mathematicalinnovations

Baselinepredictor§ Separatesusersandmovies§ Benefitsfrominsightsintouser’s

behavior§ Amongthemainpractical

contributionsofthecompetition

Page 19: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 19

Baseline Predictor• We have expectations on the rating by

user x of movie i, even without estimating x’s attitude towards movies like i

– Ratingscaleofuserx– Valuesofotherratingsuser

gaverecently(day-specificmood,anchoring,multi-useraccounts)

– (Recent)popularityofmoviei– Selectionbias;relatedto

numberofratingsusergaveonthesameday(“frequency”)

Page 20: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 20

Putting It All Together

• Example:– Mean rating: µ = 3.7– You are a critical reviewer: your ratings are

1 star lower than the mean: bx = -1– Star Wars gets a mean rating of 0.5 higher

than average movie: bi = + 0.5– Predicted rating for you on Star Wars:

= 3.7 - 1 + 0.5 = 3.2

Overallmeanrating

Biasforuserx

Biasformoviei

𝑟F* = 𝜇 +𝑏F+𝑏*+𝑞*⋅ 𝑝FUser-Movieinteraction

Page 21: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 21

Fitting the New Model

• Solve:

• Stochastic gradient decent to find parameters– Note: Both biases bx, bi as well as interactions qi,

px are treated as parameters (we estimate them)

regularization

goodness of fit

l is selected via grid-search on a validation set

( )

÷ø

öçè

æ++++

+++-

åååå

åÎ

ii

xx

xx

ii

Rixxiixxi

PQ

bbpq

pqbbr

24

23

22

21

2

),(,

)(min

llll

µ

Page 22: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 22

Recommendation Systems• Content-based• Collaborative Filtering• Latent Factor Models • Case Study: Netflix Challenge• Deep Recommender Systems

Page 23: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Case Study: Netflix Prize

23

Page 24: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Grand Prize: 0.8563

Netflix: 0.9514

Movie average: 1.0533

User average: 1.0651

Global average: 1.1296

Case Study: The Netflix Prize

Basic Collaborative filtering: 0.94

Latent factors: 0.90

Latent factors + Biases: 0.89

Collaborative filtering++: 0.91

24

Competition• Task: Reduce RMSE• 2,700+ teams• $1 million prize for 10%

improvement on Netflix

Page 25: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 25

Case Study: The Netflix Prize

• Training data– 100 million ratings, 480,000 users, 17,770

movies– 6 years of data: 2000-2005

• Test data– Last few ratings of each user (2.8 million)– Evaluation criterion: Root Mean Square Error

(RMSE) = ;L

∑ �̂�F* − 𝑟F* 1�(*,F)∈L

– Netflix’s system RMSE: 0.9514

Page 26: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 26

BellKor Recommender System• The winner of the Netflix Challenge!• Multi-scale modeling of the data:

Combine top level, “regional”modeling of the data, with a refined, local view:– Global:

• Overall deviations of users/movies– Factorization:

• Addressing “regional” effects– Collaborative filtering:

• Extract local patterns

Global effects

Factorization

Collaborative filtering

Page 27: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 27

Modeling Local & Global Effects• Global:

– Mean movie rating: 3.7 stars– The Sixth Sense is 0.5 stars above avg.– Joe rates 0.2 stars below avg. Þ Baseline estimation: Joe will rate The Sixth Sense 4 stars

• Local neighborhood (CF/NN):– Joe didn’t like related movie Signs– Þ Final estimate:

Joe will rate The Sixth Sense 3.8 stars

Page 28: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 28

Interpolation Weights• So far: 𝑟F*Q = 𝑏F* + ∑ 𝑤*+ 𝑟F+ − 𝑏F+�

+∈S(*;F)– Weights wij derived based

on their role; no use of an arbitrary similarity measure (wij ¹ sij)

– Explicitly account for interrelationships among the neighboring movies

• Next: Latent factor model– Extract “regional” correlations

Global effects

Factorization

CF/NN

Page 29: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 29

Temporal Biases Of Users• Sudden rise in the

average movie rating(early 2004)– Improvements in Netflix– GUI improvements– Meaning of rating changed

• Movie age– Users prefer new movies

without any reasons– Older movies are just

inherently better than newer ones

Y. Koren, Collaborative filtering with temporal dynamics, KDD ’09

Page 30: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 30

Temporal Biases & Factors• Original model:

rxi = µ +bx + bi + qi ·px

• Add time dependence to biases:rxi = µ +bx(t)+ bi(t) +qi · px– Make parameters bx and bi to depend on time– (1) Parameterize time-dependence by linear trends

(2) Each bin corresponds to 10 consecutive weeks

• Add temporal dependence to factors– px(t) = user preference vector on day t

Y. Koren, Collaborative filtering with temporal dynamics, KDD ’09

Page 31: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 31

Adding Temporal Effects

0.875

0.88

0.885

0.89

0.895

0.9

0.905

0.91

0.915

0.92

1 10 100 1000 10000

RM

SE

Millions of parameters

CF (no time bias)Basic Latent FactorsCF (time bias)Latent Factors w/ Biases+ Linear time factors+ Per-day user biases

+ CF

Page 32: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Grand Prize: 0.8563

Netflix: 0.9514

Movie average: 1.0533

User average: 1.0651

Global average: 1.1296

Case Study: The Netflix Prize

Basic Collaborative filtering: 0.94

Latent factors: 0.90

Latent factors + Biases: 0.89

Collaborative filtering++: 0.91

32

Latent factors+Biases+Time: 0.876

• New update: Added time• But still no prize! • What to do next?

Page 33: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 3333

Page 34: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Standing on June 26th 2009

34

June 26th submission triggers 30-day “last call”

Page 35: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 35

The Last 30 Days• Ensemble team formed

– Group of other teams on leaderboard forms a new team– Relies on combining their models– Quickly also get a qualifying score over 10%

• BellKor– Continue to get small improvements in their scores– Realize that they are in direct competition with Ensemble

• Strategy– Both teams carefully monitoring the leaderboard– Only sure way to check for improvement is to submit a

set of predictions• This alerts the other team of your latest score

Page 36: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 36

24 Hours from the Deadline• Submissions limited to 1 a day

– Only 1 final submission could be made in the last 24h• 24 hours before deadline…

– BellKor team member in Austria notices that Ensembleposts a score that is slightly better than BellKor’s

• Frantic last 24 hours for both teams– Much computer time on final optimization– Carefully calibrated to end about an hour before deadline

• Final submissions– BellKor submits a little early (on purpose), 40 mins

before deadline– Ensemble submits their final entry 20 mins later– ….and everyone waits….

Page 37: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

37

Page 38: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

$1M Awarded Sept 21st 2009

38

Page 39: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 39

Recommendation Systems• Content-based• Collaborative Filtering• Latent Factor Models • Case Study: Netflix Challenge• Deep Recommender Systems

Page 40: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 40

Deep Recommender Systems• How can deep learning advance

recommendation systems?• Simple way for content-based models: Use

CNNs, LSTMs for generate image and text features of items

Page 41: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 41

Deep Recommender Systems• But how can DL be used for tasks and

methods at the core of recommendation systems? – For collaborative filtering?– For latent factor models? – For temporal dynamics?– Some new techniques?

Page 42: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 42

Why Deep Learning TechniquesPros: • Capture non-linearity well• Non-manual representation learning• Efficient sequence modeling• Somewhat flexible and easy to retrainCons:• Lack of interpretability• Large data requirements• Extensive hyper-parameter tuning

Page 43: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 43

Applicable DL TechniquesDeep Learning methods: • MLPs and AutoEncoders• CNNs• RNNs• Adversarial Networks• Attention models• Deep reinforcement learning

How to uses these methods to improve recommender systems?

Page 44: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 44

Several Methods• Neural Collaborative Filtering• Recurrent Recommender Systems• LatentCross• Dynamic User Model: JODIE

Page 45: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 45

Neural Collaborative Filtering• Neural extensions of traditional

recommender system• Input: rating matrix, user profile and item

features (optional)– If user/item features are unavailable, we can use

one-hot vectors • Output: User and item embeddings • Traditional matrix factorization is a special

case of NCF • Reference: Neural Collaborative Filtering,

He et al., WWW 2017

Page 46: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 46

NCF Setup• User feature vector: • Item feature vector: • User embedding matrix: U• Item embedding matrix: I• Neural network: f• Neural network parameters: 𝛩• Predicted rating:

Page 47: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 47

NCF Model Architecture• Multiple layers of

fully connected layers form the Neural CF layer.

• Output is a rating score

• Real rating score is rui

Page 48: Recommendation Systems: Part II › ~srijan › teaching › cse6240 › ...Recommendation Systems: Part II Prof. Srijan Kumar ... –Rating scale of user x –Values of other ratings

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 48

NCF model: Loss function• Train on the difference between predicted

rating and the real rating• Use negative sampling to reduce the

negative data points• Loss = cross-entropy loss