Download - Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Transcript
Page 1: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Big & Personal: the data and the models behind Netflix recommendations

Page 2: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Outline

1. The Netflix Prize & the Recommendation Problem

2. Anatomy of Netflix Personalization3. Data & Models4. More data or better Models?

Page 3: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain
Page 4: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

What we were interested in:■ High quality recommendations

Proxy question:■ Accuracy in predicted rating ■ Improve by 10% = $1million!

● Top 2 algorithms still in production

Results

SVD

RBM

Page 5: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

What about the final prize ensembles?

■ Our offline studies showed they were too computationally intensive to scale

■ Expected improvement not worth the engineering effort■ Plus…. Focus had already shifted to other issues that

had more impact than rating prediction.

Page 6: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Change of focus

2006 2013

Page 7: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Anatomy of Netflix Personalization

Everything is a Recommendation

Page 8: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Everything is personalized

Note: Recommendations are per household, not individual user

Ranking

Page 9: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Top 10

Personalization awareness

Diversity

DadAll SonDaughterDad&Mom MomAll Daughter MomAll?

Page 10: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Support for Recommendations

Social Support

Page 11: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Social Recommendations

Page 12: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Genre rows

■ Personalized genre rows focus on user interest■ Also provide context and “evidence”■ Important for member satisfaction – moving personalized

rows to top on devices increased retention■ How are they generated?

■ Implicit: based on user’s recent plays, ratings, & other interactions

■ Explicit taste preferences ■ Hybrid:combine the above■ Also take into account:■ Freshness - has this been shown before?■ Diversity– avoid repeating tags and genres, limit number

of TV genres, etc.

Page 13: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Genres - personalization

Page 14: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

■ Displayed in many different contexts■ In response to

user actions/context (search, queue add…)

■ More like… rows

Similars

Page 15: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Data&

Models

Page 16: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Big Data @Netflix ■ Almost 40M subscribers■ Ratings: 4M/day■ Searches: 3M/day■ Plays: 30M/day■ 2B hours streamed in Q4

2011■ 1B hours in June 2012■ > 4B hours in Q1 2013

Member Behavior

Geo-informationTime

Impressions

Device Info

Metadata

Social

Page 17: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Smart Models■ Logistic/linear regression■ Elastic nets■ SVD and other MF models■ Factorization Machines■ Restricted Boltzmann Machines■ Markov Chains■ Different clustering approaches■ LDA■ Association Rules■ Gradient Boosted Decision

Trees/Random Forests■ …

Page 18: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

SVD

X[n x m] = U[n x r] S [ r x r] (V[m x r])T

■ X: m x n matrix (e.g., m users, n videos)■ U: m x r matrix (m users, r factors)■ S: r x r diagonal matrix (strength of each ‘factor’) (r: rank of the matrix)■ V: r x n matrix (n videos, r factor)

Page 19: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

SVD for Rating Prediction

■ User factor vectors and item-factors vector■ Baseline (bias) (user & item deviation from average)■ Predict rating as■ SVD++ (Koren et. Al) asymmetric variation w. implicit feedback

■ Where ■ are three item factor vectors■ Users are not parametrized, but rather represented by:

■ R(u): items rated by user u■ N(u): items for which the user has given implicit preference (e.g. rated vs. not

rated)

Page 20: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Simon Funk’s SVD

■ One of the most interesting findings during the Netflix Prize came out of a blog post

■ Incremental, iterative, and approximate way to compute the SVD using gradient descent

Page 21: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Restricted Boltzmann Machines

■ Restrict the connectivity in ANN to make learning easier.■ Only one layer of hidden units.

■ Although multiple layers are possible

■ No connections between hidden units.■ Hidden units are independent given the visible

states.. ■ RBMs can be stacked to form Deep Belief

Networks (DBN) – 4th generation of ANNs

hidden

i

j

visible

Page 22: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

RBM for the Netflix Prize

Page 23: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Ranking Key algorithm, sorts titles in most contexts

Page 24: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Ranking■ Ranking = Scoring + Sorting + Filtering

bags of movies for presentation to a user■ Goal: Find the best possible ordering of a

set of videos for a user within a specific context in real-time

■ Objective: maximize consumption■ Aspirations: Played & “enjoyed” titles have

best score■ Akin to CTR forecast for ads/search results

■ Factors■ Accuracy■ Novelty■ Diversity■ Freshness■ Scalability■ …

Page 25: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Example: Two features, linear model

Page 26: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Example: Two features, linear model

Page 27: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Ranking

Page 28: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Ranking

Page 29: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Ranking

Novelty

Diversity

Freshness

AccuracyScalability

Page 30: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Learning to rank

■ Machine learning problem: goal is to construct ranking model from training data

■ Training data can have partial order or binary judgments (relevant/not relevant).

■ Resulting order of the items typically induced from a numerical score

■ Learning to rank is a key element for personalization■ You can treat the problem as a standard supervised

classification problem

Page 31: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Learning to Rank Approaches

1. Pointwise■ Ranking function minimizes loss function defined on individual

relevance judgment ■ Ranking score based on regression or classification■ Ordinal regression, Logistic regression, SVM, GBDT, …

2. Pairwise■ Loss function is defined on pair-wise preferences■ Goal: minimize number of inversions in ranking■ Ranking problem is then transformed into the binary classification

problem■ RankSVM, RankBoost, RankNet, FRank…

Page 32: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Learning to rank - metrics

■ Quality of ranking measured using metrics as ■ Normalized Discounted Cumulative Gain■ Mean Reciprocal Rank (MRR)■ Fraction of Concordant Pairs (FCP)■ Others…

■ But, it is hard to optimize machine-learned models directly on these measures (they are not differentiable)

■ Recent research on models that directly optimize ranking measures

Page 33: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Learning to Rank Approaches

3. Listwisea. Indirect Loss Function

■ RankCosine: similarity between ranking list and ground truth as loss function■ ListNet: KL-divergence as loss function by defining a probability distribution■ Problem: optimization of listwise loss function may not optimize IR metrics

b. Directly optimizing IR measures (difficult since they are not differentiable)■ Directly optimize IR measures through Genetic Programming or Simulated

Annealing■ Gradient descent on smoothed version of objective function (e.g. CLiMF at

Recsys 2012 or TFMAP at SIGIR 2012)■ SVM-MAP relaxes the MAP metric by adding it to the SVM constraints■ AdaRank uses boosting to optimize NDCG

Page 34: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Other research questions we are interested on

● Row selection○ How to select and rank lists of “related” items imposing inter-

group diversity, avoiding duplicates...● Diversity

○ Can we increase diversity while preserving relevance in a way that we optimize user response?

● Similarity○ How to compute optimal and personalized similarity between

items by using different data that can range from play histories to item metadata

● Context-aware recommendations● Mood and session intent inference● ...

Page 35: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

More data or better models?

Page 36: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

More data or better models?

Really?

Anand Rajaraman: Stanford & Senior VP at Walmart Global eCommerce (former Kosmix)

Page 37: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Sometimes, it’s not about more data

More data or better models?

Page 38: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

[Banko and Brill, 2001]

Norvig: “Google does not have better Algorithms, only more Data”

Many features/ low-bias models

More data or better models?

Page 39: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

More data or better models?

Sometimes, it’s not about more data

Page 40: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

XMore data or better models?

Page 41: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Data without a sound approach = noise

Page 42: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Conclusions

Page 43: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

The Personalization Problem■ The Netflix Prize simplified the recommendation problem

to predicting ratings■ But…

■ User ratings are only one of the many data inputs we have■ Rating predictions are only part of our solution

■ Other algorithms such as ranking or similarity are very important■ We can reformulate the recommendation problem

■ Function to optimize: probability a user chooses something and enjoys it enough to come back to the service

Page 44: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

More data + Better models +

More accurate metrics + Better approaches & architectures

Lots of room for improvement!

Page 45: Big & Personal: the data and the models behind Netflix recommendations by Xavier Amatriain

Thanks!

Xavier Amatriain (@xamat)[email protected]

We’re hiring!