Nicholas Ruozzi University of Texas at Dallas

28
Collaborative Filtering Nicholas Ruozzi University of Texas at Dallas based on the slides of Alex Smola & Narges Razavian

Transcript of Nicholas Ruozzi University of Texas at Dallas

Page 1: Nicholas Ruozzi University of Texas at Dallas

Collaborative Filtering

Nicholas RuozziUniversity of Texas at Dallas

based on the slides of Alex Smola & Narges Razavian

Page 2: Nicholas Ruozzi University of Texas at Dallas

Collaborative Filtering

β€’ Combining information among collaborating entities to make recommendations and predictions

β€’ Can be viewed as a supervised learning problem (with some caveats)

β€’ Because of its many, many applications, it gets a special name

2

Page 3: Nicholas Ruozzi University of Texas at Dallas

Examples

β€’ Movie/TV recommendation (Netflix, Hulu, iTunes)

β€’ Product recommendation (Amazon)

β€’ Social recommendation (Facebook)

β€’ News content recommendation (Yahoo)

β€’ Priority inbox & spam filtering (Google)

β€’ Online dating (OK Cupid)

3

Page 4: Nicholas Ruozzi University of Texas at Dallas

Netflix Movie Recommendation

user movie rating

1 14 3

1 200 4

1 315 1

2 15 5

2 136 1

3 235 3

4 79 3

4

user movie rating

1 50 ?

1 28 ?

2 94 ?

2 32 ?

3 11 ?

4 99 ?

4 54 ?

Training Data Test Data

Page 5: Nicholas Ruozzi University of Texas at Dallas

Recommender Systems

β€’ Content-based recommendations

β€’ Recommendations based on a user profile (specific interests) or previously consumed content

β€’ Collaborative filtering

β€’ Recommendations based on the content preferences of similar users

β€’ Hybrid approaches

5

Page 6: Nicholas Ruozzi University of Texas at Dallas

Collaborative Filtering

β€’ Widely-used recommendation approaches:

β€’ π‘˜π‘˜-nearest neighbor methods

β€’ Matrix factorization based methods

β€’ Predict the utility of items for a user based on the items previously rated by other like-minded users

6

Page 7: Nicholas Ruozzi University of Texas at Dallas

Collaborative Filtering

β€’ Make recommendations based on user/item similarities

β€’ User similarity

β€’ Works well if number of items is much smaller than the number of users

β€’ Works well if the items change frequently

β€’ Item similarity (recommend new items that were also liked by the same users)

β€’ Works well if the number of users is small

7

Page 8: Nicholas Ruozzi University of Texas at Dallas

π‘˜π‘˜-Nearest Neighbor

β€’ Create a similarity matrix for pairs of users (or items)

β€’ Use π‘˜π‘˜-NN to find the π‘˜π‘˜ closest users to a target user

β€’ Use the ratings of the π‘˜π‘˜ nearest neighbors to make predictions

8

Page 9: Nicholas Ruozzi University of Texas at Dallas

User-User Similarity

β€’ Let π‘Ÿπ‘Ÿπ‘’π‘’,𝑖𝑖 be the rating of the 𝑖𝑖𝑑𝑑𝑑 item under user 𝑒𝑒, οΏ½π‘Ÿπ‘Ÿπ‘’π‘’ be the average rating of user 𝑒𝑒, and 𝑐𝑐𝑐𝑐𝑐𝑐(𝑒𝑒, 𝑣𝑣) be the set of items rated by both user 𝑒𝑒 and user 𝑣𝑣

β€’ One notion of similarity between user 𝑒𝑒 and user 𝑣𝑣 is given by Pearson’s correlation coefficient

𝑠𝑠𝑖𝑖𝑐𝑐 𝑒𝑒, 𝑣𝑣 =βˆ‘π‘–π‘–βˆˆπ‘π‘π‘π‘π‘π‘ 𝑒𝑒,𝑣𝑣 π‘Ÿπ‘Ÿπ‘’π‘’,𝑖𝑖 βˆ’ οΏ½π‘Ÿπ‘Ÿπ‘’π‘’ π‘Ÿπ‘Ÿπ‘£π‘£,𝑖𝑖 βˆ’ οΏ½π‘Ÿπ‘Ÿπ‘£π‘£

βˆ‘π‘–π‘–βˆˆπ‘π‘π‘π‘π‘π‘ 𝑒𝑒,𝑣𝑣 π‘Ÿπ‘Ÿπ‘’π‘’,𝑖𝑖 βˆ’ οΏ½π‘Ÿπ‘Ÿπ‘’π‘’2 βˆ‘π‘–π‘–βˆˆπ‘π‘π‘π‘π‘π‘ 𝑒𝑒,𝑣𝑣 π‘Ÿπ‘Ÿπ‘£π‘£,𝑖𝑖 βˆ’ οΏ½π‘Ÿπ‘Ÿπ‘£π‘£

2

9

Page 10: Nicholas Ruozzi University of Texas at Dallas

User-User Similarity

β€’ Let π‘Ÿπ‘Ÿπ‘’π‘’,𝑖𝑖 be the rating of the 𝑖𝑖𝑑𝑑𝑑 item under user 𝑒𝑒, οΏ½π‘Ÿπ‘Ÿπ‘’π‘’ be the average rating of user 𝑒𝑒, and 𝑐𝑐𝑐𝑐𝑐𝑐(𝑒𝑒, 𝑣𝑣) be the set of items rated by both user 𝑒𝑒 and user 𝑣𝑣

β€’ One notion of similarity between user 𝑒𝑒 and user 𝑣𝑣 is given by Pearson’s correlation coefficient

𝑠𝑠𝑖𝑖𝑐𝑐 𝑒𝑒, 𝑣𝑣 =βˆ‘π‘–π‘–βˆˆπ‘π‘π‘π‘π‘π‘ 𝑒𝑒,𝑣𝑣 π‘Ÿπ‘Ÿπ‘’π‘’,𝑖𝑖 βˆ’ οΏ½π‘Ÿπ‘Ÿπ‘’π‘’ π‘Ÿπ‘Ÿπ‘£π‘£,𝑖𝑖 βˆ’ οΏ½π‘Ÿπ‘Ÿπ‘£π‘£

βˆ‘π‘–π‘–βˆˆπ‘π‘π‘π‘π‘π‘ 𝑒𝑒,𝑣𝑣 π‘Ÿπ‘Ÿπ‘’π‘’,𝑖𝑖 βˆ’ οΏ½π‘Ÿπ‘Ÿπ‘’π‘’2 βˆ‘π‘–π‘–βˆˆπ‘π‘π‘π‘π‘π‘ 𝑒𝑒,𝑣𝑣 π‘Ÿπ‘Ÿπ‘£π‘£,𝑖𝑖 βˆ’ οΏ½π‘Ÿπ‘Ÿπ‘£π‘£

2

10

Empirical covariance of ratings

Page 11: Nicholas Ruozzi University of Texas at Dallas

User-User Similarity

β€’ Let π‘Ÿπ‘Ÿπ‘’π‘’,𝑖𝑖 be the rating of the 𝑖𝑖𝑑𝑑𝑑 item under user 𝑒𝑒, οΏ½π‘Ÿπ‘Ÿπ‘’π‘’ be the average rating of user 𝑒𝑒, and 𝑐𝑐𝑐𝑐𝑐𝑐(𝑒𝑒, 𝑣𝑣) be the set of items rated by both user 𝑒𝑒 and user 𝑣𝑣

β€’ One notion of similarity between user 𝑒𝑒 and user 𝑣𝑣 is given by Pearson’s correlation coefficient

𝑠𝑠𝑖𝑖𝑐𝑐 𝑒𝑒, 𝑣𝑣 =βˆ‘π‘–π‘–βˆˆπ‘π‘π‘π‘π‘π‘ 𝑒𝑒,𝑣𝑣 π‘Ÿπ‘Ÿπ‘’π‘’,𝑖𝑖 βˆ’ οΏ½π‘Ÿπ‘Ÿπ‘’π‘’ π‘Ÿπ‘Ÿπ‘£π‘£,𝑖𝑖 βˆ’ οΏ½π‘Ÿπ‘Ÿπ‘£π‘£

βˆ‘π‘–π‘–βˆˆπ‘π‘π‘π‘π‘π‘ 𝑒𝑒,𝑣𝑣 π‘Ÿπ‘Ÿπ‘’π‘’,𝑖𝑖 βˆ’ οΏ½π‘Ÿπ‘Ÿπ‘’π‘’2 βˆ‘π‘–π‘–βˆˆπ‘π‘π‘π‘π‘π‘ 𝑒𝑒,𝑣𝑣 π‘Ÿπ‘Ÿπ‘£π‘£,𝑖𝑖 βˆ’ οΏ½π‘Ÿπ‘Ÿπ‘£π‘£

2

11

Empirical standard deviation of user u’s ratings for common items

Page 12: Nicholas Ruozzi University of Texas at Dallas

User-User Similarity

β€’ Let 𝑛𝑛𝑛𝑛(𝑒𝑒) denote the set of π‘˜π‘˜-NN to 𝑒𝑒

β€’ 𝑝𝑝𝑒𝑒,𝑖𝑖, the predicted rating for the 𝑖𝑖𝑑𝑑𝑑 item of user 𝑒𝑒, is given by

𝑝𝑝𝑒𝑒,𝑖𝑖 = οΏ½π‘Ÿπ‘Ÿπ‘’π‘’ +βˆ‘π‘£π‘£βˆˆπ‘›π‘›π‘›π‘›(𝑒𝑒) 𝑠𝑠𝑖𝑖𝑐𝑐 𝑒𝑒, 𝑣𝑣 β‹… π‘Ÿπ‘Ÿπ‘£π‘£,𝑖𝑖 βˆ’ οΏ½π‘Ÿπ‘Ÿπ‘£π‘£

βˆ‘π‘£π‘£βˆˆπ‘›π‘›π‘›π‘›(𝑒𝑒) 𝑠𝑠𝑖𝑖𝑐𝑐(𝑒𝑒, 𝑣𝑣)

β€’ This is the average rating of user 𝑒𝑒 plus the weighted average of the ratings of 𝑒𝑒’s π‘˜π‘˜ nearest neighbors

12

Page 13: Nicholas Ruozzi University of Texas at Dallas

User-User Similarity

β€’ Issue: could be expensive to find the π‘˜π‘˜-NN if the number of users is very large

β€’ Possible solutions?

13

Page 14: Nicholas Ruozzi University of Texas at Dallas

Item-Item Similarity

β€’ Use Pearson’s correlation coefficient to compute the similarity between pairs of items

β€’ Let 𝑐𝑐𝑐𝑐𝑐𝑐(𝑖𝑖, 𝑗𝑗) be the set of users common to items 𝑖𝑖 and 𝑗𝑗

β€’ The similarity between items 𝑖𝑖 and 𝑗𝑗 is given by

𝑠𝑠𝑖𝑖𝑐𝑐 𝑖𝑖, 𝑗𝑗 =βˆ‘π‘’π‘’βˆˆπ‘π‘π‘π‘π‘π‘ 𝑖𝑖,𝑗𝑗 π‘Ÿπ‘Ÿπ‘’π‘’,𝑖𝑖 βˆ’ οΏ½π‘Ÿπ‘Ÿπ‘’π‘’ π‘Ÿπ‘Ÿπ‘’π‘’,𝑗𝑗 βˆ’ οΏ½π‘Ÿπ‘Ÿπ‘’π‘’

βˆ‘π‘’π‘’βˆˆπ‘π‘π‘π‘π‘π‘ 𝑖𝑖,𝑗𝑗 π‘Ÿπ‘Ÿπ‘’π‘’,𝑖𝑖 βˆ’ οΏ½π‘Ÿπ‘Ÿπ‘’π‘’2 βˆ‘π‘’π‘’βˆˆπ‘π‘π‘π‘π‘π‘ 𝑖𝑖,𝑗𝑗 π‘Ÿπ‘Ÿπ‘’π‘’,𝑗𝑗 βˆ’ οΏ½π‘Ÿπ‘Ÿπ‘’π‘’

2

14

Page 15: Nicholas Ruozzi University of Texas at Dallas

Item-Item Similarity

β€’ Let 𝑛𝑛𝑛𝑛(𝑖𝑖) denote the set of π‘˜π‘˜-NN to 𝑖𝑖

β€’ 𝑝𝑝𝑒𝑒,𝑖𝑖, the predicted rating for the 𝑖𝑖𝑑𝑑𝑑 item of user 𝑒𝑒, is given by

𝑝𝑝𝑒𝑒,𝑖𝑖 =βˆ‘π‘—π‘—βˆˆπ‘›π‘›π‘›π‘›(𝑖𝑖) 𝑠𝑠𝑖𝑖𝑐𝑐 𝑖𝑖, 𝑗𝑗 β‹… π‘Ÿπ‘Ÿπ‘’π‘’,𝑗𝑗

βˆ‘π‘—π‘—βˆˆπ‘›π‘›π‘›π‘›(𝑖𝑖) 𝑠𝑠𝑖𝑖𝑐𝑐(𝑖𝑖, 𝑗𝑗)

β€’ This is the weighted average of the ratings of 𝑖𝑖’s π‘˜π‘˜ nearest neighbors

15

Page 16: Nicholas Ruozzi University of Texas at Dallas

π‘˜π‘˜-Nearest Neighbor

β€’ Easy to train

β€’ Easily adapts to new users/items

β€’ Can be difficult to scale (finding closest pairs requires forming the similarity matrix)

β€’ Less of a problem for item-item assuming number of items is much smaller than the number of users

β€’ Not sure how to choose π‘˜π‘˜

β€’ Can lead to poor accuracy

16

Page 17: Nicholas Ruozzi University of Texas at Dallas

π‘˜π‘˜-Nearest Neighbor

β€’ Tough to use without any ratings information to start with

β€’ β€œCold Start”

β€’ New users should rate some initial items to have personalized recommendations

– Could also have new users describe tastes, etc.

β€’ New Item/Movie may require content analysis or a non-CF based approach

17

Page 18: Nicholas Ruozzi University of Texas at Dallas

Matrix Factorization

β€’ There could be a number of latent factors that affect the recommendation

β€’ Style of movie: serious vs. funny vs. escapist

β€’ Demographic: is it preferred more by men or women

β€’ Alternative approach: view CF as a matrix factorization problem

18

Page 19: Nicholas Ruozzi University of Texas at Dallas

Matrix Factorization

β€’ Express a matrix 𝑀𝑀 ∈ ℝ𝑐𝑐×𝑛𝑛 approximately as a product of factors 𝐴𝐴 ∈ ℝ𝑐𝑐×𝑝𝑝 and 𝐡𝐡 ∈ ℝ𝑝𝑝×𝑛𝑛

𝑀𝑀~𝐴𝐴 β‹… 𝐡𝐡

β€’ Approximate the user Γ— items matrix as a product of matrices in this way

β€’ Similar to SVD decompositions that we saw earlier (SVD can’t be used for a matrix with missing entries)

β€’ Think of the entries of 𝑀𝑀 as corresponding to an inner product of latent factors

19

Page 20: Nicholas Ruozzi University of Texas at Dallas

Matrix Factorization

20 [from slides of Alex Smola]

Page 21: Nicholas Ruozzi University of Texas at Dallas

Matrix Factorization

21 [from slides of Alex Smola]

Page 22: Nicholas Ruozzi University of Texas at Dallas

Matrix Factorization

22 [from slides of Alex Smola]

Page 23: Nicholas Ruozzi University of Texas at Dallas

Matrix Factorization

β€’ We can express finding the β€œclosest” matrix as an optimization problem

min𝐴𝐴,𝐡𝐡

�𝑒𝑒,𝑖𝑖 π‘π‘π‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘£π‘£π‘œπ‘œπ‘œπ‘œ

𝑀𝑀𝑒𝑒,𝑖𝑖 βˆ’ 𝐴𝐴𝑒𝑒,:,𝐡𝐡:,𝑖𝑖2 + πœ†πœ† 𝐴𝐴 𝐹𝐹

2 + 𝐡𝐡 𝐹𝐹2

23

Page 24: Nicholas Ruozzi University of Texas at Dallas

Matrix Factorization

β€’ We can express finding the β€œclosest” matrix as an optimization problem

min𝐴𝐴,𝐡𝐡

�𝑒𝑒,𝑖𝑖 π‘π‘π‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘£π‘£π‘œπ‘œπ‘œπ‘œ

𝑀𝑀𝑒𝑒,𝑖𝑖 βˆ’ 𝐴𝐴𝑒𝑒,:,𝐡𝐡:,𝑖𝑖2 + πœ†πœ† 𝐴𝐴 𝐹𝐹

2 + 𝐡𝐡 𝐹𝐹2

24

Computes the error in the approximation

of the observed matrix entries

Page 25: Nicholas Ruozzi University of Texas at Dallas

Matrix Factorization

β€’ We can express finding the β€œclosest” matrix as an optimization problem

min𝐴𝐴,𝐡𝐡

�𝑒𝑒,𝑖𝑖 π‘π‘π‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘£π‘£π‘œπ‘œπ‘œπ‘œ

𝑀𝑀𝑒𝑒,𝑖𝑖 βˆ’ 𝐴𝐴𝑒𝑒,:,𝐡𝐡:,𝑖𝑖2 + πœ†πœ† 𝐴𝐴 𝐹𝐹

2 + 𝐡𝐡 𝐹𝐹2

25

Regularization preferences matrices with small Frobenius

norm

Page 26: Nicholas Ruozzi University of Texas at Dallas

Matrix Factorization

β€’ We can express finding the β€œclosest” matrix as an optimization problem

min𝐴𝐴,𝐡𝐡

�𝑒𝑒,𝑖𝑖 π‘π‘π‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘£π‘£π‘œπ‘œπ‘œπ‘œ

𝑀𝑀𝑒𝑒,𝑖𝑖 βˆ’ 𝐴𝐴𝑒𝑒,:,𝐡𝐡:,𝑖𝑖2 + πœ†πœ† 𝐴𝐴 𝐹𝐹

2 + 𝐡𝐡 𝐹𝐹2

β€’ How to optimize this objective?

26

Page 27: Nicholas Ruozzi University of Texas at Dallas

Matrix Factorization

β€’ We can express finding the β€œclosest” matrix as an optimization problem

min𝐴𝐴,𝐡𝐡

�𝑒𝑒,𝑖𝑖 π‘π‘π‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘£π‘£π‘œπ‘œπ‘œπ‘œ

𝑀𝑀𝑒𝑒,𝑖𝑖 βˆ’ 𝐴𝐴𝑒𝑒,:,𝐡𝐡:,𝑖𝑖2 + πœ†πœ† 𝐴𝐴 𝐹𝐹

2 + 𝐡𝐡 𝐹𝐹2

β€’ How to optimize this objective?

β€’ (Stochastic) gradient descent!

27

Page 28: Nicholas Ruozzi University of Texas at Dallas

Extensions

β€’ The basic matrix factorization approach doesn’t take into account the observation that some people are tougher reviewers than others and that some movies are over-hyped

β€’ Can correct for this by introducing a bias term for each user and a global bias

min𝐴𝐴,𝐡𝐡,πœ‡πœ‡,π‘œπ‘œ

�𝑒𝑒,𝑖𝑖 π‘π‘π‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘£π‘£π‘œπ‘œπ‘œπ‘œ

𝑀𝑀𝑒𝑒,𝑖𝑖 βˆ’ πœ‡πœ‡ βˆ’ 𝑏𝑏𝑖𝑖 βˆ’ 𝑏𝑏𝑒𝑒 βˆ’ 𝐴𝐴𝑒𝑒,:,𝐡𝐡:,𝑖𝑖2

+πœ†πœ† 𝐴𝐴 𝐹𝐹2 + 𝐡𝐡 𝐹𝐹

2 + 𝜈𝜈 �𝑖𝑖

𝑏𝑏𝑖𝑖2 + �𝑒𝑒

𝑏𝑏𝑒𝑒2

28