Recsys matrix-factorizations
-
Upload
dmitriy-selivanov -
Category
Data & Analytics
-
view
194 -
download
0
Transcript of Recsys matrix-factorizations
Goals
I Personalized offersI recommended items for a customer given history of activities
(transactions, browsing history, favourites)
I Similar itemsI substitutionsI frequently bought togetherI . . .
I Exploration
Live demo
I http://94.204.253.34/reco-playlist/I http://94.204.253.34/reco-similar-artists/
Main approaches
I Content basedI good for cold startI not personalized
I Collaborative filteringI vanilla collaborative fitleringI matrix factorizationsI . . .
I Hybrid and context aware recommender systemsI best of both worlds
Collaborative filteringTrivial algorithm:
1. take cutomers who also bought item i02. check other items they’ve bought - i1, i2, ...3. calculate similarity with other items sim(i0, i1), sim(i0, i2), . . .
I just frequencyI similarity of the descriptionsI correlationI . . .
4. sort by similarity
Cons:
I recommendations are trivial - usually most popular itemsI not personalizedI cold start - how to recommend new items?I need to keep and work on whole matrix
User-based collaborative filtering1. for a user u0 calculate sim(u0,U) and take top K2. aggregate their opinions about items
I weighted sum of their ratings
Cons:
I cold startI nothing to recommend to new/untypical usersI need to keep and work on whole matrix
Item-based collaborative filtering1. for a item i0 calculate sim(i0, I) and take top K2. show most similar items
Cons:
I not personalizedI cold start
Latent methodsI Users can be described by small number of latent factors pukI Items can be described by small number of latent factors qki
Figure 5:
Explicit feedback - rating predictionI ~ 480k users, 18k movies, 100m ratingsI sparsity ~ 90%I goal is to reduce RMSE by 10% - from 0.9514 to 0.8563
RMSE 2 = 1D
∑u,i∈D
(rui − r̂ui)2
SVD
For any matrix X :
X = USV T
I X ∈ Rm∗n
I U,V - columns are orthonormal bases (dot product of any 2columns is zero, unit norm)
I S - matrix with singular values on diagonal
Truncated SVD
Take k largest singular values:
X ≈ UkSkV Tk
Truncated SVD is the best rank k approximation of the matrix X interms of Frobenius norm:
||X − UkSkV Tk ||F
P = Uk√
Sk
Q =√
SkV Tk
Issue with truncated SVD
I Optimal in terms of Frobenius norm - takes into accountzeros in ratings -
RMSE =√√√√ 1
users × items∑
u∈users,i∈items(rui − r̂ui)2
I Overfits data
Our goal is error only in “observed” ratings:
RMSE =√√√√ 1
Observed∑
u,i∈Observed(rui − r̂ui)2
SVD-like matrix factorization
J =∑
u,i∈Observed(rui − pu × qi)2 + λ(||Q||+ ||P||)
Non-convex - hard to optimize, but SGD and ALS works good inpractice
Alternating Least Squares
min∑
i∈Observed(ri − qi × P)2 + λ
u∑j=1
p2j
min∑
u∈Observed(ru − pu × Q)2 + λ
i∑j=1
q2j
Ridge regression: P = (QT Q + λI)−1QT y , Q = (PT P + λI)−1PT y
Types of feedback
Explicit
Ratings, likes/dislikes, purchases
I cleaner dataI smallerI hard to collect
Implicit
Browsing, clicks, purchases, . . .
I dirty dataI larger datasetsI generally gives better results
Implicit feedbackI missed entries in matrix are mix of negative preferences and
positive preferencesI consider them as negative with low confidence
I observed entries are positive preferencesI should have high confidence
Model - “Collaborative Filtering for Implicit Feedback Datasets”
I Preferences
Pij ={1 if Rij > 00 otherwise
I Confidence Cui = 1 + f (Rui)I Objective
J∑
u=user
∑i=item
Cui(Pui − XuYi) + λ(||X ||F + ||Y ||F )
Alternating Least Squares for implicit feedback
For fixed Y :
dL/dxu = −2∑
i=itemcui(pui − xT
u yi)yi + 2λxu =
−2∑
i=itemcui(pui − yT
i xu)yi + 2λxu =
−2Y T Cup(u) + 2Y T CuYxu + 2λxu
I Setting dL/dxu = 0 for optimal solution gives us(Y T CuY + λI)xu = Y T Cup(u)
I xu can be obtained by solving system of linear equations:
xu = solve(Y T CuY + λI,Y T Cup(u))
Alternating Least Squares for implicit feedback
Similarly for fixed X :
I dL/dyi = −2XT C ip(i) + 2XT C iYyi + 2λyiI yi = solve(XT C iX + λI,XT C ip(i))
Another optimization:
I XT C iX = XT X + XT (C i − I)XI Y T CuY = Y T Y + Y T (Cu − I)Y
XT X and Y T Y can be precomputed
EvaluationWe only care about how to produce small number of highlyrelevant items
RMSE in not the best measure!
MAP@K - Mean average precision
AveragePrecision =∑n
k=1(P(k)×rel(k))number of relevant documents
## index relevant precision_at_k## 1: 1 0 0.0000000## 2: 2 0 0.0000000## 3: 3 1 0.3333333## 4: 4 0 0.2500000## 5: 5 0 0.2000000
map@5 = 0.1566667
Evaluation
NDCG@K - Normalized Discounted Cumulative GainIntuition is the same as for MAP@K, but also takes into accountvalue of relevance.
DCGp =p∑
i=1
2reli − 1log2(i + 1)
nDCGp = DCGpIDCGp
IDCGp =|REL|∑i=1
2reli − 1log2(i + 1)
Questions?
I http://dsnotes.com/tags/recommender-systems/I http://94.204.253.34/reco-playlist/I http://94.204.253.34/reco-similar-artists/
Contacts:
I [email protected] https://github.com/dselivanov