Item Recommendation with Variational Autoencoders and ...
Transcript of Item Recommendation with Variational Autoencoders and ...
Item Recommendation with Variational Autoencoders and Heterogenous Priors
Giannis Karamanolakis Kevin Raji Cherian Ananth Ravi Narayan
Jie Yuan Da Tang Tony Jebara
Columbia Columbia Columbia
Columbia Columbia Columbia, Netflix
U × I
Item Recommendation - Collaborative Filtering (CF)
U × I
Item Recommendation - Collaborative Filtering (CF)
? ?
U × I
Item Recommendation - Collaborative Filtering (CF)
? ?
U × I
Item Recommendation - Collaborative Filtering (CF)
?
U × I (U × K) × (K × I)
Item Recommendation - Collaborative Filtering (CF)
Latent Factor Models
U × I (U × K) × (K × I)
Item Recommendation - Collaborative Filtering (CF)
Latent Factor Models
(-) linear limited modeling capacity
U × I (U × K) × (K × I)
Item Recommendation - Collaborative Filtering (CF)
Latent Factor Models
(-) linear limited modeling capacityNon-linear Features
U × I (U × K) × (K × I)
Item Recommendation - Collaborative Filtering (CF)
Latent Factor Models
(-) linear limited modeling capacityNon-linear Features
Neural Networks
U × I (U × K) × (K × I)
Item Recommendation - Collaborative Filtering (CF)
Latent Factor Models
(-) linear limited modeling capacityNon-linear Features
Neural Networks
Variational Autoencoders (VAEs)
“Auto-encoding Variational Bayes” D. P. Kingma, M. Welling, ICLR 2014“Variational Autoencoders for Collaborative Filtering” D. Liang, RG. Krishnan, MD. Hoffman, T. Jebara, WWW 2018
U × I (U × K) × (K × I)
Item Recommendation - Collaborative Filtering (CF)
Latent Factor Models
(-) linear limited modeling capacityNon-linear Features
Neural Networks
Variational Autoencoders (VAEs)
“Variational Autoencoders for Collaborative Filtering” D. Liang, RG. Krishnan, MD. Hoffman, T. Jebara, WWW 2018
generalize linear latent factor modelshave larger modeling capacity(+)
(+)
“Auto-encoding Variational Bayes” D. P. Kingma, M. Welling, ICLR 2014
VAEs for Collaborative Filtering
“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.
0.70
0.00
0.15
0.00
qϕ(zu |xu)
p(zu) ∼ 𝒩(0,𝕀K)-
-
-
0.0
1.0
0.0
1.0
VAEs for Collaborative Filtering
Multinomial Likelihood
“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.
0.70
0.00
0.15
0.00
qϕ(zu |xu)
p(zu) ∼ 𝒩(0,𝕀K)-
-
-
0.0
1.0
0.0
1.0
VAEs for Collaborative Filtering
0.70
0.00
0.15
0.00
Training Objective:
(Negative) reconstruction error
qϕ(zu |xu)
p(zu) ∼ 𝒩(0,𝕀K)
Regularization term (Kullback-Leibler Divergence)
“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.
-
-
-
0.0
1.0
0.0
1.0
VAEs for Collaborative Filtering
0.70
0.00
0.15
0.00
qϕ(zu |xu)
p(zu) ∼ 𝒩(0,𝕀K)
“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.
𝒩(0,𝕀K)
-
-
-
0.0
1.0
0.0
1.0
Training Objective:
(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)
VAEs for Collaborative Filtering
0.70
0.00
0.15
0.00
qϕ(zu |xu)
p(zu) ∼ 𝒩(0,𝕀K)
“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.
My variational posterior:
should be near my prior:
qϕ(zu1 |xu1)
p(zu1) ∼ 𝒩(0,𝕀K)
-
-
-
Training Objective:
(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)
VAEs for Collaborative Filtering
0.70
0.00
0.15
0.00
qϕ(zu |xu)
p(zu) ∼ 𝒩(0,𝕀K)
“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.
My variational posterior:
should be near my prior:
qϕ(zu1 |xu1)
p(zu1) ∼ 𝒩(0,𝕀K)
My variational posterior:
should be near my prior:
qϕ(zu2 |xu2)
p(zu2) ∼ 𝒩(0,𝕀K)-
- 0.70
0.00
0.15
0.00
Training Objective:
(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)
VAEs for Collaborative Filtering
0.70
0.00
0.15
0.00
qϕ(zu |xu)
p(zu) ∼ 𝒩(0,𝕀K)
“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.
My variational posterior:
should be near my prior:
qϕ(zu1 |xu1)
p(zu1) ∼ 𝒩(0,𝕀K)
My variational posterior:
should be near my prior:
0.70
0.00
0.15
0.00
-
-
-
qϕ(zu2 |xu2)
p(zu2) ∼ 𝒩(0,𝕀K)
My variational posterior:
should be near my prior:
qϕ(zu3 |xu3)
p(zu3) ∼ 𝒩(0,𝕀K)
Training Objective:
(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)
VAEs for Collaborative Filtering
0.70
0.00
0.15
0.00
qϕ(zu |xu)
p(zu) ∼ 𝒩(0,𝕀K)
“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.
My variational posterior:
should be near my prior:
qϕ(zu1 |xu1)
p(zu1) ∼ 𝒩(0,𝕀K)
My variational posterior:
should be near my prior:
qϕ(zu2 |xu2)
p(zu2) ∼ 𝒩(0,𝕀K)
My variational posterior:
should be near my prior:
qϕ(zu3 |xu3)
p(zu3) ∼ 𝒩(0,𝕀K)
-
My variational posterior:
should be near my prior:
qϕ(zu4 |xu4)
p(zu4) ∼ 𝒩(0,𝕀K)
Training Objective:
(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)
VAEs for Collaborative Filtering
qϕ(zu |xu)
p(zu) ∼ 𝒩(0,𝕀K)
“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.
My variational posterior:
should be near my prior:
qϕ(zu1 |xu1)
p(zu1) ∼ 𝒩(0,𝕀K)
My variational posterior:
should be near my prior:
qϕ(zu2 |xu2)
p(zu2) ∼ 𝒩(0,𝕀K)
My variational posterior:
should be near my prior:
qϕ(zu3 |xu3)
p(zu3) ∼ 𝒩(0,𝕀K)
-
My variational posterior:
should be near my prior:
qϕ(zu4 |xu4)
p(zu4) ∼ 𝒩(0,𝕀K)
0.70
0.00
0.15
0.00
Training Objective:
(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)
VAEs for Collaborative Filtering
qϕ(zu |xu)
p(zu) ∼ 𝒩(0,𝕀K)
“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.
My variational posterior:
should be near my prior:
qϕ(zu1 |xu1)
p(zu1) ∼ 𝒩(0,𝕀K)
My variational posterior:
should be near my prior:
qϕ(zu2 |xu2)
p(zu2) ∼ 𝒩(0,𝕀K)
My variational posterior:
should be near my prior:
qϕ(zu3 |xu3)
p(zu3) ∼ 𝒩(0,𝕀K)
-
My variational posterior:
should be near my prior:
qϕ(zu4 |xu4)
p(zu4) ∼ 𝒩(0,𝕀K)
0.70
0.00
0.15
0.00
Training Objective:
(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)
VAEs for Collaborative Filtering
qϕ(zu |xu)
p(zu) ∼ 𝒩(0,𝕀K)
“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.
My variational posterior:
should be near my prior:
qϕ(zu1 |xu1)
p(zu1) ∼ 𝒩(0,𝕀K)
My variational posterior:
should be near my prior:
qϕ(zu2 |xu2)
p(zu2) ∼ 𝒩(0,𝕀K)
My variational posterior:
should be near my prior:
qϕ(zu3 |xu3)
p(zu3) ∼ 𝒩(0,𝕀K)
-
My variational posterior:
should be near my prior:
qϕ(zu4 |xu4)
p(zu4) ∼ 𝒩(0,𝕀K)
0.70
0.00
0.15
0.00
?
Training Objective:
(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)
VAEs for Collaborative Filtering
qϕ(zu |xu)
p(zu) ∼ 𝒩(0,𝕀K)
“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.
My variational posterior:
should be near my prior:
qϕ(zu1 |xu1)
p(zu1) ∼ 𝒩(0,𝕀K)
My variational posterior:
should be near my prior:
qϕ(zu2 |xu2)
p(zu2) ∼ 𝒩(0,𝕀K)
My variational posterior:
should be near my prior:
qϕ(zu3 |xu3)
p(zu3) ∼ 𝒩(0,𝕀K)
-
My variational posterior:
should be near my prior:
qϕ(zu4 |xu4)
p(zu4) ∼ 𝒩(0,𝕀K)
0.70
0.00
0.15
0.00
“Posterior-Collapse” Issue
𝒩(0,𝕀K)qϕ(zui |xui)
collapses to
Hoffman et al. 2016 (“ELBO surgery”)
∀ user:
Bowman et al. 2016,
?
Training Objective:
(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)
VAEs for Collaborative Filtering
qϕ(zu |xu)
p(zu) ∼ 𝒩(0,𝕀K)
“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.
My variational posterior:
should be near my prior:
qϕ(zu1 |xu1)
p(zu1) ∼ 𝒩(0,𝕀K)
My variational posterior:
should be near my prior:
qϕ(zu2 |xu2)
p(zu2) ∼ 𝒩(0,𝕀K)
My variational posterior:
should be near my prior:
qϕ(zu3 |xu3)
p(zu3) ∼ 𝒩(0,𝕀K)
-
My variational posterior:
should be near my prior:
qϕ(zu4 |xu4)
p(zu4) ∼ 𝒩(0,𝕀K)
0.70
0.00
0.15
0.00
In this paper
Use heterogenous, user-dependent priors!
𝒩(tu4, 𝕊u4)p(zu4) ∼ 𝒩(0,𝕀K)
Training Objective:
(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)
This work: Item Recommendation with VAEs and Heterogenous Priors
• For each user , we replace by
- Prior parameters encode user preferences
u
- Explicitly encourage user diversity in latent VAE space
Using heterogenous, user-dependent priors
- Prior parameters encode user preferences- Explicitly encourage user diversity in latent VAE space
Using heterogenous, user-dependent priors
• For each user , we replace by
This work: Item Recommendation with VAEs and Heterogenous Priors
u
?How can you encode my preferences?
Have I revealed them?
?
- Prior parameters encode user preferences- Explicitly encourage user diversity in latent VAE space
Using heterogenous, user-dependent priors
• For each user , we replace by
This work: Item Recommendation with VAEs and Heterogenous Priors
u
How can you encode my preferences?
Have I revealed them?
Yes, you have!
By writing reviews…
Item Recommendation with VAEs and Heterogenous Priors
Users reveal their preferences in text reviews
Users reveal their preferences in text reviews
Item Recommendation with VAEs and Heterogenous Priors
Yelp Review
Item Recommendation with VAEs and Heterogenous Priors
Users reveal their preferences in text reviews
Yelp Review
IMDB Review
Item Recommendation with VAEs and Heterogenous Priors
Users reveal their preferences in text reviews
Yelp Review
IMDB Review
likes burgers
likes heist movies
cares aboutportion size
likes suspense
Item Recommendation with VAEs and Heterogenous Priors
Before After
Encoding user preferences from text (2 methods):
• , : functions of the user’s review text
Encoding user preferences from text (2 methods):
- Method 1: Word Embeddings (word2vec)- Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)
Item Recommendation with VAEs and Heterogenous Priors
Before
• , : functions of the user’s review text
After
Encoding user preferences from text:• Method 1: Word Embeddings (word2vec)
Item Recommendation with VAEs and Heterogenous Priors
-dim Word EmbeddingsK Mikolov et. al. 2013: “Efficient estimation of word representations in vector space”
Encoding user preferences from text:
1. Create review embeddings: avg of word embeddings
Item Recommendation with VAEs and Heterogenous Priors
Review Embeddings-dim Word EmbeddingsK
1.
• Method 1: Word Embeddings (word2vec)
Encoding user preferences from text:
1. Create review embeddings: avg of word embeddings
- : avg of review embeddings (written by )u- : diagonal covariance matrix
-dim Word EmbeddingsK
diagonal values : std of review embeddingss1, . . . , sK ∈ ℝ
2. Represent each user: Gaussian distribution
Item Recommendation with VAEs and Heterogenous Priors
Review Embeddings User Distributions
1. 2.
• Method 1: Word Embeddings (word2vec)
Encoding user preferences from text:• Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)
Item Recommendation with VAEs and Heterogenous Priors
David Blei, Andrew Y Ng, Michael I Jordan, 2003: “Latent Dirichlet Allocation”
Topic 1horrorblood
guncrime
Topic 2 Topic K…
loveromance
kisswedding
actionrobbery
killpolice
… … …
Encoding user preferences from text:
1. Train LDA to extract topicsK
Item Recommendation with VAEs and Heterogenous Priors
Topic 1horrorblood
guncrime
Topic 2 Topic K…
loveromance
kisswedding
actionrobbery
killpolice
… … …
• Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)
David Blei, Andrew Y Ng, Michael I Jordan, 2003: “Latent Dirichlet Allocation”
Encoding user preferences from text:
1. Train LDA to extract topics2. For each user :
Ku
Item Recommendation with VAEs and Heterogenous Priors
Topic 1horrorblood
guncrime
Topic 2 Topic K…
loveromance
kisswedding
actionrobbery
killpolice
… … …
• Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)
2a: concatenate all of the user’s reviews in one document
Encoding user preferences from text:
1. Train LDA to extract topics2. For each user :
K
2a: concatenate all of the user’s reviews in one document2b: represent as the distribution over the topics
u
u K
Topi
c 1
…
Topi
c 2
Topi
c K
Item Recommendation with VAEs and Heterogenous Priors
Topic 1horrorblood
guncrime
Topic 2 Topic K…
loveromance
kisswedding
actionrobbery
killpolice
… … …
• Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)
Encoding user preferences from text:
1. Train LDA to extract topics2. For each user :
K
2b: represent as the distribution over the topics
u
u
Topi
c 1
…
Topi
c 2
Topi
c K
Topic 1horrorblood
guncrime
Topic 2 Topic K…
loveromance
kisswedding
Item Recommendation with VAEs and Heterogenous Priors
K
actionrobbery
killpolice
… … …
• Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)
2a: concatenate all of the user’s reviews in one document
p(zu)
Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline
Online Reviews
p(zu)
Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline
Text
Online Reviews
p(zu)1. word2vec2. LDA
p(zu)
Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline
Text
Online Reviews
Ratings
p(zu)
p(zu)
Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline
Text
Online Reviews
Ratings KL
p(zu)
Text
Online Reviews
p(zu)
Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline
Ratings
p(zu)
Text
Online Reviews
p(zu)
Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline
Ratings
0.70
0.01
0.15
0.00
p(zu)
Text
Online Reviews
p(zu)
Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline
Ratings
0.70
0.01
0.15
0.00
Item Ranking
p(zu)
Text
Online Reviews
p(zu)
Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline
Ratings
0.70
0.01
0.15
0.00
Item Ranking
p(zu)
Evaluate on held-out ratings
Evaluation Metrics
• NDCG@100• Recall@20• Recall@50
Text
Online Reviews
p(zu)
Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline
Ratings
0.70
0.01
0.15
0.00
Item Ranking
p(zu)
Evaluation Metrics
• NDCG@100• Recall@20• Recall@50
EncoderI × 600 × 300
300 × 600 × I
MLP:DecoderMLP:
Evaluate on held-out ratings
Evaluation Datasets: Online Reviews (Rating & Text)
Preprocessing:
Item Recommendation with VAEs and Heterogenous Priors
% non-empty entries
•Yelp Challenge Dataset• IMDB Corpus of Movie Reviews
•Binarize ratings
•Reduce sparsity (cutoff)
-Yelp: 1-2 stars 0, 3-5 stars 1- IMDB: 1-4 stars 0, 5-10 stars 1
-Yelp: discard businesses < 30 reviews, users < 5 reviews- IMDB: discard movies < 5 reviews, users < 5 reviews
Item Recommendation with VAEs and Heterogenous Priors
Ranking the items in random order
Model Comparison
Matrix FactorizationText-only: Ranking items according to cos(tu, ti)
VAE (Liang et al. 2018)VAE with random user-dependent priors
VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)
VAE with heterogenous user-dependent priors
~ 0.003~ 0.007
Evaluation Results
Item Recommendation with VAEs and Heterogenous Priors
Ranking the items in random order
std of scores
Matrix FactorizationText-only: Ranking items according to cos(tu, ti)
VAE (Liang et al. 2018)VAE with random user-dependent priors
VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)
VAE with heterogenous user-dependent priors
~ 0.003~ 0.007
Model ComparisonEvaluation Results
Item Recommendation with VAEs and Heterogenous Priors
Ranking the items in random order
std of scores
Matrix FactorizationText-only: Ranking items according to cos(tu, ti)
VAE (Liang et al. 2018)VAE with random user-dependent priors
VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)
VAE with heterogenous user-dependent priors
~ 0.003~ 0.007
Model ComparisonEvaluation Results
Item Recommendation with VAEs and Heterogenous Priors
Ranking the items in random order
std of scores
Matrix FactorizationText-only: Ranking items according to cos(tu, ti)
VAE (Liang et al. 2018)VAE with random user-dependent priors
VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)
VAE with heterogenous user-dependent priors
~ 0.003~ 0.007
Model ComparisonEvaluation Results
Item Recommendation with VAEs and Heterogenous Priors
Ranking the items in random order
std of scores
Matrix FactorizationText-only: Ranking items according to cos(tu, ti)
VAE (Liang et al. 2018)VAE with random user-dependent priors
VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)
VAE with heterogenous user-dependent priors
~ 0.003~ 0.007
Model ComparisonEvaluation Results
Item Recommendation with VAEs and Heterogenous Priors
Ranking the items in random order
std of scores
Matrix FactorizationText-only: Ranking items according to cos(tu, ti)
VAE (Liang et al. 2018)VAE with random user-dependent priors
VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)
VAE with heterogenous user-dependent priors
~ 0.003~ 0.007
Relative Performance Improvement:Mult-VAE VAE-HPrior
+18.4% +29.4% +17.7% NDCG@100 Recall@20 Recall@50
+14.4% +18.7% +12.3% NDCG@100 Recall@20 Recall@50
YelpIMDB
Model ComparisonEvaluation Results
Item Recommendation with VAEs and Heterogenous Priors
Ranking the items in random order
std of scores
Matrix FactorizationText-only: Ranking items according to cos(tu, ti)
VAE (Liang et al. 2018)VAE with random user-dependent priors
VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)
VAE with heterogenous user-dependent priors
~ 0.003~ 0.007
Model ComparisonEvaluation Results
Item Recommendation with VAEs and Heterogenous Priors
Ranking the items in random order
std of scores
Matrix FactorizationText-only: Ranking items according to cos(tu, ti)
VAE (Liang et al. 2018)VAE with random user-dependent priors
VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)
VAE with heterogenous user-dependent priors
~ 0.003~ 0.007
Model ComparisonEvaluation Results
Item Recommendation with VAEs and Heterogenous Priors
Ranking the items in random orderMatrix Factorization
Text-only: Ranking items according to cos(tu, ti)VAE (Liang et al. 2018)
VAE with random user-dependent priorsVAE with Text Regularization
ℒγ = ℒβ − γ ⋅ dist(zu, tu)
VAE with heterogenous user-dependent priors
Denoising autoencoder (Liang et al. 2018)
~ 0.003~ 0.007std of scores
Model ComparisonEvaluation Results
Item Recommendation with VAEs and Heterogenous Priors
Conclusions•Extend VAEs to Collaborative Filtering with side information (ratings + text)
- User-agnostic user-dependent priors- Prior parameters as functions of the users’ review text
• Outperform the existing Mult-VAE model (up to 29.41% relative improvement in Recall@20)• Perform comparably to a denoising autoencoder (Mult-DAE).
- User representations in a multimodal latent space (encoding ratings + text)
Ongoing & Future work• Experiments: VAE-HPrior vs Mult-DAE on different levels of sparsity• Models: more effective aspect-based methods for extracting user preferences from text reviews• Data: extra side-information available (e.g., geolocation)