Item Recommendation with Variational Autoencoders and ...

Item Recommendation with Variational Autoencoders and Heterogenous Priors

Giannis Karamanolakis Kevin Raji Cherian Ananth Ravi Narayan

Jie Yuan Da Tang Tony Jebara

Columbia Columbia Columbia

Columbia Columbia Columbia, Netflix

U × I

Item Recommendation - Collaborative Filtering (CF)

U × I


? ?

U × I


?

U × I (U × K) × (K × I)


Latent Factor Models

U × I (U × K) × (K × I)



(-) linear limited modeling capacity

U × I (U × K) × (K × I)



(-) linear limited modeling capacityNon-linear Features

U × I (U × K) × (K × I)




Neural Networks

U × I (U × K) × (K × I)




Neural Networks

Variational Autoencoders (VAEs)

“Auto-encoding Variational Bayes” D. P. Kingma, M. Welling, ICLR 2014“Variational Autoencoders for Collaborative Filtering” D. Liang, RG. Krishnan, MD. Hoffman, T. Jebara, WWW 2018

U × I (U × K) × (K × I)




Neural Networks

Variational Autoencoders (VAEs)

“Variational Autoencoders for Collaborative Filtering” D. Liang, RG. Krishnan, MD. Hoffman, T. Jebara, WWW 2018

generalize linear latent factor modelshave larger modeling capacity(+)

(+)

“Auto-encoding Variational Bayes” D. P. Kingma, M. Welling, ICLR 2014

VAEs for Collaborative Filtering

“Variational Autoencoders for Collaborative Filtering” D. Liang, R. G. Krishnan, M. D. Hoffman, T. Jebara, WWW 2018.

0.70

0.00

0.15

0.00

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)-

-

-

0.0

1.0

0.0

1.0


Multinomial Likelihood


0.70

0.00

0.15

0.00

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)-

-

-

0.0

1.0

0.0

1.0


0.70

0.00

0.15

0.00

Training Objective:

(Negative) reconstruction error

qϕ(zu |xu)

p(zu) ∼ 𝒩(0,𝕀K)

Regularization term (Kullback-Leibler Divergence)


-

-

-

0.0

1.0

0.0

1.0


0.70

0.00

0.15

0.00

qϕ(zu |xu)



𝒩(0,𝕀K)

-

-

-

0.0

1.0

0.0

1.0

Training Objective:

(Negative) reconstruction error Regularization term (Kullback-Leibler Divergence)


0.70

0.00

0.15

0.00

qϕ(zu |xu)



My variational posterior:

should be near my prior:

qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)

-

-

-

Training Objective:



0.70

0.00

0.15

0.00

qϕ(zu |xu)





qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)



qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)-

- 0.70

0.00

0.15

0.00

Training Objective:



0.70

0.00

0.15

0.00

qϕ(zu |xu)





qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)



0.70

0.00

0.15

0.00

-

-

-

qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)



qϕ(zu3 |xu3)

p(zu3) ∼ 𝒩(0,𝕀K)

Training Objective:



qϕ(zu |xu)





qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)



qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)



qϕ(zu3 |xu3)

p(zu3) ∼ 𝒩(0,𝕀K)

-



qϕ(zu4 |xu4)

p(zu4) ∼ 𝒩(0,𝕀K)

0.70

0.00

0.15

0.00

“Posterior-Collapse” Issue

𝒩(0,𝕀K)qϕ(zui |xui)

collapses to

Hoffman et al. 2016 (“ELBO surgery”)

∀ user:

Bowman et al. 2016,

?

Training Objective:



qϕ(zu |xu)





qϕ(zu1 |xu1)

p(zu1) ∼ 𝒩(0,𝕀K)



qϕ(zu2 |xu2)

p(zu2) ∼ 𝒩(0,𝕀K)



qϕ(zu3 |xu3)

p(zu3) ∼ 𝒩(0,𝕀K)

-



qϕ(zu4 |xu4)

p(zu4) ∼ 𝒩(0,𝕀K)

0.70

0.00

0.15

0.00

In this paper

Use heterogenous, user-dependent priors!

𝒩(tu4, 𝕊u4)p(zu4) ∼ 𝒩(0,𝕀K)

Training Objective:


This work: Item Recommendation with VAEs and Heterogenous Priors

• For each user , we replace by

- Prior parameters encode user preferences

u

- Explicitly encourage user diversity in latent VAE space

Using heterogenous, user-dependent priors

- Prior parameters encode user preferences- Explicitly encourage user diversity in latent VAE space




u

?How can you encode my preferences?

Have I revealed them?

?

- Prior parameters encode user preferences- Explicitly encourage user diversity in latent VAE space




u

How can you encode my preferences?

Have I revealed them?

Yes, you have!

By writing reviews…

Item Recommendation with VAEs and Heterogenous Priors

Users reveal their preferences in text reviews



Yelp Review



Yelp Review

IMDB Review



Yelp Review

IMDB Review

likes burgers

likes heist movies

cares aboutportion size

likes suspense


Before After

Encoding user preferences from text (2 methods):

• , : functions of the user’s review text

Encoding user preferences from text (2 methods):

- Method 1: Word Embeddings (word2vec)- Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)


Before

• , : functions of the user’s review text

After

Encoding user preferences from text:• Method 1: Word Embeddings (word2vec)


-dim Word EmbeddingsK Mikolov et. al. 2013: “Efficient estimation of word representations in vector space”

Encoding user preferences from text:

1. Create review embeddings: avg of word embeddings


Review Embeddings-dim Word EmbeddingsK

1.

• Method 1: Word Embeddings (word2vec)


1. Create review embeddings: avg of word embeddings

- : avg of review embeddings (written by )u- : diagonal covariance matrix

-dim Word EmbeddingsK

diagonal values : std of review embeddingss1, . . . , sK ∈ ℝ

2. Represent each user: Gaussian distribution


Review Embeddings User Distributions

1. 2.

• Method 1: Word Embeddings (word2vec)

Encoding user preferences from text:• Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)


David Blei, Andrew Y Ng, Michael I Jordan, 2003: “Latent Dirichlet Allocation”

Topic 1horrorblood

guncrime

Topic 2 Topic K…

loveromance

kisswedding

actionrobbery

killpolice

… … …


1. Train LDA to extract topicsK


Topic 1horrorblood

guncrime

Topic 2 Topic K…

loveromance

kisswedding

actionrobbery

killpolice

… … …

• Method 2: Probabilistic Topic Models (Latent Dirichlet Allocation - LDA)

David Blei, Andrew Y Ng, Michael I Jordan, 2003: “Latent Dirichlet Allocation”


1. Train LDA to extract topics2. For each user :

Ku


Topic 1horrorblood

guncrime

Topic 2 Topic K…

loveromance

kisswedding

actionrobbery

killpolice

… … …


2a: concatenate all of the user’s reviews in one document



K

2a: concatenate all of the user’s reviews in one document2b: represent as the distribution over the topics

u

u K

Topi

c 1

…

Topi

c 2

Topi

c K


Topic 1horrorblood

guncrime

Topic 2 Topic K…

loveromance

kisswedding

actionrobbery

killpolice

… … …




K

2b: represent as the distribution over the topics

u

u

Topi

c 1

…

Topi

c 2

Topi

c K

Topic 1horrorblood

guncrime

Topic 2 Topic K…

loveromance

kisswedding


K

actionrobbery

killpolice

… … …


2a: concatenate all of the user’s reviews in one document

p(zu)

Item Recommendation with VAEs and Heterogenous PriorsItem Recommendation Pipeline

Online Reviews

p(zu)


Text

Online Reviews

p(zu)1. word2vec2. LDA

p(zu)


Text

Online Reviews

Ratings

p(zu)

p(zu)


Text

Online Reviews

Ratings KL

p(zu)

Text

Online Reviews

p(zu)


Ratings

p(zu)

Text

Online Reviews

p(zu)


Ratings

0.70

0.01

0.15

0.00

p(zu)

Text

Online Reviews

p(zu)


Ratings

0.70

0.01

0.15

0.00

Item Ranking

p(zu)

Text

Online Reviews

p(zu)


Ratings

0.70

0.01

0.15

0.00

Item Ranking

p(zu)

Evaluate on held-out ratings

Evaluation Metrics

• NDCG@100• Recall@20• Recall@50

Text

Online Reviews

p(zu)


Ratings

0.70

0.01

0.15

0.00

Item Ranking

p(zu)

Evaluation Metrics

• NDCG@100• Recall@20• Recall@50

EncoderI × 600 × 300

300 × 600 × I

MLP:DecoderMLP:

Evaluate on held-out ratings

Evaluation Datasets: Online Reviews (Rating & Text)

Preprocessing:


% non-empty entries

•Yelp Challenge Dataset• IMDB Corpus of Movie Reviews

•Binarize ratings

•Reduce sparsity (cutoff)

-Yelp: 1-2 stars 0, 3-5 stars 1- IMDB: 1-4 stars 0, 5-10 stars 1

-Yelp: discard businesses < 30 reviews, users < 5 reviews- IMDB: discard movies < 5 reviews, users < 5 reviews


Ranking the items in random order

Model Comparison

Matrix FactorizationText-only: Ranking items according to cos(tu, ti)

VAE (Liang et al. 2018)VAE with random user-dependent priors

VAE with Text Regularizationℒγ = ℒβ − γ ⋅ dist(zu, tu)

VAE with heterogenous user-dependent priors

~ 0.003~ 0.007

Evaluation Results



std of scores





~ 0.003~ 0.007

Model ComparisonEvaluation Results



std of scores





~ 0.003~ 0.007

Relative Performance Improvement:Mult-VAE VAE-HPrior

+18.4% +29.4% +17.7% NDCG@100 Recall@20 Recall@50

+14.4% +18.7% +12.3% NDCG@100 Recall@20 Recall@50

YelpIMDB




std of scores





~ 0.003~ 0.007



Ranking the items in random orderMatrix Factorization

Text-only: Ranking items according to cos(tu, ti)VAE (Liang et al. 2018)

VAE with random user-dependent priorsVAE with Text Regularization

ℒγ = ℒβ − γ ⋅ dist(zu, tu)


Denoising autoencoder (Liang et al. 2018)

~ 0.003~ 0.007std of scores



Conclusions•Extend VAEs to Collaborative Filtering with side information (ratings + text)

- User-agnostic user-dependent priors- Prior parameters as functions of the users’ review text

• Outperform the existing Mult-VAE model (up to 29.41% relative improvement in Recall@20)• Perform comparably to a denoising autoencoder (Mult-DAE).

- User representations in a multimodal latent space (encoding ratings + text)

Ongoing & Future work• Experiments: VAE-HPrior vs Mult-DAE on different levels of sparsity• Models: more effective aspect-based methods for extracting user preferences from text reviews• Data: extra side-information available (e.g., geolocation)

Thank you!

Giannis Karamanolakis [email protected]

Contact

mailto:[email protected]

Questions?

Giannis Karamanolakis [email protected]

Contact

mailto:[email protected]

Item Recommendation with Variational Autoencoders and ...

Documents

Transcript of Item Recommendation with Variational Autoencoders and ...