Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

44
CERI 2016, G, S A S R-B L M R S Daniel Valcarce, Javier Parapar, Álvaro Barreiro @dvalcarce @jparapar @AlvaroBarreiroG Information Retrieval Lab @IRLab_UDC University of A Coruña Spain

Transcript of Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Outline

1. Recommender Systems

2. Pseudo-Relevance Feedback

3. Relevance-Based Language Modelling of RecommenderSystems

4. IDF Effect and Additive Smoothing

5. Experiments

6. Conclusions and Future Directions

1/26

RECOMMENDER SYSTEMS

Recommender Systems

Recommender systems generate personalised suggestions foritems that may be of interest to the users.

Top-N Recommendation: create a ranking of the N mostrelevant items for each user.

Collaborative filtering: exploit only user-item interactions(ratings, clicks, etc.).

3/26

PSEUDO-RELEVANCE FEEDBACK

Pseudo-Relevance Feedback (I)

In Information Retrieval, Pseudo-Relevance Feedback (PRF) isan automatic query expansion method.

The goal is to expand the original query with new terms toimprove the quality of the search results.

These new terms are extracted automatically from a firstretrieval using the original query.

5/26

Pseudo-Relevance Feedback (II)

Information need

6/26

Pseudo-Relevance Feedback (II)

Information need

query

6/26

Pseudo-Relevance Feedback (II)

Information need

query RetrievalSystem

6/26

Pseudo-Relevance Feedback (II)

Information need

query RetrievalSystem

6/26

Pseudo-Relevance Feedback (II)

Information need

query RetrievalSystem

6/26

Pseudo-Relevance Feedback (II)

Information need

query RetrievalSystem

6/26

Pseudo-Relevance Feedback (II)

Information need

query RetrievalSystem

QueryExpansion

expandedquery

6/26

Pseudo-Relevance Feedback (II)

Information need

query RetrievalSystem

QueryExpansion

expandedquery

6/26

RELEVANCE-BASED LANGUAGE MODELLINGOF RECOMMENDER SYSTEMS

Pseudo-Relevance Feedback for Collaborative Filtering

PRF CFUser’s query User’s profile

mostˆ1,populatedˆ2,stateˆ2 Titanicˆ2,Avatarˆ3,Matrixˆ5

Docum

ents

Neigh

bours

Term

s

Items

8/26

Relevance-Based Language Models (RM)

Relevance-Based Language Models or Relevance Models (RM)are a state-of-the-art PRF technique (Lavrenko & Croft, SIGIR2001).

# Two models: RM1 and RM2.

# RM1 works better than RM2 in retrieval.

Relevance Models have been recently adapted to collaborativefiltering (Parapar et al., IPM 2013).

# For recommendation, RM2 is the preferred method.

9/26

Relevance-Based Language Models (RM)

Relevance-Based Language Models or Relevance Models (RM)are a state-of-the-art PRF technique (Lavrenko & Croft, SIGIR2001).

# Two models: RM1 and RM2.

# RM1 works better than RM2 in retrieval.

Relevance Models have been recently adapted to collaborativefiltering (Parapar et al., IPM 2013).

# For recommendation, RM2 is the preferred method.

9/26

Relevance Models for Collaborative Filtering

RM2 : p(i |Ru) ∝ p(i)∏j∈Iu

∑v∈Vu

p(i |v) p(v)p(i) p( j |v)

# Iu is the set of items rated by the user u.

# Vu is neighbourhood of the user u. This is computed usinga clustering algorithm.

# p(i) and p(v) are the item and user priors.

# p(i |u) is computed smoothing the maximum likelihoodestimate with the probability in the collection.

10/26

Relevance Models for Collaborative Filtering

RM2 : p(i |Ru) ∝ p(i)∏j∈Iu

∑v∈Vu

p(i |v) p(v)p(i) p( j |v)

# Iu is the set of items rated by the user u.

# Vu is neighbourhood of the user u. This is computed usinga clustering algorithm.

# p(i) and p(v) are the item and user priors.

# p(i |u) is computed smoothing the maximum likelihoodestimate with the probability in the collection.

10/26

Relevance Models for Collaborative Filtering

RM2 : p(i |Ru) ∝ p(i)∏j∈Iu

∑v∈Vu

p(i |v) p(v)p(i) p( j |v)

# Iu is the set of items rated by the user u.

# Vu is neighbourhood of the user u. This is computedusing a clustering algorithm.

# p(i) and p(v) are the item and user priors.

# p(i |u) is computed smoothing the maximum likelihoodestimate with the probability in the collection.

10/26

Relevance Models for Collaborative Filtering

RM2 : p(i |Ru) ∝ p(i)∏j∈Iu

∑v∈Vu

p(i |v) p(v)p(i) p( j |v)

# Iu is the set of items rated by the user u.

# Vu is neighbourhood of the user u. This is computed usinga clustering algorithm.

# p(i) and p(v) are the item and user priors.

# p(i |u) is computed smoothing the maximum likelihoodestimate with the probability in the collection.

10/26

Relevance Models for Collaborative Filtering

RM2 : p(i |Ru) ∝ p(i)∏j∈Iu

∑v∈Vu

p(i |v) p(v)p(i) p( j |v)

# Iu is the set of items rated by the user u.

# Vu is neighbourhood of the user u. This is computed usinga clustering algorithm.

# p(i) and p(v) are the item and user priors.

# p(i |u) is computed smoothing the maximum likelihoodestimate with the probability in the collection.

10/26

Collection-based Smoothing Techniques (I)

Absolute Discounting (AD)

pδ(i |u) � max(ru ,i − δ, 0) + δ |Iu | p(i |C)∑j∈Iu ru , j

Jelinek-Mercer (JM)

pλ(i |u) � (1 − λ) ru ,i∑j∈Iu ru , j

+ λ p(i |C)

Dirichlet Priors (DP)

pµ(i |u) � ru ,i + µ p(i |C)µ +∑

j∈Iu ru , j

11/26

Collection-based Smoothing Techniques (II)

Absolute Discounting, Jelinek-Mercer and Dirichlet Priors havebeen studied in the context of:

# Text Retrieval (Zhai & Lafferty, ACM TOIS 2004)

◦ Absolute Discounting performs very poorly.◦ Dirichlet Priors is the most popular approach.◦ Jelinek-Mercer is a bit better for long queries.

# Collaborative Filtering (Valcarce et al., ECIR 2015)

◦ Absolute Discounting is the best smoothing method.

Can we do better?

12/26

Collection-based Smoothing Techniques (II)

Absolute Discounting, Jelinek-Mercer and Dirichlet Priors havebeen studied in the context of:

# Text Retrieval (Zhai & Lafferty, ACM TOIS 2004)◦ Absolute Discounting performs very poorly.◦ Dirichlet Priors is the most popular approach.◦ Jelinek-Mercer is a bit better for long queries.

# Collaborative Filtering (Valcarce et al., ECIR 2015)◦ Absolute Discounting is the best smoothing method.

Can we do better?

12/26

Collection-based Smoothing Techniques (II)

Absolute Discounting, Jelinek-Mercer and Dirichlet Priors havebeen studied in the context of:

# Text Retrieval (Zhai & Lafferty, ACM TOIS 2004)◦ Absolute Discounting performs very poorly.◦ Dirichlet Priors is the most popular approach.◦ Jelinek-Mercer is a bit better for long queries.

# Collaborative Filtering (Valcarce et al., ECIR 2015)◦ Absolute Discounting is the best smoothing method.

Can we do better?

12/26

IDF EFFECT AND ADDITIVE SMOOTHING

Axiomatic Analysis of the IDF Effect in IR

A recent work performed an axiomatic analysis of several PRFmethods (Hazimeh & Zhai, ICTIR 2015).

# They found out that RM1 with Dirichlet Priors andJelinek-Mercer smoothing methods demote the IDF effect.

# The IDF effect is a desirable property that, intuitively,promotes documents with very specific terms.

Can we use this result in recommendation?

What is the IDF effect in recommendation? Is it a desirableproperty?

They studied RM1, what about RM2?

14/26

Axiomatic Analysis of the IDF Effect in IR

A recent work performed an axiomatic analysis of several PRFmethods (Hazimeh & Zhai, ICTIR 2015).

# They found out that RM1 with Dirichlet Priors andJelinek-Mercer smoothing methods demote the IDF effect.

# The IDF effect is a desirable property that, intuitively,promotes documents with very specific terms.

Can we use this result in recommendation?

What is the IDF effect in recommendation? Is it a desirableproperty?

They studied RM1, what about RM2?

14/26

The IDF Effect in Recommendation (I)

This retrieval idea is related to the novelty in recommendation.

Definition (IDF effect)

A recommender system supports the IDF effect if p(i1 |Ru) >p(i2 |Ru) when

# two items i1 and i2# have the same ratings r(v , i1) � r(v , i2) for all v ∈ Vu

# and different popularity p(i1 |C) < p(i2 |C)

In simply words, if we have the same feedback for two items,we should recommend the least popular one.

15/26

The IDF Effect in Recommendation (II)

We performed an axiomatic analysis of RM21 using thefollowing smoothing methods:

# Dirichlet Priors

# Jelinek-Mercer

# Absolute Discounting

Additive Smoothing

pγ(i |u) � r(u , i) + γ∑j∈Iu r(u , j) + γ|I|

1Math proofs in the paper!

16/26

The IDF Effect in Recommendation (II)

We performed an axiomatic analysis of RM21 using thefollowing smoothing methods:

# Dirichlet Priors

# Jelinek-Mercer

# Absolute Discounting

Additive Smoothing

pγ(i |u) � r(u , i) + γ∑j∈Iu r(u , j) + γ|I|

1Math proofs in the paper!

16/26

The IDF Effect in Recommendation (II)

We performed an axiomatic analysis of RM21 using thefollowing smoothing methods:

# Dirichlet Priors

# Jelinek-Mercer

# Absolute Discounting

Additive Smoothing

pγ(i |u) � r(u , i) + γ∑j∈Iu r(u , j) + γ|I|

1Math proofs in the paper!

16/26

EXPERIMENTS

Experimental settings

Datasets:

# Movielens 100k

# Movielens 1M

Metrics:

# Ranking accuracy: nDCG.

# Diversity: the complement of the Gini index.

# Novelty: mean self-information (MSI).

18/26

Ranking accuracy

0.30

0.31

0.32

0.33

0.34

0.35

0.36

0.37

0.38

0.39

0.40

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.001 0.01 0.1 1 10

nDC

G@

10

δ, λ, µ× 103

γ

Additive (γ)Absolute Discounting (δ)

Jelinek-Mercer (λ)Dirichlet Priors (µ)

0.26

0.27

0.28

0.29

0.30

0.31

0.32

0.33

0.34

0.35

0.36

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.001 0.01 0.1 1 10

nDC

G@

10

δ, λ, µ× 103

γ

Additive (γ)Absolute Discounting (δ)

Jelinek-Mercer (λ)Dirichlet Priors (µ)

Figure: Values of nDCG@10 on MovieLens 100k (left) and 1M (right).

19/26

Diversity

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.001 0.01 0.1 1 10

Gin

i@10

δ, λ, µ× 103

γ

Additive (γ)Absolute Discounting (δ)

Jelinek-Mercer (λ)Dirichlet Priors (µ)

0.00

0.01

0.02

0.03

0.04

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.001 0.01 0.1 1 10

Gin

i@10

δ, λ, µ× 103

γ

Additive (γ)Absolute Discounting (δ)

Jelinek-Mercer (λ)Dirichlet Priors (µ)

Figure: Values of Gini@10 on MovieLens 100k (left) and 1M (right).

20/26

Novelty

7.5

8.0

8.5

9.0

9.5

10.0

10.5

11.0

11.5

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.001 0.01 0.1 1 10

MSI

@10

δ, λ, µ× 103

γ

Additive (γ)Absolute Discounting (δ)

Jelinek-Mercer (λ)Dirichlet Priors (µ)

8.0

8.5

9.0

9.5

10.0

10.5

11.0

11.5

12.0

12.5

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.001 0.01 0.1 1 10

MSI

@10

δ, λ, µ× 103

γ

Additive (γ)Absolute Discounting (δ)

Jelinek-Mercer (λ)Dirichlet Priors (µ)

Figure: Values of MSI@10 on MovieLens 100k (le ft) and 1M (right).

21/26

G-measure of nDCG, Gini and MSI

0.2

0.3

0.4

0.5

0.6

0.7

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.001 0.01 0.1 1 10

G(G

ini@

10,M

SI@

10,n

DC

G@

10)

δ, λ, µ× 103

γ

Additive (γ)Absolute Discounting (δ)

Jelinek-Mercer (λ)Dirichlet Priors (µ)

0.1

0.2

0.3

0.4

0.5

0.6

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.001 0.01 0.1 1 10

G(G

ini@

10,M

SI@

10,n

DC

G@

10)

δ, λ, µ× 103

γ

Additive (γ)Absolute Discounting (δ)

Jelinek-Mercer (λ)Dirichlet Priors (µ)

Figure: Values of the geometric mean among nDCG@10, Gini@10 andMSI@10 on MovieLens 100k (left) and 1M (right).

22/26

CONCLUSIONS AND FUTURE DIRECTIONS

Conclusions

The IDF effect from IR is related to the novelty of therecommendations.

The use of collection-based smoothing methods with RM2demotes the IDF effect.

Additive smoothing is a simple method that does not demote(nor promote) the IDF effect.

Additive smoothing provides better accuracy, diversity andnovelty figures than collection-based smoothing methods.

24/26

Future work

Envision new ways of enhancing the IDF effect in RM2:

# Design smoothing methods that actively promote the IDFeffect.

# Use non-uniform prior estimates.

Study axiomatically other IR properties that can be useful inrecommendation.

25/26

THANK YOU!

@DVALCARCEhttp://www.dc.fi.udc.es/~dvalcarce