Bayesian Nonparametric Models for Ranking Datacaron/slides/Caron_BNPSki2014.pdf · Bayesian...

23
Bayesian Nonparametric Models for Ranking Data Fran¸ cois Caron 1 , Yee Whye Teh 1 and Brendan Murphy 2 1 Dept of Statistics, University of Oxford, UK 2 School of Mathematical Sciences, University College Dublin, Ireland BNPSki, January 2014 Caron, Teh & Murphy 1 / 23

Transcript of Bayesian Nonparametric Models for Ranking Datacaron/slides/Caron_BNPSki2014.pdf · Bayesian...

Bayesian Nonparametric Models for Ranking Data

Francois Caron1, Yee Whye Teh1 and Brendan Murphy2

1Dept of Statistics, University of Oxford, UK2School of Mathematical Sciences, University College Dublin, Ireland

BNPSki, January 2014

Caron, Teh & Murphy 1 / 23

Ranked Data

“Rank these 5 movies by preference”

“Rank your 5 favourite movies among these 30”

“Rank your 5 favourite movies”

Aim

I Bayesian nonparametric model to model top-m partial rankings of apotentially infinite number of items.

I Each item is modelled using a positive rating parameter that isinferred from partial rankings.

I Develop efficient computational procedure for posterior simulation.

I Mixture generalizations.

I Application to preferences in college programmes

Caron, Teh & Murphy 2 / 23

Preferences for college degree programmes

I Application for college degree programmes in Ireland in 2000I 53 757 applicants provided top-10 rankingsI 533 degree programmes available for selectionI Heterogenity in the data

Table: Two samples from the preference data.

Rank CAO code College Degree Programme1 DN002 University College Dublin Medicine2 GY501 NUI - Galway Medicine3 CK701 University College Cork Medicine4 DN006 University College Dublin Physiotherapy5 TR053 Trinity College Dublin Physiotherapy6 DN004 University College Dublin Radiotherapy7 TR007 Trinity College Dublin Clinical Speech8 FT223 Dublin IT Human Nutrition9 TR084 Trinity College Dublin Social Work10 DN007 University College Dublin Social Science

Rank CAO code College Degree Programme1 MI005 Mary Immaculate Limerick Education - Primary Teaching2 CK301 University College Cork Law3 CK105 University College Cork European Studies4 CK107 University College Cork Language - French5 CK101 University College Cork Arts

Caron, Teh & Murphy 3 / 23

Parametric Plackett-Luce Model

I Population of M items X1, . . . , XM .

I To each item Xk assign a rating parameter wk > 0.

Stage-wise interpretation:

I Pick first item with probabilities proportional to wk’s.

I Remove first item.

I Pick second item with probabilities proportional to wk’s.

I Remove second item.

I . . .

The probability of a ranking (Xρ1, . . . , XρM ), with ρ = (ρ1, . . . , ρM)a permutation, is

P (ρ|w) =

M∏i=1

wρi∑Mk=1wk −

∑i−1j=1wρj

[Luce, 1959, Plackett, 1975], [Gormley and Murphy, 2008, Gormley and Murphy, 2009]Caron, Teh & Murphy 4 / 23

Nonparametric Plackett-Luce Model

I Population of items X1, X2, . . . of infinite size.

I Each item Xk assigned a rating parameter wk > 0.

Stage-wise interpretation:

I Pick first item with probabilities proportional to wk’s.

I Remove first item.

I Pick second item with probabilities proportional to wk’s.

I Remove second item.

I . . .

The probability of a (finite, partial) ranking Xρ1, . . . , Xρm is

P (ρ|w) =

m∏i=1

wρi∑∞k=1wk −

∑i−1j=1wρj

Caron, Teh & Murphy 5 / 23

Thurstonian Interpretation

I For each item Xk define an exponential random variate:

zk ∼ Exp(wk)

I Induces a partial permutation ρ such that zρ1 < · · · < zρm < · · ·

A race among items:

I zk is the time that item Xk finished the race.

I ρ is the order of champion, runner-up, second runner-up...

The probability of a partial permutation is:

P (ρ|w) =P (zρ1 < zρ2 < · · · < zρm < everything else)

=

m∏i=1

wρi∑∞k=1wk −

∑i−1j=1wρj

Caron, Teh & Murphy 6 / 23

Thurstonian Interpretation

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80

5

10

15

20

25

30

ratings

items

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

5

10

15

20

25

30

time

items

ρ1

ρ2

ρ3

ρ4

ρ5

zkwk

P (ρ|w) =P (zρ1 < zρ2 < · · · < zρm < everything else)

=m∏i=1

wρi∑∞k=1wk −

∑i−1j=1wρj

Caron, Teh & Murphy 7 / 23

Data Augmentation

I Reparametrize in terms of inter-arrival durations:

Z1 = zρ1, Z2 = zρ2 − zρ1, . . . Zm = zρm − zρm−1

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80

5

10

15

20

25

30

ratings

items

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

5

10

15

20

25

30

time

items

ρ1

ρ2

ρ3

ρ4

ρ5

zkwk

Z1 Z2 Z3 Z4 Z5

Caron, Teh & Murphy 8 / 23

Data Augmentation

I Augmented system with auxiliary variables Z1, . . . , Zm:

Zi|ρ,w,X ∼ Exp

∞∑k=1

wk −i−1∑j=1

wρj

Joint probability is:

P (ρ, Z|w,X) =

m∏i=1

wρi exp

− ∞∑k=1

wk −i−1∑j=1

wρj

Zi

What prior to use for w,X?

I Independent Gamma’s for wk’s do not work.

I Gamma process.

I Better: completely random measures.

Caron, Teh & Murphy 9 / 23

Completely Random Measures

I Levy intensity λ(w).I Base distribution H with density h(x).I Random atomic measure

G =

∞∑k=1

wkδXk

Construction: two-dimensional Poisson process N = {wk, Xk} withintensity λ(w)h(x):

[Kingman, 1967, Lijoi and Prunster, 2010]Caron, Teh & Murphy 10 / 23

Completely Random Measures

Conditions on Levy intensity:∫ ∞0

λ(w)dw =∞ ⇒ infinitely many items.∫ ∞0

(1− e−w)λ(w)dw <∞ ⇒ finite total∞∑k=1

wk.

Plackett-Luce sampling without replacement:

I Pick first item with probabilities proportional to wk’s; remove.

I Pick second item with probabilities proportional to wk’s; remove;

I . . .

⇒ Size-biased sampling of the atoms in G.

Caron, Teh & Murphy 11 / 23

Prior Draws

Generalized Gamma process with λ(w) = αΓ(1−σ)

w−σ−1e−τw, τ = 1.

5 10 15 20 25 30

2

4

6

8

10

12

14

16

18

20

(a) α = 0.1, σ = 0

5 10 15 20 25 30

2

4

6

8

10

12

14

16

18

20

(b) α = 1, σ = 0

5 10 15 20 25 30

2

4

6

8

10

12

14

16

18

20

(c) α = 3, σ = 0

5 10 15 20 25 30

2

4

6

8

10

12

14

16

18

20

(d) α = 1, σ = 0.1

5 10 15 20 25 30

2

4

6

8

10

12

14

16

18

20

(e) α = 1, σ = 0.5

5 10 15 20 25 30 35 40 45 50

2

4

6

8

10

12

14

16

18

20

(f) α = 1, σ = 0.9

Caron, Teh & Murphy 12 / 23

Posterior Characterization

I Observe L partial rankings Y` = (Y`1, . . . , Y`m`) for ` = 1, . . . , L.I X∗1 , . . . , X

∗K are the K unique items among observations

I Associated auxiliary variables Z`1, . . . , Z`m` .

TheoremThe posterior distribution given partial rankings Y and auxiliary variablesZ is a CRM with fixed atoms:

G|Y, Z = G∗ +K∑k=1

w∗kδX∗k

where G∗ and w∗1, . . . , w∗K are mutually independent. The law of G∗ is

still CRM with Levy intensity λ∗(w) = λ(w)e−w(∑

`i Z`i)

while the masses have distributions,P (w∗k|Y, Z) ∝ (w∗k)

nke−w∗k(

∑`i δ`ikZ`i)λ(w∗k).

I Characterization similar to that for normalized random measures.

[Prunster, 2002, James, 2002, James et al., 2009]Caron, Teh & Murphy 13 / 23

Bayesian Inference via Gibbs Sampling

I Rating parameters w∗k of observed items.

I Latent variables Z`i.

I CRM G∗ containing unobserved items.

I Marginalize out G∗, keeping only its total mass w∗∗.

Easy Gibbs sampler for generalized gamma process class of CRM

Z`i|rest ∼ Exponential

w∗k|rest ∼ Gamma

w∗∗|rest ∼ Exponentially tilted stable

Caron, Teh & Murphy 14 / 23

Nonparametric Plackett-Luce Mixture Model

Partial rankings reflecting the preferences of a heteroge-nous population.

π ∼ GEM(θ)

c`|π ∼ Discrete(π)

Y`|c`, Gc` ∼ PL(Gc`)

Gk|Ck ∼ Gamma(αH + Ck, τ + φ)

Ck|G0 ∼ Poisson(φG0)

G0 ∼ Gamma(αH, τ )

I Gk ∼ Gamma(αH, τ ) does not work.

π

Gk

c�

ρ�

G0

Ck

Caron, Teh & Murphy 15 / 23

Irish University Programme Applications

I 53757 Irish university applicants.

I Each applicant ranks their top-10 desired university programmes.

I Point estimate of the partition

c = argminc(i)∈{c(1),...,c(N)}

∑k

∑`

(δc(i)k c

(i)`

− ζk`)2

where the coclustering matrix ζ is obtained with

ζk` =1

N

N∑i=1

δc(i)k c

(i)`

c(i), i = 1, . . . , N are the Monte Carlo samples and δk` = 1 ifk = `, 0 otherwise.

Caron, Teh & Murphy 16 / 23

Irish University Programme Applications

Table: Description of the different clusters. The size of the clusters, the entropyand a cluster description are provided.

Cluster Size Entropy Description Cluster Size Entropy Description1 3325 0.72 Social Science/Tourism 14 1918 0.71 Engineering2 3214 0.71 Science 15 1835 0.48 Teaching/Arts3 3183 0.64 Business/Commerce 16 1835 0.68 Art/Music - Dublin4 2994 0.58 Arts 17 1740 0.71 Engineering - Dublin5 2910 0.63 Business/Marketing - Dublin 18 1701 0.55 Medicine6 2879 0.68 Construction 19 1675 0.70 Arts/Religion/Theology7 2803 0.66 CS - outside Dublin 20 1631 0.76 Arts/History - Dublin8 2225 0.67 CS - Dublin 21 1627 0.66 Galway9 2303 0.67 Arts/Social - outside Dublin 22 1392 0.70 Limerick10 2263 0.63 Business/Finance - Dublin 23 1273 0.65 Law11 2198 0.65 Arts/Psychology - Dublin 24 1269 0.72 Business - Dublin12 2086 0.63 Cork 25 1225 0.79 Arts/Bus. - Dublin13 2029 0.64 Comm./Journalism - Dublin 26 47 0.96 Mixed

Caron, Teh & Murphy 17 / 23

Irish University Programme Applications

Table: Cluster 7: Computer Science - outside Dublin

Rank Aver. Norm. Weight College Degree Programme1 0.081 Cork IT Computer Applications2 0.075 Limerick IT Software Development3 0.072 University of Limerick Computer Systems4 0.064 Waterford IT Applied Computing5 0.061 Cork IT Software Dev & Comp Net6 0.046 IT Carlow Computer Networking7 0.038 Athlone IT Computer and Software Engineering8 0.036 University College Cork Computer Science9 0.033 Dublin City University Computer Applications

10 0.033 University of Limerick Information Technology

Table: Cluster 8: Computer Science - Dublin

Rank Aver. Norm. Weight College Degree Programme1 0.141 Dublin City University Computer Applications2 0.054 University College Dublin Computer Science3 0.049 NUI - Maynooth Computer Science4 0.043 Dublin IT Computer Science5 0.040 National College of Ireland Software Systems6 0.038 Dublin IT Business Info. Systems Dev.7 0.036 Trinity College Dublin Computer Science8 0.035 Dublin IT Applied Sciences/Computing9 0.030 Trinity College Dublin Information & Comm. Tech.

10 0.029 University College Dublin B.A. (Computer Science)

Caron, Teh & Murphy 18 / 23

Irish University Programme Applications

Table: Cluster 12: Cork

Rank Aver. Norm. Weight College Degree Programme1 0.105 University College Cork Arts2 0.072 University College Cork Computer Science3 0.072 University College Cork Commerce4 0.067 University College Cork Business Information Systems5 0.057 Cork IT Computer Applications6 0.049 Cork IT Software Dev & Comp Net7 0.035 University College Cork Finance8 0.031 University College Cork Law9 0.031 University College Cork Accounting

10 0.026 University College Cork Biological and Chemical Sciences

Caron, Teh & Murphy 19 / 23

Irish University Programme Applications

Caron, Teh & Murphy 20 / 23

Summary

I A Bayesian nonparametric Plackett-Luce model for partial rankings.

I Completely random measures and posterior characterization.

I Easy Gibbs sampling for posterior simulation.

I Mixture generalization.I Future:

I Dependent general CRM models

Caron, Teh & Murphy 21 / 23

Bibliography I

Caron, F., Teh, Y. W., and Murphy, B. (2013).Bayesian nonparametric Plackett-Luce models for the analysis of preferences for collegedegree programmes.To appear in Annals of Applied Statistics.

Gormley, I. C. and Murphy, T. B. (2008).Exploring voting blocs with the Irish electorate: a mixture modeling approach.Journal of the American Statistical Association, 103(483):1014–1027.

Gormley, I. C. and Murphy, T. B. (2009).A grade of membership model for rank data.Bayesian Analysis, 4(2):265–296.

James, L., Lijoi, A., and Prunster, I. (2009).Posterior analysis for normalized random measures with independent increments.Scandinavian Journal of Statistics, 36(1):76–97.

James, L. F. (2002).Poisson process partition calculus with applications to exchangeable models and bayesiannonparametrics.arXiv preprint math/0205093.

Caron, Teh & Murphy 22 / 23

Bibliography II

Kingman, J. F. C. (1967).Completely random measures.Pacific Journal of Mathematics, 21(1):59–78.

Lijoi, A. and Prunster, I. (2010).Models beyond the Dirichlet process.In Hjort, N. L., C. Holmes, P. M., and Walker, S. G., editors, Bayesian Nonparametrics.Cambridge University Press.

Luce, R. D. (1959).Individual choice behavior: A theoretical analysis.Wiley.

Plackett, R. (1975).The analysis of permutations.Journal of the Royal Statistical Society: Series C (Applied Statistics), 24(2):193–202.

Prunster, I. (2002).Random probability measures derived from increasing additive processes and theirapplication to Bayesian statistics.PhD thesis, University of Pavia.

Caron, Teh & Murphy 23 / 23