Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian...
-
Upload
leonard-davidson -
Category
Documents
-
view
219 -
download
1
Transcript of Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian...
![Page 1: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/1.jpg)
Bayesian inference for Plackett-Luce ranking
modelsJohn Guiver, Edward Snelson
MSRC
Bayesian inference for Packet-Lube ranking models
![Page 2: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/2.jpg)
Distributions over orderings
• Many problems in ML/IR concern ranked lists of items
• Data in the form of multiple independent orderings of a set of K items
• How to characterize such a set of orderings?
• Need to learn a parameterized probability model over orderings
![Page 3: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/3.jpg)
Notation Items and rank positions are each indexed from ሼ1,⋯,𝐾ሽ≡ ℤ𝐾 A ranking 𝜌:ℤ𝐾 →ℤ𝐾 is a permutation which maps
item indices to position indices. 𝜌𝑖 is the rank position of item 𝑖
An ‘ordering’ 𝜔:ℤ𝐾 →ℤ𝐾 is a permutation which maps position indices to item indices. 𝜔𝑘 is the item whose rank position is 𝑘
Any ranking has a corresponding ordering, and vice versa, so that: 𝜔𝜌𝑖 = 𝑖 and 𝜌𝜔𝑘 = 𝑘
![Page 4: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/4.jpg)
Distributions• Ranking distributions are defined over the domain of
all K! rankings (or orderings)• A fully parameterised distribution would have a
probability for each possible ranking which sum to 1.– E.g. For three items:
𝑃3! = ሼ𝑝123,𝑝132,𝑝213,𝑝231,𝑝312,𝑝321ሽ, σ𝑝𝑖𝑗𝑘 = 1, 𝑝𝑖𝑗𝑘 > 0 • A ranking distribution is a point in this simplex• A model is a parameterised family within the
simplex
![Page 5: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/5.jpg)
Plackett-Luce: vase interpretation
vb
vr
vg
Probability:
bgr
r
vvv
v
bg
g
vv
v
b
b
v
v
![Page 6: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/6.jpg)
Plackett-Luce model
• PL likelihood for a single complete ordering:
![Page 7: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/7.jpg)
obgr
g
vvvv
v
bg
g
vv
v
b
b
v
v
Partial orderingsTop N
obr
b
vvv
v
Bradley-Terry model for case of pairs
Plackett-Luce: vase interpretation
![Page 8: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/8.jpg)
Luce’s Choice Axiom
![Page 9: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/9.jpg)
Gumbel Thurstonian modelEach item represented by a score distribution on the real line.
Marginal matrixProbability of an item in a position
![Page 10: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/10.jpg)
Thurstonian Models, and Yellott’s Theorem
• Assume a Thurstonian Model with each score having identical distributions except for their means. Then:– The score distributions give rise to a Plackett-Luce model
if and only the scores are distributed according to a Gumbel distribution (Yellott)
• Result depends on some nice properties of the Gumbel distribution:𝐶𝐷𝐹: 𝒢ሺ𝑥a0𝜇,𝛽ሻ= 𝑒−𝑧 𝑤ℎ𝑒𝑟𝑒 𝑧ሺ𝑥ሻ= 𝑒−ሺ𝑥−𝜇ሻ𝛽 , 𝑃𝐷𝐹: ℊሺ𝑥a0𝜇,𝛽ሻ= 𝑧𝛽𝑒−𝑧
න ℊሺ𝑥a0𝜇,𝛽ሻ𝒢ሺ𝑥a0𝜇′,𝛽ሻ𝑑𝑥𝑡−∞ = 𝒢൫𝑡ห𝜇+ 𝛽𝑙𝑛൫1 + 𝑎ሺ𝜇,𝜇′ሻ൯,𝛽൯
൫1 + 𝑎ሺ𝜇,𝜇′ሻ൯
![Page 11: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/11.jpg)
Maximum likelihood estimation
• Hunter (2004) describes minorize/maximize (MM) algorithm to find MLE
• Can over-fit with sparse data (especially incomplete rankings)
• Strong assumption for convergence:– “in every possible partition of the items into
two nonempty subsets, some item in the second set ranks higher than some item in the first set at least once in the data”
![Page 12: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/12.jpg)
Bayesian inference: factor graph
vA vDvB vC vE
BAE
DE
Gamma priors
EBA
B
vvv
v
EA
A
vv
v
![Page 13: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/13.jpg)
Fully factored approximation
• Posterior over P-L parameters, given N orderings :
• Approximate as fully factorised product of Gammas:
![Page 14: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/14.jpg)
Expectation Propagation [Minka 2001]
![Page 15: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/15.jpg)
Alpha-divergence
Kullback-Leibler (KL) divergence
Let p,q be two distributions (don’t need to be normalised)
Alpha-divergence ( is any real number)
![Page 16: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/16.jpg)
16
Alpha-divergence – special casesSimilarity measures between two distributions(p is the truth, and q an approximation)
α
![Page 17: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/17.jpg)
17
Minimum alpha-divergence
q is Gaussian, minimizes D(p||q)
= -∞ = 0 = 0.5 = 1 = ∞
![Page 18: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/18.jpg)
18
Structure of alpha space
0 1
zeroforcing
inclusive (zeroavoiding)
MFBP,EP
![Page 19: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/19.jpg)
Bayesian inference: factor graph
vA vDvB vC vE
BAE
DE
Gamma priors
EA
A
vv
v
)()()1()(
)()(
:following eProject th
1
1
AEAEEA
EEA
AEA
vGamvGamvvdvvGam
vGamvv
vdvvGam
![Page 20: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/20.jpg)
Inferring known parameters
![Page 21: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/21.jpg)
Ranking NASCAR drivers
![Page 22: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/22.jpg)
Posterior rank distributionsMLEEP
Driver rank : 1 .... 83
![Page 23: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/23.jpg)
Conclusions and future work• We have given an efficient Bayesian
treatment for P-L models using Power EP• Advantage of Bayesian approach is:
– Avoid over-fitting on sparse data– Gives uncertainty information on the parameters– Gives estimation of model evidence
• Future work:– Mixture models– Feature-based ranking models
![Page 24: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/24.jpg)
Thank you
http://www.research.microsoft.com/infernet
![Page 25: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/25.jpg)
Ranking movie genres
![Page 26: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/26.jpg)
Incomplete orderings
• Internally consistent: – “the probability of a particular ordering
does not depend on the subset from which the items are assumed to be drawn”
• Likelihood for an incomplete ordering (only a few items or top-S items are ranked) simple:– only include factors for those items that
are actually ranked in datum n
![Page 27: Bayesian inference for Plackett-Luce ranking models John Guiver, Edward Snelson MSRC Bayesian inference for Packet-Lube ranking models.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f325503460f94c4e94e/html5/thumbnails/27.jpg)
α = -1 power makes this tractable
Power EP for Plackett-Luce• A choice of α = -1 leads to a particularly
nice simplification for the P-L likelihood• An example of the type of calculation in the
EP updates, with a factor connecting two items A, E:
• Sum of Gammas can be projected back onto single Gamma
)(
)()1()(
)()(
1
1
A
EAEEA
EEA
AEA
vGam
vGamvvdvvGam
vGamvv
vdvvGam