Carnegie Mellon Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel...

31
Carnegie Mellon Jonathan Huang Carlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel Learning Hierarchical Riffle Independent Groupings from Rankings

Transcript of Carnegie Mellon Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel...

Carnegie Mellon

Jonathan Huang Carlos Guestrin

Carnegie Mellon UniversityICML 2010

Haifa, Israel

Learning Hierarchical Riffle Independent Groupings

from Rankings

American Psychological Association Elections

Each ballot in election data is a ranked list of candidates [Diaconis, 89]

5738 full ballots (1980 election)

5 candidatesWilliam BevanIra IscoeCharles KieslerMax SiegleLogan Wright

10 20 30 40 50 60 70 80 90 100 110 12000.010.020.030.040.050.060.070.080.090.1

rankings

pro

bab

ility

candidates

ran

ks

First-order matrix Prob(candidate i was

ranked j)

Candidate 3 has most

1st place votesAnd many

last place votes

2

3

Factorial Possibilitiesn items/candidates means n! rankings:

n n! Memory required to store n! doubles

5 120 <4 kilobytes

15 1.31x1012 1729 petabytes (!!)

Possible learning biases for taming complexity:

Parametric? (e.g., Mallows, Plackett Luce,…)

Sparsity? [Reid79, Jagabathula08]

Independence/Graphical models? [Huang09]

(Not to mention sample complexity issues…)

10 20 30 40 50 60 70 80 90 100 110 12000.010.020.030.040.050.060.070.080.090.1

rankings

pro

bab

ility

4

Full independence on rankingsGraphical model for

joint ranking of 6 items

Rank of item C

A F

C D

B E

Mutual exclusivity leads to fully

connected model!

A F

C D

B E

{A,B,C}, {D,E,F} independent

Ranks of {A,B,C} a permutation of {1,2,3}

Ranks of {D,E,F} a permutation of {4,5,6}

5

First-order independence condition

Ran

ks

Candidates

Ran

ks

Candidates

Not independent

Independent

vs.

A F

C D

B E

A F

C D

B E

candidates

ran

ks

Prob(candidate i was ranked j)

Sparsity: any ranking putting A, B, or C in ranks 4, 5, or 6 has zero probability!

But… such sparsity unlikely to exist in real data

6

Drawing independent fruits/veggies

Veggies in ranks {1,2}, Fruits in ranks {3,4}(veggies always better than fruits)

Draw veggie rankings, fruit rankings independently:

Form joint ranking of veggies and fruits:Veggie

Fruit

Veggie

Fruit

>

>

>

Artichoke Broccoli>

Veggies

Cherry Dates>Fruits

Broccoli

Artichoke

Cherry

Date

Veggie

Fruit

Veggie

Fruit

>

>

>

Artichoke

Broccoli

Date

Cherry

Full independence: fruit/veggie positions fixed ahead of time!

7

Riffled independence

Riffled independence model [Huang, Guestrin, 2009]:

Draw interleaving of veggie/fruit positions (according to a distribution)

Artichoke Broccoli>

Veggies

Cherry Dates>Fruits

Veggie

Fruit

Veggie

Fruit

>

>

>Veggie

Fruit

Veggie

Fruit

>

>

>Veggie

Fruit

Veggie

Fruit

>

>

>

Veggie

Fruit

Veggie

Fruit

>

>

>

Artichoke

Broccoli

Cherry

Dates

>

>

>

8

Riffle ShufflesRiffle shuffle (dovetail shuffle)

Cut deck into two piles.Interleave piles.

Riffle shuffles corresponds to

distributions over interleavings Interleaving distribution

9

American Psych. Assoc. Election (1980)

10 20 30 40 50 60 70 80 90 100 110 12000.010.020.030.040.050.060.070.080.090.1

permutations

pro

bab

ility

Candidate 3 fully independent

Minimize:

KL(empirical || riffle indep. approx.)

Best KL split

{12345}

{1345} {2}

vs. Candidate 2 riffle independent

10

Irish House of Parliament election 2002

64,081 votes, 14 candidates

Two main parties: Fianna Fail (FF)Fine Gael (FG)

Minor parties: Independents (I) Green Party (GP)Christian Solidarity (CS)Labour (L)Sinn Fein (SF)

[Gormley, Murphy, 2006]

Ireland

11

Approximating the Irish Election

“True” first order marginals

Riffle Independent approx.

n=14 candidates

Major parties riffle independent of minor parties?

Sinn Fein, Christian Solidarity marginals not well captured by a single riffle

independent split!

CandidatesFF FF FFFG FGFGI I I I GPCS SF L

CandidatesFF FF FFFG FGFGI I I I GPCS SF L

Prob(candidate i was ranked j)

12

Back to Fruits and Vegetables

banana, apple, orange, broccoli, carrot, lettuce

banana, apple, orange

broccoli, carrot, lettuce

candy, cookies

candy, cookies

??

candy, cookies

??

Need to factor out {candy, cookies} first!

(Sinn Fein, Christian Solidarity)

13

Hierarchical Decompositions

Banana, apple, orange, broccoli, carrot, lettuce,

candy, cookies

Banana, apple, orange Broccoli, carrot, lettuce

Candy, cookiesBanana, apple,

orange, broccoli, carrot, lettuce

All foods items

Healthy food Junk food

Fruits Vegetables

Fruits, Vegetables, marginally riffle independent

14

Generative process for hierarchy

Interleave Healthy foods with Junk food

Interleave fruits/vegetables

Rank Junk food

Rank fruits Rank vegetables

better

Hierarchical Riffle Independent Models- Encode intuitive independence constraints

- Can be learned with lower sample complexity

- Have interpretable parameters/structure

15

Contributions

Structured Representation: Introduction of a hierarchical model based on riffled independence factorizations

Structure Learning Objectives: An objective function for learning model structure from ranking data

Structure Learning Algorithms: Efficient algorithms for optimizing the proposed objective

16

Learning a HierarchyExponentially many hierarchies, each encoding a distinct set of independence assumptions

{12345}

{1345}

{2}

{13} {45}

{12345}

{1234}

{5}

{234} {1}

{2} {34}

{12345}

{245} {13}

{24} {5}

{12345}

{2345}

{1}

{34} {25}

{12345}

{1345}

{3}

{1} {452}

{12345}

{1234}

{5}

{234} {1}

{24} {3}

{12345}

{243} {15}

{34} {2}

Problem statement: given i.i.d. ranked data, learn the hierarchy that generated the

data

17

Learning a HierarchyOur Approach: top-down partitioning of item set X={1,…,n}Binary splits at each stage

{12345}

{1234}

{5}

{234}{1}

18

10 20 30 40 50 60 70 80 90 100 110 1200

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

permutations

pro

bab

ility

Minimize:

KL(empirical || hierarchical model)

Hierarchical Decomposition

KL(true,best hierarchy)=6.76e-02, TV(true, best hierarchy)=1.44e-01

Best KL hierarchy

{12345}

{1345} {2}

{13} {45}Research

psychologistsClinical

psychologists

Community psychologists

candidates

rank

s

1 2 3 4 5

1

2

3

4

5

candidates

rank

s

1 2 3 4 5

1

2

3

4

50

0.05

0.1

0.15

0.2

0.25

“True” first order Learned first order

19

KL-based objective function

KL Objective:Minimize KL(true distribution || riffle independent approx)

Algorithm:For each binary partitioning of the item set: [A,B]

Estimate parameters: (ranking probabilities, for A, B and interleaving probabilities)Compute log likelihood of data

Return maximum likelihood partition

Need to search over exponentially many subsets!

If hierarchical structure of A or B is unknown, might not have enough samples to learn parameters!

20

B

Finding fully independent sets by clustering

Why this won’t work (for riffled independence):

Pairs of candidates on opposite sides of the split can be strongly correlated in general(If I vote up Democrat, I am likely to vote down Republican)

Compute pairwise mutual informationsPartition resulting graph

APairwise measures unlikely to be

effective for detecting riffled independence!

21

Higher order measure of riffled independence

Key insight:Riffled independence means: absolute rankings in A not informative about relative rankings in B

If i, (j,k) lie on opposite sides,Mutual information=0

Idea: measure mutual information between singleton rankings and pairwise

rankings

preference over Democrat i

relative preference over Republicans j & k

22

Tripletwise objective function

Objective function (want to minimize):

A Ball items in set A

–plays nono role

inobjectiv

e

Tripletwise measure: no longer obviously a graph cut problem…

23

Efficient Splitting: Anchors heuristic

Given two elements of A, we can decide whether rest of elements are in A or B:

large

small

anchor elements

Large

Small

24

Efficient Splitting: Anchors heuristic

In practice, anchor elements a1, a2, unknown!

Theorem: Anchors heuristic recovers riffle independent split with high probability given polynomial samples (under certain strong connectivity assumptions)

251 2 3 4 5

-6300

-6200

-6100

-6000

-5900

-5800

-5700

-5600

-5500

-5400

log-

likel

ihoo

d

log10(# samples)

true structure known

learned structure

random 1-chain (with learned parameters)

Structure learning with synthetic data

16 items, 4 items in each leaf

26

Anchors on Irish election data

Irish Election hierarchy (first four splits):{1,2,3,4,5,6,7,8,9,10,11,12,13,14

}

{1,2,3,4,5,6,7,8,9,10,11,13,14

}

{12}

{11}{1,2,3,4,5,6,7,8,9,10,13,14}

{2,3,5,6,7,8,9,10,14}

{1,4,13}

{2,5,6} {3,7,8,9,10,14}

Sinn Fein

Christian Solidarity

Fianna Fail

Fine Gael Independents, Labour, Green

Brute force optimization: 70.2sAnchors method: 2.3s

Running time

“True” first order Learned first order

Candidates

Ran

ks

2 4 6 8 10 12 14

2468101214

Candidates

Ran

ks

2 4 6 8 10 12 14

2468101214 0

0.05

0.1

0.15

0.2

0.25

27

Irish log likelihood comparison

50 85 144 243 412 698 1180 2000-5000

-4500

logl

ikel

ihoo

d

# samples (logarithmic scale)

Optimized 1-chain

Optimized Hierarchy

28

76 171 389 882 20010

0.2

0.4

0.6

0.8

1

# samples (log scale)

Suc

cess

rat

eMajor party {FF, FG}

leaves recovered

Top partition recovered

All leaves recovered

Full tree recovered

Bootstrapped substructures: Irish data

29

is a natural notion of independence for rankingscan be exploited for efficient inference, low sample complexityapproximately holds in many real datasets

Hierarchical riffled independencecaptures more structure in datastructure can be learned efficiently:

related to clustering and to graphical model structure learning efficient algorithm & polynomial sample complexity result

Riffled Independence…

Acknowledgements: Brendan Murphy and Claire Gormley provided the Irish voting datasets. Discussions with Marina Meila provided important initial ideas upon which this work is based.

30

Sushi rankingDataset: 5000 preference rankings of 10 types of sushi

Types1. Ebi (shrimp)2. Anago (sea eel)3. Maguro (tuna)4. Ika (squid)5. Uni (sea urchin)6. Sake (salmon roe)7. Tamago (egg) 8. Toro (fatty tuna)9. Tekka-make (tuna roll)10. Kappa-maki (cucumber roll)

sushi

ran

ks

Prob(sushi i was ranked j)

Fatty tuna (Toro)is a favorite!

No one likes cucumber roll !

31

{1,2,3,4,5,6,7,8,9,10}

{2} {1,3,4,5,6,7,8,9,10}

{1,3,5,6,7,8,9,10}{4}

{1,3,7,8,9,10} {5,6}

{3,7,8,9,10} {1}

{3,8,9} {7,10}

(sea eel)

(squid)

(sea urchin, salmoe roe)

(shrimp)

(tuna, fatty tuna, tuna roll) (egg, cucumber roll)

Sushi hierarchy