Rank Aggregation Methods for the Webklarson/teaching/F08-886/talks/WLuo.pdf · Rank Aggregation...

Post on 30-Aug-2020

5 views 0 download

Transcript of Rank Aggregation Methods for the Webklarson/teaching/F08-886/talks/WLuo.pdf · Rank Aggregation...

1

CS 886 Advanced Topics in Artificial Intelligence: Multiagent Systems

Rank Aggregation Methods for the WebCynthia Dwork Ravi Kumar Moni Naor D. Sivakumar

Presented by: Wanying Luo

2

Outline

What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion

3

What is rank aggregation problem

A

B

D

C

B

D

A

C

B

A

C

D

Based on different ranking techniques and criteria, we may get different results

4

What is rank aggregation problem

A

B

D

C

B

D

A

C

B

A

C

D

Need to obtain a ”consensus” ranking of all the individual rankings

B

A

D

C

5

Outline

What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion

6

Motivation

Provide robust meta searchExamples of meta search engines

ClustyDogpileMetacrawler

Spamhttp://searchenginewatch.com/showPage.html?page=3483601

Commercial interests, e.g., sponsored links

7

8

9

10

Motivation

Provide robust meta searchExamples of meta search engines

ClustyDogpileMetacrawler

Spamhttp://searchenginewatch.com/showPage.html?page=3483601

Commercial interests, e.g., sponsored links

11

Motivation

Provide robust meta searchExamples of meta search engines

ClustyDogpileMetacrawler

Spamhttp://searchenginewatch.com/showPage.html?page=3483601

Commercial interests, e.g., sponsored links

12

Motivation

User may provide a variety of searching criteria

13

Outline

What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion

14

Challenges

Unrealistic to rank the entire collection of pages on the web

29.7 billion pages on the World Wide Web as of February 2007 (http://www.boutell.com/)

Most search engines rank only the top few hundred entries

15

Challenges

Unrealistic to rank the entire collection of pages on the web

29.7 billion pages on the World Wide Web as of February 2007 (http://www.boutell.com/)

Most search engines rank only the top few hundred entries

16

Outline

What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion

17

Preliminaries

Ordered listGiven a universe U, an ordered list τ with respect to U is anordering(aka ranking) of a subset S ⊆ U, i.e.

τ=[χ1 ≥ χ2 ≥ …≥ χd ]

Full listτ contains all the elements in U

Partial list|τ| < |U|

18

Preliminaries

Ordered listGiven a universe U, an ordered list τ with respect to U is anordering(aka ranking) of a subset S ⊆ U, i.e.

τ=[χ1 ≥ χ2≥ …≥ χd ]

Full listτ contains all the elements in U

Partial list|τ| < |U|

19

Preliminaries

Ordered listGiven a universe U, an ordered list τ with respect to U is anordering(aka ranking) of a subset S ⊆ U, i.e.

τ=[χ1 ≥ χ2≥ …≥ χd ]

Full listτ contains all the elements in U

Partial list|τ| < |U|

20

Preliminaries

Rank aggregation approachGoal: minimize the total disagreement between several rankings Spearman footrule distance

Given two full lists σ and τ, F(σ,τ)=Σi=1|σ(i)-τ(i)|Kendall tau distance

Given two full lists σ and τ, K(σ,τ)=|{(i,j) | i<j, σ(i)<σ(j) but τ(i)>τ(j) }|These two measurements can be generalized to several listsCan also be generalized to partial lists

21

Preliminaries

Rank aggregation approachGoal: minimize the total disagreement between several rankings Spearman footrule distance

Given two full lists σ and τ, F(σ,τ)=Σi=1|σ(i)-τ(i)|Kendall tau distance

Given two full lists σ and τ, K(σ,τ)=|{(i,j) | i<j, σ(i)<σ(j) but τ(i)>τ(j) }|These two measurements can be generalized to several listsCan also be generalized to partial lists

22

Preliminaries

Rank aggregation approachGoal: minimize the total disagreement between several rankings Spearman footrule distance

Given two full lists σ and τ, F(σ,τ)=Σi=1|σ(i)-τ(i)|Kendall tau distance

Given two full lists σ and τ, K(σ,τ)=|{(i,j) | i<j, σ(i)<σ(j) but τ(i)>τ(j) }|These two measurements can be generalized to several listsCan also be generalized to partial lists

23

Preliminaries

Rank aggregation approachGoal: minimize the total disagreement between several rankings Spearman footrule distance

Given two full lists σ and τ, F(σ,τ)=Σi=1|σ(i)-τ(i)|Kendall tau distance

Given two full lists σ and τ, K(σ,τ)=|{(i,j) | i<j, σ(i)<σ(j) but τ(i)>τ(j) }|These two measurements can be generalized to several listsCan also be generalized to partial lists

24

Preliminaries

Rank aggregation approachGoal: minimize the total disagreement between several rankings Spearman footrule distance

Given two full lists σ and τ, F(σ,τ)=Σi=1|σ(i)-τ(i)|Kendall tau distance

Given two full lists σ and τ, K(σ,τ)=|{(i,j) | i<j, σ(i)<σ(j) but τ(i)>τ(j) }|These two measurements can be generalized to several listsCan also be generalized to partial lists

25

Outline

What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion

26

First result: spam resistance in meta-search

Extended Condorcet Criterion (ECC) If there is a partition (C, C’) of S such that for any x∈Cand y∈C’ the majority prefers x to y, then x must be ranked above y

ECC can be used to fight spam in meta-searchHow to achieve ECC efficiently

Local Kemenization method

27

First result: spam resistance in meta-search

Extended Condorcet Criterion (ECC) If there is a partition (C, C’) of S such that for any x∈Cand y∈C’ the majority prefers x to y, then x must be ranked above y

ECC can be used to fight spam in meta-searchHow to achieve ECC efficiently

Local Kemenization method

28

First result: spam resistance in meta-search

Extended Condorcet Criterion (ECC) If there is a partition (C, C’) of S such that for any x∈Cand y∈C’ the majority prefers x to y, then x must be ranked above y

ECC can be used to fight spam in meta-searchHow to achieve ECC efficiently

Local Kemenization method

29

First result: spam resistance in meta-search

An example to illustrate Local Kemenization …

30

First result: spam resistance in meta-search

B

A

C

E

D

A

B

D

E

C

B

C

A

D

E

A

B

D

C

E

31

First result: spam resistance in meta-search

B

A

C

E

D

A

B

D

E

C

B

C

A

D

E

A

B

D

C

E

A

B

32

First result: spam resistance in meta-search

B

A

C

E

D

A

B

D

E

C

B

C

A

D

E

A

B

D

C

E

A

B

B>A A>B B>A

33

First result: spam resistance in meta-search

B

A

C

E

D

A

B

D

E

C

B

C

A

D

E

A

B

D

C

E

B

A

34

First result: spam resistance in meta-search

B

A

C

E

D

A

B

D

E

C

B

C

A

D

E

A

B

D

C

E

B

A

D

35

First result: spam resistance in meta-search

B

A

C

E

D

A

B

D

E

C

B

C

A

D

E

A

B

D

C

E

B

A

D

A>D A>D A>D

36

First result: spam resistance in meta-search

B

A

C

E

D

A

B

D

E

C

B

C

A

D

E

A

B

D

C

E

B

A

D

37

First result: spam resistance in meta-search

B

A

C

E

D

A

B

D

E

C

B

C

A

D

E

A

B

D

C

E

B

A

D

C

38

First result: spam resistance in meta-search

B

A

C

E

D

A

B

D

E

C

B

C

A

D

E

A

B

D

C

E

B

A

D

C is preferred to D in two lists

D is preferred to C in one list

C

C>D D>C C>D

39

First result: spam resistance in meta-search

B

A

C

E

D

A

B

D

E

C

B

C

A

D

E

A

B

D

C

E

B

A

C

D

40

First result: spam resistance in meta-search

B

A

C

E

D

A

B

D

E

C

B

C

A

D

E

A

B

D

C

E

B

A

C

D

A>C A>C C>A

41

First result: spam resistance in meta-search

B

A

C

E

D

A

B

D

E

C

B

C

A

D

E

A

B

D

C

E

B

A

C

D

42

Outline

What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion

43

Second result: Markov chain methods

Markov chain A set of states S={1,2,...,n}An n x n matrix MBegins with an initial state xAt each step the system moves from state i to state j with probability Mij

44

Second result: Markov chain methods

Under some nice condition, system eventually reaches a fixed point irrespective of the initial state x

45

Second result: Markov chain methods

B

A

C

A

B

C

A

C

B

Original rankings

46

Second result: Markov chain methods

B

A

C

A

B

C

A

C

B

0.1 0.1 0.3 0.3

0.2

0.20.50.7

0.6

A

BC

Original rankings

47

Second result: Markov chain methods

B

A

C

A

B

C

A

C

B

0.1 0.1 0.3 0.3

0.2

0.20.50.7

0.6

A

BC

A

B

C

Original rankings Aggregated ranking

48

Second result: Markov chain methods

Assume the current state is page PMC1: The next state is chosen uniformly from the multiset of all pages that were ranked higher than or equal to P by some search engine that ranked PPlease refer to the paper for the rest …

MC2MC3

MC4

49

Second result: Markov chain methods

Assume the current state is page PMC1: The next state is chosen uniformly from the multiset of all pages that were ranked higher than or equal to P by some search engine that ranked PPlease refer to the paper for the rest …

MC2MC3MC4

50

Outline

What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion

51

Applications

Meta-searchSpam reductionMulti-criteria searchSearch engine comparison

52

Applications

Meta-searchSpam reductionMulti-criteria searchSearch engine comparison

53

Applications

Meta-searchSpam reductionMulti-criteria searchSearch engine comparison

54

Applications

Meta-searchSpam reductionMulti-criteria searchSearch engine comparison

55

Outline

What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion

56

Experiments

Experiments were conducted by using the following search engines: Altavista(AV), Alltheweb(AW), Excite(EX), Google(GG), Hotbot(HB), Lycos(LY) and Northernlight(NL)Experiment on meta-search using several keywords: “affirmative action”, alcoholism, sushi, ...

57

Experiments

Experiments were conducted by using the following search engines: Altavista(AV), Alltheweb(AW), Excite(EX), Google(GG), Hotbot(HB), Lycos(LY) and Northernlight(NL)Experiment on meta-search using several keywords: “affirmative action”, alcoholism, sushi, ...

58

Experiments

59

Experiments

SFO and MC4 outperform the other 4 algorithms

MC4 performs better than SFO most of the time

60

Experiments

Experiment on spam reduction using queries: Feng shui, organic vegetable, gardening

61

Experiments

62

Experiments

63

Experiments

64

Experiments

65

Experiments

Local Kemenization works!

66

Outline

What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion

67

Conclusion

Proposed several rank aggregation techniques using Markov chainEstablished the value of Extended CondorcetCriterion (ECC)

Spam resistanceFuture work

Obtain a qualitative understanding of why Markov chain methods perform well

68

Conclusion

Proposed several rank aggregation techniques using Markov chainEstablished the value of Extended CondorcetCriterion (ECC)

Spam resistanceFuture work

Obtain a qualitative understanding of why Markov chain methods perform well

69

Conclusion

Proposed several rank aggregation techniques using Markov chainEstablished the value of Extended CondorcetCriterion (ECC)

Spam resistanceFuture work

Obtain a qualitative understanding of why Markov chain methods perform well

70

Questions???