Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel...

42
A Framework for Result Diversification Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford) , Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft Research)

Transcript of Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel...

Page 1: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

A Framework for Result Diversification

Sreenivas GollapudiSearch Labs, Microsoft Research

Joint work with Aneesh Sharma (Stanford) , Samuel Ieong, Alan Halverson, and Rakesh Agrawal

(Microsoft Research)

Page 2: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Ambiguous queries

wine 2009

Page 3: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Intuitive definition◦ Represent a variety of relevant meanings for a

given query

Mathematical definitions:◦ Minimizing query abandonment

Want to represent different user categories◦ Trade-off between relevance and novelty

Definition of Diversification

Page 4: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Query and document similarities◦ Maximal Marginal Relevance [CG98]◦ Personalized re-ranking of results [RD06]

Probability Ranking Principle not optimal [CK06] ◦ Query abandonment

Topical diversification [Z+05, AGHI09]◦ Needs topical (categorical) information

Loss minimization framework [Z02, ZL06]◦ “Diminishing returns” for docs w/ the same intent is

a specific loss function [AGHI09]

Research on diversification

Page 5: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Express diversity requirements in terms of desired properties

Define objectives that satisfy these properties

Develop efficient algorithms

Metrics and evaluation methodologies

The framework

Page 6: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Inspired by similar approaches for◦ Recommendation systems [Andersen et al ’08]◦ Ranking [Altman, Tennenholtz ’07]◦ Clustering [Kleinberg ’02]

Map the space of functions – a “basis vector”

Axiomatic approach

Page 7: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Input:◦ Candidate documents: U={u1,u2,…, un}, query q

◦ Relevance function: wq(ui)

◦ Distance function: dq(ui, uj) (symmetric, non-metric)

◦ Size k of output result set

Diversification Setup (1/2)

wq(u5

)

u5

u1

u2

u3

u4u6

dq(,u2,u4)

Page 8: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Output◦ Diversified set S* of documents (|S*|= k)◦ Diversification function:

f : S x wq x dq R+ S* = argmax f(S) (|S|=k)

Diversification Setup (2/2)

u5

u1

u2

u3

u4u6

k = 3S* = {u1,u2,u6}

Page 9: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

1. Scale-invariance2. Consistency3. Richness4. Strength

a) Relevanceb) Diversity

5. Stability6. Two technical properties

Axioms

Page 10: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

S* = argmaxS f(S, w(·), d(·, ·))

= argmaxS f(S, w΄(·), d΄(·, ·))

◦ w΄(ui) = α · w(ui)

◦ d΄(ui,uj) = α · d(ui,uj)

Scale Invariance Axiom

• No built-in scalefor f !

S*(3)

Page 11: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

S* = argmaxS f(S, w(·), d(·, ·))

= argmaxS f(S, w΄(·), d΄(·, ·))

◦ w΄(ui) = w(ui) + ai for ui є S*

◦ d΄(ui,uj) = d(ui,uj) + bi for ui and/or uj є S*

Consistency Axiom

• Increasing relevance/ diversity doesn’t hurt!

S*(3)

Page 12: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

S*(k) = argmaxS f(S, w(·), d(·, ·),k)

◦S*(k) S*(k+1) for all k

Stability Axiom

• Output set shouldn’t oscillate (change arbitrarily) with size

S*(3)

S*(4)

Page 13: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Proof via constructive argument

Impossibility result

Theorem: No function f can satisfy all the axioms simultaneously.

Scale-invariance, Consistency, Richness,

Strength of Relevance/Diversity, Stability, Two technical properties

Page 14: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Baseline for what is possible

Mathematical criteria for choosing f

Modular approach: f is independent of specific wq(·) and dq(·, ·)!

Axiomatic characterization– Summary

Page 15: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Express diversity requirements in terms of desired properties

Define objectives that satisfy these properties

Develop efficient algorithms

Metrics and evaluation methodologies

A Framework for Diversification

Page 16: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Max-sum (avg) objective:

Diversification objectives

u5

u1

u2

u3

u4u6

k = 3S* =

{u1,u2,u6}

Violates stability!

u3 u5

k = 4S* = {u1,u3,u5,u6}

Page 17: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Max-min objective:

Diversification objectives

u5

u1

u2

u3

u4u6

k = 3S* =

{u1,u2,u6}

Violates consistency and stability!

u5

S* = {u1,u5,u6}

Page 18: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

A taxonomy-based diversification objective◦ Uses the analogy of marginal utility to determine

whether to include more results from an already covered category

◦ Violates stability and one of the technical axioms

Other Diversification objectives

Page 19: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Express diversity requirements in terms of desired properties

Define objectives that satisfy these properties

Develop efficient algorithms

Metrics and evaluation methodologies

The Framework

Page 20: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Recast as facility dispersion◦ Max-sum (MaxSumDispersion):

◦ Max-min(MaxMinDispersion):

Known approximation algorithms

Lower bounds

Lots of other facility dispersion objectives and algorithms

Algorithms for facility dispersion

Page 21: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Algorithm for categorical diversification S = ∅ ∀c ∈ C, U (c |q) ← P (c |q) while |S| < k do for d ∈ D do g (d |q, c) ← c U (c |q)V (d |q,c) end for d∗ ← argmax g (d | q, c) S ← S ∪ {d∗} ∀c ∈ C, U (c |q) ← (1−V (d∗ |q, c))U (c |q) D ← D \ {d∗} end while

P(c | q): conditional prob of intent c given query q

g(d | q, c): current prob of d satisfying q, c

Update the utility of a category

Page 22: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Intent distribution: P (R |q) = 0.8, P (B |q) = 0.2.

0.4

An Example

0.9

0.5

0.4

0.4

D V(d | q, c)

0.08

0.72

0.40

0.32

0.08

g(d | q, c)

U(R | q) = U(B | q) =0.8 0.2

×0.8×0.8×0.8×0.2×0.2

×0.08×0.08×0.2×0.2

0.08

0.08

0.04

0.03

0.08

0.12

×0.08×0.08

×0.12 0.050.4

0.9

0.4

0.07

S• Actually produces an

ordered set of results

• Results not proportional to intent distribution

• Results not according to (raw) quality

• Better results ⇒ less needed to be shown

Page 23: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Express diversity requirements in terms of desired properties

Define objectives that satisfy these properties

Develop efficient algorithms

Metrics and evaluation methodologies

The Framework

Page 24: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Approach◦ Represent real queries◦ Scale beyond a few user studies

Problem: Hard to define ground truth

Use disambiguated information sources on the web as the ground truth

Incorporate intent into human judgments◦ Can exploit the user distribution (need to be careful)

Evaluation Methodologies

Page 25: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Query = Wikipedia disambiguation page title

Large-scale ground truth set Open source Growing in size

Wikipedia Disambiguation Pages

Page 26: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Novelty◦ Coverage of wikipedia topics

Relevance◦ coverage of top Wikipedia results

Metrics Based on Wikipedia Topics

Page 27: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Relevance function:◦ 1/position◦ Can use the search engine score◦ Maybe use query category information

Distance function:◦ Compute TF-IDF distances◦ Jaccard similarity score for two documents A and

B:

The Relevance and Distance Functions

Page 28: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Evaluating Novelty

Topics/categories = list of disambiguation topics

Given a set Sk of results:◦ For each result, compute a distribution over topics

(using our d(·, ·))◦ Sum confidence over all topics◦ Threshold to get # topics represented

jaguar.com

Jaguar cat (0.1)

Jaguar car (0.9)

wikipedia.org/jaguar

Jaguar cat (0.8)

Jaguar car (0.2)

Category confidence

• Jaguar cat: 0.1+0.8

• Jaguar car: 0.9+0.2

Threshold = 1.0

• Jaguar cat: 0• Jaguar car: 1

Page 29: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Evaluating Relevance Query – get ranking of search restricted to

Wikipedia pages a(i) = position of Wikipedia topic i in this

list b(i) = position of Wikipedia topic i in list

being tested Relevance is measured in terms of

reciprocal ranks:

Page 30: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Adding Intent to Human Judgments(Generalizing Relevance Metrics)

Take expectation over distribution of intents◦ Interpretation: how will the average user feel?

Consider NDCG@k◦ Classic:

◦ NDCG-IA depends on intent distribution and intent-specific NDCG

c

ckSqcPkS )|;(NDCG)|();(IA-NDCG

)|;(DCG/)|;(DCG)|;(NDCG ideal ckSckSckS

Page 31: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Evaluation using Mechanical Turk Created two types of

HITs on Mechanical Turk◦ Query classification:

workers are asked to choose among three interpretations

◦ Document rating (under the given interpretation)

Two additional evaluations◦ MT classification +

current ratings◦ MT classification + MT

document ratings

Page 32: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Some Important Questions When is it right to diversify?

◦ Users have certain expectations about the workings of a search engine

What is the best way to diversify?◦ Evaluate approaches beyond diversifying the

retrieved results Metrics that capture both relevance and

diversity◦ Some preliminary work suggests that there will be

certain trade-offs to make

Page 33: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Questions?

Page 34: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Otherwise, need to encode explicit user model in the metric◦ Selection only needs k (which is 10)

Later, can rank set according to relevance◦ Personalize based on clicks

Alternative to stability:◦ Select sets repeatedly (this loses information)◦ Could refine selection online, based on user clicks

Why frame diversification as set selection?

Page 35: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

00.

10.

20.

30.

40.

5

0.60

0000

0000

0000

1

0.70

0000

0000

0000

10.

80.

9 10

20

40

60

80

100

120

140

160

180

Novelty difference over 650 ambiguous queries

Max-sumMax-min

Normalized difference in novelty between diversified and original results

Fre

quency c

ount

Novelty Evaluation – Effect of Algorithms

Page 36: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

-1-0

.9-0

.8

-0.7

0000

0000

0000

01

-0.6

0000

0000

0000

01 -0.5

-0.4

-0.3

-0.2

-0.1 0

0.1

0.2

0.3

0.4

0.5

0.60

0000

0000

0000

1

0.70

0000

0000

0000

10.

80.

9 10

100

200

300

400

500

600

Relevance difference over 650 ambiguous queries

Max-sumMax-min

Normalized difference in relevance between diversified and original results

Fre

qu

en

cy c

ou

nt

Relevance Evaluation – Effect of Algorithms

Page 37: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Product Evaluation – Anecdotal Result

• Results for query cd player

• Relevance: popularity• Distance: from product hierarchy

Page 38: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Preliminary Results (100 queries)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Novelty for Max-sum as a function of thresholds and lambda

0.10.20.40.60.812468

Thresholds for measuring novelty

Fra

cti

onal diff

ere

nce in n

ovelt

y

Page 39: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Evaluation using Mechanical Turk

MAP-IA@3 MAP-IA@5 [email protected]

0.10

0.20

0.30

0.40

0.50

0.60Diverse Engine 1 Engine 2 Engine 3

MA

P-I

A v

alu

e

NDCG-IA@3 NDCG-IA@5 [email protected]

0.05

0.10

0.15

0.20

0.25Diverse Engine 1 Engine 2 Engine 3

ND

CG

-IA

valu

e

MRR-IA@3 MRR-IA@5 [email protected]

0.10

0.20

0.30

0.40

0.50

0.60 Diverse Engine 1 Engine 2 Engine 3

Page 40: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Other Measures of Success Many metrics for relevance

◦ Normalized discounted cumulative gains at k (NDCG@k)

◦ Mean average precision at k (MAP@k)◦ Mean reciprocal rank (MRR)

Some metrics for diversity◦ Maximal marginal relevance (MMR) [CG98]◦ Nugget-based instantiation of NDCG [C+08]

Want a metric that can take into account both relevance and diversity

[JK00]

Page 41: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Problem Statement

DIVERSIFY(K) Given a query q, a set of documents D,

distribution P(c | q), quality estimates V(d | c, q), and integer k

Find a set of docs S D with |S| = k that maximizes

interpreted as the probability that the set S is relevant to the query over all possible intentions

c Sd

cqdVqcPqSP )),|(1(1)(|()|(

Find at least one relevant docMultiple intents

Page 42: Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford), Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft.

Discussion of Objective Makes explicit use of taxonomy

◦ In contrast, similarity-based: [CG98], [CK06], [RKJ08] Captures both diversification and doc relevance

◦ In contrast, coverage-based: [Z+05], [C+08], [V+08] Specific form of “loss minimization” [Z02], [ZL06] “Diminishing returns” for docs w/ the same intent Objective is order-independent

◦ Assumes that all users read k results◦ May want to optimize k P(k) P(S | q)