Query-drift prevention for robust query expansion - presentation

34
Robust Query Expansion Based on Query-Drift Prevention Robust Query Expansion Based on Query-Drift Prevention Liron Zighelnic Academic advisor: Dr. Oren Kurland Based on our work at SIGIR 08’ The Faculty of Industrial Engineering and Management Technion - Israel Institute of Technology 30.6.2009 - Information Systems Seminar

Transcript of Query-drift prevention for robust query expansion - presentation

Page 1: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Robust Query Expansion Based on Query-DriftPrevention

Liron ZighelnicAcademic advisor: Dr. Oren Kurland

Based on our work at SIGIR 08’

The Faculty of Industrial Engineering and ManagementTechnion - Israel Institute of Technology

30.6.2009 - Information Systems Seminar

Page 2: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Background

retrieval

Outline1 Background

Ad Hoc Retrieval

2 Query ExpansionMotivationPseudo Relevance FeedbackThe Performance Robustness ProblemQuery Expansion Models

3 Query-Drift PreventionImproving Robustness Using FusionImproving Robustness Using Re-ordering Methods

4 Experimental Evaluation

5 Related Work

6 Summary

7 Questions

Page 3: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Background

retrieval

Our Mission - Ad Hoc Retrieval

Information Need

Corpus C

Ranked list of

documents

initD

1d

2d

nd

1nd

+

id C∈

Retrieval

System

3d

4d

(d)Scoreinit

Query q

documents

Page 4: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query Expansion

Motivation

Outline1 Background

Ad Hoc Retrieval

2 Query ExpansionMotivationPseudo Relevance FeedbackThe Performance Robustness ProblemQuery Expansion Models

3 Query-Drift PreventionImproving Robustness Using FusionImproving Robustness Using Re-ordering Methods

4 Experimental Evaluation

5 Related Work

6 Summary

7 Questions

Page 5: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query Expansion

Motivation

Query Expansion - Motivation

Users tend to use (very) short queriesThe polysemy problem (e.g., q: "Paris Hilton")The vocabulary mismatch problem (e.g., q: "view photos" d:"nature picures")

Expansion: Relevance Feedback vs. Pseudo RelevanceFeedback (a.k.a. blind feedback)(Buckley et al. 94’, Xu and Croft96’)

Page 6: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query Expansion

Pseudo Relevance Feedback

Outline1 Background

Ad Hoc Retrieval

2 Query ExpansionMotivationPseudo Relevance FeedbackThe Performance Robustness ProblemQuery Expansion Models

3 Query-Drift PreventionImproving Robustness Using FusionImproving Robustness Using Re-ordering Methods

4 Experimental Evaluation

5 Related Work

6 Summary

7 Questions

Page 7: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query Expansion

Pseudo Relevance Feedback

Pseudo Relevance Feedback

Expanded

Query

2'd

3'd

4'd

⋯ ⋯

(d)Scorepf

Retrieval

System

( )init

PF D

1'd

'nd

Expansion-Based List

Page 8: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query Expansion

The Performance Robustness Problem

Outline1 Background

Ad Hoc Retrieval

2 Query ExpansionMotivationPseudo Relevance FeedbackThe Performance Robustness ProblemQuery Expansion Models

3 Query-Drift PreventionImproving Robustness Using FusionImproving Robustness Using Re-ordering Methods

4 Experimental Evaluation

5 Related Work

6 Summary

7 Questions

Page 9: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query Expansion

The Performance Robustness Problem

The Performance Robustness Problem

Problems:

Dinit may contain many non relevant documents.

The initially retrieved document list Dinit may not manifest allquery-related aspects (Buckley 04’)

Consequences:

query drift- the shift in “intention” from the original query to itsexpanded form. (Mitra et al. 98’) (e.g., q: "Paris Hilton", q’: "ParisHilton Whitney model heiress")

While on average, pseudo-feedback-based query expansion methodsimprove retrieval effectiveness over that of retrieval using the originalquery, there are numerous queries for which this is not true

Page 10: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query Expansion

The Performance Robustness Problem

The Performance Robustness Problem

Problems:

Dinit may contain many non relevant documents.

The initially retrieved document list Dinit may not manifest allquery-related aspects (Buckley 04’)

Consequences:

query drift- the shift in “intention” from the original query to itsexpanded form. (Mitra et al. 98’) (e.g., q: "Paris Hilton", q’: "ParisHilton Whitney model heiress")

While on average, pseudo-feedback-based query expansion methodsimprove retrieval effectiveness over that of retrieval using the originalquery, there are numerous queries for which this is not true

Page 11: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query Expansion

The Performance Robustness Problem

The Performance Robustness Problem

Problems:

Dinit may contain many non relevant documents.

The initially retrieved document list Dinit may not manifest allquery-related aspects (Buckley 04’)

Consequences:

query drift- the shift in “intention” from the original query to itsexpanded form. (Mitra et al. 98’) (e.g., q: "Paris Hilton", q’: "ParisHilton Whitney model heiress")

While on average, pseudo-feedback-based query expansion methodsimprove retrieval effectiveness over that of retrieval using the originalquery, there are numerous queries for which this is not true

Page 12: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query Expansion

The Performance Robustness Problem

The Performance Robustness Problem - Cont.

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

302 304

306 308

310 312

314 316

318 320

322 324

326 328

330 332

334 336

338 340

342 344

346 348

350

Diffe

ren

ce

in

Eff

ective

ne

ss

Queries

RM1 Query Drift - ROBUST Corpus Queries 301-350

Page 13: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query Expansion

Query Expansion Models

Outline1 Background

Ad Hoc Retrieval

2 Query ExpansionMotivationPseudo Relevance FeedbackThe Performance Robustness ProblemQuery Expansion Models

3 Query-Drift PreventionImproving Robustness Using FusionImproving Robustness Using Re-ordering Methods

4 Experimental Evaluation

5 Related Work

6 Summary

7 Questions

Page 14: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query Expansion

Query Expansion Models

Query Expansion Models

The Relevance Model - RM1 (Lavrenko and Croft 01’): The relevancemodel paradigm assumes that there exists a (language) model RM1that generates terms both in the query and in the relevant documents 1

pRM1(w)def= ∑

d∈Dinit

pd(w)pq(d)

The Interpolated Relevance Model - RM3 (Abdul-Jaleel et al. 04’):query-anchoring at the model level:

pRM3(w)def= λpq(w)+(1−λ )pRM1(w)

—————————————

1. px (y) denotes the "similarity" between x and y

Page 15: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query Expansion

Query Expansion Models

Query Expansion Models- Cont.

Rocchio-1: If we take RM1 model and set pq(d) to a uniformdistribution we get the following model:

pRocchio1(w)def= ∑

d∈Dinit

pd(w)∗ 1|Dinit|

where all documents in Dinit are equal contributors to the constructed model.

Rocchio-3 (Rocchio 71’): query-anchoring at the model level:

pRocchio3(w)def= λpq(w)+(1−λ ) ∑

d∈Dinit

pd(w)∗ 1|Dinit|

Page 16: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query Expansion

Query Expansion Models

Query Expansion Models- Cont.

Model Weigh Interpolationwith respect with the

to pq(d) original queryRM1 3 7

∑d∈Dinitpd(w)pq(d)

RM3 3 3

λpq(w)+(1−λ )∑d∈Dinitpd(w)pq(d)

Rocchio1 7 7

∑d∈Dinitpd(w)∗ 1

|Dinit|Rocchio3 7 3

λpq(w)+(1−λ )∑d∈Dinitpd(w)∗ 1

|Dinit|

Page 17: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query-Drift Prevention

Improving Robustness Using Fusion

Outline1 Background

Ad Hoc Retrieval

2 Query ExpansionMotivationPseudo Relevance FeedbackThe Performance Robustness ProblemQuery Expansion Models

3 Query-Drift PreventionImproving Robustness Using FusionImproving Robustness Using Re-ordering Methods

4 Experimental Evaluation

5 Related Work

6 Summary

7 Questions

Page 18: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query-Drift Prevention

Improving Robustness Using Fusion

Our Idea - Using Fusion

Data fusion - combining retrieval methods or query representations.Data fusion - motivation:

Using a variety of methods (results) will utilize different aspects ofthe search space and hence will return more relevant results.

Performance effectiveness due to minimal overhead.

Page 19: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query-Drift Prevention

Improving Robustness Using Fusion

Improving Robustness Using Fusion - Motivation

Documents ranked high by both retrieved lists are potentiallyrelevant since they constitute a good match to both forms of thepresumed information need.

A document ranked high by the initial retrieval can be assumed tohave a high surface level similarity to the original query

Query expansion can add aspects that were not in the originalquery but may be relevant to the information need and mayimprove the retrieval.

A document that is ranked high by both the initial retrieval and theexpansion is assumed (potentially) to suffer less from query drift.

Documents that are retrieved using a variety of queryrepresentations have a high chance of being relevant. (Belkin etal. 93’, Robertson 97’)

Page 20: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query-Drift Prevention

Improving Robustness Using Fusion

Improving Robustness Using Fusion

The following retrieval methods operate on Dinit∪PF(Dinit).

Combmnz (Fox and Shaw 94’) rewards documents that are rankedhigh in both Dinit and PF(Dinit): 23

Scorecombmnz(d)def= (δ[d ∈Dinit]+δ[d ∈ PF(Dinit)])

·( Scoreinit(d)

∑d ′∈DinitScoreinit(d ′)

+Scorepf(d)

∑d ′∈PF(Dinit) Scorepf(d ′)

).

—————————————

2. For statement s, δ[s] = 1 if s is true and 0 otherwise.

Page 21: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query-Drift Prevention

Improving Robustness Using Fusion

Improving Robustness Using Fusion - Cont.

The interpolation algorithm:Differentially weights the initial score and the pseudo-feedback-basedscore using an interpolation parameter λ :

Scoreinterpolation(d)def=

λδ[d ∈Dinit]Scoreinit(d)

∑d ′∈DinitScoreinit(d ′)

+(1−λ )δ[d ∈ PF(Dinit)]Scorepf(d)

∑d ′∈PF(Dinit) Scorepf(d ′).

Page 22: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query-Drift Prevention

Improving Robustness Using Re-ordering Methods

Outline1 Background

Ad Hoc Retrieval

2 Query ExpansionMotivationPseudo Relevance FeedbackThe Performance Robustness ProblemQuery Expansion Models

3 Query-Drift PreventionImproving Robustness Using FusionImproving Robustness Using Re-ordering Methods

4 Experimental Evaluation

5 Related Work

6 Summary

7 Questions

Page 23: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query-Drift Prevention

Improving Robustness Using Re-ordering Methods

Improving Robustness Using Re-ordering MethodsrerankThe rerank method (e.g. Kurland and Lee 04’) re-orders the (top)pseudo-feedback-based retrieval results by the initial scores ofdocuments. This method anchors the documents in PF(Dinit) to thequery by using their initial scores.

Scorererank(d)def= δ[d ∈ PF(Dinit)]Scoreinit(d).

rev_rerankThe rev_rerank method re-orders the (top) initial retrieval results by thepseudo-feedback-based scores of documents

Scorerev_rerank(d)def= δ[d ∈Dinit]Scorepf(d).

Page 24: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Query-Drift Prevention

Improving Robustness Using Re-ordering Methods

Improving Robustness Using Re-ordering MethodsrerankThe rerank method (e.g. Kurland and Lee 04’) re-orders the (top)pseudo-feedback-based retrieval results by the initial scores ofdocuments. This method anchors the documents in PF(Dinit) to thequery by using their initial scores.

Scorererank(d)def= δ[d ∈ PF(Dinit)]Scoreinit(d).

rev_rerankThe rev_rerank method re-orders the (top) initial retrieval results by thepseudo-feedback-based scores of documents

Scorerev_rerank(d)def= δ[d ∈Dinit]Scorepf(d).

Page 25: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Experimental Evaluation

Evaluation

Evaluation methods:MAP - Mean Average Precision - effectiveness measurement<Init - Percentage of queries for which the expansion-basedperformance is worse than that of using the original query(measure of robustness)

TREC collections:corpus queries disksTREC 51-200 1-3ROBUST 301-450, 601-700 4,5WSJ 151-200 1-2SJMN 51-150 3AP 51-150 1-3

Page 26: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Experimental Evaluation

Query Drift Prevention Methods Applied for RM1

10

15

20

25

30

35

MAP

TREC ROBUST WSJ SJMN AP

Corpus

Query Drift Prevention Methods Applied for RM1 - MAP

RM1

Interpolation

combmnz

rerank

rev_rerank

RM3

10

15

20

25

30

35

40

45

50

<Init

TREC ROBUST WSJ SJMN AP

Corpus

Query Drift Prevention Methods Applied for RM1- Robustness

RM1

Interpolation

combmnz

rerank

rev_rerank

RM3

i,ei,ei,e

i i i

i,ei,ei,e

i ii

i,ei i

i

i

i,e

ii

iii,e

i,e

i i

i i,ei,e

i

“i” and “e” indicate

statistically

significant MAP

differences with the

initial ranking and

RM1 respectively

Page 27: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Experimental Evaluation

Robustness of Expansion Methods w/o Combmnz

0

10

20

30

40

50

<Init

TREC ROBUST WSJ SJMN AP

Corpus

Robustness of Expansion Methods

RM1

RM3

Rocchio1

Rocchio3

0

10

20

30

40

50

<Init

TREC ROBUST WSJ SJMN AP

Corpus

Robustness of Combmnz Applied for Expansion Methods

RM1

combmnz

RM3

combmnz

Rocchoi1

combmnz

Rocchio3

combmnz

Page 28: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Experimental Evaluation

Robustness Improvement Due to Combmnz

0

0.1

0.2

0.3

0.4

0.5

0.6

% Improvement

TREC ROBUST WSJ SJMN AP AVERAGE

Corpus

Robustness Improvement Due to Combmnz

RM3

combmnz

RM1

combmnz

Rocchio3

combmnz

Rocchoi1

combmnz

Page 29: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Experimental Evaluation

RM3 - The Impact of λ on Effectiveness and Robustness

pRM3(w)def= λpq(w)+(1−λ )pRM1(w)

Page 30: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Experimental Evaluation

Comparison with a Cluster-Based Re-sampling Method(Lee et al. 08’)

TREC ROBUST WSJ SJMN APMAP < Init MAP < Init MAP < Init MAP < Init MAP < Init

RM3 20 28.7 30 28.1 34.8 20 24.6 29 29.1 28.3RM3 combmnz 17.9 16.7 27.1 19.3 30.7 18 21.6 23 26.5 16.2RM3 rerank 16.9 22.7 25.5 15.3 28.4 14 19.9 11 25.1 12.1Clusters 19.8 31.3 29.9 32.9 32.7 24 25 31 29.4 28.3

1

Page 31: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Related Work

Related Work

Improving Robustness

Selecting sampling and weighting documents from the initialsearch (e.g, Billerbeck and Zobel 03’, Li and Croft 05’, Tao andZhai 06’, Collins-Thompson and Callan 07’)

Selecting and weighting terms (Mitra et al. 98’, Carpineto et al01’, Cao et al 08’)

Page 32: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Related Work

Related Work - Cont.

Using clustering (Lu et al. 97’ Buckley et al. 98’, Lee et al. 08’)

Predicting whether a given expanded query will be more effectivethan the original one (Cronen-Townsend et al 04’)

Predicting which expansion form from a set of candidates willperform best (Winaver et al. 07’)

Query-anchoring at the model level (Zhai and Lafferty 01’,Abdul-Jaleel et al 04’)

Page 33: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Summary

Summary

Fusion can potentially ameliorate query drift (similarity based vs.rank based)

Trade-off between effectiveness and robustness

Pre-retrieval vs. post-retrieval query anchoring

Page 34: Query-drift prevention for robust query expansion - presentation

Robust Query Expansion Based on Query-Drift Prevention

Questions

Questions?

Thank you for your time