Reverted Indexing for Expansion and Feedback

41
Reverted Indexing for Feedback and Expansion Jeremy Pickens, Matthew Cooper, Gene Golovchinsky

description

Pickens, J., Cooper, M., and Golovchinsky, G. Reverted Indexing for Expansion and Feedback. In Proc. CIKM 2010, Toronto, Canada, ACM Press. See http://fxpal.com/?p=abstract&abstractID=581

Transcript of Reverted Indexing for Expansion and Feedback

Page 1: Reverted Indexing for Expansion and Feedback

Reverted Indexing for Feedback and Expansion

Jeremy Pickens, Matthew Cooper,

Gene Golovchinsky

Page 2: Reverted Indexing for Expansion and Feedback

Reverted Indexing for Feedback and Expansion

Jeremy PickensCatalyst Repository Systems

Page 3: Reverted Indexing for Expansion and Feedback

Query-Document Duality has long history

• Using queries to label documents

• Queries and documents as bipartite graph– Used for random walks– Used for partitioning

• Reverse Querying

Page 4: Reverted Indexing for Expansion and Feedback

Motivation – Three R’s

Retrievability

Reuse (Algorithmic)

Recall-Oriented Tasks

Page 5: Reverted Indexing for Expansion and Feedback

Our Key Contribution

We treat query result sets as unstructured text “documents” -- and index them

Page 6: Reverted Indexing for Expansion and Feedback

Outline

• Reverted Documents• Reverted Indexing• Experimental Setup• Results

– Effectiveness– Efficiency

• Related Work• Future Extensions

Page 7: Reverted Indexing for Expansion and Feedback

Reverted Document

Query Expression

Ranking Algorithm

Results (docid)

Results (score)

ID(Basis Query)

Body

Page 8: Reverted Indexing for Expansion and Feedback

Basis Query(Reverted Document ID)

Query Expression

RankingAlgorithm

giraffe BM25

cheetah BM25

gazelle BM25

gazelle Language Model

gazelle PL2 (Divergence from Randomness)

gazelle Y

gazelle B

gazelle G

fast cheetah BM25

cheetah AND NOT gazelle Boolean

Latitude+Longitude of Zanzibar Euclidean distance

Page 9: Reverted Indexing for Expansion and Feedback

Reverted Document Body

Results (docid)

Results (score)

Canonical URL and/or docid

1. Probability of Relevance2. Cosine similarity3. KL Divergence4. Raw Rank5. 1 or 0 (Boolean)

Page 10: Reverted Indexing for Expansion and Feedback

rank docid score shift-scale Ahn&Moffat

1 #415 0.82 10.0 10

2 #32 0.73 8.92 9

3 #63 0.62 7.57 8

4 #7 0.49 5.95 6

5 #56 0.35 4.24 4

6 #12 0.14 1.72 2

7 #108 0.12 1.36 1

8 #115 0.09 1.09 1

9 #42 0.08 1.0 1

10 #85 0.08 1.0 1

Result Set→Document Body

Page 11: Reverted Indexing for Expansion and Feedback

Result Set→Document Bodydocid Ahn&Moffat

#415 10

#32 9

#63 8

#7 6

#56 4

#12 2

#108 1

#115 1

#42 1

#85 1

<text>415 415 415 415 415 415 415 415 415 415 32 32 32 32 32 32 32 32 32 63 63 63 63 63 63 63 63 7 7 7 7 7 7 56 56 56 56 12 12 108 115 42 85</text>

Page 12: Reverted Indexing for Expansion and Feedback

Reverted Document

Query Expression

Ranking Algorithm

Results (docid)

Results (score)

ID(Basis Query)

Body

Page 13: Reverted Indexing for Expansion and Feedback

Reverted Document<document><docid>[gazelle : BM25]</docid><text>415 415 415 415 415 415 415 415 415 415 32 32 32 32 32 32 32 32 32 63 63 63 63 63 63 63 63 7 7 7 7 7 7 56 56 56 56 12 12 108 115 42 85</text></document>

Page 14: Reverted Indexing for Expansion and Feedback

Fin

Questions?

Page 15: Reverted Indexing for Expansion and Feedback

Outline

• Reverted Documents• Reverted Indexing• Experimental Setup• Results

– Effectiveness– Efficiency

• Related Work• Future Extensions

Page 16: Reverted Indexing for Expansion and Feedback

Reverted Indexing

1. Choose a set of basis queries

2. For each basis query:1. Execute each query, producing results up to

cutoff depth k

2. Use results to create a “reverted document”

3. Add the reverted document to the index

How basis queries are chosen (in these experiments): All singleton terms (unigrams) with df ≥ 2. Ranking algorithm for all basis queries is PL2.

Page 17: Reverted Indexing for Expansion and Feedback

Standard Index

Page 18: Reverted Indexing for Expansion and Feedback

Reverted Index

Page 19: Reverted Indexing for Expansion and Feedback
Page 20: Reverted Indexing for Expansion and Feedback

Reverted Index Statistics

Retrieval Score of docid Term Frequency

Sum of Retrieval Scores of all docids retrieved by

a Basis Query

Document Length

Number of Basis Queries that docid was

retrieved by

Document Frequency

Page 21: Reverted Indexing for Expansion and Feedback

Outline

• Reverted Documents• Reverted Indexing• Experimental Setup• Results

– Effectiveness– Efficiency

• Related Work• Future Extensions

Page 22: Reverted Indexing for Expansion and Feedback

Experiment: Relevance Feedback

1. Run initial query using PL2 (Terrier platform)[poaching wildlife preserves]

2. Judge top k documents for relevance

3.

4. Expand using top 500 terms (strongest baseline @ 500)

5. Run expanded query using PL2

6. Evaluate

Use KL Divergence to select and weight query expansion terms

Use Bo1 to select and weight query expansion terms

Use PL2 retrieval on the Reverted Index to select and weight query expansion terms

Page 23: Reverted Indexing for Expansion and Feedback

Reverted Index→Expansion1. Original query = [poaching wildlife preserves]

2. Reverted query = [#415 #56 #42 #85]

3. Expanded query = [poaching^2.0 wildlife^1.24 preserves^1.0 poachers^0.57 tsavo^0.56 leakey^0.41 tusks^0.39 …]

term original retrieved weightpoaching 1 1.0 2.0poachers 0 0.57 0.57

tsavo 0 0.56 0.56leakey 0 0.41 0.41tusks 0 0.39 0.39

elephants 0 0.34 0.34wildlife 1 0.24 1.24

kws 0 0.2 0.2… … … …

preserves 1 0 1.0

Page 24: Reverted Indexing for Expansion and Feedback

Outline

• Reverted Documents• Reverted Indexing• Experimental Setup• Results

– Effectiveness– Efficiency

• Related Work• Future Extensions

Page 25: Reverted Indexing for Expansion and Feedback

MAP

Page 26: Reverted Indexing for Expansion and Feedback

%Change

Page 27: Reverted Indexing for Expansion and Feedback

Residual MAP

Page 28: Reverted Indexing for Expansion and Feedback

%Change

Page 29: Reverted Indexing for Expansion and Feedback

Efficiency

• Two components to query expansion– Selection and Weighting– Execution of Expanded Query

Page 30: Reverted Indexing for Expansion and Feedback

Avg Selection Time

Page 31: Reverted Indexing for Expansion and Feedback

Avg Execution Time

Page 32: Reverted Indexing for Expansion and Feedback

Why would execution be faster?

Page 33: Reverted Indexing for Expansion and Feedback

Bo1 Reverted_PL2Term Score Term Score

leakey 0.88 poaching 1.00poaching 0.74 poachers 0.56wildlife 0.73 tsavo 0.56kenya 0.52 leakey 0.41ivory 0.47 tusks 0.39elephants 0.46 elephants 0.34elephant 0.32 wildlife 0.24deer 0.30 kws 0.20poachers 0.28 kez 0.17conservation 0.27 ivory 0.14species 0.23 jealousies 0.14tusks 0.19 elephant 0.14african 0.19 conservationists 0.09namibia 0.19 kenya 0.09animals 0.17 fiefdom 0.08africa 0.15 safaris 0.04zimbabwe 0.15 conservationist 0.03tsavo 0.14 egos 0.01kenyan 0.13 kierie 0.00conservationists 0.12 aphrodisiacs 0.00

Page 34: Reverted Indexing for Expansion and Feedback

Bo1 Reverted_PL2Term DF Term DF

africa 20390 wildlife 2891african 10636 kenya 1163conservation 4298 ivory 1014animals 3928 elephant 743species 3479 elephants 356wildlife 2891 poaching 331kenya 1163 conservationists 293ivory 1014 egos 269zimbabwe 966 kez 173deer 748 fiefdom 129elephant 743 conservationist 125namibia 483 poachers 117kenyan 436 safaris 57elephants 356 jealousies 56poaching 331 tusks 42conservationists 293 leakey 22poachers 117 tsavo 12tusks 42 aphrodisiacs 12leakey 22 kws 9tsavo 12 kierie 2Average DF 2617 Average DF 391

Page 35: Reverted Indexing for Expansion and Feedback

Bo1 Reverted_PL2Term DF Term DF

los 46748 transportation 15262angeles 45147 freeway 3506metro 39849 tunnel 2643safety 22569 disasters 1822fire 21257 subway 805foot 13120 extinguished 452traffic 12410 rtd 227feet 12034 caved 193hollywood 7677 shoring 158heat 6004 roper 147rail 5747 timbers 98downtown 5390 shored 97engineers 4308 pilgrimages 73freeway 3506 asphyxiation 71disasters 1822 smolder 29firefighters 1489 busway 22subway 805 grouting 21rtd 227 smoldered 19timbers 98 lutgen 10busway 22 droped 2Average DF 12511 Average DF 1283

Page 36: Reverted Indexing for Expansion and Feedback

Outline

• Reverted Documents• Reverted Indexing• Experimental Setup• Results

– Effectiveness– Efficiency

• Related Work• Future Extensions

Page 37: Reverted Indexing for Expansion and Feedback

Related Work

Inspiration:

“Retrievability: An Evaluation Measure for Higher Order Information Access Tasks” --Azzopardi and Vinay, CIKM 2008

Azzopardi & Vinay take a document centric approach, examining whether documents (n)ever appear among top k results to any query

Page 38: Reverted Indexing for Expansion and Feedback

Related Work

Query-Document Duality has long history– S. E. Robertson. “Query-Document Symmetry

and Dual models.” Journal of Documentation, 50(3),1994

– B. Billerbeck, F. Scholer, H. E. Williams, and J. Zobel. Query Expansion Using Associated Queries. CIKM '03

– N. Craswell and M. Szummer. Random walks on the Query-Click Graph. SIGIR 2007

– Reverse Querying / alerting (various)

Page 39: Reverted Indexing for Expansion and Feedback

Future ExtensionsBasis queries

– Query expression may be arbitrarily complex– Ranking function may be arbitrarily complex

(remember: ranking function is a part of the basis query)

Reverted queries– Best Match: [#415 #56 #42 #85]– Boolean: (#415 AND #56) OR (#42 AND #85)– Other query operators:

[SYNONYM(#415 #56) #42 #85]

[ORDERED(#415 #56) #42 #85]

Page 40: Reverted Indexing for Expansion and Feedback

Motivation – Three R’s

Retrievability

Reuse (Algorithmic)

Recall-Oriented Tasks

Page 41: Reverted Indexing for Expansion and Feedback

Fin

Questions?