Improving Query Representations for Dense Retrieval with ...
Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval
description
Transcript of Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval
![Page 1: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/1.jpg)
Exploring Sentence Level Query Expansion in Language Modeling
Based Information RetrievalDebasis Ganguly Johannes Leveling Gareth Jones
![Page 2: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/2.jpg)
Outline
Standard blind relevance feedback
Sentence based query expansion
Does it fit into LM?
Evaluation on FIRE Bengali and English ad-hoc topics
Comparison with term based query expansion
Conclusions
![Page 3: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/3.jpg)
Standard Blind Relevance Feedback (BRF)
Assume top R documents from initial retrieval as relevant.
Extract feedback terms from these documents:
Choose terms which occur in most number of pseudo-relevant documents (e.g. VSM)
Choose terms with highest value of RSV scores (e.g. BM25)
Choose terms with highest value of LM scores (e.g. LM)
Expand query with and final retrieval
![Page 4: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/4.jpg)
What standard BRF assumes (wrongly)
The whole document is relevant
All R feedback documents are equally relevant
Query
t1
t2
![Page 5: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/5.jpg)
Ideal scenario
The whole document is relevant.
Query
t1
t2
Restrict the choice of feedback terms to the relevant segments of the documents
![Page 6: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/6.jpg)
Can we get closer to the ideal?
Extract sentences most similar to the query assuming these sentences constitute relevant text chunks.
Impossible to accurately know the relevant segments
Query
![Page 7: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/7.jpg)
Sentence selection using rank
Make the number of sentences to add for a document proportional to its rank
Not all documents are equally relevant
Query
![Page 8: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/8.jpg)
In short
Documents are often composed of a few main topics and a series of short, sometimes densely discussed subtopics.
Feedback terms chosen from a whole document might introduce a topic shift.
Good expansion terms might exist in a particular subtopic.
Terms with close proximity to the query terms might be useful for feedback.
![Page 9: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/9.jpg)
Does this fit into LM?
Noisy channel
D1
D2
Dn
Query
Add a part of D1 to Q
Add a part of D2 to Q
As a result Q starts looking like D1 and D2 which increases the likelihood of generation Qexp
Qexp
![Page 10: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/10.jpg)
Tools
FIRE collection comprises of newspaper articles from different genres like sports, business etc. in several Indian languages
Morphadorner package used for sentence demarcation
Stopword listsStandard SMART stopword list for English
Default stopword list provided by FIRE organizers for Bengali
StemmersRule based stemmer for Bengali
Porter’s stemmer for English
LM implemented in SMART used for indexing and retrieval
![Page 11: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/11.jpg)
Setup
Baseline is standard BRF using terms occurring in most number of relevant documents
Two variants of sentence based expansion tried out
BRFcns: constant number of sentences for each document
BRFvns: variable number of sentences (proportional to retrieval rank)
![Page 12: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/12.jpg)
Parameter Settings
R: # of documents assumed to be relevant,
varied in [10,40]
T: # of terms to add
varied in [10,40]
m: # of sentences to add from the top ranked document
varied in [2,10]
![Page 13: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/13.jpg)
Best MAPs
Topics
R T MAP
EN-2008 10 10 0.5682
EN-2010 10 30 0.4953
BN-2008 20 40 0.3885
BN-2010 10 30 0.4537
BRF
Topics
R m MAP
EN-2008 30 5 0.5964
EN-2010 20 4 0.5032
BN-2008 20 4 0.4226
BN-2010 10 5 0.4467
BRFcns
Topics
R m MAP
EN-2008 30 10 0.6015
EN-2010 20 8 0.5102
BN-2008 30 10 0.4302
BN-2010 10 8 0.4581
BRFvns
![Page 14: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/14.jpg)
Query drift analysis
As a result of adding too many terms the original query might be completely off-the-mark from the original information need
Measured with impact of changes in precision values per query
An easy query is one for which P@20 for initial retrieval is good
Queries categorized into groups by initial retrieval P@20
A good feedback algorithm would improve many (ideally bad) queries and hurt performance of a few (ideally good) queries
![Page 15: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/15.jpg)
Query drift analysis
BRF
BRFcns
BRFvns
![Page 16: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/16.jpg)
Comparison to True Relevance Feedback
The best possible average precision in IR is obtained by True Relevance Feedback
A BRF method should be as close as possible to this oracle.
Topic |TRF| o(|TBRF|)
o(|Tvns |)
EN08 937 743 912
EN10 433 407 432
BN08
979 744 955
BN10
991 728 933
![Page 17: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/17.jpg)
Conclusions
The new approach improves over standard BRF by
using sentences instead of whole documentsdistinguishing between the amount of pseudo-relevance
Significantly improves MAP on four ad-hoc topic sets as compared to standard BRF for two languages
Is able to add more true relevant terms as compared to standard BRF
![Page 18: Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022070419/56815a74550346895dc7db63/html5/thumbnails/18.jpg)
Queries?