(Pseudo)-Relevance Feedback & Passage Retrieval
description
Transcript of (Pseudo)-Relevance Feedback & Passage Retrieval
![Page 1: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/1.jpg)
(Pseudo)-Relevance Feedback &
Passage RetrievalLing573
NLP Systems & ApplicationsApril 28, 2011
![Page 2: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/2.jpg)
RoadmapRetrieval system walkthrough
Improving document retrievalCompression & Expansion techniques
Passage retrieval:Contrasting techniques Interactions with document retreival
![Page 3: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/3.jpg)
Major IssueAll approaches operate on term matching
If a synonym, rather than original term, is used, approach can fail
![Page 4: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/4.jpg)
Major IssueAll approaches operate on term matching
If a synonym, rather than original term, is used, approach can fail
Develop more robust techniquesMatch “concept” rather than term
![Page 5: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/5.jpg)
Major IssueAll approaches operate on term matching
If a synonym, rather than original term, is used, approach can fail
Develop more robust techniquesMatch “concept” rather than term
Mapping techniques Associate terms to concepts
Aspect models, stemming
![Page 6: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/6.jpg)
Major IssueAll approaches operate on term matching
If a synonym, rather than original term, is used, approach can fail
Develop more robust techniquesMatch “concept” rather than term
Mapping techniques Associate terms to concepts
Aspect models, stemmingExpansion approaches
Add in related terms to enhance matching
![Page 7: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/7.jpg)
Compression TechniquesReduce surface term variation to concepts
![Page 8: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/8.jpg)
Compression TechniquesReduce surface term variation to conceptsStemming
![Page 9: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/9.jpg)
Compression TechniquesReduce surface term variation to conceptsStemming
Aspect modelsMatrix representations typically very sparse
![Page 10: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/10.jpg)
Compression TechniquesReduce surface term variation to conceptsStemming
Aspect modelsMatrix representations typically very sparseReduce dimensionality to small # key aspects
Mapping contextually similar terms togetherLatent semantic analysis
![Page 11: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/11.jpg)
Expansion TechniquesCan apply to query or document
![Page 12: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/12.jpg)
Expansion TechniquesCan apply to query or documentThesaurus expansion
Use linguistic resource – thesaurus, WordNet – to add synonyms/related terms
![Page 13: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/13.jpg)
Expansion TechniquesCan apply to query or documentThesaurus expansion
Use linguistic resource – thesaurus, WordNet – to add synonyms/related terms
Feedback expansionAdd terms that “should have appeared”
![Page 14: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/14.jpg)
Expansion TechniquesCan apply to query or documentThesaurus expansion
Use linguistic resource – thesaurus, WordNet – to add synonyms/related terms
Feedback expansionAdd terms that “should have appeared”
User interaction Direct or relevance feedback
Automatic pseudo relevance feedback
![Page 15: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/15.jpg)
Query RefinementTypical queries very short, ambiguous
Cat: animal/Unix command
![Page 16: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/16.jpg)
Query RefinementTypical queries very short, ambiguous
Cat: animal/Unix command Add more terms to disambiguate, improve
Relevance feedback
![Page 17: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/17.jpg)
Query RefinementTypical queries very short, ambiguous
Cat: animal/Unix command Add more terms to disambiguate, improve
Relevance feedback Retrieve with original queries Present results
Ask user to tag relevant/non-relevant
![Page 18: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/18.jpg)
Query RefinementTypical queries very short, ambiguous
Cat: animal/Unix command Add more terms to disambiguate, improve
Relevance feedback Retrieve with original queries Present results
Ask user to tag relevant/non-relevant “push” toward relevant vectors, away from non-relevant
Vector intuition: Add vectors from relevant documents Subtract vector from non-relevant documents
![Page 19: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/19.jpg)
Relevance FeedbackRocchio expansion formula
β+γ=1 (0.75,0.25);Amount of ‘push’ in either direction
R: # rel docs, S: # non-rel docs r: relevant document vectors s: non-relevant document vectors
Can significantly improve (though tricky to evaluate)
![Page 20: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/20.jpg)
Collection-based Query Expansion
Xu & Croft 97 (classic)Thesaurus expansion problematic:
Often ineffective Issues:
![Page 21: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/21.jpg)
Collection-based Query Expansion
Xu & Croft 97 (classic)Thesaurus expansion problematic:
Often ineffective Issues:
Coverage: Many words – esp. NEs – missing from WordNet
![Page 22: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/22.jpg)
Collection-based Query Expansion
Xu & Croft 97 (classic)Thesaurus expansion problematic:
Often ineffective Issues:
Coverage: Many words – esp. NEs – missing from WordNet
Domain mismatch: Fixed resources ‘general’ or derived from some domain May not match current search collection
Cat/dog vs cat/more/ls
![Page 23: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/23.jpg)
Collection-based Query Expansion
Xu & Croft 97 (classic)Thesaurus expansion problematic:
Often ineffective Issues:
Coverage: Many words – esp. NEs – missing from WordNet
Domain mismatch: Fixed resources ‘general’ or derived from some domain May not match current search collection
Cat/dog vs cat/more/lsUse collection-based evidence: global or local
![Page 24: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/24.jpg)
Global AnalysisIdentifies word cooccurrence in whole collection
Applied to expand current queryContext can differentiate/group concepts
![Page 25: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/25.jpg)
Global AnalysisIdentifies word cooccurrence in whole collection
Applied to expand current queryContext can differentiate/group concepts
Create index of concepts:Concepts = noun phrases (1-3 nouns long)
![Page 26: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/26.jpg)
Global AnalysisIdentifies word cooccurrence in whole collection
Applied to expand current queryContext can differentiate/group concepts
Create index of concepts:Concepts = noun phrases (1-3 nouns long)Representation: Context
Words in fixed length window, 1-3 sentences
![Page 27: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/27.jpg)
Global AnalysisIdentifies word cooccurrence in whole collection
Applied to expand current queryContext can differentiate/group concepts
Create index of concepts:Concepts = noun phrases (1-3 nouns long)Representation: Context
Words in fixed length window, 1-3 sentencesConcept identifies context word documents
Use query to retrieve 30 highest ranked conceptsAdd to query
![Page 28: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/28.jpg)
Local AnalysisAka local feedback, pseudo-relevance feedback
![Page 29: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/29.jpg)
Local AnalysisAka local feedback, pseudo-relevance feedbackUse query to retrieve documents
Select informative terms from highly ranked documents
Add those terms to query
![Page 30: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/30.jpg)
Local AnalysisAka local feedback, pseudo-relevance feedbackUse query to retrieve documents
Select informative terms from highly ranked documents
Add those terms to querySpecifically,
Add 50 most frequent terms,10 most frequent ‘phrases’ – bigrams w/o stopwordsReweight terms
![Page 31: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/31.jpg)
Local Context AnalysisMixes two previous approaches
Use query to retrieve top n passages (300 words)Select top m ranked concepts (noun sequences)Add to query and reweight
![Page 32: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/32.jpg)
Local Context AnalysisMixes two previous approaches
Use query to retrieve top n passages (300 words)Select top m ranked concepts (noun sequences)Add to query and reweight
Relatively efficientApplies local search constraints
![Page 33: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/33.jpg)
Experimental ContrastsImprovements over baseline:
Local Context Analysis: +23.5% (relative)Local Analysis: +20.5%Global Analysis: +7.8%
![Page 34: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/34.jpg)
Experimental ContrastsImprovements over baseline:
Local Context Analysis: +23.5% (relative)Local Analysis: +20.5%Global Analysis: +7.8%
LCA is best and most stable across data setsBetter term selection that global analysis
![Page 35: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/35.jpg)
Experimental ContrastsImprovements over baseline:
Local Context Analysis: +23.5% (relative)Local Analysis: +20.5%Global Analysis: +7.8%
LCA is best and most stable across data setsBetter term selection that global analysis
All approaches have fairly high varianceHelp some queries, hurt others
![Page 36: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/36.jpg)
Experimental Contrasts Improvements over baseline:
Local Context Analysis: +23.5% (relative) Local Analysis: +20.5% Global Analysis: +7.8%
LCA is best and most stable across data sets Better term selection that global analysis
All approaches have fairly high variance Help some queries, hurt others
Also sensitive to # terms added, # documents
![Page 37: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/37.jpg)
Global Analysis
Local Analysis
LCA
What are the different techniques used to create self-induced hypnosis?
![Page 38: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/38.jpg)
Passage RetrievalDocuments: wrong unit for QA
Highly ranked documentsHigh weight terms in common with queryNot enough!
Matching terms scattered across document Vs Matching terms concentrated in short span of document
Solution:From ranked doc list, select and rerank shorter
spansPassage retrieval
![Page 39: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/39.jpg)
Passage RankingGoal: Select passages most likely to contain
answerFactors in reranking:
Document rankWant answers!
Answer type matching Restricted Named Entity Recognition
Question match:Question term overlapSpan overlap: N-gram, longest common sub-spanQuery term density: short spans w/more qterms
![Page 40: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/40.jpg)
Quantitative Evaluation of Passage Retrieval for QA
Tellex et al.Compare alternative passage ranking
approaches8 different strategies + voting ranker
Assess interaction with document retrieval
![Page 41: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/41.jpg)
Comparative IR SystemsPRISE
Developed at NISTVector Space retrieval systemOptimized weighting scheme
LuceneBoolean + Vector Space retrievalResults Boolean retrieval RANKED by tf-idf
Little control over hit list
Oracle: NIST-provided list of relevant documents
![Page 42: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/42.jpg)
Comparing Passage Retrieval
Eight different systems used in QAUnitsFactors
MITRE:Simplest reasonable approach: baselineUnit: sentenceFactor: Term overlap count
MITRE+stemming:Factor: stemmed term overlap
![Page 43: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/43.jpg)
Comparing Passage Retrieval
Okapi bm25Unit: fixed width sliding windowFactor:
k1=2.0; b=0.75
MultiText:Unit: Window starting and ending with query termFactor:
Sum of IDFs of matching query termsLength based measure * Number of matching terms
![Page 44: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/44.jpg)
Comparing Passage Retrieval
IBM:Fixed passage lengthSum of:
Matching words measure: Sum of idfs of overlap terms
Thesaurus match measure: Sum of idfs of question wds with synonyms in document
Mis-match words measure: Sum of idfs of questions wds NOT in document
Dispersion measure: # words b/t matching query terms
Cluster word measure: longest common substring
![Page 45: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/45.jpg)
Comparing Passage Retrieval
SiteQ:Unit: n (=3) sentencesFactor: Match words by literal, stem, or WordNet
synSum of
Sum of idfs of matched terms Density weight score * overlap count, where
![Page 46: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/46.jpg)
Comparing Passage Retrieval
Alicante:Unit: n (= 6) sentencesFactor: non-length normalized cosine similarity
ISI:Unit: sentenceFactors: weighted sum of
Proper name match, query term match, stemmed match
![Page 47: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/47.jpg)
ExperimentsRetrieval:
PRISE:Query: Verbatim quesiton
Lucene: Query: Conjunctive boolean query (stopped)
Passage retrieval: 1000 word passagesUses top 200 retrieved docsFind best passage in each docReturn up to 20 passages
Ignores original doc rank, retrieval score
![Page 48: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/48.jpg)
Evaluation MRR
Strict: Matching pattern in official documentLenient: Matching pattern
Percentage of questions with NO correct answers
![Page 49: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/49.jpg)
Evaluation on Oracle Docs
![Page 50: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/50.jpg)
OverallPRISE:
Higher recall, more correct answersLucene:
Higher precision, fewer correct, but higher MRRBest systems:
IBM, ISI, SiteQRelatively insensitive to retrieval engine
![Page 51: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/51.jpg)
AnalysisRetrieval:
Boolean systems (e.g. Lucene) competitive, good MRRBoolean systems usually worse on ad-hoc
Passage retrieval:Significant differences for PRISE, OracleNot significant for Lucene -> boost recall
Techniques: Density-based scoring improvesVariants: proper name exact, cluster, density score
![Page 52: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/52.jpg)
Error Analysis‘What is an ulcer?’
After stopping -> ‘ulcer’Match doesn’t helpNeed question type!!
Missing relations ‘What is the highest dam?’
Passages match ‘highest’ and ‘dam’ – but not together
Include syntax?
![Page 53: (Pseudo)-Relevance Feedback & Passage Retrieval](https://reader035.fdocuments.in/reader035/viewer/2022062302/56816765550346895ddc4603/html5/thumbnails/53.jpg)