Post on 30-Jun-2015
description
Document Distance for the Automated Expansionof Relevance Judgements for Information Retrieval
Evaluation
Diego Molla1 Iman Amini2 David Martinez3
1Macquarie University,Sydney, Australia
2NICTA and RMIT,Melbourne, Australia
3University of Melbourne,Melbourne, Australia
GEAR’14, 11 July 2014
Background Document Distance to Expand Qrels Evaluation Results
Contents
1 BackgroundThe ScenarioRelated Work
2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets
3 Evaluation ResultsAdding Pseudo-qrelsEvaluation
GEAR 2014 Diego Molla, Iman Amini, David Martinez 2/26
Background Document Distance to Expand Qrels Evaluation Results
Contents
1 BackgroundThe ScenarioRelated Work
2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets
3 Evaluation ResultsAdding Pseudo-qrelsEvaluation
GEAR 2014 Diego Molla, Iman Amini, David Martinez 3/26
Background Document Distance to Expand Qrels Evaluation Results
Evidence Based Medicine
http://laikaspoetnik.wordpress.com/2009/04/04/evidence-based-medicine-the-facebook-of-medicine/
GEAR 2014 Diego Molla, Iman Amini, David Martinez 4/26
Background Document Distance to Expand Qrels Evaluation Results
Journal of Family Practice’s “Clinical Inquiries”
GEAR 2014 Diego Molla, Iman Amini, David Martinez 5/26
Background Document Distance to Expand Qrels Evaluation Results
Can We Use the List of References for Evaluation?
GEAR 2014 Diego Molla, Iman Amini, David Martinez 6/26
Background Document Distance to Expand Qrels Evaluation Results
Contents
1 BackgroundThe ScenarioRelated Work
2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets
3 Evaluation ResultsAdding Pseudo-qrelsEvaluation
GEAR 2014 Diego Molla, Iman Amini, David Martinez 7/26
Background Document Distance to Expand Qrels Evaluation Results
Expanding the Qrels
Buttcher et al. (2007) and Martinez et al. (2008) use MLmethods. ⇒ But we don’t have negative judgements.
Sakai & Lin (2010) use most popular documents retrieved byseveral systems. ⇒ But we don’t have several systems.
GEAR 2014 Diego Molla, Iman Amini, David Martinez 8/26
Background Document Distance to Expand Qrels Evaluation Results
Contents
1 BackgroundThe ScenarioRelated Work
2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets
3 Evaluation ResultsAdding Pseudo-qrelsEvaluation
GEAR 2014 Diego Molla, Iman Amini, David Martinez 9/26
Background Document Distance to Expand Qrels Evaluation Results
Contents
1 BackgroundThe ScenarioRelated Work
2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets
3 Evaluation ResultsAdding Pseudo-qrelsEvaluation
GEAR 2014 Diego Molla, Iman Amini, David Martinez 10/26
Background Document Distance to Expand Qrels Evaluation Results
The Cluster Hypothesis
Cluster Hypothesis (Rijsbergen, 1979)
“Closely associated documents tend to be relevant to the samerequests”
The cluster hypothesis has been used to improve the qualityof retrieval systems.
It has not been used to improve the quality of the evaluationof retrieval systems.
Our distance metrics
tf.idf of all words after lowercasing and removing stop words,then top 200 PCA components.
Distance metric: d(x , y) = 1− cos(x , y).
GEAR 2014 Diego Molla, Iman Amini, David Martinez 11/26
Background Document Distance to Expand Qrels Evaluation Results
Computing Distances and Binning
1. Gather closest distances
q1 q2qrel11 qrel12 qrel21 qrel22
d1 0.89 0.75 0.04 0.35d2 0.12 0.43 0.84 0.45d3 0.34 0.56 0.75 0.92
=⇒
0.75 d1 q1 no0.04 d1 q2 yes0.12 d2 q1 yes0.45 d2 q2 no0.34 d3 q1 yes0.75 d3 q2 no
2. Sort distances and bin
0.04 d1 q2 yes0.12 d2 q1 yes
0.34 d3 q1 yes0.45 d2 q2 no
0.75 d1 q1 no0.75 d3 q2 no
3. Compute bin ratio ofrelevance
1.0
0.5
0.0
GEAR 2014 Diego Molla, Iman Amini, David Martinez 12/26
Background Document Distance to Expand Qrels Evaluation Results
Computing Distances and Binning
1. Gather closest distances
q1 q2qrel11 qrel12 qrel21 qrel22
d1 0.89 0.75 0.04 0.35d2 0.12 0.43 0.84 0.45d3 0.34 0.56 0.75 0.92
=⇒
0.75 d1 q1 no0.04 d1 q2 yes0.12 d2 q1 yes0.45 d2 q2 no0.34 d3 q1 yes0.75 d3 q2 no
2. Sort distances and bin
0.04 d1 q2 yes0.12 d2 q1 yes
0.34 d3 q1 yes0.45 d2 q2 no
0.75 d1 q1 no0.75 d3 q2 no
3. Compute bin ratio ofrelevance
1.0
0.5
0.0
GEAR 2014 Diego Molla, Iman Amini, David Martinez 12/26
Background Document Distance to Expand Qrels Evaluation Results
Computing Distances and Binning
1. Gather closest distances
q1 q2qrel11 qrel12 qrel21 qrel22
d1 0.89 0.75 0.04 0.35d2 0.12 0.43 0.84 0.45d3 0.34 0.56 0.75 0.92
=⇒
0.75 d1 q1 no0.04 d1 q2 yes0.12 d2 q1 yes0.45 d2 q2 no0.34 d3 q1 yes0.75 d3 q2 no
2. Sort distances and bin
0.04 d1 q2 yes0.12 d2 q1 yes
0.34 d3 q1 yes0.45 d2 q2 no
0.75 d1 q1 no0.75 d3 q2 no
3. Compute bin ratio ofrelevance
1.0
0.5
0.0
GEAR 2014 Diego Molla, Iman Amini, David Martinez 12/26
Background Document Distance to Expand Qrels Evaluation Results
Contents
1 BackgroundThe ScenarioRelated Work
2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets
3 Evaluation ResultsAdding Pseudo-qrelsEvaluation
GEAR 2014 Diego Molla, Iman Amini, David Martinez 13/26
Background Document Distance to Expand Qrels Evaluation Results
Data
The OHSUMED collection
Documents from MEDLINE.
63 queries, average of 50.97qrels per query.
Only positive qrels.
Original TREC-9 runs notavailable.
How we used it
16 distinct runs of Terrier.
Pooled top 100 docs per run.
All docs not in set of qrelsmarked as negative qrels.
The TREC-8 collection
News documents.
qrels pooled the top 100documents retrieved by theTREC-8 ad-hoc runs.
50 queries, average of 1,736qrels (94.56 positive).
Positive and negative qrels.
How we used it
All 116 runs from the TREC-8ad-hoc track.
Only first 100 qrels per query.
GEAR 2014 Diego Molla, Iman Amini, David Martinez 14/26
Background Document Distance to Expand Qrels Evaluation Results
Distance vs. Relevance within the qrels
GEAR 2014 Diego Molla, Iman Amini, David Martinez 15/26
Background Document Distance to Expand Qrels Evaluation Results
Contents
1 BackgroundThe ScenarioRelated Work
2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets
3 Evaluation ResultsAdding Pseudo-qrelsEvaluation
GEAR 2014 Diego Molla, Iman Amini, David Martinez 16/26
Background Document Distance to Expand Qrels Evaluation Results
Contents
1 BackgroundThe ScenarioRelated Work
2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets
3 Evaluation ResultsAdding Pseudo-qrelsEvaluation
GEAR 2014 Diego Molla, Iman Amini, David Martinez 17/26
Background Document Distance to Expand Qrels Evaluation Results
Adding Pseudo-qrels for Evaluation
1. Gather closest distances
q1 q2qrel11 qrel12 qrel21 qrel22
d1 0.89 0.75 0.04 0.35d2 0.12 0.43 0.84 0.45d3 0.34 0.56 0.75 0.92
=⇒
0.75 d1 q10.04 d1 q20.12 d2 q10.45 d2 q20.34 d3 q10.75 d3 q2
2. Sort distances
0.04 d1 q20.12 d2 q10.34 d3 q10.45 d2 q20.75 d1 q10.75 d3 q2
3. Set threshold
0.04 d1 q20.12 d2 q10.34 d3 q10.45 d2 q20.75 d1 q10.75 d3 q2
GEAR 2014 Diego Molla, Iman Amini, David Martinez 18/26
Background Document Distance to Expand Qrels Evaluation Results
Contents
1 BackgroundThe ScenarioRelated Work
2 Document Distance to Expand QrelsEvaluation of Information RetrievalData Sets
3 Evaluation ResultsAdding Pseudo-qrelsEvaluation
GEAR 2014 Diego Molla, Iman Amini, David Martinez 19/26
Background Document Distance to Expand Qrels Evaluation Results
Evaluation Method
Method1 Oracle: Evaluate the sample runs using all the available qrels.
2 Select a subset of qrels and expand it with pseudo qrels.
3 Compute Kendall’s Tau between Oracle and (2).
OHSUMED runs
16 algorithms fromTerrier 3.5.
TREC runs
All 116 runs fromTREC-8 ad-hoc track.
GEAR 2014 Diego Molla, Iman Amini, David Martinez 20/26
Background Document Distance to Expand Qrels Evaluation Results
Evaluation Results — OHSUMED
GEAR 2014 Diego Molla, Iman Amini, David Martinez 21/26
Background Document Distance to Expand Qrels Evaluation Results
Evaluation Results — TREC8
GEAR 2014 Diego Molla, Iman Amini, David Martinez 22/26
Background Document Distance to Expand Qrels Evaluation Results
Evaluation Results — TREC8 (zoom)
GEAR 2014 Diego Molla, Iman Amini, David Martinez 23/26
Background Document Distance to Expand Qrels Evaluation Results
Evaluation Results — TREC8 (different initial qrels)
GEAR 2014 Diego Molla, Iman Amini, David Martinez 24/26
Background Document Distance to Expand Qrels Evaluation Results
Conclusions
Conclusions
Small evaluation improvement when adding documents similarto existing qrels.
The approach can be used when there are no negative qrels.
Further Work
Try other thresholds.
Try other distance metrics.
Double-check with human judgements.
GEAR 2014 Diego Molla, Iman Amini, David Martinez 25/26
Background Document Distance to Expand Qrels Evaluation Results
Conclusions
Conclusions
Small evaluation improvement when adding documents similarto existing qrels.
The approach can be used when there are no negative qrels.
Further Work
Try other thresholds.
Try other distance metrics.
Double-check with human judgements.
GEAR 2014 Diego Molla, Iman Amini, David Martinez 25/26
Background Document Distance to Expand Qrels Evaluation Results
Thank You
Questions?
Further information about our research:http://web.science.mq.edu.au/~diego/medicalnlp/
Diego Iman David
GEAR 2014 Diego Molla, Iman Amini, David Martinez 26/26