Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

Tie-Breaking Bias:Tie-Breaking Bias:Effect of an Uncontrolled ParameterEffect of an Uncontrolled Parameteron Information Retrieval Evaluation on Information Retrieval Evaluation

Guillaume Cabanac, Gilles Hubert,

Mohand Boughanem, Claude Chrisment

CLEF’10: Conference on Multilingual and MultimodalInformation Access Evaluation

September 20-23, Padua, Italy

Outline

1. Motivation A tale about two TREC participants

2. Context IRS effectiveness evaluation

Issue Tie-breaking bias effects

3. Contribution Reordering strategies

4. Experiments Impact of the tie-breaking bias

5. Conclusion and Future Works

Effect of the Tie-Breaking Bias G. Cabanac et al.

Outline

A tale about two TREC participants (1/2)

1. Motivation Tie-breaking bias illustration G. Cabanac et al.

5 relevant documentsTopic 031 “satellite launch contracts”

ChrisChris EllenEllenone single difference

Why such a huge difference?unluckyunlucky luckylucky

A tale about two TREC participants (2/2)

1. Motivation Tie-breaking bias illustration G. Cabanac et al.

ChrisChris EllenEllenone single difference

Only difference: the name of one document

After 15 days of hard work

Outline

Measuring the effectiveness of IRSs User-centered vs. System-focused [Spärk Jones & Willett, 1997]

Evaluation campaigns 1958 Cranfield

UK 1992 TREC Text Retrieval

Conference USA 1999 NTCIR NII Test

Collection for IR Systems Japan 2001 CLEF Cross-

Language Evaluation Forum Europe …

“Cranfield” methodology Task Test collection

Corpus Topics Qrels

Measures : MAP, P@X ... using trec_eval

2. Context & issue Tie-breaking bias G. Cabanac et al.

[Voorhees, 2007]

Runs are reordered prior to their evaluationQrels = qid, iter, docno, rel Run = qid, iter, docno, rank, sim,

run_id

Reordering by trec_evalqid asc, sim desc, docno desc

Effectiveness measure = f (intrinsic_quality, )MAP, P@X, MRR…

2. Context & issue Tie-breaking bias G. Cabanac et al.

Outline

Consequences of run reordering Measures of effectiveness for an IRS s

RR(s,t) 1/rank of the 1st relevant document, for topic t

P(s,t,d) precision at document d, for topic t

AP(s,t) average precision for topic t

MAP(s) mean average precision

Tie-breaking bias

Is the Wall Street Journal collection more relevant than Associated Press?

Problem 1comparing 2 systemsAP(s1, t) vs. AP(s2, t)

Problem 2 comparing 2 topicsAP(s, t1) vs. AP(s, t2)

ChrisChris

EllenEllen

3. Contribution Reordering strategies G. Cabanac et al.

Sensitive to document

Alternative unbiased reordering strategies

Conventional reordering (TREC) Ties sorted Z A qid asc, sim desc,

docno desc

Realistic reordering Relevant docs last qid asc, sim desc, rel

asc, docno desc

Optimistic reordering Relevant docs first qid asc, sim desc, rel

desc, docno desc

3. Contribution Reordering strategies G. Cabanac et al.

ex aequo

Outline

Effect of the tie-breaking bias Study of 4 TREC tasks

22 editions 1360 runs

Assessing the effect of tie-breaking Proportion of document ties How frequent is the bias? Effect on measure values

Top 3 observed differences Observed difference in % Significance of the observed difference: Student’s t-test (paired, unilateral)

1993 1999 20001998 2002 20041997

routing webfiltering

3 GB of data from trec.nist.gov

4. Experiments Impact of the tie-breaking bias G. Cabanac et al.

Ties demographics 89.6% of the runs comprise ties

Ties are present all along the runs

Proportion of tied documents in submitted runs

On average, 10.6 docs in a tied group of docs On average, 25.2 % of a result-list = tied documents

Effect on Reciprocal Rank (RR)4. Experiments Impact of the tie-breaking bias G. Cabanac et al.

Effect on Average Precision (AP)4. Experiments Impact of the tie-breaking bias G. Cabanac et al.

Effect on Mean Average Precision (MAP)

Difference of ranks computed on MAP not significant (Kendall’s )

What we learnt: Beware of tie-breaking for AP Poor effect on MAP, larger effect on AP

Measure bounds APRealistic APConventionnal APOptimistic

Failure analysis for the ranking process Error bar = element of chance potential for improvement

padre1, adhoc’94

Related works in IR evaluation

[Voorhees, 2007]

Topics reliability?[Buckley & Voorhees, 2000] 25[Voorhees & Buckley, 2002] error rate[Voorhees, 2009] n collections

Qrels reliability?[Voorhees, 1998] quality[Al-Maskari et al., 2008] TREC vs. TREC

Measures reliability?[Buckley & Voorhees, 2000] MAP [Sakai, 2008] ‘system bias’[Moffat & Zobel, 2008] new measures

[Raghavan et al., 1989] Precall[McSherry & Najork, 2008] Tied scores

Pooling reliability?[Zobel, 1998] approximation [Sanderson & Joho, 2004] manual[Buckley et al., 2007] size adaptation[Cabanac et al., 2010] tie-breaking bias

Outline

Conclusions and future works Context: IR evaluation

TREC and other campaigns based on trec_eval

Contributions Measure = f (intrinsic_quality, luck) tie-breaking bias

Measure bounds (realistic conventional optimistic)

Study of the tie-breaking bias effect (conventional, realistic) for RR, AP and MAP Strong correlation, yet significant difference No difference on system rankings (based on MAP)

Future works Study of other / more recent evaluation campaigns Reordering-free measures Finer grained analyses: finding vs. ranking

Impact du « biais des ex aequo » dans les évaluations de RI G. Cabanac et al.

Thank you

CLEF’10: Conference on Multilingual and MultimodalInformation Access Evaluation

September 20-23, Padua, Italy

Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

Documents

Transcript of Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

7€¦ · few study subjects), inadequate length of follow-up, bias and uncontrolled confounding, among others. Three (3) epidemiological studies examined the association between

1) “A New Approach to Determine Tie Line Frequency Bias (B) incms.sinhgad.edu/media/451221/entcpublications.pdf · focus on the different integrating environments and tools needed

TIE ROD / TIE BAR SYSTEMS - Sta-Lok Terminals Tie Rod Brochure.pdf · tie rod / tie bar systems, providing complete solutions for Architects, Engineers and Contractors. ... Tie rods

Uncontrolled Copy - epd.georgia.gov

UNCONTROLLED DOCUMENT WHEN PRINTED

Uncontrolled Copy - California

Single Phase Uncontrolled Rectifier

CLEF 2010 - Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

Effects of Uncontrolled Diabetes

PRINTED WHEN UNCONTROLLED

Uncontrolled copy if printed!

Discriminative v. generativeggordon/tmp/22-more-classifiers.pdf‣MLE increases bias, decreases variance (vs. MCLE) To interpolate generative / discriminative models, soft-tie θ x

TIE-13100 / TIE-13106 Tietotekniikan projektityö / Project ...projekti/2014-15/TIE-PROJ_Arch_WS_materials.pdf · TIE-13100 / TIE-13106 Tietotekniikan projektityö / Project Work

Document Uncontrolled - OU

UNCONTROLLED - Southern Company

UNCONTROLLED COPY CORPORATION LTD

Evidence …ksm66ashwagandhaa.com/eCAM.pdferectile dysfunction, congenital anomalies, uncontrolled diabetesmellitus,severehepaticorrenalinsufficiency,car-diovasculardiseases,cerebrovascularaccidents,uncontrolled

Tie Rod Tie Back Systems

Bias. Bias and Propaganda Bias Bias- when a person has a strong feeling for or against something.

An Uncontrolled Diabetic Cat