CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

19
CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists Luo Si and Jamie Callan Language Technology Institute, School of Comput er Science Carnegie Mellon University CLEF 2005

description

CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists. Luo Si and Jamie Callan Language Technology Institute, School of Computer Science Carnegie Mellon University CLEF 2005. Task Definition. Multi-8 Two Years On: multilingual information retrieval - PowerPoint PPT Presentation

Transcript of CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Page 1: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

CLEF 2005: Multilingual Retrieval by Combining

MultipleMultilingual Ranked Lists

Luo Si and Jamie CallanLanguage Technology Institute, School of Computer ScienceCarnegie Mellon University

CLEF 2005

Page 2: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task Definition

• Multi-8 Two Years On:– multilingual information retrieval

• Multi-8 Merging Only:– participants merge provided bilingual ranked l

ists

Page 3: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task 1:Multilingual Retrieval System• Method Overview

Page 4: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task 1:Multilingual Retrieval System• Text Preprocessing

– Stop words– Stemming– Decompounding– Word translation

Page 5: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task 1:Multilingual Retrieval System• Method Overview

Page 6: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task 1: Multilingual Retrieval System• Method 1:

– Multilingual Retrieval via Query Translation no query feedback Raw score merge and Okapi system

• Method 2: – Multilingual Retrieval via Query Translation with query feedback Raw score merge and Okapi system

• Method 3: – Multilingual Retrieval via Document Translation no query feedback R

aw score merge and Okapi system

• Method 4:– Multilingual Retrieval via Document Translation with query feedback

Raw score merge and Okapi system

• Method 5: UniNe System

Page 7: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task 1:Multilingual Retrieval System• Method Overview

Page 8: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task 1:Multilingual Retrieval System• Normalization

– drsk_mj denote the raw document score for the jth document retrieved from the mth ranked list for kth query,

Page 9: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task 1:Multilingual Retrieval System• Method Overview

Page 10: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task 1:Multilingual Retrieval System• Combine Multilingual Ranked Lists

– (wm , rm) represents the weight of the vote and the exponential normalization factor for the mth ranked list

Page 11: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task 1:Experimental Results: Multilingual Retrieval

• Qry/Doc: what was translated • fb/nofb: with/without pseudo relevance back • UniNe: UniNE system

Page 12: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task 1:Experimental Results: Multilingual Retrieval

• MX: Combine models• W1/Trn: Equal or learned weights

Page 13: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task 2:Results Merge for Multilingual Retrieval

• merge ranked lists of eight different languages (i.e., bilingual or monolingual) into a single final list

• Logistic model (rank , doc score) language-specific methods

query-specific & language-specific

Page 14: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task 2:Results Merge for Multilingual Retrieval

• Learn Query-Independent and Language-Specific Merging Model

• estimated probability of relevance of document dk_ij

• Model parameter

– maximizing the log-likelihood (MLE)

– maximizing MAP

Page 15: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task 2:Results Merge for Multilingual Retrieval

• Learn Query-Specific and Language-Specific Merging Model

– Calculate comparable scores for top ranked documents in each language

(1) Combine scores of query-based and doc-based translation methods

(2) Build language-specific query-specific logistic models to transform language-specific scores to comparable scores

Page 16: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task 2:Results Merge for Multilingual Retrieval

(2) Build language-specific query-specific logistic models to transform language-specific scores to comparable scores

• logistic model parameter estimate– minimize the mean squared error between exact normalized

comparable scores and the estimated comparable scores• Estimate comparable scores for all retrieved documents in each

language• Use comparable scores to create a merged multilingual result list

Page 17: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task 2:Experimental Results: Results Merge• Query-independent , language-specific• Mean average precision of merged multilingual lists of dif

ferent methods on UniNE result lists

• Mean average precision of merged multilingual lists of different methods on HummingBird result lists

MAP is more accurate

Page 18: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task 2:Experimental Results: Results Merge• Query-specific , language-specific

• Mean average precision of merged multilingual lists of different methods on UniNE result lists

• C_X: top X docs from each list merged by exact comparable scores.

• Top_X_0.5: top X docs from each list downloaded for logistic model to estimate comparable scores and combine them with exact scores by equal weight

This means that the combination of estimated comparable scores and exact comparable scores can be more accurate than exact comparable scores in some cases

Page 19: CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Task 2:Experimental Results: Results Merge• Query-specific , language-specific • Mean average precision of merged multilingual lists of dif

ferent methods on HummingBird result lists

• Outperform query-independent and language-specific algorithm