CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Post on 14-Jan-2016

35 views 0 download

Tags:

description

CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists. Luo Si and Jamie Callan Language Technology Institute, School of Computer Science Carnegie Mellon University CLEF 2005. Task Definition. Multi-8 Two Years On: multilingual information retrieval - PowerPoint PPT Presentation

Transcript of CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

CLEF 2005: Multilingual Retrieval by Combining

MultipleMultilingual Ranked Lists

Luo Si and Jamie CallanLanguage Technology Institute, School of Computer ScienceCarnegie Mellon University

CLEF 2005

Task Definition

• Multi-8 Two Years On:– multilingual information retrieval

• Multi-8 Merging Only:– participants merge provided bilingual ranked l

ists

Task 1:Multilingual Retrieval System• Method Overview

Task 1:Multilingual Retrieval System• Text Preprocessing

– Stop words– Stemming– Decompounding– Word translation

Task 1:Multilingual Retrieval System• Method Overview

Task 1: Multilingual Retrieval System• Method 1:

– Multilingual Retrieval via Query Translation no query feedback Raw score merge and Okapi system

• Method 2: – Multilingual Retrieval via Query Translation with query feedback Raw score merge and Okapi system

• Method 3: – Multilingual Retrieval via Document Translation no query feedback R

aw score merge and Okapi system

• Method 4:– Multilingual Retrieval via Document Translation with query feedback

Raw score merge and Okapi system

• Method 5: UniNe System

Task 1:Multilingual Retrieval System• Method Overview

Task 1:Multilingual Retrieval System• Normalization

– drsk_mj denote the raw document score for the jth document retrieved from the mth ranked list for kth query,

Task 1:Multilingual Retrieval System• Method Overview

Task 1:Multilingual Retrieval System• Combine Multilingual Ranked Lists

– (wm , rm) represents the weight of the vote and the exponential normalization factor for the mth ranked list

Task 1:Experimental Results: Multilingual Retrieval

• Qry/Doc: what was translated • fb/nofb: with/without pseudo relevance back • UniNe: UniNE system

Task 1:Experimental Results: Multilingual Retrieval

• MX: Combine models• W1/Trn: Equal or learned weights

Task 2:Results Merge for Multilingual Retrieval

• merge ranked lists of eight different languages (i.e., bilingual or monolingual) into a single final list

• Logistic model (rank , doc score) language-specific methods

query-specific & language-specific

Task 2:Results Merge for Multilingual Retrieval

• Learn Query-Independent and Language-Specific Merging Model

• estimated probability of relevance of document dk_ij

• Model parameter

– maximizing the log-likelihood (MLE)

– maximizing MAP

Task 2:Results Merge for Multilingual Retrieval

• Learn Query-Specific and Language-Specific Merging Model

– Calculate comparable scores for top ranked documents in each language

(1) Combine scores of query-based and doc-based translation methods

(2) Build language-specific query-specific logistic models to transform language-specific scores to comparable scores

Task 2:Results Merge for Multilingual Retrieval

(2) Build language-specific query-specific logistic models to transform language-specific scores to comparable scores

• logistic model parameter estimate– minimize the mean squared error between exact normalized

comparable scores and the estimated comparable scores• Estimate comparable scores for all retrieved documents in each

language• Use comparable scores to create a merged multilingual result list

Task 2:Experimental Results: Results Merge• Query-independent , language-specific• Mean average precision of merged multilingual lists of dif

ferent methods on UniNE result lists

• Mean average precision of merged multilingual lists of different methods on HummingBird result lists

MAP is more accurate

Task 2:Experimental Results: Results Merge• Query-specific , language-specific

• Mean average precision of merged multilingual lists of different methods on UniNE result lists

• C_X: top X docs from each list merged by exact comparable scores.

• Top_X_0.5: top X docs from each list downloaded for logistic model to estimate comparable scores and combine them with exact scores by equal weight

This means that the combination of estimated comparable scores and exact comparable scores can be more accurate than exact comparable scores in some cases

Task 2:Experimental Results: Results Merge• Query-specific , language-specific • Mean average precision of merged multilingual lists of dif

ferent methods on HummingBird result lists

• Outperform query-independent and language-specific algorithm