1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project:...
-
Upload
easter-robyn-copeland -
Category
Documents
-
view
224 -
download
0
description
Transcript of 1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project:...
113/05/07 1/20LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
The INFILE project: a The INFILE project: a crosslingual filtering systems crosslingual filtering systems
evaluation campaignevaluation campaign
Romaric Besançon , Stéphane Chaudiron, Djamel Mostefa, Ismaïl Timimi, Khalid Choukri
213/05/07 2/20
Overview
Goals and features of the INFILE campaign Test collections:
DocumentsTopicsAssessments
Evaluation protocolEvaluation procedureEvaluation metrics
Conclusions
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
313/05/07 3/20
Goals and features of the INFILE Campaign
Information Filtering Evaluation filter documents according to long-term
information needs (user profiles - topics)Adaptive : use simulated user feedbackFollowing TREC adaptive filtering task
Crosslingual three languages: English, French, Arabic
close to real activity of competitive intelligence professionals
in particular, profiles developed by CI professional (STI) pilot track in CLEF 2008
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
413/05/07 4/20
Test Collection
Built from a corpus of news from the AFP (Agence France Presse)
almost 1.5 million news in French, English and Arabic
For the information filtering task:100 000 documents to filter, in each language
NewsML formatstandard XML format for news (IPTC)
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
513/05/07 5/20
Document example
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
document identifier
keywords
headline
613/05/07 6/20
Document example
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
location
IPTC category
AFP category
content
713/05/07 7/20
Profiles
50 interest profiles
20 profiles in the domain of science and technology
developped by CI professionals from INIST, ARIST, Oto Research, Digiport
30 profiles of general interest
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
813/05/07 8/20
Profiles
Each profile contains 5 fields: title: a few words descriptiondescription: a one-sentence descriptionnarrative: a longer description of what is
considered a relevant documentkeywords: a set of key words, key phrases or
named entitiessample: a sample of relevant document (one
paragraph) Participants may use any subset of the fields
for their filtering
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
913/05/07 9/20
Constitution of the corpus
To build the corpus of documents to filter: find relevant documents for the profiles in
the original corpususe a pooling technique with results of IR
tools
the whole corpus is indexed with 4 IR engines (Lucene, Indri, Zettair and CEA search engine)
each search engine is queried independently using the 5 different fields of the profiles + all fields + all fields but the sample
28 runs
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
1013/05/07 10/20
Constitution of the corpus (2)
pooling using a “Mixture of Experts” model first 10 documents of each run is taken
first pool assessed a score is computed for each run and each topic
according to the assessments of the first pool create next pool by merging runs using a weighted
sumweights are proportional to the score
ongoing assessments keep all documents assessed
documents returned by IR systems by judged not relevant form a set of difficult documents
choose random documents (noise)
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
1113/05/07 11/20
Evaluation procedure
One pass test Interactive protocol using a client-server
architecture (webservice communication)participant registersretrieves one document filters the documentask for feedback (on kept documents)retrieves new document
limited number of feedbacks (50) new document available only if previous one has
been filtered
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
1213/05/07 12/20
Evaluation metrics
Precision / Recall/F-measure
Utility (from TREC)
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
relevanta bc d
not relevantretrievednot retrieved
P=a/a+b R=a/a+c
F=2PR/P+R
u=w1∗a-w2∗b
min
minminmax
1max
uu,uu/u=un
1313/05/07 13/20
Evaluation metrics (2)
Detection cost (from TDT) uses probability of missed documents and false alarms
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
relevant not relevantretrieved a bnot retrieved c d
cac=Pmiss /
dbb=Pfalse /
topicfalsefalsetopicmissmissdet PPcPPc=c 1
1413/05/07 14/20
Evaluation metrics
per profile and averaged on all profiles adaptivity: score evolution curve (values computed
each 10000 documents)
two experimental measuresoriginality
number of relevant documents a system uniquely retrieves
anticipation inverse rank of first relevant document detected
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
1513/05/07 15/20
Conclusions
INFILE campaign Information Filtering Evaluation: adaptive, crosslingual, close to real usage
Ongoing pilot track in CLEF 2008current constitution of the corpusdry run mid-Juneevaluation campaign in Julyworkshop in September
Work in progress the modelling of the filtering task assumed
by the CI practitioners
LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit