25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC...

16
23/06/22 1 Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop

Transcript of 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC...

Page 1: 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.

20/04/23 1Gianluca Demartini

Desktop Search Evaluation

Sergey Chernov and Gianluca Demartini

TREC 2006, 16th November 2006

Pre-Track Workshop

Page 2: 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.

20/04/23 2Gianluca Demartini

Outline Why we need a Desktop Track?

What are the settings?

Does it solve THE Privacy Problem?

What we do next?

Page 3: 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.

Microsoft

Copernicus Beagle

Google Yahoo

And roughly 20 more…

How we compare their performance?

Page 4: 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.

Proposed Track Building the DataSet

Personal documents Include activity logs containing the history of each

file, query logs, email and clipboard usage, instant messenger history, ...

Activity logs and metadaat should substitute missing hyperlink structure on a desktop

Page 5: 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.

Main problem

Privacy Issue

Track will not run in 2007

Can be proposed for 2008

Page 6: 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.

Questions 1 how to build the collection? (desktops from

participants?) 2 how to protect privacy? 3 Data? (text docs, mails, pics, audio) 4 Tasks? 5 Topics? 6 Evaluation measures? binary or multi-graded

relevance? 7 Logged information? Logged applications?

Page 7: 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.

“Permanent” Information to Log

Permanent Information (Applied to)

URL (HTML)

Author (All files)

Recipients (Email messages)

Metadata tags (MP3)

Has/is attachment (Emails and attachments)

Saved picture's URL and saving time (Graphic files)

Page 8: 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.

“Timeline” Information to Log

• Timeline information (Applied to) • Time of being in focus ( All files)• Time of being opened ( All files)• Being edited ( All files)• History of moving/renaming ( All files)• Request type: bookmark, clicked link, typed URL ( HTML)• Adding/editing an entry in calendar and tasks (Outlook Journal)• Being printed (All files)• Search queries in Google/MSN Search/Yahoo!/etc. (Browser search field)• Clicked links (HTML)• Text selections from the clipboard Text pieces within a file and the filename

(Text files)• Bookmarking time (Browser bookmarks)• Instant Messenger status, contact's statuses, sent filenames and links (IM

History)• Running applications (Task queue)• IP address User's address and addresses user connects to• Email status Change between received/read (email client)

Page 9: 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.

Data Gathering

Data is not publicly avalilable

Data format is known

Retrieval Systems can be run on the data by track coordinator and results are sent back (See Spam Track)

Page 10: 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.

Collection Structure

Text Documents, EMails and Instant Messages – yes Images - ??? Audio – only metadata would be extracted Video - no What else?

Page 11: 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.

Proposed Tasks

AdHoc Retrieval Task Find several documents

containing pieces of necessary information

Known-Item Retrieval Task find a single specific

document

Folder Retrieval Task Find the folders with

the relevant information

Page 12: 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.

Topic Format

title Eleonet project deliverable June

metadata date:June topic:Eleonet project type:deliverable

task description I am combining a new deliverable for the Eleonet project.

narrative I am combining a new deliverable for the Eleonet project and

I am looking for the last deliverable of the same type. I remember that the main contribution to this document has been done in June 2006.

Page 13: 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.

Relevance & Evaluation Measures trec_eval to a set of common metrics Binary relevance assessments or 3 levels? Ranking is important:

MAP Gain & Discount Metrics (DCG, nDCG, AWP, AGR, Q-

m) Uncomplete assessments:

Bpref (/Rpref)

Page 14: 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.

Logged Applications

Acrobat Reader

MS Word

MS Excel

MS Powerpoint

MS Internet Explorer

MS Outlook

Mozilla Firefox

Mozilla Thunderbird

Page 15: 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.

The same questions again

1 how to build the collection? (desktops from participants?)

2 how to protect privacy?

3 Data? (text docs, mails, pics, audio)

4 Tasks? 5 Topics? 6 Evaluation measures? 7 Logged information?

Logged applications?

Page 16: 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.

Desktop Search Workshop Summary

• Strong interest – about 20 participants

• Main novelty – activity logs

• Privacy is still an issue

• We need a clear task definition (suggestion: “Find all documents related to a project”?)

• We are planning a workshop to discuss it further

• A mailing list is available – to subscribe visit https://info.l3s.uni-hannover.de/mailman/listinfo/personal-activity-search