Desktop Search Evaluation
description
Transcript of Desktop Search Evaluation
![Page 1: Desktop Search Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022062803/568148d3550346895db5ef65/html5/thumbnails/1.jpg)
21/04/23 1Gianluca Demartini
Desktop Search Evaluation
Sergey Chernov and Gianluca Demartini
TREC 2006, 16th November 2006
Pre-Track Workshop
![Page 2: Desktop Search Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022062803/568148d3550346895db5ef65/html5/thumbnails/2.jpg)
21/04/23 2Gianluca Demartini
Outline Why we need a Desktop Track?
What are the settings?
Does it solve THE Privacy Problem?
What we do next?
![Page 3: Desktop Search Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022062803/568148d3550346895db5ef65/html5/thumbnails/3.jpg)
Microsoft
Copernicus Beagle
Google Yahoo
And roughly 20 more…
How we compare their performance?
![Page 4: Desktop Search Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022062803/568148d3550346895db5ef65/html5/thumbnails/4.jpg)
Proposed Track Building the DataSet
Personal documents Include activity logs containing the history of each
file, query logs, email and clipboard usage, instant messenger history, ...
Activity logs and metadaat should substitute missing hyperlink structure on a desktop
![Page 5: Desktop Search Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022062803/568148d3550346895db5ef65/html5/thumbnails/5.jpg)
Main problem
Privacy Issue
Track will not run in 2007
Can be proposed for 2008
![Page 6: Desktop Search Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022062803/568148d3550346895db5ef65/html5/thumbnails/6.jpg)
Questions 1 how to build the collection? (desktops from
participants?) 2 how to protect privacy? 3 Data? (text docs, mails, pics, audio) 4 Tasks? 5 Topics? 6 Evaluation measures? binary or multi-graded
relevance? 7 Logged information? Logged applications?
![Page 7: Desktop Search Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022062803/568148d3550346895db5ef65/html5/thumbnails/7.jpg)
“Permanent” Information to Log
Permanent Information (Applied to)
URL (HTML)
Author (All files)
Recipients (Email messages)
Metadata tags (MP3)
Has/is attachment (Emails and attachments)
Saved picture's URL and saving time (Graphic files)
![Page 8: Desktop Search Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022062803/568148d3550346895db5ef65/html5/thumbnails/8.jpg)
“Timeline” Information to Log
• Timeline information (Applied to) • Time of being in focus ( All files)• Time of being opened ( All files)• Being edited ( All files)• History of moving/renaming ( All files)• Request type: bookmark, clicked link, typed URL ( HTML)• Adding/editing an entry in calendar and tasks (Outlook Journal)• Being printed (All files)• Search queries in Google/MSN Search/Yahoo!/etc. (Browser search field)• Clicked links (HTML)• Text selections from the clipboard Text pieces within a file and the filename
(Text files)• Bookmarking time (Browser bookmarks)• Instant Messenger status, contact's statuses, sent filenames and links (IM
History)• Running applications (Task queue)• IP address User's address and addresses user connects to• Email status Change between received/read (email client)
![Page 9: Desktop Search Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022062803/568148d3550346895db5ef65/html5/thumbnails/9.jpg)
Data Gathering
Data is not publicly avalilable
Data format is known
Retrieval Systems can be run on the data by track coordinator and results are sent back (See Spam Track)
![Page 10: Desktop Search Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022062803/568148d3550346895db5ef65/html5/thumbnails/10.jpg)
Collection Structure
Text Documents, EMails and Instant Messages – yes Images - ??? Audio – only metadata would be extracted Video - no What else?
![Page 11: Desktop Search Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022062803/568148d3550346895db5ef65/html5/thumbnails/11.jpg)
Proposed Tasks
AdHoc Retrieval Task Find several documents
containing pieces of necessary information
Known-Item Retrieval Task find a single specific
document
Folder Retrieval Task Find the folders with
the relevant information
![Page 12: Desktop Search Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022062803/568148d3550346895db5ef65/html5/thumbnails/12.jpg)
Topic Format
title Eleonet project deliverable June
metadata date:June topic:Eleonet project type:deliverable
task description I am combining a new deliverable for the Eleonet project.
narrative I am combining a new deliverable for the Eleonet project and
I am looking for the last deliverable of the same type. I remember that the main contribution to this document has been done in June 2006.
![Page 13: Desktop Search Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022062803/568148d3550346895db5ef65/html5/thumbnails/13.jpg)
Relevance & Evaluation Measures trec_eval to a set of common metrics Binary relevance assessments or 3 levels? Ranking is important:
MAP Gain & Discount Metrics (DCG, nDCG, AWP, AGR, Q-
m) Uncomplete assessments:
Bpref (/Rpref)
![Page 14: Desktop Search Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022062803/568148d3550346895db5ef65/html5/thumbnails/14.jpg)
Logged Applications
Acrobat Reader
MS Word
MS Excel
MS Powerpoint
MS Internet Explorer
MS Outlook
Mozilla Firefox
Mozilla Thunderbird
![Page 15: Desktop Search Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022062803/568148d3550346895db5ef65/html5/thumbnails/15.jpg)
The same questions again
1 how to build the collection? (desktops from participants?)
2 how to protect privacy?
3 Data? (text docs, mails, pics, audio)
4 Tasks? 5 Topics? 6 Evaluation measures? 7 Logged information?
Logged applications?
![Page 16: Desktop Search Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022062803/568148d3550346895db5ef65/html5/thumbnails/16.jpg)
Desktop Search Workshop Summary
• Strong interest – about 20 participants
• Main novelty – activity logs
• Privacy is still an issue
• We need a clear task definition (suggestion: “Find all documents related to a project”?)
• We are planning a workshop to discuss it further
• A mailing list is available – to subscribe visit https://info.l3s.uni-hannover.de/mailman/listinfo/personal-activity-search