2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley...
-
date post
20-Dec-2015 -
Category
Documents
-
view
219 -
download
4
Transcript of 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley...
![Page 1: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/1.jpg)
2007.02.27 - SLIDE 1IS 240 – Spring 2007
Prof. Ray Larson University of California, Berkeley
School of InformationTuesday and Thursday 10:30 am - 12:00 pm
Spring 2007http://courses.ischool.berkeley.edu/i240/s07
Principles of Information Retrieval
Lecture 12: Evaluation Intro
![Page 2: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/2.jpg)
2007.02.27 - SLIDE 2IS 240 – Spring 2007
Today
• Mini-TREC Setup reports
• Evaluation of IR Systems – Precision vs. Recall– Cutoff Points– Test Collections/TREC– Blair & Maron Study
![Page 3: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/3.jpg)
2007.02.27 - SLIDE 3IS 240 – Spring 2007
Mini-TREC Setup Reports
• Why did you choose the system?
• How easy was it to set up the system?
• Have you indexed the data yet?– Issues and problems?
• Plans or ideas about how to improve or change the system?
![Page 4: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/4.jpg)
2007.02.27 - SLIDE 4IS 240 – Spring 2007
Today
• Mini-TREC Setup reports
• Evaluation of IR Systems – Precision vs. Recall– Cutoff Points– Test Collections/TREC– Blair & Maron Study
![Page 5: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/5.jpg)
2007.02.27 - SLIDE 5IS 240 – Spring 2007
Evaluation
• Why Evaluate?
• What to Evaluate?
• How to Evaluate?
![Page 6: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/6.jpg)
2007.02.27 - SLIDE 6IS 240 – Spring 2007
Why Evaluate?
• Determine if the system is desirable
• Make comparative assessments
• Test and improve IR algorithms
![Page 7: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/7.jpg)
2007.02.27 - SLIDE 7IS 240 – Spring 2007
What to Evaluate?
• How much of the information need is satisfied.
• How much was learned about a topic.
• Incidental learning:– How much was learned about the collection.– How much was learned about other topics.
• How inviting the system is.
![Page 8: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/8.jpg)
2007.02.27 - SLIDE 8IS 240 – Spring 2007
Relevance
• In what ways can a document be relevant to a query?– Answer precise question precisely.– Partially answer question.– Suggest a source for more information.– Give background information.– Remind the user of other knowledge.– Others ...
![Page 9: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/9.jpg)
2007.02.27 - SLIDE 9IS 240 – Spring 2007
Relevance
• How relevant is the document– for this user for this information need.
• Subjective, but• Measurable to some extent
– How often do people agree a document is relevant to a query
• How well does it answer the question?– Complete answer? Partial? – Background Information?– Hints for further exploration?
![Page 10: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/10.jpg)
2007.02.27 - SLIDE 10IS 240 – Spring 2007
What to Evaluate?
What can be measured that reflects users’ ability to use system? (Cleverdon 66)
– Coverage of Information– Form of Presentation– Effort required/Ease of Use– Time and Space Efficiency– Recall
• proportion of relevant material actually retrieved
– Precision• proportion of retrieved material actually relevant
effe
ctiv
enes
s
![Page 11: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/11.jpg)
2007.02.27 - SLIDE 11IS 240 – Spring 2007
Relevant vs. Retrieved
Relevant
Retrieved
All docs
![Page 12: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/12.jpg)
2007.02.27 - SLIDE 12IS 240 – Spring 2007
Precision vs. Recall
Relevant
Retrieved
|Collectionin Rel|
|edRelRetriev| Recall
|Retrieved|
|edRelRetriev| Precision
All docs
![Page 13: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/13.jpg)
2007.02.27 - SLIDE 13IS 240 – Spring 2007
Why Precision and Recall?
Get as much good stuff while at the same time getting as little junk as possible.
![Page 14: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/14.jpg)
2007.02.27 - SLIDE 14IS 240 – Spring 2007
Retrieved vs. Relevant Documents
Relevant
Very high precision, very low recall
![Page 15: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/15.jpg)
2007.02.27 - SLIDE 15IS 240 – Spring 2007
Retrieved vs. Relevant Documents
Relevant
Very low precision, very low recall (0 in fact)
![Page 16: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/16.jpg)
2007.02.27 - SLIDE 16IS 240 – Spring 2007
Retrieved vs. Relevant Documents
Relevant
High recall, but low precision
![Page 17: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/17.jpg)
2007.02.27 - SLIDE 17IS 240 – Spring 2007
Retrieved vs. Relevant Documents
Relevant
High precision, high recall (at last!)
![Page 18: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/18.jpg)
2007.02.27 - SLIDE 18IS 240 – Spring 2007
Precision/Recall Curves
• There is a tradeoff between Precision and Recall• So measure Precision at different levels of Recall• Note: this is an AVERAGE over MANY queries
precision
recall
x
x
x
x
![Page 19: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/19.jpg)
2007.02.27 - SLIDE 19IS 240 – Spring 2007
Precision/Recall Curves
• Difficult to determine which of these two hypothetical results is better:
precision
recall
x
x
x
x
![Page 20: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/20.jpg)
2007.02.27 - SLIDE 20IS 240 – Spring 2007
Precision/Recall Curves
![Page 21: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/21.jpg)
2007.02.27 - SLIDE 21IS 240 – Spring 2007
Document Cutoff Levels
• Another way to evaluate:– Fix the number of relevant documents retrieved at
several levels:• top 5• top 10• top 20• top 50• top 100• top 500
– Measure precision at each of these levels– Take (weighted) average over results
• This is sometimes done with just number of docs• This is a way to focus on how well the system
ranks the first k documents.
![Page 22: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/22.jpg)
2007.02.27 - SLIDE 22IS 240 – Spring 2007
Problems with Precision/Recall
• Can’t know true recall value – except in small collections
• Precision/Recall are related– A combined measure sometimes more
appropriate
• Assumes batch mode– Interactive IR is important and has different
criteria for successful searches– We will touch on this in the UI section
• Assumes a strict rank ordering matters.
![Page 23: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/23.jpg)
2007.02.27 - SLIDE 23IS 240 – Spring 2007
Relation to Contingency Table
• Accuracy: (a+d) / (a+b+c+d)• Precision: a/(a+b)• Recall: ?• Why don’t we use Accuracy for
IR?– (Assuming a large collection)– Most docs aren’t relevant – Most docs aren’t retrieved– Inflates the accuracy value
Doc is Relevant
Doc is NOT relevant
Doc is retrieved a b
Doc is NOT retrieved c d
![Page 24: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/24.jpg)
2007.02.27 - SLIDE 24IS 240 – Spring 2007
The E-Measure
Combine Precision and Recall into one number (van Rijsbergen 79)
PRb
bE
1
11 2
2
P = precisionR = recallb = measure of relative importance of P or R
For example,b = 0.5 means user is twice as interested in
precision as recall
)1/(1
1)1(
11
1
2
RP
E
![Page 25: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/25.jpg)
2007.02.27 - SLIDE 25IS 240 – Spring 2007
Old Test Collections
• Used 5 test collections– CACM (3204)– CISI (1460)– CRAN (1397)– INSPEC (12684)– MED (1033)
![Page 26: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/26.jpg)
2007.02.27 - SLIDE 26IS 240 – Spring 2007
TREC
• Text REtrieval Conference/Competition– Run by NIST (National Institute of Standards &
Technology)– 2001 was the 10th year - 11th TREC in November
• Collection: 5 Gigabytes (5 CRDOMs), >1.5 Million Docs– Newswire & full text news (AP, WSJ, Ziff, FT, San Jose
Mercury, LA Times)– Government documents (federal register,
Congressional Record)– FBIS (Foreign Broadcast Information Service)– US Patents
![Page 27: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/27.jpg)
2007.02.27 - SLIDE 27IS 240 – Spring 2007
TREC (cont.)
• Queries + Relevance Judgments– Queries devised and judged by “Information Specialists”– Relevance judgments done only for those documents
retrieved -- not entire collection!
• Competition– Various research and commercial groups compete (TREC
6 had 51, TREC 7 had 56, TREC 8 had 66)– Results judged on precision and recall, going up to a
recall level of 1000 documents
• Following slides from TREC overviews by Ellen Voorhees of NIST.
![Page 28: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/28.jpg)
2007.02.27 - SLIDE 28IS 240 – Spring 2007
![Page 29: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/29.jpg)
2007.02.27 - SLIDE 29IS 240 – Spring 2007
![Page 30: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/30.jpg)
2007.02.27 - SLIDE 30IS 240 – Spring 2007
![Page 31: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/31.jpg)
2007.02.27 - SLIDE 31IS 240 – Spring 2007
![Page 32: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/32.jpg)
2007.02.27 - SLIDE 32IS 240 – Spring 2007
![Page 33: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/33.jpg)
2007.02.27 - SLIDE 33IS 240 – Spring 2007
![Page 34: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/34.jpg)
2007.02.27 - SLIDE 34IS 240 – Spring 2007
Sample TREC queries (topics)
<num> Number: 168<title> Topic: Financing AMTRAK
<desc> Description:A document will address the role of the Federal Government in financing the operation of the National Railroad Transportation Corporation (AMTRAK)
<narr> Narrative: A relevant document must provide information on the government’s responsibility to make AMTRAK an economically viable entity. It could also discuss the privatization of AMTRAK as an alternative to continuing government subsidies. Documents comparing government subsidies given to air and bus transportation with those provided to aMTRAK would also be relevant.
![Page 35: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/35.jpg)
2007.02.27 - SLIDE 35IS 240 – Spring 2007
![Page 36: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/36.jpg)
2007.02.27 - SLIDE 36IS 240 – Spring 2007
![Page 37: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/37.jpg)
2007.02.27 - SLIDE 37IS 240 – Spring 2007
![Page 38: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/38.jpg)
2007.02.27 - SLIDE 38IS 240 – Spring 2007
![Page 39: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/39.jpg)
2007.02.27 - SLIDE 39IS 240 – Spring 2007
![Page 40: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/40.jpg)
2007.02.27 - SLIDE 40IS 240 – Spring 2007
![Page 41: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/41.jpg)
2007.02.27 - SLIDE 41IS 240 – Spring 2007
![Page 42: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/42.jpg)
2007.02.27 - SLIDE 42IS 240 – Spring 2007
![Page 43: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/43.jpg)
2007.02.27 - SLIDE 43IS 240 – Spring 2007
![Page 44: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/44.jpg)
2007.02.27 - SLIDE 44IS 240 – Spring 2007
![Page 45: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/45.jpg)
2007.02.27 - SLIDE 45IS 240 – Spring 2007
![Page 46: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/46.jpg)
2007.02.27 - SLIDE 46IS 240 – Spring 2007
TREC
• Benefits:– made research systems scale to large collections
(pre-WWW)– allows for somewhat controlled comparisons
• Drawbacks:– emphasis on high recall, which may be unrealistic for
what most users want– very long queries, also unrealistic– comparisons still difficult to make, because systems
are quite different on many dimensions– focus on batch ranking rather than interaction
• There is an interactive track.
![Page 47: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/47.jpg)
2007.02.27 - SLIDE 47IS 240 – Spring 2007
TREC is changing
• Ad hoc track suspended in TREC 9
• Emphasis on specialized “tracks”– Interactive track– Natural Language Processing (NLP) track– Multilingual tracks (Chinese, Spanish)– Filtering track– High-Precision– High-Performance
• http://trec.nist.gov/
![Page 48: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/48.jpg)
2007.02.27 - SLIDE 48IS 240 – Spring 2007
TREC Results
• Differ each year
• For the main track:– Best systems not statistically significantly
different– Small differences sometimes have big effects
• how good was the hyphenation model• how was document length taken into account
– Systems were optimized for longer queries and all performed worse for shorter, more realistic queries
![Page 49: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/49.jpg)
2007.02.27 - SLIDE 49IS 240 – Spring 2007
The TREC_EVAL Program
• Takes a “qrels” file in the form…– qid iter docno rel
• Takes a “top-ranked” file in the form…– qid iter docno rank sim run_id – 030 Q0 ZF08-175-870 0 4238 prise1
• Produces a large number of evaluation measures. For the basic ones in a readable format use “-o”
• Demo…
![Page 50: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/50.jpg)
2007.02.27 - SLIDE 50IS 240 – Spring 2007
Blair and Maron 1985
• A classic study of retrieval effectiveness– earlier studies were on unrealistically small collections
• Studied an archive of documents for a legal suit– ~350,000 pages of text– 40 queries– focus on high recall– Used IBM’s STAIRS full-text system
• Main Result: – The system retrieved less than 20% of the
relevant documents for a particular information need; lawyers thought they had 75%
• But many queries had very high precision
![Page 51: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/51.jpg)
2007.02.27 - SLIDE 51IS 240 – Spring 2007
Blair and Maron, cont.
• How they estimated recall– generated partially random samples of unseen
documents– had users (unaware these were random) judge them
for relevance
• Other results:– two lawyers searches had similar performance– lawyers recall was not much different from paralegal’s
![Page 52: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/52.jpg)
2007.02.27 - SLIDE 52IS 240 – Spring 2007
Blair and Maron, cont.
• Why recall was low– users can’t foresee exact words and phrases that will
indicate relevant documents• “accident” referred to by those responsible as:“event,” “incident,” “situation,” “problem,” …• differing technical terminology• slang, misspellings
– Perhaps the value of higher recall decreases as the number of relevant documents grows, so more detailed queries were not attempted once the users were satisfied
![Page 53: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/53.jpg)
2007.02.27 - SLIDE 53IS 240 – Spring 2007
What to Evaluate?
• Effectiveness– Difficult to measure– Recall and Precision are one way– What might be others?
![Page 54: 2007.02.27 - SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d415503460f94a1cad0/html5/thumbnails/54.jpg)
2007.02.27 - SLIDE 54IS 240 – Spring 2007
Next Time
• No Class Thursday
• Attend Andrei Broder’s talk in CS this afternoon (4-5 in 310 Soda Hall)
• Next Week– Calculating standard IR measures
• and more on trec_eval
– Theoretical limits of Precision and Recall– Intro to Alternative evaluation metrics