© CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in...
-
Upload
eduardo-hatch -
Category
Documents
-
view
216 -
download
1
Transcript of © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in...
![Page 1: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/1.jpg)
© CvR SIGIR2002
![Page 2: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/2.jpg)
© CvR SIGIR2002
Keith van RijsbergenTampere 12th August, 2002
Landmarks in Information Retrieval: the message out of the bottle
![Page 3: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/3.jpg)
© CvR SIGIR2002
Introductory Remarks
• Exclusions – IE, TM, ..
• Commercial successes and failures
• Caveats
• Why we have survived.
• Where we were, where we are, where we are going.
![Page 4: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/4.jpg)
© CvR SIGIR2002
Pre-history
Smee (1850)Wells (1936)Bush (1945)Bagley (1951) MITFairthorne (1945-52) RAELuhn (1958)Mooers (1952)
![Page 5: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/5.jpg)
© CvR SIGIR2002
Experimental Methodology
Cleverdon CranfieldLancaster MedlarsKeen Cranfield/SmartSaracevic CWRUSalton SmartSparck Jones Ideal Test CollectionBlair & Maron StairsHarman TREC
![Page 6: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/6.jpg)
© CvR SIGIR2002
Evaluation
ABNO/OBNA (Fairthorne)Precision, Recall -> trade-off (Cleverdon)Probabilistic versions (Swets)Measure-theoretic (Bollman)
![Page 7: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/7.jpg)
© CvR SIGIR2002
‘the world in 1980 according to Belver Griffith’
Who is missing?
![Page 8: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/8.jpg)
© CvR SIGIR2002
Landmarks
Luhn’s tf weightingArchitectureRelevance FeedbackStemmingPoisson Model -> BM25Statistical weighting tf*idfVarious models
![Page 9: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/9.jpg)
© CvR SIGIR2002
Luhn’s curve
![Page 10: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/10.jpg)
© CvR SIGIR2002
What about evaluation?
InformationProblem
IndexedObjects
Query
FictiveObjects
Representation Representation
Compare
![Page 11: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/11.jpg)
© CvR SIGIR2002
Architecture (Brenda Gerrie, 1983)
![Page 12: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/12.jpg)
© CvR SIGIR2002
Time I (highlights for me)1952 Mooers coins IR1958 International Conference on Scientific Information1960 Cranfield I1960 Maron and Kuhns paper1961 Towards IR, RAF1961 (-1965) Smart built1964 Washington conference on Association Methods1966 Cranfield II1968 Salton’s first book197- Cranfield conferences1975 CvR’s book1975 Ideal test collection1976 KSJ/SER JASIS paper
![Page 13: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/13.jpg)
© CvR SIGIR2002
Time II1978 1st SIGIR1979 1st BCSIRSG1980 1st joint ACM/BCS conference on IR1981 KSJ book on IR Experiments1982 Belkin et al ASK hypothesis1983 - Okapi started1985 RIAO-11986 CvR logic model1990 Deerwester et al,LSI paper1991 CoLIS 1 (in Tampere!)1991 – Inquiry started1992 Ingwersen’s book1992 TREC-11998 Croft Ponte paper on language models
![Page 14: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/14.jpg)
© CvR SIGIR2002
Matching
Inference
Model
Classification
Query Language
Query Definition
Query Dependence
Items wanted
Error response
Logic
Exact Match Partial (best) Match
Deduction Induction
Deterministic Probabilistic
Monothetic Polythetic
Artificial Natural
Complete Incomplete
Yes No
Matching Relevant
Sensitive Insensitive
Classical Non-classical
Representation a priori a posteriori
Language Models Logical Statistical
dimensions
![Page 15: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/15.jpg)
© CvR SIGIR2002
Probabilistic Retrieval
Maron and KuhnsMiller (following Goffman)SER/KSJCroft
![Page 16: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/16.jpg)
© CvR SIGIR2002
Vector Space Model
SaltonMurrayRocchio
![Page 17: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/17.jpg)
© CvR SIGIR2002
Logical Model
Mooers/Faithorne 1960+Hillman 1965Cooper/Maron 1970+CvR 1986Nie/Amati/Bruza/Huibers 1990+
For
Against
Bar-Hillel 1950+Kasher 1966
![Page 18: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/18.jpg)
© CvR SIGIR2002
Buried Treasure
Dependence e.g C.T YuUnified Probabilistic Model Maron/Cooper/SERCo-relevance IvieStochastic Processes Mandelbrot/HerdanBrouwerian Logics HillmanError Analysis Hughes/Cover/Duda
![Page 19: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/19.jpg)
© CvR SIGIR2002
Hypotheses/Principles
P & R trade-off – ABNO/OBNAExhaustivity/SpecificityCluster HypothesisAssociation HypothesisProbability Ranking PrincipleLogical Uncertainty PrincipleASKPolyrepresentation
Items may be associated without apparent meaning butexploiting their association may help retrieval
![Page 20: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/20.jpg)
© CvR SIGIR2002
Postulates of Impotence(according to Swanson, 1988)
• An information need cannot be expressed independent of context
• It is impossible to instruct a machine to translate a request into adequate search terms
• A document’s relevance depends on other seen documents
• It is never possible to verify whether all relevant documents have been found
• Machines cannot recognise meaning -> can’t beat human indexing etc
![Page 21: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/21.jpg)
© CvR SIGIR2002
….more postulates
• Word-occurrence statistics can neither represent meaning nor substitute for it
• The ability of an IR system to support an iterative process cannot be evaluated in terms of single-iteration human relevance judgment
• You can have either subtle relevance judgments or highly effective mechanised procedures, but not both
• Thus, consistently effective fully automatic in dexing and retrieval is not possible
![Page 22: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/22.jpg)
© CvR SIGIR2002
?
Conclusions
![Page 23: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/23.jpg)
© CvR SIGIR2002
Co-ordination is positively correlated with external relevanceJackson, 1969 – Association Hypothesis
The larger the number of matching descriptive items, for arequest and document, the more likely the document is to berelevant to the requestSparck Jones, 1971- Relevance Hypothesis
Matching
![Page 24: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/24.jpg)
© CvR SIGIR2002
It is a common fallacy, underwritten at this date by theinvestment of several million dollars in a variety of retrievalhardware, that the algebra of Boole (1847) is the appropriateformalism for retrieval design…..The ‘logic’ of Brouwer,as invoked by Fairthorne, is one such weakening of thepostulate system,……Mooers, 1961
Another one:Logical Uncertainty PrincipleCvR, 1986
Inference
![Page 25: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/25.jpg)
© CvR SIGIR2002
Co-occurrence [of terms] as a basis for grouping makesfor good swops i.e. permits substitutions which retrieverelevant rather than irrelevant documents.Sparck Jones, 1971. – Classification Hypothesis
If an index term is good at discriminating relevant fromnon-relevant document then any closely associated index termis also likely to be good at this. CvR, 1979 – Association Hypothesis
Closely associated documents tend to be relevant to the samerequests – CvR, 1971 - Cluster Hypothesis
Classification
![Page 26: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/26.jpg)
© CvR SIGIR2002
Vector Space/LSIProbabilisticLogical
Models
![Page 27: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/27.jpg)
© CvR SIGIR2002
Query Language
Artificial/Natural
Multilingual/cross-lingual
images
none at all
![Page 28: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/28.jpg)
© CvR SIGIR2002
Query Definition
Complete/Incomplete
Independence/Dependence
Weighted/Unweighted
Query Expansion/one shot (feedback, web)
Sense disambiguation
Cross-lingual
![Page 29: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/29.jpg)
© CvR SIGIR2002
Relevance Feedback
Ostensive Retrieval
Context
Query Expansion
Query Dependence
![Page 30: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/30.jpg)
© CvR SIGIR2002
Relevance
ASK: Anomolous State of Knowledge
Situated Relevance
Items wanted
![Page 31: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/31.jpg)
© CvR SIGIR2002
Precision and Recall
Error response
![Page 32: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/32.jpg)
© CvR SIGIR2002
Logic
standard/non-standard
probabilistic logic
information flow/logic
![Page 33: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/33.jpg)
© CvR SIGIR2002
Discrimination/Representation
Specificity/Exhaustivity
Representation
![Page 34: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/34.jpg)
© CvR SIGIR2002
NLP
Montague Semantics
Language Models
Stochastic
![Page 35: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle.](https://reader036.fdocuments.in/reader036/viewer/2022062417/551ac184550346b2288b53db/html5/thumbnails/35.jpg)
© CvR SIGIR2002