Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD...
-
Upload
christian-washington -
Category
Documents
-
view
215 -
download
0
Transcript of Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD...
![Page 1: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/1.jpg)
going further together
Information Search & Retrieval:Problems, solutions, trends…
Tony Rose, PhD MBCS CEngVice-Chair, BCS IRSG
![Page 2: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/2.jpg)
Contents
The BCS Information Retrieval SG
What is IR anyway?
How search engines work
Why search is hard
Where’s it all going?
![Page 3: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/3.jpg)
Information Retrieval SG
Growing rapidly– 750+ members
Annual conference (ECIR)– FDIA
Various 1-day events– Search Solutions
Informer
Discounts for various events, e.g. SIGIR
… is free to join!
![Page 4: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/4.jpg)
Information Retrieval SG
Traditional focus on search (text retrieval)– Knowledge management, Multimedia retrieval, User experience,
Information visualisation, extraction, summarisation, etc.
Latest issue of Informer:– “Searching for the Music You Like”– “Exploring Maps through Geo-referenced Images and RDF
Shared Metadata”– “Using Semantic Relations to improve Question Answering”– “Modeling & Annotation of Dance Media Semantics”
![Page 5: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/5.jpg)
What is IR?
“Science of searching for:– information in documents– documents themselves– metadata which describe documents,– within databases
…whether relational stand-alone databases or hypertextually-networked databases such as the World Wide Web”
![Page 6: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/6.jpg)
The Need for IR
In a word … Infoglut
800Mb of recorded information is produced per person per year [Computing magazine]
Up to 80% of corporate information is unstructured– Documents, emails, images, voicemail, etc.
So …can’t we just use Google?
![Page 7: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/7.jpg)
How do Search Engines Work?
On the surface:
1. Understand what the user wants
2. Find documents about that topic
In reality:
1. Count words
2. Apply a simple equation
![Page 8: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/8.jpg)
How do Search Engines Work?
1. Measure the conceptual distance between your query and each document in the DB
2. Return the best matches
[Source: Maristella Agosti, University of Padova]
![Page 9: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/9.jpg)
The Central Problem in IR
Information Seeker Author
Concepts Concepts
Query Terms Document Terms
Do these represent the same concepts?
[Source: Jimmy Lin, University of Maryland]
![Page 10: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/10.jpg)
The Central Problem in IR
How do you represent the concepts?– Documents and queries = “bag of words”
• Unordered set of terms + numeric weights
How do you calculate similarity?– Set theory (e.g. Boolean)– Algebraic (e.g. vector space)– Probabilistic
![Page 11: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/11.jpg)
IR models
[Source: Wikipedia]
![Page 12: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/12.jpg)
Assume that results are either relevant or non-relevant
Precision:– Proportion of retrieved documents that are relevantRecall:– Proportion of known-relevant documents that were
actually retrievedBut what about: indexing / retrieval speed, query language, user experience, etc?
How do we Evaluate Search?
relevant retrieved
![Page 13: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/13.jpg)
Why Search is Hard
Document representation– Keywords are not enough
•Blind Venetian = Venetian Blind
– Terms are not independent• Structural & discourse dependencies, co-
references, etc.
Imperfect “stop lists”– the, and, of…
![Page 14: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/14.jpg)
Why Search is Hard
Morphological relationships– Computer, computing, compute, computed…
Index documents using word stems– False positives:
– organization, organ organ– police, policy polic– arm, army arm
– False negatives:– cylinder, cylindrical– create, creation– Europe, European
– Prefixes are particularly difficult– Un*, dis*– Delegate = de-leg-ate– Ratify = rat-ify
![Page 15: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/15.jpg)
Why Search is Hard
Named entity recognition– Companies in New York– New companies in YorkNEs are highly discriminatory– People– Places– OrganisationsMany vertical applications– e.g. bioscience
![Page 16: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/16.jpg)
Why Search is Hard
Semantic relationships– Car = automobile– Buy = purchase– Sick = ill
Synonym rings– Car, automobile, truck, bus, taxi...– Appropriate level of abstraction depends on user & task
Development of subject-specific taxonomies– “concept matching”
![Page 17: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/17.jpg)
Why Search is Hard
Word sense disambiguation– “Bank”
• Financial institution?• Part of a river?• An aerial manoeuvre?
Active research area– Categorisation & clustering of results
![Page 18: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/18.jpg)
Google’s Insight
Exploit the link structure inherent in the web– calculate measure of document’s value
• Independent of any query
– “PageRank”
Overall relevance based on 100+ parameters– Constant battle with SEOs
Enterprise search is a different proposition…– As is desktop search
![Page 19: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/19.jpg)
Where’s it all going?
Vertical search– Jobs, travel, health, people, etc.Rich media search– Audio, video, TV, imagesSpecialised content search– blogs, news, classifiedsSocial searchPersonalisation
![Page 20: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/20.jpg)
Where’s it all going?
Mobile search
Answer engines– Active research community in Question Answering
Multi / cross-lingual search
Search agentsHuman UI
![Page 21: Going further together Information Search & Retrieval: Problems, solutions, trends… Tony Rose, PhD MBCS CEng Vice-Chair, BCS IRSG.](https://reader036.fdocuments.in/reader036/viewer/2022062804/56649cf45503460f949c185f/html5/thumbnails/21.jpg)
Further Information
www.irsg.bcs.org
Informer
ECIR (March 2008, Glasgow)
Search Solutions 2008 (Sept 2008, London)