Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based...
Transcript of Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based...
![Page 1: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/1.jpg)
![Page 2: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/2.jpg)
Knowledge-based Information Retrieval
with
Wikipedia
David Milne | Ian H. Witten
The University of Waikato | New Zealand
Koru Wikipedia Link-based Measure Wikification
![Page 3: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/3.jpg)
Limitations of search engines
“Search is not solved” Current search engines
don’t understand documents don’t understand queries
![Page 4: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/4.jpg)
Knowledge-based information retrieval
Consult an external knowledge base to find out what these characters mean and proactively do stuff with them
A fairly obvious, compelling idea But one that hasn’t worked out
We haven’t had the right knowledge base Computers aren’t accurate enough Humans aren’t quick enough
![Page 5: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/5.jpg)
Wikipedia | as a knowledge base
What topics/concepts are there? ~2 million articles and categories
How are topics referred to? ~5 million titles, redirects and anchors
How do topics relate to each other? ~60 million article and category links
football team sports ball sports
rugby league touch rugby
rugbyrugby world
cup
rugby union
all blacks
australia national rugby team
RWC
New Zealand national rugby team
Wallabies
![Page 6: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/6.jpg)
Wikipedia | as a knowledge base
WordNet118,000 synsets
ResearchCyc300,000 concepts
Wikipedia2,000,000 articles
20 years
7 years
$$$$$$$
Almost free
1 language
250 languages
![Page 7: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/7.jpg)
2004 U.S. presidential
election controversy and
irregularities
George W. Bush
Dubya
Shrubya
Thief in chief
Al-Qaeda September 11
Iraq War
George Walker Bush George W. Bush
Presidents of the united states
Current national leaders
heads of state
◄ Formal structure
Wikipedia ►
Wikipedia | as a knowledge base
![Page 8: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/8.jpg)
My research goals
“Wikipedia will provide significantly improved retrieval, as it is” We don’t need to make it “tidy” It’s not a question of sophisticated NLP or AI It’s more about HCI
So lets make a search engine that consults Wikipedia, and find out!
![Page 9: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/9.jpg)
Koru
Wikipedia
Documents
Queries
Document Topics
Query Topics
RelatedTopics
WikiSaurus
![Page 10: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/10.jpg)
Koru | interface
![Page 11: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/11.jpg)
![Page 12: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/12.jpg)
![Page 13: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/13.jpg)
![Page 14: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/14.jpg)
![Page 15: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/15.jpg)
securityAND
(air carrier OR airline company OR airline industry OR flight company OR modern aviation OR passenger
aircraft….)AND
(America OR American OR American continent…)
![Page 16: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/16.jpg)
![Page 17: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/17.jpg)
![Page 18: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/18.jpg)
![Page 19: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/19.jpg)
![Page 20: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/20.jpg)
Koru | Evaluation
Wikipedia matches query terminology extremely well
Recognition and expansion of topics improves retrieval
Recognition of topics modifies query behavior Related topics need further investigation Extraction of thesaurus terms is inaccurate
“rugby world cup” vs. (“rugby world cup” OR rwc OR “web ellis cup”)
rugby world cup vs. “rugby world cup”
![Page 21: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/21.jpg)
What now
We need to improve how topics and the relations between them are extracted
Semantic Relatedness Wikification
![Page 22: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/22.jpg)
Semantic Relatedness
Given any two terms, what is the strength of the semantic relation between them?
Highly useful AI, data mining, IR, NLP
But subjective
RadioTelevision……
LifeStockCarPlaneInternetComputerKeyboardComputerPaperBookTigerTigerCatTigerSexLove
6.77…
0.925.777.587.627.46
10.007.356.77
![Page 23: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/23.jpg)
Semantic Relatedness | with Wikipedia
Two techniques have been developed already
19% - 48%WikiRelate!75%Explicit Semantic Analysis
Scale and structure GBs of text millions of articles hundreds of thousands of categories
![Page 24: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/24.jpg)
Semantic Relatedness | Wikipedia links
Wikipedia has an extremely rich hyperlink structure that has been ignored so far.
Global WarmingAutomobile
Petrol Engine
Fossil Fuel
20th Century Emission
Standard
Bicycle
Diesel Engine
Carbon Dioxide Air
PollutionGreenhouse
Gas
Alternative Fuel
Transport
Vehicle
Henry Ford
Combustion Engine
Kyoto Protocol
Ozone
Greenhouse Effect
Planet
Audi
Battery(electricity)
Arctic Circle Environmental
Skepticism
GreenpeaceEcology
![Page 25: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/25.jpg)
Semantic Relatedness | evaluation
WLMESAWikiRelateDataset
49%52%45%
75%82%73%
64%Rubenstein & Goodenough
70%Miller & Charles
69%WordSimilarity 353
WikiRelate < WLM < ESA
![Page 26: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/26.jpg)
Wikification
How do we accurately cross-reference documents with Wikipedia?
Wikipedia contains millions of examples of how to do this. Which terms relate to concepts? How do we resolve ambiguous terms? How do we select the concepts that are relevant?
![Page 27: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/27.jpg)
Wikification | identifying concept terms
Wikipedia’s links provide a huge vocabulary of which terms can resolve to which concepts
“Six central banks, including the Bank of England, have cut interest rates by half a percentage point in an effort to steady the faltering global economy.”
Six (number) Article (grammar)
One halfProperty
0.002%
15%
![Page 28: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/28.jpg)
Wikification | resolving ambiguity
For every link in Wikipedia, a human author has manually chosen the correct destination.
“Six central banks, including the Bank of England, have cut interest rates by half a percentage point in an effort to steady the faltering global economy.”
A movement in flightAn underwater hillEdge of river or streamFinancial institution
0.3%0.3%1.8%
97.0%“The story begins on the banks of the Rio Negro in the Central Amazon. A party of scientists is embarking on a voyage which they hope will provide answers to a five hundred year old mystery.”
recall 96% | precision 98%
![Page 29: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/29.jpg)
Wikification | selecting relevant concepts
Wikipedians do not link to every single article only ones that readers would want to investigate
“Six central banks, including the Bank of England, have cut interest rates by half a percentage point in an effort to steady the faltering global economy.”
“The story begins on the banks of the Rio Negro in the Central Amazon. A party of scientists is embarking on a voyage which they hope will provide answers to a five hundred year old mystery.”
recall 74% | precision 74%
![Page 30: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/30.jpg)
What next?
Explore applications for Wikification Topic Indexing Document Clustering Document Summarization
Revisit Koru Apply semantic relatedness and wikification to
knowledge base generation, query expansion, and exploratory search
Write up!
![Page 31: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/31.jpg)
ReferencesMilne, D., Medelyan, O. and Witten, I. H. Mining Domain-Specific
Thesauri from Wikipedia: A case study. In Proceedings of WI 2006, Hong Kong.
Milne, D., Witten, I.H. and Nichols, D.M. A Knowledge-Based Search Engine Powered by Wikipedia. In Proceedings of CIKM 2007, Lisbon, Portugal.
Milne, D. and Witten, I.H. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proceedings of WIKIAI 2008, Chicago, I.L.
Milne, D. and Witten, I.H. Learning to link with Wikipedia. To appear in Proceedings of CIKM 2008, Napa Valley, California.
Websites and Demoswww.cs.waikato.ac.nz/~dnk2www.nzdl.org/koruwikipedia-miner.sourceforge.netwww.nzdl.org/wikification
![Page 32: Knowledge-based - Heidelberg University › colloquium › docs › milne_slides.pdfKnowledge-based information retrieval Consult an external knowledge base to find out what these](https://reader036.fdocuments.in/reader036/viewer/2022062603/5f0b7bad7e708231d430bd9f/html5/thumbnails/32.jpg)