Lecture 24: NER & Entity Linking - University of Virginia...
Transcript of Lecture 24: NER & Entity Linking - University of Virginia...
Lecture 24: NER & Entity Linking
Kai-Wei ChangCS @ University of Virginia
Couse webpage: http://kwchang.net/teaching/NLP16
1CS6501-NLP
Organizing knowledgeIt’saversionofChicago – thestandardclassicMacintoshmenu font,withthatdistinctivethickdiagonalinthe”N”.
Chicago wasusedbydefaultforMacmenusthroughMacOS 7.6,andOS8wasreleasedmid-1997..
Chicago VIIIwasoneoftheearly70s-eraChicagoalbumstocatchmyear,alongwithChicagoII.
2
SlidesareadaptedfromDanRothCS6501-NLP
Cross-document co-reference resolutionIt’saversionofChicago – thestandardclassicMacintoshmenu font,withthatdistinctivethickdiagonalinthe”N”.
Chicago wasusedbydefaultforMacmenusthroughMacOS 7.6,andOS8wasreleasedmid-1997..
Chicago VIIIwasoneoftheearly70s-eraChicagoalbumstocatchmyear,alongwithChicagoII.
3CS6501-NLP
4
Reference resolution: (disambiguation to Wikipedia)
It’saversionofChicago – thestandardclassicMacintoshmenu font,withthatdistinctivethickdiagonalinthe”N”.
Chicago wasusedbydefaultforMacmenusthroughMacOS 7.6,andOS8wasreleasedmid-1997..
Chicago VIIIwasoneoftheearly70s-eraChicagoalbumstocatchmyear,alongwithChicagoII.
CS6501-NLP
5
The “Reference” Collection has Structure
It’saversionofChicago – thestandardclassicMacintoshmenu font,withthatdistinctivethickdiagonalinthe”N”.
Chicago wasusedbydefaultforMacmenusthroughMacOS 7.6,andOS8wasreleasedmid-1997..
Chicago VIIIwasoneoftheearly70s-eraChicagoalbumstocatchmyear,alongwithChicagoII.
Used_In
Is_aIs_a
Succeeded
Released
CS6501-NLP
6
Analysis of Information NetworksIt’saversionofChicago – thestandardclassicMacintoshmenu font,withthatdistinctivethickdiagonalinthe”N”.
Chicago wasusedbydefaultforMacmenusthroughMacOS 7.6,andOS8wasreleasedmid-1997..
Chicago VIIIwasoneoftheearly70s-eraChicagoalbumstocatchmyear,alongwithChicagoII.
CS6501-NLP
7
Wikipedia as a knowledge resource ….
Used_In
Is_aIs_a
Succeeded
Released
CS6501-NLP
Wikification: The Reference Problem
Blumenthal(D)isacandidatefortheU.S.SenateseatnowheldbyChristopherDodd(D),andhehasheldacommanding leadintheracesinceheenteredit.ButtheTimesreporthasthepotentialtofundamentally reshape thecontestintheNutmegState.
Blumenthal (D)isacandidatefortheU.S.Senate seatnowheldbyChristopherDodd (D),andhehasheldacommanding leadintheracesinceheenteredit.ButtheTimes reporthasthepotentialtofundamentally reshape thecontestintheNutmegState.
CyclesofKnowledge:Groundingfor/usingKnowledge
8CS6501-NLP
Challenging
v Dealing with Ambiguity of Natural Languagev Mentions of entities and concepts could have multiple
meaningsv Dealing with Variability of Natural Language
v A given concept could be expressed in many ways
v Wikification addresses these two issues in a specific way:
v The Reference Problemv What is meant by this concept? (WSD + Grounding)v More than just co-reference (within and across documents)
9CS6501-NLP
• Ambiguity
• Concepts outside of Wikipedia (NIL)
• Blumenthal ?
• Variability
• Scale• Millions of labels
General Challenges
Blumenthal (D) isacandidatefortheU.S.Senate seatnowheldbyChristopherDodd (D),andhehasheldacommanding leadintheracesinceheenteredit.ButtheTimes reporthasthepotentialtofundamentally reshapethecontestintheNutmegState.
ConnecticutCT
TheNutmegStateTimesTheNewYorkTimesTheTimes
CS6501-NLP 10
Wikification: Subtasks
v Wikification and Entity Linking requires addressing several sub-tasks:v Identifying Target Mentions
v Mentions in the input text that should be Wikifiedv Identifying Candidate Titles
v Candidate Wikipedia titles that could correspond to each mention
v Candidate Title Ranking v Rank the candidate titles for a given mention
v NIL Detection and Clusteringv Identify mentions that do not correspond to a Wikipedia titlev Entity Linking: cluster NIL mentions that represent the
same entity.
11CS6501-NLP
High-level Algorithmic Approach. v Input: A text document d; Output: a set of pairs (mi ,ti)
v mi are mentions in d; tj(mi) are corresponding Wikipedia titles, or NIL.
v (1) Identify mentions mi in d v (2) Local Inference
v For each mi in d: v Identify a set of relevant titles T(mi) v Rank titles ti∈ T(mi)
[E.g., consider local statistics of edges [(mi ,ti) , (mi ,*), and (*, ti )] occurrences in the Wikipedia graph]
v (3) Global Inferencev For each document d:
v Consider all mi∈ d; and all ti∈ T(mi)v Re-rank titles ti∈ T(mi)
[E.g., if m, m’ are related by virtue of being in d, their corresponding titles t, t’ may also be related]
12CS6501-NLP
Local approach
§ Γ isasolution totheproblem§ Asetofpairs(m,t)
§ m:amention inthedocument§ t:thematchedWikipediaTitle
AtextDocument
WikipediaArticles
Identifiedmentions
Localscoreofmatchingthemention tothetitle(decomposed bymi)
13CS6501-NLP
Global Approach: Using Additional Structure
TextDocument(s)—News,Blogs,…
WikipediaArticles
Adding a“global”termtoevaluatehowgood thestructure of thesolution is.• Usethelocalsolutions Γ’(each
mentionconsidered independently.• Evaluatethestructurebasedonpair-
wisecoherencescoresΨ(ti,tj)• Choosethosethatsatisfy document
coherenceconditions.
14CS6501-NLP