Lecture 24: NER & Entity Linking - University of Virginia...

14
Lecture 24: NER & Entity Linking Kai-Wei Chang CS @ University of Virginia [email protected] Couse webpage: http://kwchang.net/teaching/NLP16 1 CS6501-NLP

Transcript of Lecture 24: NER & Entity Linking - University of Virginia...

Page 1: Lecture 24: NER & Entity Linking - University of Virginia ...kc2wc/teaching/NLP16/slides/25-NER.pdfCross-document co-reference resolution It’s a version of Chicago –the standard

Lecture 24: NER & Entity Linking

Kai-Wei ChangCS @ University of Virginia

[email protected]

Couse webpage: http://kwchang.net/teaching/NLP16

1CS6501-NLP

Page 2: Lecture 24: NER & Entity Linking - University of Virginia ...kc2wc/teaching/NLP16/slides/25-NER.pdfCross-document co-reference resolution It’s a version of Chicago –the standard

Organizing knowledgeIt’saversionofChicago – thestandardclassicMacintoshmenu font,withthatdistinctivethickdiagonalinthe”N”.

Chicago wasusedbydefaultforMacmenusthroughMacOS 7.6,andOS8wasreleasedmid-1997..

Chicago VIIIwasoneoftheearly70s-eraChicagoalbumstocatchmyear,alongwithChicagoII.

2

SlidesareadaptedfromDanRothCS6501-NLP

Page 3: Lecture 24: NER & Entity Linking - University of Virginia ...kc2wc/teaching/NLP16/slides/25-NER.pdfCross-document co-reference resolution It’s a version of Chicago –the standard

Cross-document co-reference resolutionIt’saversionofChicago – thestandardclassicMacintoshmenu font,withthatdistinctivethickdiagonalinthe”N”.

Chicago wasusedbydefaultforMacmenusthroughMacOS 7.6,andOS8wasreleasedmid-1997..

Chicago VIIIwasoneoftheearly70s-eraChicagoalbumstocatchmyear,alongwithChicagoII.

3CS6501-NLP

Page 4: Lecture 24: NER & Entity Linking - University of Virginia ...kc2wc/teaching/NLP16/slides/25-NER.pdfCross-document co-reference resolution It’s a version of Chicago –the standard

4

Reference resolution: (disambiguation to Wikipedia)

It’saversionofChicago – thestandardclassicMacintoshmenu font,withthatdistinctivethickdiagonalinthe”N”.

Chicago wasusedbydefaultforMacmenusthroughMacOS 7.6,andOS8wasreleasedmid-1997..

Chicago VIIIwasoneoftheearly70s-eraChicagoalbumstocatchmyear,alongwithChicagoII.

CS6501-NLP

Page 5: Lecture 24: NER & Entity Linking - University of Virginia ...kc2wc/teaching/NLP16/slides/25-NER.pdfCross-document co-reference resolution It’s a version of Chicago –the standard

5

The “Reference” Collection has Structure

It’saversionofChicago – thestandardclassicMacintoshmenu font,withthatdistinctivethickdiagonalinthe”N”.

Chicago wasusedbydefaultforMacmenusthroughMacOS 7.6,andOS8wasreleasedmid-1997..

Chicago VIIIwasoneoftheearly70s-eraChicagoalbumstocatchmyear,alongwithChicagoII.

Used_In

Is_aIs_a

Succeeded

Released

CS6501-NLP

Page 6: Lecture 24: NER & Entity Linking - University of Virginia ...kc2wc/teaching/NLP16/slides/25-NER.pdfCross-document co-reference resolution It’s a version of Chicago –the standard

6

Analysis of Information NetworksIt’saversionofChicago – thestandardclassicMacintoshmenu font,withthatdistinctivethickdiagonalinthe”N”.

Chicago wasusedbydefaultforMacmenusthroughMacOS 7.6,andOS8wasreleasedmid-1997..

Chicago VIIIwasoneoftheearly70s-eraChicagoalbumstocatchmyear,alongwithChicagoII.

CS6501-NLP

Page 7: Lecture 24: NER & Entity Linking - University of Virginia ...kc2wc/teaching/NLP16/slides/25-NER.pdfCross-document co-reference resolution It’s a version of Chicago –the standard

7

Wikipedia as a knowledge resource ….

Used_In

Is_aIs_a

Succeeded

Released

CS6501-NLP

Page 8: Lecture 24: NER & Entity Linking - University of Virginia ...kc2wc/teaching/NLP16/slides/25-NER.pdfCross-document co-reference resolution It’s a version of Chicago –the standard

Wikification: The Reference Problem

Blumenthal(D)isacandidatefortheU.S.SenateseatnowheldbyChristopherDodd(D),andhehasheldacommanding leadintheracesinceheenteredit.ButtheTimesreporthasthepotentialtofundamentally reshape thecontestintheNutmegState.

Blumenthal (D)isacandidatefortheU.S.Senate seatnowheldbyChristopherDodd (D),andhehasheldacommanding leadintheracesinceheenteredit.ButtheTimes reporthasthepotentialtofundamentally reshape thecontestintheNutmegState.

CyclesofKnowledge:Groundingfor/usingKnowledge

8CS6501-NLP

Page 9: Lecture 24: NER & Entity Linking - University of Virginia ...kc2wc/teaching/NLP16/slides/25-NER.pdfCross-document co-reference resolution It’s a version of Chicago –the standard

Challenging

v Dealing with Ambiguity of Natural Languagev Mentions of entities and concepts could have multiple

meaningsv Dealing with Variability of Natural Language

v A given concept could be expressed in many ways

v Wikification addresses these two issues in a specific way:

v The Reference Problemv What is meant by this concept? (WSD + Grounding)v More than just co-reference (within and across documents)

9CS6501-NLP

Page 10: Lecture 24: NER & Entity Linking - University of Virginia ...kc2wc/teaching/NLP16/slides/25-NER.pdfCross-document co-reference resolution It’s a version of Chicago –the standard

• Ambiguity

• Concepts outside of Wikipedia (NIL)

• Blumenthal ?

• Variability

• Scale• Millions of labels

General Challenges

Blumenthal (D) isacandidatefortheU.S.Senate seatnowheldbyChristopherDodd (D),andhehasheldacommanding leadintheracesinceheenteredit.ButtheTimes reporthasthepotentialtofundamentally reshapethecontestintheNutmegState.

ConnecticutCT

TheNutmegStateTimesTheNewYorkTimesTheTimes

CS6501-NLP 10

Page 11: Lecture 24: NER & Entity Linking - University of Virginia ...kc2wc/teaching/NLP16/slides/25-NER.pdfCross-document co-reference resolution It’s a version of Chicago –the standard

Wikification: Subtasks

v Wikification and Entity Linking requires addressing several sub-tasks:v Identifying Target Mentions

v Mentions in the input text that should be Wikifiedv Identifying Candidate Titles

v Candidate Wikipedia titles that could correspond to each mention

v Candidate Title Ranking v Rank the candidate titles for a given mention

v NIL Detection and Clusteringv Identify mentions that do not correspond to a Wikipedia titlev Entity Linking: cluster NIL mentions that represent the

same entity.

11CS6501-NLP

Page 12: Lecture 24: NER & Entity Linking - University of Virginia ...kc2wc/teaching/NLP16/slides/25-NER.pdfCross-document co-reference resolution It’s a version of Chicago –the standard

High-level Algorithmic Approach. v Input: A text document d; Output: a set of pairs (mi ,ti)

v mi are mentions in d; tj(mi) are corresponding Wikipedia titles, or NIL.

v (1) Identify mentions mi in d v (2) Local Inference

v For each mi in d: v Identify a set of relevant titles T(mi) v Rank titles ti∈ T(mi)

[E.g., consider local statistics of edges [(mi ,ti) , (mi ,*), and (*, ti )] occurrences in the Wikipedia graph]

v (3) Global Inferencev For each document d:

v Consider all mi∈ d; and all ti∈ T(mi)v Re-rank titles ti∈ T(mi)

[E.g., if m, m’ are related by virtue of being in d, their corresponding titles t, t’ may also be related]

12CS6501-NLP

Page 13: Lecture 24: NER & Entity Linking - University of Virginia ...kc2wc/teaching/NLP16/slides/25-NER.pdfCross-document co-reference resolution It’s a version of Chicago –the standard

Local approach

§ Γ isasolution totheproblem§ Asetofpairs(m,t)

§ m:amention inthedocument§ t:thematchedWikipediaTitle

AtextDocument

WikipediaArticles

Identifiedmentions

Localscoreofmatchingthemention tothetitle(decomposed bymi)

13CS6501-NLP

Page 14: Lecture 24: NER & Entity Linking - University of Virginia ...kc2wc/teaching/NLP16/slides/25-NER.pdfCross-document co-reference resolution It’s a version of Chicago –the standard

Global Approach: Using Additional Structure

TextDocument(s)—News,Blogs,…

WikipediaArticles

Adding a“global”termtoevaluatehowgood thestructure of thesolution is.• Usethelocalsolutions Γ’(each

mentionconsidered independently.• Evaluatethestructurebasedonpair-

wisecoherencescoresΨ(ti,tj)• Choosethosethatsatisfy document

coherenceconditions.

14CS6501-NLP