Entity Linking - College of Engineering and Physical...

59
Enityt Linking Entity Linking Use cursor keys to ip through slides. Laura Dietz [email protected] University of Massachusetts

Transcript of Entity Linking - College of Engineering and Physical...

Page 1: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Enityt LinkingEntity Linking

Use cursor keys to flip through slides.

Laura Dietz [email protected]

University of Massachusetts

Page 2: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Given query mention in a source document,identify which Wikipedia entity it represents

Query Entity

Problem: Entity Linking

NIL

Page 3: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Problem: Example

Example Query:

Northern Ireland has a population of about one and a half million people. At the time of partition in 1921 Protestants / unionists had a two-thirds majority in the region. The first Prime Minister of Northern Ireland, Sir James Craig, described the state as having ‘a Protestant Parliament for a Protestant people.’ The state effectively discriminated against Catholics in housing, jobs, and political representation.

http://cain.ulst.ac.uk/othelem/incorepaper09.htm

Northern Ireland

Northern Ireland

Search for:

Page 4: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."
Page 5: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Problem: Example

Example Query:

Northern Ireland has a population of about one and a half million people. At the time of partition in 1921 Protestants / unionists had a two-thirds majority in the region. The first Prime Minister of Northern Ireland, Sir James Craig, described the state as having ‘a Protestant Parliament for a Protestant people.’ The state effectively discriminated against Catholics in housing, jobs, and political representation.

http://cain.ulst.ac.uk/othelem/incorepaper09.htm

James Craig

James Craig

Search for:

Page 6: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."
Page 7: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

near miss! :(

Page 8: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Overview

M1: Popularity MethodM2: Machine Learned SimilarityM3: Context with IRM4: Joint Assignment ModelM5: Joint Retrieval Model

Experimental ResultsOnline Demos

Page 9: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Challenges

Page 10: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Problem: Example

Example Query:

Northern Ireland has a population of about one and a half million people. At the time of partition in 1921 Protestants / unionists had a two-thirds majority in the region. The first Prime Minister of Northern Ireland, Sir James Craig, described the state as having ‘a Protestant Parliament for a Protestant people.’ The state effectively discriminated against Catholics in housing, jobs, and political representation.

http://cain.ulst.ac.uk/othelem/incorepaper09.htm

James Craig

Page 11: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Q: Query StringV: Name VariantsM: Neighbor MentionsS: Sentence

James Craig

Document Analysis

Name Variants: Within-doc CoreferenceNeighbor Mentions: NER Tagger (Alternative Mention Detection)Sentence: Term models

Symbol Notation:

Page 12: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Method 1: Popularity of Links

Step 1: Build a dictionary of names for each entity.

Step 2:Inspect all KB entities that have thequery mention as a name variant.

Step 3:Choose the entity with the most inlinks through this name.

Page 13: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Names and Links on Wikipedia

Page 14: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Ulster Unionists

Northern Ireland

Prime Minister ofNorthern Ireland

Sir James Craig

1st Viscount Craigavon

Northern Ireland

James Craig, 1st Viscount Craigavon

Irish Unionist

Unionism in Ireland

Ulster

Mining Name Variants and Neighbors

Page 15: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Pros & Cons: Popularity of Links

Works for very popular entitiessuch as "Northern Ireland"

Fails for entities with confusable names"James Craig", "Springfield", "Jaguar"

Page 16: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Method 1: Popularity of Links

Step 1: Build a dictionary of names for each entity.

Step 2:Inspect all KB entities that have thequery mention as a name variant.

Step 3:Choose the entity with the most inlinks through this name.

Page 17: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Method 2: Machine Learn Similarity

Step 1: Collect different similarity features ofquery mention and entities

Step 2:Machine learn the feature weightson training data (e.g. learning to rank)

Step 3:Apply similarity to query and each entity,select the most similar entity.

Page 18: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Method 2: Similarity Features

James Craig

JC, 1st Viscount Craigavon

title: James Craig, 1st Viscount Craigavonanchor text: Sir James Craig's Craig Administrationdisambiguation: James Craigfreebase name: Lord Craigavon

James Craig

James Craig (actor)

title: James Craig (actor)anchor text: James Craig James Craig indisambiguation: James Craigfreebase name: James Craig (actor)

James Craig

is exact title match?is disambiguation match?inlinks through this nameis approx match?TF-IDF similarity score

Page 19: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Features: Name variants, Document Terms, Links, Popularity ...

Query

Feature vector for supervised Re-rankingand classification

Re-ranking

NIL classification: Is it similar enough to be a match?

NIL?

Learn Similarity and NIL

Candidate Entities

Q: Query StringV: Name VariantsM: Neighbor MentionsS: Sentence

Page 20: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Pros & Cons:Machine Learn Similarity

Pro: Combination of different indicatorsof similarity; option to predict "NILs".

Pro: Can incorporate name variantsfound in the text (coreference tools)

Con: Requires selection of a pool of candidate entities, which can be large ("John Smith").

Will still fail on "James Craig", because thewrong James has more anchor text matches.

Page 21: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Method 3: Context Disambiguation

Step 1:Identify surrounding text, entities, etc.

Step 2:Issue search query containing all of it.

Page 22: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Different Kinds of Context

Example Query:

Northern Ireland has a population of about one and a half million people. At the time of partition in 1921 Protestants / unionists had a two-thirds majority in the region. The first Prime Minister of Northern Ireland, Sir James Craig, described the state as having ‘a Protestant Parliament for a Protestant people.’ The state effectively discriminated against Catholics in housing, jobs, and political representation.

http://cain.ulst.ac.uk/othelem/incorepaper09.htm

James Craig

James Craig +Name Variants + Neighbors + SentenceSearch for:

Page 23: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."
Page 24: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Method 3: Pros and Cons

Works for "James Craig"!

Problematic when neighbors are ambiguous:

"Lisa witnessed a shooting at Springfield high school".

(Unclear which "Lisa" and which "Springfield")

Page 25: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Method 3: Pros and Cons

Also problematic when neighbors don't provide enough disambiguation power

Example, all other James Craigs of Irelandwhich are less popular.

Page 26: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."
Page 27: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Method 4: Joint Assignment Models

Step 1: Identify all entity mentions in textStep 2: For each mention retrieve candidates

Step 3: Select the entity that maximizes:

across all neighbor entities

James Craig

Page 28: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

James Craig

Northern Ireland

Catholics American Catholic Church

Method 4 Example: Candidates

Page 29: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

James Craig

Northern Ireland

Catholics American Catholic Church

Method 4 Example: Correct Selection

Page 30: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

James Craig

Northern Ireland

Catholics American Catholic Church

Method 4 Example: Scoring

Page 31: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

James Craig

Northern Ireland

Catholics American Catholic Church

Method 4 Example: Wrong Selection

notcompatible

Page 32: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Method 4: Learn Similarities

As in Method 2, learn feature-based similarity

entity-entity similarity features:mutual links, same categories, RDF relations

mention-entitysimilarity

entity-entitysimilarity

Page 33: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Method 4: Joint Assignment Models

Step 1: Identify all entity mentions in textStep 2: For each mention retrieve candidates

Step 3: Select the entity that maximizes:

across all neighbor entities

James Craig

Page 34: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Method 4: Pros and Cons

Pro: Can mutually resolve uncertainty

Con: Requires a pool of candidates(trade-off runtime versus recall)

Con: expensive inference problem

May still fail on less popular James Craigsor when context does not resolve ambiguities.

Page 35: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Method 5: Joint Retrieval Model

Step 1: Identify all entity mentions in textStep 2: For each query mention: Issue a search query including query, neighboring mentions, sentenceWeighting each "ingredient" differently

Intuition: structured matching of text to KB

Page 36: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Names and Links on Wikipedia

Page 37: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Ulster Unionists

Northern Ireland

Prime Minister ofNorthern Ireland

Sir James Craig

1st Viscount Craigavon

Northern Ireland

James Craig, 1st Viscount Craigavon

Irish Unionist

Unionism in Ireland

Ulster

Mining Name Variants and Neighbors

Page 38: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

James Craig

Northern Ireland

Catholics

Ulster Unionists

Northern Ireland

Prime Minister ofNorthern Ireland

Nashville, Tennessee

B-Movies

Method 5 Example: Scoring

Page 39: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Method 4

Q: Query StringV: Name VariantsM: Neighbor MentionsS: Sentence

Integrate over Method 5

Connection between 4 and 5

Requires iterative optimization Can be solved inside asearch engine

Page 40: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Identify context of querymention

Need a Search Index for the KB

Preprocessing:build a special KB Index

neighbor-entity similarity features:neighbor occurs in entity's textneighbor is title of inlinks/outlinks

Page 41: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Special Wikipedia Index

Search Indexwith special Fields

Ulster Unionists

Northern Ireland

Prime Minister ofNorthern Ireland

Ulster Unionists

Northern Ireland

Page 42: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Neighbor-Entity Features

Ulster Unionists

Northern Ireland

James Craig

neighbor occurs in text?neighbor in inlink titles?neighbor in outlink titles?is approx match?TF-IDF similarity score

Northern Ireland

Machine learn the feature weightson training data (e.g. learning to rank)

Page 43: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Query mention-Entity Features

Ulster Unionists

Northern Ireland

James Craig

Machine learn the feature weightson training data (e.g. learning to rank)

is exact title match?is disambiguation match?inlinks through this nameis approx match?TF-IDF similarity score

Page 44: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Issue the Entity Linking IR Actually: structured matching

Method 5: Joint Retrieval Model

with special KB Index Select the entity that maximizes:

Page 45: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Method 5: Pros and Cons

Pro: Similar to joint assignment, but cheaper

Pro: Does not require pools (optimize in IR)

Pro: Can be combined with Machine Learning(Method 2) to improve precision.

Con: Fails when context is misleading

Page 46: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Really Difficult Example

Example Query:

ABC shot "Lost" in AustraliaABC

True entity: American Broadcasting Company

Context "Australia" and mention similaritywill point instead to Australian Broadcasting Corporation

Approach: Identify misleading neighbors (variant of M5)

Page 47: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

0 5 10 15 200.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

avera

ge r

eca

ll

2009

Q QV QVM_nrm QVM_nrm LTR

0 5 10 15 20

0.75

0.80

0.85

0.90

0.95

1.002010

0 5 10 15 20

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.002011

0 5 10 15 20

cutoff rank k

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.002012

Q: Query StringV: Name VariantsM: Neighbor MentionsS: Sentence

M1(Popularity)

variant of M5(Joint Retrieval)

M5 + M2(JR + ML)

TAC KBP Entity Linking Task

Page 48: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

References

M1 Popularity / Keyphraseness:Mihalcea et al. In CIKM, 2007. Wikify!: linking documents to encyclopedic knowledge.

M2 Machine Learn Mention-to-Entity SimilarityBunescu et al. In EACL, 2006. "Using Encyclopedic Knowledge for Named entity Disambiguation." M. Dredze, et al. In ACL, 2010 "Entity disambiguation for knowledge base population".

M4 Joint AssignmentSilviu Cucerzan. In EMNLP-CoNLL, 2007. "Large-scale named entity disambiguation based on wikipedia data."Ratinov et al. ACL 2011. "Local and global algorithms for disambiguation to wikipedia." Entity-to-Entity Features: Ceccarelli et al. In CIKM, 2013. "Learning relatedness measures for entity linking."

M5 Joint Retrieval ModelDalton et al. In OAIR, 2013. "A neighborhood relevance model for entity linking."

more: http://nlp.cs.rpi.edu/kbp/2014/elreading.htmlhttp://www.mendeley.com/groups/3339761/entity-linking-and-retrieval/

Page 49: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Toolkits & Online Demos

List of toolkits: http://nlp.cs.rpi.edu/kbp/2014/tools.html

Several Online Demos:UIUC Wikifierhttp://cogcomp.cs.illinois.edu/demo/wikify/

TagMe! http://tagme.di.unipi.it/

AIDA https://gate.d5.mpi-inf.mpg.de/webaida/

Page 50: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

UIUC Wikifier

Page 51: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

TagMe!

Page 52: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

AIDA (prior+sim+coherence)

Page 53: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

AIDA (prior only)

Page 54: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Another Example: Lisa Fletcher

Page 55: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

UIUC Wikifier

Page 56: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

TagMe!

Page 57: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

AIDA

Page 58: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Search Engine (DuckDuckGo)

Page 59: Entity Linking - College of Engineering and Physical Sciencesdietz/entitylinking/entity-linking-tutorial-slides.pdf"Large-scale named entity disambiguation based on wikipedia data."

Participate!

TAC KBP Entity Linking Taskhttp://nlp.cs.rpi.edu/kbp/2014/

SIGIR Entity Recognition and Disambiguation Challengehttp://web-ngram.research.microsoft.com/erd2014/

INEX 2014 Tweet Contextualization Trackhttps://inex.mmci.uni-saarland.de/tracks/qa/

Questions?email: [email protected]: http://ciir.cs.umass.edu/~dietz/