Relational Inference for Wikification

61
Relational Inference for Wikification Xiao Cheng and Dan Roth University of Illinois at Urbana- Champaign 1

description

Relational Inference for Wikification. Xiao Cheng and Dan Roth University of Illinois at Urbana-Champaign. Wikification. - PowerPoint PPT Presentation

Transcript of Relational Inference for Wikification

Page 1: Relational Inference for Wikification

1

Relational Inference for Wikification

Xiao Cheng and Dan RothUniversity of Illinois at Urbana-Champaign

Page 2: Relational Inference for Wikification

2

Wikification

Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

Page 3: Relational Inference for Wikification

3

Applications

Knowledge Acquisition via Grounding Coreference Resolution

Learning-based multi-sieve co-reference resolution with knowledge (Ratinov et al. 2012)

Information Extraction Unsupervised relation discovery with sense disambiguation (Yao et al.

2012) Automatic Event Extraction with Structured Preference Modeling (Lu

and Roth, 2012 ) Text Classification

Gabrilovich and Markovitch, 2007; Chang et al., 2008 Entity Linking

Page 4: Relational Inference for Wikification

4

Ambiguity

Concepts outside of Wikipedia (NIL) Blumenthal ?

Variability

Scale Millions of labels

Challenges

Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

ConnecticutCT

The Nutmeg StateTimesThe New York TimesThe Times

Page 5: Relational Inference for Wikification

5

State-of-the-art systems (Ratinov et al. 2011) can achieve the above with local and global statistical features Reaches bottleneck around 70%~ 85% F1 on non-wiki datasets What is missing?

Challenges

Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

Page 6: Relational Inference for Wikification

6

, the of deposed , …

Motivating Example

Mubarak wife Egyptian President Hosni Mubarak

What are we missing with Bag of Words (BOW) models? Who is Mubarak?

Constraining interaction between concepts (Mubarak, wife, Hosni Mubarak)

Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …

Page 7: Relational Inference for Wikification

7

Relational Inference for Wikification

Our contribution Identify key textual relations for Wikification A global inference framework to incorporate relational knowledge

Significant improvement over state-of-the-art systems

Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …

(Mubarak, wife, Hosni Mubarak)

Page 8: Relational Inference for Wikification

8

Talk Outline

Why Wikification? Introduction

Motivation Approach

Wikification Pipeline Formulation Relational Analysis

Evaluation Result Entity Linking

Page 9: Relational Inference for Wikification

9

Mention Segmentation

Candidate Generation

Candidate Ranking

NIL Linking

Wikification

Mention Segmentatio

n

Candidate Generation

Candidate Ranking

Determine NILs

Page 10: Relational Inference for Wikification

10

Wikification Pipeline 1 - Mention Segmentation

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

sub-NP (Noun Phrase) chunks NER Regular expressions

Page 11: Relational Inference for Wikification

11

Wikification Pipeline 1 - Mention Segmentation

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Page 12: Relational Inference for Wikification

12

Wikification Pipeline 2 - Candidate Generation

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek4

1 Socialist_Party_(France)

2 Socialist_Party_(Portugal)

3 Socialist_Party_of_America

4 Socialist_Party_(Argentina)

k ek3

1 Slobodan_Milošević

2 Milošević_(surname)

3 Boki_Milošević

4 Alexander_Milošević

Page 13: Relational Inference for Wikification

13

Wikification Pipeline 3 - Candidate Ranking

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek4 sk

4

1 Socialist_Party_(France) 0.23

2 Socialist_Party_(Portugal) 0.16

3 Socialist_Party_of_America 0.07

4 Socialist_Party_(Argentina) 0.06

k ek3 sk

3

1 Slobodan_Milošević 0.7

2 Milošević_(surname) 0.1

3 Boki_Milošević 0.1

4 Alexander_Milošević 0.05

Local and global statistical features

Page 14: Relational Inference for Wikification

14

Wikification Pipeline 4 – Determine NILs

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek4 sk

4

1 Socialist_Party_(France) 0.23

k ek3 sk

3

1 Slobodan_Milošević 0.7

Is the top candidate really what the text referred to?

Page 15: Relational Inference for Wikification

15

Talk Outline

Why Wikification? Introduction

Motivation Approach

Wikification Pipeline Formulation Relational Analysis

Evaluation Result Entity Linking

Page 16: Relational Inference for Wikification

16

Formulation (0)

Intuition Promote pairs of concepts coherent with textual relations

Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …

(Mubarak, wife, Hosni Mubarak)

Page 17: Relational Inference for Wikification

17

Formulate as an Integer Linear Program (ILP):

If no relation exists, collapse to the non-structured decision

Formulation (1)

Whether a relation exists between and

weight of a relation

Whether to output th candidate of the th mentionweight to output

Page 18: Relational Inference for Wikification

18

Formulation (2)

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek4 sk

4

1 Socialist_Party_(France) 0.23

2 Socialist_Party_(Portugal) 0.16

3 Socialist_Party_of_America 0.07

4 Socialist_Party_(Argentina) 0.06

k ek3 sk

3

1 Slobodan_Milošević 0.7

2 Milošević_(surname) 0.1

3 Boki_Milošević 0.1

4 Alexander_Milošević 0.05

r(1,2)34

eki: whether a concept is chosen

ski : score of a concept

r(k,l)ij: whether a relation is present

w(k,l)ij : score of a relation

r(4,3)34

Page 19: Relational Inference for Wikification

19

Overall Approach

Wikification

Candidate

Generation

Relation Analysis

Relation Identification

Page 20: Relational Inference for Wikification

20

Relation Identification

ACE style in-document coreference Extract named entity-only coreference relations with high precision

Syntactico-Semantic relations (Chan & Roth ‘10)

Easy to extract with high precision Aim for high recall, as false-positives will be verified and discarded Sparse, but covers ~80% relation instances in ACE2004

Type Example

Premodifier Iranian Ministry of Defense

Possessive NYC’s stock exchange

Formulaic Chicago, Illinois

Preposition President of the US

Page 21: Relational Inference for Wikification

21

Relation Identification

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Argument 1 Relation Type Argument 2

Yugoslav President apposition Slobodan Milošević

Slobodan Milošević coreference Milošević

Milošević possessive Socialist Party

Page 22: Relational Inference for Wikification

22

Overall Approach

Wikification

Candidate

Generation

Relation Analysis

Relation Identification

Page 23: Relational Inference for Wikification

23

Relation Retrieval

Current approach Collect known mappings from Wikipedia page titles, hyperlinks… Limit to top-K candidates based on frequency of links (Ratinov et al.

2011) What concepts can “Socialist Party” refer to?

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Page 24: Relational Inference for Wikification

24

Uninformative Mentions

Page 25: Relational Inference for Wikification

25

Relation Retrieval

What concepts can “Socialist Party” refer to? More robust candidate generation

Identified relations are verified against a knowledge base (DBPedia) Retrieve relation arguments matching “(Milošević ,?,Socialist Party)”

as our new candidates

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Page 26: Relational Inference for Wikification

26

Query Pruning Only 2 queries per pair necessary due to strong baseline.

Relation Retrieval

q1=(Socialist Party of France,?, *Milošević*)q2=(Slobodan Milošević,?,*Socialist Party*)

k ek4 sk

4

1 Socialist_Party_(France) 0.23

k ek3 sk

3

1 Slobodan_Milošević 0.7

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Page 27: Relational Inference for Wikification

27

Relation Retrieval

Argument 1 Relation Type Argument 2

Milošević possessive Socialist Party

Page 28: Relational Inference for Wikification

28

Relation Retrieval

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek4 sk

4

1 Socialist_Party_(France) 0.23

2 Socialist_Party_(Portugal) 0.16

3 Socialist_Party_of_America 0.07

4 Socialist_Party_(Argentina) 0.06

21 Socialist_Party_of_Serbia 0.0

k ek3 sk

3

1 Slobodan_Milošević 0.7

2 Milošević_(surname) 0.1

3 Boki_Milošević 0.1

4 Alexander_Milošević 0.05

𝑟34(1,21)=1

Page 29: Relational Inference for Wikification

29

Overall Approach

Wikification

Candidate

Generation

Relation Analysis

Relation Identification

Page 30: Relational Inference for Wikification

30

Relational Inference

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek4 sk

4

1 Socialist_Party_(France) 0.23

2 Socialist_Party_(Portugal) 0.16

3 Socialist_Party_of_America 0.07

4 Socialist_Party_(Argentina) 0.06

21 Socialist_Party_of_Serbia 0.0

k ek3 sk

3

1 Slobodan_Milošević 0.7

2 Milošević_(surname) 0.1

3 Boki_Milošević 0.1

4 Alexander_Milošević 0.05

𝑟34(1,21)=1

𝑤34(1,21)=?

Page 31: Relational Inference for Wikification

31

Relation scoring

Query scoring as a tie-breaker between multiple relations Explicit relations are stronger than a hyperlink relations Normalize score for each pair of mention to

𝑤=1𝑍 𝑓 (𝑞 ,𝜎 )

Retrieved relation tupleRelation query

Page 32: Relational Inference for Wikification

32

Relational Inference

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek4 sk

4

1 Socialist_Party_(France) 0.23

2 Socialist_Party_(Portugal) 0.16

3 Socialist_Party_of_America 0.07

4 Socialist_Party_(Argentina) 0.06

21 Socialist_Party_of_Serbia 0.0

k ek3 sk

3

1 Slobodan_Milošević 0.7

2 Milošević_(surname) 0.1

3 Boki_Milošević 0.1

4 Alexander_Milošević 0.05

𝑟34(1,21)=1

1

Page 33: Relational Inference for Wikification

33

Relational Inference - coreference

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek2 sk

2

1 Slobodan_Milošević 1.0

k ek3 sk

3

1 Slobodan_Milošević 0.7

2 Milošević_(surname) 0.1

3 Boki_Milošević 0.1

4 Alexander_Milošević 0.05

𝑟23(1,1)=1

Page 34: Relational Inference for Wikification

34

Overall Approach

Wikification

Candidate

Generation

Relation Analysis

Relation Identification

Page 35: Relational Inference for Wikification

35

Determine unknown concepts (NILs)

How to capture the fact: “Dorothy Byrne” does not refer to any concept in Wikipedia

Identify coreferent nominal mention relations Generate better features for NIL classifier

Dorothy Byrne, a state coordinator for the Florida Green Party,…

k ek2 sk

2

1 Green_Party_of_Florida 1.0

k ek1 sk

1

1 Dorothy_Byrne_(British_Journalist)

0.6

2 Dorothy_Byrne_(mezzo-soprano)

0.4

nominal mention

Page 36: Relational Inference for Wikification

36

Determine unknown concepts (NILs)

Create NIL candidate for propagation

Dorothy Byrne, a state coordinator for the Florida Green Party,…

k ek2 sk

2

1 Green_Party_of_Florida 1.0

k ek1 sk

1

0 NIL 1.0

1 Dorothy_Byrne_(British_Journalist)

0.6

2 Dorothy_Byrne_(mezzo-soprano)

0.4

nominal mention

Page 37: Relational Inference for Wikification

37

Talk Outline

Why Wikification? Introduction

Motivation Approach

Wikification Pipeline Formulation Relational Analysis

Evaluation Result Entity Linking

Page 38: Relational Inference for Wikification

38

Wikification Performance Result

ACE MSNBC AQUAINT Wikipedia60

65

70

75

80

85

90

95

F1 Performance on Wikification datasets

Milne&WittenRatinov&RothRelational Inference

Page 39: Relational Inference for Wikification

39

Evaluation – TAC KBP Entity Linking

Wikipedia Articles

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

5000000

TAC KBNon-TAC KB

Task Definition Links a named entity in a document to

either a TAC Knowledge Base (TAC KB) node or NIL

Cluster NIL entities Relevant Tasks

Wikification Cross-document coreference

Page 40: Relational Inference for Wikification

40

Evaluation – TAC KBP Entity Linking

Run Relational Inference (RI) Wikifier “as-is”: No retraining using TAC data

68

72

76

80

84

88

RI

CogComp

LCC

MS_MLI

NUShim

e

*Median

TAC KBP 2011 Entity Linking Performance

Micro AverageB³F1

System Names*Median of top 14 systems

Page 41: Relational Inference for Wikification

41

Conclusion

Importance of linguistic and world knowledge Identification of relational information benefits Wikification Introduced an inference framework to leverage better

language understandings Future work

Accumulate what we know about NIL concepts Joint entity typing, coreference and disambiguation Incorporate more relations

*Demo will be updated in a week at: http://cogcomp.cs.illinois.edu/demo/wikify Download at: http://cogcomp.cs.illinois.edu/page/download_view/Wikifier

Thank you!

Page 42: Relational Inference for Wikification

42

BACK UP SLIDESBack up slides

Page 43: Relational Inference for Wikification

43

Massive Textual Information

How can we know more from large volumes of possibly unfamiliar raw texts?

Page 44: Relational Inference for Wikification

44

Massive Textual Information

We naturally “look up” concepts and accumulate knowledge.

Page 45: Relational Inference for Wikification

45

Challenges

It’s a version of Chicago – the standard classic Macintosh menu font…

Tuesday marks the 142nd anniversary of an event that forever altered the course of Chicago's development as a then-young American city

By the time the New Orleans Saints kicked off at Soldier Field on Sunday afternoon, their woeful history in the Windy City was fully understood.

?

Page 46: Relational Inference for Wikification

Ambiguity

Variety

Concepts outside of Wikipedia (NIL)

Challenges

46

It’s a version of Chicago – the standard classic Macintosh menu font…

Tuesday marks the 142nd anniversary of an event that forever altered the course of Chicago's development as a then-young American city

By the time the New Orleans Saints kicked off at Soldier Field on Sunday afternoon, their woeful history in the Windy City was fully understood.

?Chicago Chicago

Chicago font

ChicagoChicago

The Windy City

Page 47: Relational Inference for Wikification

47

Organizing knowledge

Wikipedia as a source of “common sense” knowledge Naturally bridges rich structured knowledge and text data Comprehensive for most purposes

~4.3 million English articles as of today Cross-lingual

Estimated size of a printed version of Wikipedia as of August 2010. *Picture courtesy of Wikipedia

Page 48: Relational Inference for Wikification

48

Motivating Example

Relatively sparse, but act as hard constraints

Intuitively, we need to “de-coreference” this pair of mention Opens a new dimension in text understanding

helps all stages of Wikification

Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …

Mubarak

Hosni Mubarak

Egyptian President

wife

Page 49: Relational Inference for Wikification

49

Wikification Approach

Mention Segmentation Use Shallow Parsing, NER and regular expression to generate likely

mentions of concepts. Match nested mentions using dictionary. Discard unknown mentions.

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Page 50: Relational Inference for Wikification

50

Ranking

Features Local features from Bag of Words (BOW) representation, such as

various TFIDF windows. Global features from Bag of Concepts (BOC) representation.

What are we losing?

Mubarak

Hosni Mubarak

Egyptian President

wife

Page 51: Relational Inference for Wikification

51

Relation Extraction

In document coreference Extract entity-only coreference relations with high precision

Syntactico-Semantic relations (Chan & Roth ‘10)

Easy to extract with high precision Deep parsing too slow

Premodifier Iranian Ministry of Defense

Possessive NYC’s stock exchange

Formulaic Chicago, Illinois

Preposition President of the US

Page 52: Relational Inference for Wikification

52

Relational Inference

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party

Coreference

Possessive

Page 53: Relational Inference for Wikification

53

Relational Inference

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party

How to look up knowledge and use these relations?

Page 54: Relational Inference for Wikification

54

Wikification Approach

Recall concepts Rank concepts Determine unknown concepts (NILs)

Page 55: Relational Inference for Wikification

55

Determine NILs

Dorothy Bryne, a state coordinator for the Florida Green Party

Leverage nominal mention relations to construct limited context

Page 56: Relational Inference for Wikification

56

Relational Inference Example

...like Chicago O'Hare and [La Guardia] in [New York],...

Wikipedia Page Ski

LaGuardia_Airport 0.36La_Guardia 0.22Fiorello_La_Guardia 0.18La_Guardia,_Spain 0.11

Wikipedia Page Ski

New_York 0.53New_York_City 0.44New_York_(magazine) 0.01New_York_Knicks 0.01

Page 57: Relational Inference for Wikification

57

Relational Inference Scenario 1

...like Chicago O'Hare and [La Guardia] in [New York],...

Wikipedia Page ScoreLaGuardia_Airport 0.36La_Guardia 0.22Fiorello_La_Guardia 0.18La_Guardia,_Spain 0.11

Wikipedia Page ScoreNew_York 0.53New_York_City 0.44New_York_(magazine) 0.01New_York_Knicks 0.01

Γ = 0.18+0.53 = 0.71

Page 58: Relational Inference for Wikification

58

Relational Inference Scenario 2

...like Chicago O'Hare and [La Guardia] in [New York],...

Wikipedia Page ScoreLaGuardia_Airport 0.36La_Guardia 0.22Fiorello_La_Guardia 0.18La_Guardia,_Spain 0.11

Wikipedia Page ScoreNew_York 0.53New_York_City 0.44New_York_(magazine) 0.01New_York_Knicks 0.01

Γ = 0.36+0.44 = 0.8

Page 59: Relational Inference for Wikification

59

Relational Inference Conclusion

...like Chicago O'Hare and [La Guardia] in [New York],...

Pr(Γ= ) > Pr(Γ= )

Page 60: Relational Inference for Wikification

60

Constraint scaling

Lexical similarity as a tie-breaker between multiple relations Explicit relations are stronger than a hyperlink relations Normalize score for each pair of mention to [0,1]

Relation scaling factor

Query lexical similarityNormalization factor

: query: a retrieved relation triple

Page 61: Relational Inference for Wikification

61

Wikification Pipeline 1 - Mention Segmentation

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Sub noun phrase chunks and NER segments (Ratinov et al. 2011)

Regular expression capturing longer composite mentions President of the United States of America