PhD Day: Entity Linking using Ontology Modularization

93
PhD Day – 04/2014 Bianca Pereira

description

Presentation given at 6th NLP PhD Day at National University of Ireland, Galway (Insight) in 29/04/2014.

Transcript of PhD Day: Entity Linking using Ontology Modularization

Page 1: PhD Day: Entity Linking using Ontology Modularization

PhD Day – 04/2014 Bianca Pereira

Page 2: PhD Day: Entity Linking using Ontology Modularization

The PhD Route

Page 3: PhD Day: Entity Linking using Ontology Modularization
Page 4: PhD Day: Entity Linking using Ontology Modularization

Outline

Literature Review Define the PhD topic

Page 5: PhD Day: Entity Linking using Ontology Modularization

DEFINING THE TOPIC

Page 6: PhD Day: Entity Linking using Ontology Modularization
Page 7: PhD Day: Entity Linking using Ontology Modularization
Page 8: PhD Day: Entity Linking using Ontology Modularization

Entity Linking is..

“Grounding entity mentions in documents to

Knowledge Base entries”

- TAC-KBP 2009

Page 9: PhD Day: Entity Linking using Ontology Modularization
Page 10: PhD Day: Entity Linking using Ontology Modularization

Entity Resolution

Page 11: PhD Day: Entity Linking using Ontology Modularization
Page 12: PhD Day: Entity Linking using Ontology Modularization

http://en.wikipedia.org/wiki/The_Guardian http://en.wikipedia.org/wiki/National_Security_Agency

http://en.wikipedia.org/wiki/British_people http://en.wikipedia.org/wiki/Edward_snowden

Page 13: PhD Day: Entity Linking using Ontology Modularization

PROBLEM SEEKING

Page 14: PhD Day: Entity Linking using Ontology Modularization

Types of Entity Domains of Knowledge Methods

Accuracy Time

Page 15: PhD Day: Entity Linking using Ontology Modularization

Types of Entity

Named Entities

Unamed Entities

Topics

Classes Natural Language Processing

Statistics Entity Linking

Page 16: PhD Day: Entity Linking using Ontology Modularization

Domains of Knowledge

Page 17: PhD Day: Entity Linking using Ontology Modularization

Methods

Page 18: PhD Day: Entity Linking using Ontology Modularization
Page 19: PhD Day: Entity Linking using Ontology Modularization

EVERYTHING !

Natural Language Processing

Statistics

Entity Linking

Page 20: PhD Day: Entity Linking using Ontology Modularization

PROBLEM DEFINITION

Page 21: PhD Day: Entity Linking using Ontology Modularization

Types of Entity

Named Entities

Given by Class

Given by Knowledge Base Others

Page 22: PhD Day: Entity Linking using Ontology Modularization

Types of Entity

Named Entities

Given by Class

Given by Knowledge Base Others

Page 23: PhD Day: Entity Linking using Ontology Modularization

Domains of Knowledge

Page 24: PhD Day: Entity Linking using Ontology Modularization

Domains of Knowledge

Cross-domain Knowledge Base

Page 25: PhD Day: Entity Linking using Ontology Modularization

Methods

“(…) Collective Inference over a set of entities can lead

to better performance.”

- Stoyanov et al 2012

Page 26: PhD Day: Entity Linking using Ontology Modularization
Page 27: PhD Day: Entity Linking using Ontology Modularization
Page 28: PhD Day: Entity Linking using Ontology Modularization

Named Entity Recognition Disambiguation

Page 29: PhD Day: Entity Linking using Ontology Modularization
Page 30: PhD Day: Entity Linking using Ontology Modularization

Named Entity Recognition

Page 31: PhD Day: Entity Linking using Ontology Modularization

Disambiguation

http://en.wikipedia.org/wiki/Michael_Jackson

http://en.wikipedia.org/wiki/Popular_music

http://en.wikipedia.org/wiki/Beat_It

http://en.wikipedia.org/wiki/Billie_Jean

http://en.wikipedia.org/wiki/Thriller_(song)

Page 32: PhD Day: Entity Linking using Ontology Modularization

Collective Inference are algorithms for Disambiguation

Co

Page 33: PhD Day: Entity Linking using Ontology Modularization
Page 34: PhD Day: Entity Linking using Ontology Modularization

URI1

URI2

URI3

URI4 URI5

URI6

URI7

URI8

URI9

URI10

Page 35: PhD Day: Entity Linking using Ontology Modularization

A Local Context is used to give the mention-candidate score

Co

URI1

Page 36: PhD Day: Entity Linking using Ontology Modularization

There is coherence between entities in the same document.

Co

Page 37: PhD Day: Entity Linking using Ontology Modularization

URI1

URI2

URI3

URI4 URI5

URI6

URI7

URI8

URI9

URI10

Page 38: PhD Day: Entity Linking using Ontology Modularization

URI1

URI2

URI3

URI4 URI5

URI6

URI7

URI8

URI9

URI10

Page 39: PhD Day: Entity Linking using Ontology Modularization

URI1

URI2

URI3

URI4 URI5

URI6

URI7

URI8

URI9

URI10

Page 40: PhD Day: Entity Linking using Ontology Modularization

Disambiguation using collective inference is a NP problem.

Co

Page 41: PhD Day: Entity Linking using Ontology Modularization

URI1

URI2

URI3

URI4 URI5

URI6

URI7

URI8

URI9

URI10

Page 42: PhD Day: Entity Linking using Ontology Modularization

URI1

URI4 URI5

URI6

URI7

URI8

230 candidates

24 candidates

Page 43: PhD Day: Entity Linking using Ontology Modularization

“The number of contexts [entities] is overwhelming and had to be reduced to

a manageable size.” - Cucerzan 2007

Page 44: PhD Day: Entity Linking using Ontology Modularization

“Much speed is gained by imposing a threshold below which all senses

[candidates] are discarded” - Milne and Witten 2008

Page 45: PhD Day: Entity Linking using Ontology Modularization

“Inference is NP Hard”

- Kulkarni et al 2009

Page 46: PhD Day: Entity Linking using Ontology Modularization

“(…) exact algorithms on large input graphs are infeasible.”

- Hoffart et al 2011

Page 47: PhD Day: Entity Linking using Ontology Modularization

Collective Inference - Accuracy

Page 48: PhD Day: Entity Linking using Ontology Modularization

Collective Inference - Time

Using approximation algorithms the time is suitable for the task

Page 49: PhD Day: Entity Linking using Ontology Modularization

Methods

Page 50: PhD Day: Entity Linking using Ontology Modularization

Recalling

Given by Knowledge Base

Cross-domain Knowledge

Base

Page 51: PhD Day: Entity Linking using Ontology Modularization
Page 52: PhD Day: Entity Linking using Ontology Modularization

~ 5 MILLION entities

Page 53: PhD Day: Entity Linking using Ontology Modularization

~ 10 MILLION entities

Page 54: PhD Day: Entity Linking using Ontology Modularization

~ 43 MILLION entities

Page 55: PhD Day: Entity Linking using Ontology Modularization
Page 56: PhD Day: Entity Linking using Ontology Modularization

Problem Statement

The time spent in disambiguation for Entity Linking increases with the size of the Knowledge Base. It turns the disambiguation with large Knowledge Bases infeasible.

Page 57: PhD Day: Entity Linking using Ontology Modularization

RELATED WORK

Page 58: PhD Day: Entity Linking using Ontology Modularization

Two solutions for the problem..

1.  Approximation Algorithms 2.  Dimensionality Reduction

Page 59: PhD Day: Entity Linking using Ontology Modularization

Approximation Algorithms

Kulkarni et al 2009, Hoffart et al 2011

Page 60: PhD Day: Entity Linking using Ontology Modularization

Dimensionality Reduction

URI1

URI4 URI5

URI6

URI7

URI8

230

24

URI1

URI4 URI5

URI6

URI7

URI8 URI2

URI3

URI9

URI10

Cucerzan 2007, Milne and Witten 2008, Hoffart et al 2011

Page 61: PhD Day: Entity Linking using Ontology Modularization

Dimensionality Reduction (candidate space)

Algorithm

Knowledge Base

Page 62: PhD Day: Entity Linking using Ontology Modularization

Dimensionality Reduction (candidate space)

Algorithm

Knowledge Base

Related Work

Page 63: PhD Day: Entity Linking using Ontology Modularization

Dimensionality Reduction (candidate space)

Algorithm

Knowledge Base

Related Work

Page 64: PhD Day: Entity Linking using Ontology Modularization

RESEARCH QUESTIONS

Page 65: PhD Day: Entity Linking using Ontology Modularization

R1. Is it possible to delimit a feasible maximum amount of time for disambiguation regardless of the size of the Knowledge Base?

R2. Is it possible to reduce the dimensionality directly in the Knowledge Base?

R3. Is it feasible to use exact algorithms for disambiguation using large Knowledge Bases?

Page 66: PhD Day: Entity Linking using Ontology Modularization

R1. Is it possible to delimit a feasible maximum amount of time for disambiguation regardless of the size of the Knowledge Base?

R2. Is it possible to reduce the dimensionality directly in the Knowledge Base?

R3. Is it feasible to use exact algorithms for disambiguation using large Knowledge Bases?

Page 67: PhD Day: Entity Linking using Ontology Modularization

R1. Is it possible to delimit a feasible maximum amount of time for disambiguation regardless of the size of the Knowledge Base?

R2. Is it possible to reduce the dimensionality directly in the Knowledge Base?

R3. Is it feasible to use exact algorithms for disambiguation using large Knowledge Bases?

Page 68: PhD Day: Entity Linking using Ontology Modularization

HYPOTHESES

Page 69: PhD Day: Entity Linking using Ontology Modularization

R1. Is it possible to delimit a feasible maximum amount of time for disambiguation regardless of the size of the Knowledge Base?

H1. There is a maximum size of candidate set that allows disambiguation in a feasible

time.

Page 70: PhD Day: Entity Linking using Ontology Modularization

R1. Is it possible to delimit a feasible maximum amount of time for disambiguation regardless of the size of the Knowledge Base?

H2. If the Knowledge Base can be divided in subsets of constant ambiguity then the

candidate space is constant.

Page 71: PhD Day: Entity Linking using Ontology Modularization

R1. Is it possible to delimit a feasible maximum amount of time for disambiguation regardless of the size of the Knowledge Base?

Subset of constant ambiguity

Candidate space constant

Candidate space = maximum allowed size

Feasible time

Page 72: PhD Day: Entity Linking using Ontology Modularization

R2. Is it possible to reduce the dimensionality directly in the Knowledge Base?

H3. The relatedness between entities is a sufficient condition to reduce the

dimensionality without loss of accuracy.

Page 73: PhD Day: Entity Linking using Ontology Modularization

R3. Is it feasible to use exact algorithms for disambiguation using large Knowledge Bases?

H4. Decreasing the ambiguity in the Knowledge Base is less time consuming

that perform it at disambiguation time.

Page 74: PhD Day: Entity Linking using Ontology Modularization

R3. Is it feasible to use exact algorithms for disambiguation using large Knowledge Bases?

H5. Exact algorithms can be used in a feasible time until a maximum size of

candidate space.

Page 75: PhD Day: Entity Linking using Ontology Modularization

PROPOSED SOLUTION

Page 76: PhD Day: Entity Linking using Ontology Modularization

Ontology Modularization for Disambiguation in Entity Linking

Page 77: PhD Day: Entity Linking using Ontology Modularization

Ontology Modularization

Page 78: PhD Day: Entity Linking using Ontology Modularization

Ontology Modularization

Page 79: PhD Day: Entity Linking using Ontology Modularization

How to Generate the Modules?

Semantic-Driven Strategies Depends on the Application.

Structure-Driven Strategies Graph Decomposition based on inter-relation. Machine Learning Strategies

Data Mining and Clustering.

Page 80: PhD Day: Entity Linking using Ontology Modularization

EVALUATION

Page 81: PhD Day: Entity Linking using Ontology Modularization

H1. There is a maximum size of candidate set that allows disambiguation in a feasible time.

Page 82: PhD Day: Entity Linking using Ontology Modularization

H1. There is a maximum size of candidate set that allows disambiguation in a feasible time.

Perform an experiment using different collective inference approaches to discover how the time increases with the size of the candidate set.

Page 83: PhD Day: Entity Linking using Ontology Modularization

H2. If the Knowledge Base can be divided in subsets of constant ambiguity then the candidate

space is constant.

Page 84: PhD Day: Entity Linking using Ontology Modularization

H2. If the Knowledge Base can be divided in subsets of constant ambiguity then the candidate

space is constant.

Perform Ontology Modularization aiming a maximum ambiguity in each module.

Page 85: PhD Day: Entity Linking using Ontology Modularization

H3. The relatedness between entities is a sufficient condition to reduce the dimensionality without loss

of accuracy.

Page 86: PhD Day: Entity Linking using Ontology Modularization

H3. The relatedness between entities is a sufficient condition to reduce the dimensionality without loss

of accuracy.

Generate the module based on the same relatedness measure used by the original method and verify the accuracy.

Page 87: PhD Day: Entity Linking using Ontology Modularization

H4. Decreasing the ambiguity in the Knowledge Base is less time consuming that perform it at

disambiguation time.

Page 88: PhD Day: Entity Linking using Ontology Modularization

H4. Decreasing the ambiguity in the Knowledge Base is less time consuming that perform it at

disambiguation time.

Measure the time for disambiguation r e d u c i n g t h e d i m e n s i o n a l i t y a t disambiguation time and using the Modularization approach.

Page 89: PhD Day: Entity Linking using Ontology Modularization

H5. Exact algorithms can be used in a feasible time until a maximum size of candidate space.

Page 90: PhD Day: Entity Linking using Ontology Modularization

H5. Exact algorithms can be used in a feasible time until a maximum size of candidate space.

Select a set of exact algorithms and measure the time for different sizes of candidate space.

Page 91: PhD Day: Entity Linking using Ontology Modularization
Page 92: PhD Day: Entity Linking using Ontology Modularization

Next Steps

Doctoral Consortium TAC-KBP First Experiments Use Cases

Page 93: PhD Day: Entity Linking using Ontology Modularization

Thank you!

Bianca Pereira [email protected]