PhD Day: Entity Linking using Ontology Modularization

PhD Day – 04/2014 Bianca Pereira

The PhD Route

Outline

Literature Review Define the PhD topic

DEFINING THE TOPIC

Entity Linking is..

“Grounding entity mentions in documents to

Knowledge Base entries”

- TAC-KBP 2009

Entity Resolution

http://en.wikipedia.org/wiki/The_Guardian http://en.wikipedia.org/wiki/National_Security_Agency

http://en.wikipedia.org/wiki/British_people http://en.wikipedia.org/wiki/Edward_snowden

PROBLEM SEEKING

Types of Entity Domains of Knowledge Methods

Accuracy Time

Types of Entity

Named Entities

Unamed Entities

Topics

Classes Natural Language Processing

Statistics Entity Linking

Domains of Knowledge

Methods

EVERYTHING !

Natural Language Processing

Statistics

Entity Linking

PROBLEM DEFINITION

Types of Entity

Named Entities

Given by Class

Given by Knowledge Base Others

Types of Entity

Named Entities

Given by Class

Given by Knowledge Base Others

Domains of Knowledge

Cross-domain Knowledge Base

Methods

“(…) Collective Inference over a set of entities can lead

to better performance.”

- Stoyanov et al 2012

Named Entity Recognition Disambiguation

Named Entity Recognition

Disambiguation

http://en.wikipedia.org/wiki/Michael_Jackson

http://en.wikipedia.org/wiki/Popular_music

http://en.wikipedia.org/wiki/Beat_It

http://en.wikipedia.org/wiki/Billie_Jean

http://en.wikipedia.org/wiki/Thriller_(song)

Collective Inference are algorithms for Disambiguation

URI4 URI5

A Local Context is used to give the mention-candidate score

There is coherence between entities in the same document.

URI4 URI5

Disambiguation using collective inference is a NP problem.

URI4 URI5

230 candidates

24 candidates

“The number of contexts [entities] is overwhelming and had to be reduced to

a manageable size.” - Cucerzan 2007

“Much speed is gained by imposing a threshold below which all senses

[candidates] are discarded” - Milne and Witten 2008

“Inference is NP Hard”

- Kulkarni et al 2009

“(…) exact algorithms on large input graphs are infeasible.”

- Hoffart et al 2011

Collective Inference - Accuracy

Collective Inference - Time

Using approximation algorithms the time is suitable for the task

Methods

Recalling

Given by Knowledge Base

Cross-domain Knowledge

~ 5 MILLION entities

Problem Statement

The time spent in disambiguation for Entity Linking increases with the size of the Knowledge Base. It turns the disambiguation with large Knowledge Bases infeasible.

RELATED WORK

Two solutions for the problem..

1.  Approximation Algorithms 2.  Dimensionality Reduction

Approximation Algorithms

Kulkarni et al 2009, Hoffart et al 2011

Dimensionality Reduction

URI4 URI5

URI8 URI2

Cucerzan 2007, Milne and Witten 2008, Hoffart et al 2011

Dimensionality Reduction (candidate space)

Algorithm

Knowledge Base

Algorithm

Knowledge Base

Related Work

Algorithm

Knowledge Base

Related Work

RESEARCH QUESTIONS

R1. Is it possible to delimit a feasible maximum amount of time for disambiguation regardless of the size of the Knowledge Base?

R2. Is it possible to reduce the dimensionality directly in the Knowledge Base?

R3. Is it feasible to use exact algorithms for disambiguation using large Knowledge Bases?

HYPOTHESES

H1. There is a maximum size of candidate set that allows disambiguation in a feasible

H2. If the Knowledge Base can be divided in subsets of constant ambiguity then the

candidate space is constant.

Subset of constant ambiguity

Candidate space constant

Candidate space = maximum allowed size

Feasible time

H3. The relatedness between entities is a sufficient condition to reduce the

dimensionality without loss of accuracy.

H4. Decreasing the ambiguity in the Knowledge Base is less time consuming

that perform it at disambiguation time.

H5. Exact algorithms can be used in a feasible time until a maximum size of

candidate space.

PROPOSED SOLUTION

Ontology Modularization for Disambiguation in Entity Linking

Ontology Modularization

How to Generate the Modules?

Semantic-Driven Strategies Depends on the Application.

Structure-Driven Strategies Graph Decomposition based on inter-relation. Machine Learning Strategies

Data Mining and Clustering.

EVALUATION

H1. There is a maximum size of candidate set that allows disambiguation in a feasible time.

Perform an experiment using different collective inference approaches to discover how the time increases with the size of the candidate set.

H2. If the Knowledge Base can be divided in subsets of constant ambiguity then the candidate

space is constant.

H2. If the Knowledge Base can be divided in subsets of constant ambiguity then the candidate

space is constant.

Perform Ontology Modularization aiming a maximum ambiguity in each module.

H3. The relatedness between entities is a sufficient condition to reduce the dimensionality without loss

of accuracy.

H3. The relatedness between entities is a sufficient condition to reduce the dimensionality without loss

of accuracy.

Generate the module based on the same relatedness measure used by the original method and verify the accuracy.

H4. Decreasing the ambiguity in the Knowledge Base is less time consuming that perform it at

disambiguation time.

H4. Decreasing the ambiguity in the Knowledge Base is less time consuming that perform it at

disambiguation time.

Measure the time for disambiguation r e d u c i n g t h e d i m e n s i o n a l i t y a t disambiguation time and using the Modularization approach.

H5. Exact algorithms can be used in a feasible time until a maximum size of candidate space.

Select a set of exact algorithms and measure the time for different sizes of candidate space.

Next Steps

Doctoral Consortium TAC-KBP First Experiments Use Cases

Thank you!

Bianca Pereira bianca.pereira@insight-centre.org

PhD Day: Entity Linking using Ontology Modularization

Internet

Transcript of PhD Day: Entity Linking using Ontology Modularization

An Ontology for eGovernment: Linking the Scientific Model ... · An Ontology for eGovernment: Linking the Scientific Model with Concrete Projects Gertraud Orthofer*, Maria A. Wimmer°

PS scripting and modularization

Efficient state-space modularization for planning: …papers.nips.cc/paper/6320-efficient-state-space-modularization-for...Efﬁcient state-space modularization for planning: theory,

Modularization in Large-Scale OO Systemse.kaist.ac.kr/.../2013/05/Modularization-in-Large-Scale-OO-systems.… · Quality Assessment of Modularization Related Work : Sarkarn et a

13 Modularization

Ontology Support for Abstraction Layer Modularization Hyun Cho, Jeff Gray Department of Computer Science University of Alabama

SAP Modularization techniques

MECHATRONIC MODULARIZATION

modularization in automotive

Optique: Towards OBDA Systems for Industrydbis.informatik.uni-freiburg.de/content/team/schmidt/...* E.g., widget development, Java, REST Ontology Processing Ontology modularization

8 Modularization Techniques

Chapter 3: Modularization

MASTERY and Modularization

Entity Linking - GATE · What is Entity Linking • Entity linking is the task of identifying all mentions in text of a specific entity from a database or ontology • Also referred

Application of Ontology Modularization for Building …ICAIL/MIRELwsPubs/ElGhosh-etal... · Application of Ontology Modularization for Building a Criminal Domain Ontology Mirna El

INDUSTRIAL FABRICATION & MODULARIZATION

Prefabrications and modularization

Xhtml Modularization

DISPARATE ONTOLOGY UNDERSTANDING, BROKERING, LINKING … · disparate ontology understanding, brokering, linking and elaboration ... and is approved for publication ... disparate

An Introduction to the Use of Ontologies in Linking ... · Ontology Ontology. Zebraﬁsh: Mutagenesis produces phenotypes Mutagenesis Phenotype 1 Phenotype 3 Halpern et al. (1993)