Digging by Debating: linking massive datasets to specific arguments
PhD Day: Entity Linking using Generic Linked Data Datasets
-
Upload
bianca-pereira -
Category
Internet
-
view
108 -
download
0
description
Transcript of PhD Day: Entity Linking using Generic Linked Data Datasets
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Entity Linking Using Generic Linked Data Datasets
PhD Day – April/2013Bianca Pereira
Digital Enterprise Research Institute www.deri.ie
Agenda
Motivation Problem Related Work Research Questions Next Steps Challenges
2 of XYZ
Digital Enterprise Research Institute www.deri.ie
Motivation
Biggest part of the content available on the web is unstructured natural language text.
How to structure natural language texts in order to be easier to process them?
3 of XYZ
Digital Enterprise Research Institute www.deri.ie
Motivation
There are three possible solutions to this problem:
Extract knowledge from text according to a given structure (ontology population from text).
Extract knowledge from text without using a previous structure (ontology learning from text).
Link mention from text with entities from a structured knowledge base (entity linking).
4 of XYZ
Digital Enterprise Research Institute www.deri.ie
Motivation
Entity Linking..
.. enables reusing knowledge already published on the web.
.. can be used as the first step for ontology learning and population algorithms.
5 of XYZ
Digital Enterprise Research Institute www.deri.ie
Motivation
Many datasets have been used for Entity Linking:
Relational datasets Wikipedia DBPedia, YAGO, MusicBrainz, Freebase, …
6 of XYZ
Digital Enterprise Research Institute www.deri.ie
Motivation
Linked Data datasets are promising because..
.. many of them are public. .. they are already structured. .. they are interlinked. .. they are available under diverse ownership. .. they provide knowledge in diverse domains. .. the LOD cloud is growing.
7 of XYZ
Digital Enterprise Research Institute www.deri.ie
Motivation
There are already some Entity Linking solutions using Linked Data datasets.
8 of XYZ
Digital Enterprise Research Institute www.deri.ie
Problem
Current Entity Linking Approaches work only with a small fixed number of Linked Data datasets.
AIDA (YAGO) Alchemy API (CIA Factbook, CrunchBase, Freebase,
GeoNames, MusicBrainz, OpenCyc, UMBEL, US Census, YAGO)
DBPedia Spotlight (DBPedia) Open Calais (Calais)
9 of XYZ
Digital Enterprise Research Institute www.deri.ie
Problem
Current tools work well with generic knowledge and public datasets. But what do we do if we want to..
.. link an enterprise text with a private dataset?
.. identify domain specific entities?
10 of XYZ
Digital Enterprise Research Institute www.deri.ie
Problem
AELA (Adaptive Entity Linking Approach) was developed to solve this problem..
11 of XYZ
Digital Enterprise Research Institute www.deri.ie
Problem
What AELA does not solve is..
.. the recognition of generalized entities/topics (such as genes and diseases).
.. the recognition of individuals with the same name as their classes (such as ambulance, coffee machine and airplane).
12 of XYZ
Digital Enterprise Research Institute www.deri.ie
Related Work
Which topics are related to Entity Linking?
Entity Resolution, coreference resolution, merge-purge, data deduplication, object identification, mention matching, tuple matching, record linkage, entity disambiguation, anaphora resolution, instance identification, database hardening, entity identification, identity resolution, reference reconciliation, record matching, name matching, identity uncertainty, duplicate detection, entity matching, instance matching, entity consolidation, entity reconciliation, object consolidation, topic consolidation, reference disambiguation, instance fusion, data fusion.
13 of XYZ
Digital Enterprise Research Institute www.deri.ie
Related Work
14 of XYZ
Digital Enterprise Research Institute www.deri.ie
Research Questions
Which methods created in last 5 decades can be used to improve AELA results?
How can AELA adapt itself to a given domain?
What are the use cases in which AELA can be applied? Is it better than previous approaches?
May AELA be language independent?
15 of XYZ
Digital Enterprise Research Institute www.deri.ie
Research Questions
The most important question..
What is an entity!?
Object? Concept? Topic?
16 of XYZ
Digital Enterprise Research Institute www.deri.ie
Next Steps
Survey the methods used in related areas.
Evaluation of the methods within AELA architecture.
Develop a method to select a given Linked Data dataset given the domain from text.
Apply AELA to news domain.
Evaluate AELA using datasets in other languages.
17 of XYZ
Digital Enterprise Research Institute www.deri.ie
Next Steps
Define entity.
18 of XYZ
Digital Enterprise Research Institute www.deri.ie
Challenges
Many previous works
Big Data issues
Linked Data issues (standards and data quality)
Evaluation issues
19 of XYZ
Digital Enterprise Research Institute www.deri.ie
QUESTIONS?
20 of XYZ