Semantic Web: Extracting and Mining Structured Data … Semantic Web Ontologies Linked Data...

57
Introduction Semantic Web Ontologies Linked Data Information Sources Information Extraction and Text Mining Machine Reading Relation Extraction Named Entity Recognition and Disambiguation Semantic Web Application Use Cases Knowledge Bases Entity Linking Entity Retrieval Linked Data Quality Conclusions Papers for Presentations Resources Semantic Web: Extracting and Mining Structured Data from Unstructured Content Web Science Lecture Besnik Fetahu L3S Research Center, Leibniz Universit¨ at Hannover May 20, 2014

Transcript of Semantic Web: Extracting and Mining Structured Data … Semantic Web Ontologies Linked Data...

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Semantic Web: Extracting and MiningStructured Data from Unstructured Content

Web Science Lecture

Besnik Fetahu

L3S Research Center, Leibniz Universitat Hannover

May 20, 2014

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

1 Introduction

2 Semantic WebOntologiesLinked Data

3 Information Sources

4 Information Extraction and Text MiningMachine ReadingRelation ExtractionNamed Entity Recognition and Disambiguation

5 Semantic Web Application Use CasesKnowledge BasesEntity LinkingEntity RetrievalLinked Data Quality

6 Conclusions

7 Papers for Presentations

8 Resources

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

1 Introduction

2 Semantic WebOntologiesLinked Data

3 Information Sources

4 Information Extraction and Text MiningMachine ReadingRelation ExtractionNamed Entity Recognition and Disambiguation

5 Semantic Web Application Use CasesKnowledge BasesEntity LinkingEntity RetrievalLinked Data Quality

6 Conclusions

7 Papers for Presentations

8 Resources

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Introduction

• Large amounts of data.

• Heterogeneity of information: provenance, quality,content, representation, language etc.

• ‘Unstructured’ vs. Structured.

• Ontologies and Knowledge Bases.

• Entities, topics, relations.

• Use cases: Machine translation, semantic search, etc.

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

1 Introduction

2 Semantic WebOntologiesLinked Data

3 Information Sources

4 Information Extraction and Text MiningMachine ReadingRelation ExtractionNamed Entity Recognition and Disambiguation

5 Semantic Web Application Use CasesKnowledge BasesEntity LinkingEntity RetrievalLinked Data Quality

6 Conclusions

7 Papers for Presentations

8 Resources

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Semantic Web

The Semantic Web vision

The ultimate goal of the Web of data is to enable comput-ers to do more useful work and to develop systems thatcan support trusted interactions over the network. Theterm “Semantic Web” refers to W3C’s vision of the Webof linked data. Semantic Web technologies enable peopleto create data stores on the Web, build vocabularies, andwrite rules for handling data. Linked data are empoweredby technologies such as RDF, SPARQL, OWL, and SKOS.

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Semantic WebMain Components

• Format: turtle, n3, etc.

• Syntax: XML Schema

• Models: RDF

• Taxonomies: RDFS

• Ontologies: OWL

• Query languages: SPARQL

• Interchange formats: RIF

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Semantic WebData Formats and Models

• XML data format

• RDF data representation (〈 subject, predicate, object 〉)

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Semantic WebData Formats and Models

• XML data format

• RDF data representation (〈 subject, predicate, object 〉)

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Ontologies

Ontologies define the following concepts:

• Entities

• Relations

• Domains

• Rules

• Axioms

http://en.wikipedia.org/wiki/Ontology_(information_science)

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Knowledge RepresentationDifferences in OWL ontologies

• OWL-Lite (OWL-Lite ⊂ OWL-DL): supports those usersprimarily needing a classification hierarchy and simpleconstraints. It supports cardinality constraints, and only permitscardinality values of 0 or 1.

• OWL-DL (OWL-DL ⊂ OWL): supports maximum expressivenesswhile retaining computational completeness and decidability.OWL-DL includes all OWL language constructs, but they can beused only under certain restrictions.

• OWL: is meant for users who want maximum expressiveness andthe syntactic freedom of RDF with no computationalguarantees.

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Knowledge RepresentationOntologies and Schemas

• RDF Schema RDFS1

1 classes: rdfs:Class

2 properties: rdf:property, rdfs:subClassOf

3 domains: rdfs:domain

• Web Ontology Language OWL2 (OWL-Lite, OWL-DL)1 classes: owl:Class

2 properties: owl:equivalentClass, owl:sameAs

• Friend of a Friend FOAF ontology3

1 classes: foaf:Agent, foaf:Document,

foaf:Organisation, foaf:Person

• Simple Knowledge Organization System SKOS ontology4

1 classes: skos:Concept, skos:Collection

2 properties: skos:related, skos:broader,

skos:narrower1http://www.w3.org/TR/rdf-schema/2http://www.w3.org/TR/owl-ref/3http://xmlns.com/foaf/spec/4http://www.w3.org/2008/05/skos

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Knowledge Representation

• RDFS example

• Hierarchical class modelling

• OWL ontology example

www.mpi-inf.mpg.de/yago-naga/IJCAI11-tutorial/IJCAI11-tutorial/ijcai11-tutorial.pptx

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Knowledge Representation• RDFS example• Hierarchical class modelling

• OWL ontology example

www.mpi-inf.mpg.de/yago-naga/IJCAI11-tutorial/IJCAI11-tutorial/ijcai11-tutorial.pptx

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Knowledge Representation

• RDFS example

• Hierarchical class modelling

• OWL ontology example

www.mpi-inf.mpg.de/yago-naga/IJCAI11-tutorial/IJCAI11-tutorial/ijcai11-tutorial.pptx

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Knowledge RepresentationOntologies vs. Taxonomies

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Knowledge RepresentationAbox vs. Tbox

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Linked Data• RDF data published as triples 〈 subject, predicate, object 〉• SPARQL standard querying language over RDF data• Linked Data principles:

1 URIs as names for things2 De-referencable URIs

3 Provide information about things using standards: RDF,

SPARQL

4 Interlink with other things

• Billions of triples• Interlink all data into one gigantic graph:lod-cloud,schema.org...

• Microformats: RDFa for annotating web pages

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Everything done?

• Only a small fraction of data is actually structured

• Cumbersome to define manually and explicitly schemas,taxonomies, ontologies

• Large proportion of data is unstructured orsemi-structured

• Can we automatically extract and model such content?

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

1 Introduction

2 Semantic WebOntologiesLinked Data

3 Information Sources

4 Information Extraction and Text MiningMachine ReadingRelation ExtractionNamed Entity Recognition and Disambiguation

5 Semantic Web Application Use CasesKnowledge BasesEntity LinkingEntity RetrievalLinked Data Quality

6 Conclusions

7 Papers for Presentations

8 Resources

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Information Sources

1 Semi-structured data: Wikipedia, WordNet

2 Social Streams: twitter

3 News corpora: NYT Collection, Reuters, Wall

Street Journal (WSJ)

4 Web pages: common-crawl, ClueWeb

5 Linked Data: lod-cloud

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Information Sources

1 Semi-structured data: Wikipedia, WordNet

2 Social Streams: twitter

3 News corpora: NYT Collection, Reuters, Wall

Street Journal (WSJ)

4 Web pages: common-crawl, ClueWeb

5 Linked Data: lod-cloud

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Information Sources

1 Semi-structured data: Wikipedia, WordNet

2 Social Streams: twitter

3 News corpora: NYT Collection, Reuters, Wall

Street Journal (WSJ)

4 Web pages: common-crawl, ClueWeb

5 Linked Data: lod-cloud

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Information Sources

1 Semi-structured data: Wikipedia, WordNet

2 Social Streams: twitter

3 News corpora: NYT Collection, Reuters, Wall

Street Journal (WSJ)

4 Web pages: common-crawl, ClueWeb

5 Linked Data: lod-cloud

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Information Sources

1 Semi-structured data: Wikipedia, WordNet

2 Social Streams: twitter

3 News corpora: NYT Collection, Reuters, Wall

Street Journal (WSJ)

4 Web pages: common-crawl, ClueWeb

5 Linked Data: lod-cloud

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

1 Introduction

2 Semantic WebOntologiesLinked Data

3 Information Sources

4 Information Extraction and Text MiningMachine ReadingRelation ExtractionNamed Entity Recognition and Disambiguation

5 Semantic Web Application Use CasesKnowledge BasesEntity LinkingEntity RetrievalLinked Data Quality

6 Conclusions

7 Papers for Presentations

8 Resources

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Information Extraction and Text Mining• Very large corpora of unstructured text.• Heterogeneity: languages, quality, domains.• Rich underlying structure of unstructured text.• Natural Language Processing (NLP): POS, NER,

Co-Ref, Dependency Parsing (DP) etc.• Utilise NLP output for IE based on syntactic, semantic

and lexical patterns.• Query and Entity based summarisation.

http://en.wikipedia.org/wiki/Wikipedia:Statistics

http://www.worldwidewebsize.com/

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Machine Reading• Autonomous understanding of text by machines• Construct a belief based on the underlying corpus• OpenIE: an IE domain-independent paradigm for relation,

classes, and entities extraction.• TextRunner (Etzioni et al. 2008) self-supervised approach

for OpenIE.• Represent each relation as a triple 〈subject predicate object〉• Understanding and semantics of extracted triples is

primitive still

Machine Reading. Etzioni O., Banko M., J.

Cafarella M. AAAI 2007

Machine Reading. Etzioni O., Banko M., J.

Cafarella M. AAAI 2007

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Machine Reading: TextRunner

1 Self-Supervised Learner

2 Single-pass extractor

3 Redundancy-Based Assessor

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Machine Reading: TextRunner

1 Self-Supervised Learner

2 Single-pass extractor

3 Redundancy-Based Assessor

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Machine Reading: TextRunner

1 Self-Supervised Learner2 Single-pass extractor3 Redundancy-Based Assessor

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Relation Extraction

• DP of chunks of texts for relation extraction

• Syntactic patterns for relation extraction

• Semantic and Lexical patterns for relation extraction

• ReVerb: two step approach “relation first” rather than“arguments first”

1 identify relations2 identify arguments

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Relation Extraction

• DP of chunks of texts for relation extraction

• Syntactic patterns for relation extraction

• Semantic and Lexical patterns for relation extraction• ReVerb: two step approach “relation first” rather than

“arguments first”1 identify relations2 identify arguments

‘‘Michael Webb appeared on Oprah...’’ ⇒〈Michael Webb; appear on; Oprah〉

Schmitz et al. 2007

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Relation Extraction

• DP of chunks of texts for relation extraction

• Syntactic patterns for relation extraction

• Semantic and Lexical patterns for relation extraction

• ReVerb: two step approach “relation first” rather than“arguments first”

1 identify relations2 identify arguments

Schmitz et al. 2007

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Relation Extraction

• DP of chunks of texts for relation extraction

• Syntactic patterns for relation extraction

• Semantic and Lexical patterns for relation extraction

• ReVerb: two step approach “relation first” rather than“arguments first”

1 identify relations2 identify arguments

Fader et al. 2011

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Relation Extraction

• ClausIE (del Corro et al., 2013) a clause based approachfor relation extraction

• Automated approach, less restrictive and with improvedrecall.

del Corro et al. 2013

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Relation Extraction

• ClausIE (del Corro et al., 2013) a clause based approachfor relation extraction

• Automated approach, less restrictive and with improvedrecall.

del Corro et al. 2013

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Named Entity Recognition and Disambiguation

• Textual content has rich underlying syntactical andsemantical structure

• Frequently extracted syntactical and semanticalinformation: POS, Co-Ref and NER.

• Stanford CoreNLP: named entity recognition withspecific entity types Person, Organisation, Place,

Date.

• NED: named entity disambiguation of surface forms withentities from knowledge bases

1 DBpedia Spotlight2 Wikiminer3 AIDA ...

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Named Entity Recognition and Disambiguation• Textual content has rich underlying syntactical and

semantical structure• Frequently extracted syntactical and semantical

information: POS, Co-Ref and NER.

• Stanford CoreNLP: named entity recognition withspecific entity types Person, Organisation, Place,

Date.• NED: named entity disambiguation of surface forms with

entities from knowledge bases1 DBpedia Spotlight2 Wikiminer3 AIDA ...

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Named Entity Recognition and Disambiguation

• Textual content has rich underlying syntactical andsemantical structure

• Frequently extracted syntactical and semanticalinformation: POS, Co-Ref and NER.

• Stanford CoreNLP: named entity recognition withspecific entity types Person, Organisation, Place,

Date.

• NED: named entity disambiguation of surface forms withentities from knowledge bases

1 DBpedia Spotlight2 Wikiminer3 AIDA ...

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Named Entity Recognition and Disambiguation

• Textual content has rich underlying syntactical andsemantical structure

• Frequently extracted syntactical and semanticalinformation: POS, Co-Ref and NER.

• Stanford CoreNLP: named entity recognition withspecific entity types Person, Organisation, Place,

Date.

• NED: named entity disambiguation of surface forms withentities from knowledge bases

1 DBpedia Spotlight2 Wikiminer3 AIDA ...

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

1 Introduction

2 Semantic WebOntologiesLinked Data

3 Information Sources

4 Information Extraction and Text MiningMachine ReadingRelation ExtractionNamed Entity Recognition and Disambiguation

5 Semantic Web Application Use CasesKnowledge BasesEntity LinkingEntity RetrievalLinked Data Quality

6 Conclusions

7 Papers for Presentations

8 Resources

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Knowledge Bases

Prominent knowledge base examples:

1 WordNet knowledge base

2 Wikipedia encyclopaedia

3 DBpedia knowledge base

4 YAGO knowledge base

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Knowledge Bases

Prominent knowledge base examples:

1 WordNet knowledge base

2 Wikipedia encyclopaedia

3 DBpedia knowledge base

4 YAGO knowledge base

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Knowledge Bases

Prominent knowledge base examples:

1 WordNet knowledge base

2 Wikipedia encyclopaedia

3 DBpedia knowledge base

4 YAGO knowledge base

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Knowledge Bases

Prominent knowledge base examples:

1 WordNet knowledge base

2 Wikipedia encyclopaedia

3 DBpedia knowledge base

4 YAGO knowledge base

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Entity Linking and Interlinking• Semantic relatedness of entities• Exploit existing knowledge base structures• Latent relationships via semantic relations

http://www.visualdataweb.org/relfinder/relfinder.php

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Entity Retrieval• Search through structured data in the form of triples• Weigh differently different predicates• Map user keyword queries to matching entities

Blanco et al. 2011

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Linked Data Quality

Zaveri et al. 2012

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

1 Introduction

2 Semantic WebOntologiesLinked Data

3 Information Sources

4 Information Extraction and Text MiningMachine ReadingRelation ExtractionNamed Entity Recognition and Disambiguation

5 Semantic Web Application Use CasesKnowledge BasesEntity LinkingEntity RetrievalLinked Data Quality

6 Conclusions

7 Papers for Presentations

8 Resources

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Conclusions

• Large volumes of unstructured and high quality data

• High applicability of IE techiniques for structuringunstructured data

• Availability of encyclopaedias in the form of knowledgebases

• Wide range of applications in Semantic Web

• Further expansion of knowledge bases with facts about thereal world from unstructured text apart from WikipediaInfoboxes

• Quality aspects of data

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

1 Introduction

2 Semantic WebOntologiesLinked Data

3 Information Sources

4 Information Extraction and Text MiningMachine ReadingRelation ExtractionNamed Entity Recognition and Disambiguation

5 Semantic Web Application Use CasesKnowledge BasesEntity LinkingEntity RetrievalLinked Data Quality

6 Conclusions

7 Papers for Presentations

8 Resources

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Papers for Presentations

1 YAGO: A Core of Semantic Knowledge Unifying WordNet andWikipedia. Suchanek F., Kasneci Gj., Weikum G.,. InProceedings of the 16th WWW, page 697-706, 2007

2 Semantic Stability in Social Tagging Streams. Wagner C.,Singer P., Strohmaier M., Huberman B.,. CoRR, 2013

3 Test-driven Evaluation of Linked Data Quality. Kontokostas D.,Westphal P., Auer S., Hellmann S., Lehmann J., Cornelissen R.,Zaveri A.,. In Proceedings of the 23rd WWW, page 747–758,2014

4 Federated Entity Search Using On-the-Fly Consolidation. HerzigD., Mika P., Blanco R., Tran T.,. In proceedings of the ISWC,page 167-183.

5 Automatic Expansion of DBpedia Exploiting WikipediaCross-Language Information. Palmero Aprosio A., Giuliano C.,Lavelli A.,. In proceedings of the 11th ESWC, page 397-411.

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Resources

• Fabian Suchanek and Gerhard Weikum. 2013. Knowledge harvesting in thebig-data era. In Proceedings of the 2013 ACM SIGMOD InternationalConference on Management of Data (SIGMOD ’13).

• Gerhard Weikum and Martin Theobald. 2010. From information toknowledge: harvesting entities and relationships from web sources. InProceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGARTsymposium on Principles of database systems (PODS ’10).

• Roi Blanco, Peter Mika, and Sebastiano Vigna. 2011. Effective andefficient entity search in RDF data. In Proceedings of the 10th internationalconference on The semantic web (ISWC’11).

• Jeffrey Pound, Peter Mika, and Hugo Zaragoza. 2010. Ad-hoc objectretrieval in the web of data. In Proceedings of the 19th internationalconference on World wide web (WWW ’10).

• Nunes, B. P., Dietze, S., Casanova, M.A., Kawase, R., Fetahu, B. andNejdl, W.. ”Combining a co-occurrence-based and a semantic measure forentity linking.” In Proceedings of the 10th Extended Semantic WebConference, 2013 (ESWC’13).

• Zaveri, Amrapali, Rula, Anisa, Maurino, Andrea, Pietrobon, Ricardo,Lehmann, Jens and Auer, Soren. ”Quality Assessment Methodologies forLinked Open Data.” Semantic Web Journal (2014).

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Resources• Gangemi, Aldo. ”A Comparison of Knowledge Extraction Tools for the

Semantic Web.” In Proceedings of the 10th Extended Semantic WebConference, 2013 (ESWC’13).

• Mendes, Pablo N., Jakob, Max, Garca-Silva, Andres and Bizer, Christian.”DBpedia spotlight: shedding light on the web of documents.” InProceedings of the 7th International Conference on Semantic Systems,2011.

• Yosef, Mohamed Amir, Hoffart, Johannes, Bordino, Ilaria, Spaniol, Marcand Weikum, Gerhard. ”AIDA: An Online Tool for AccurateDisambiguation of Named Entities in Text and Tables.” PVLDB 4 , no. 12(2011): 1450-1453.

• Isabelle Augenstein, Sebastian Pado, and Sebastian Rudolph. 2012.LODifier: generating linked data from unstructured text. In Proceedings ofthe 9th international conference on The Semantic Web: research andapplications (ESWC’12).

• Soren Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, RichardCyganiak, and Zachary Ives. 2007. DBpedia: a nucleus for a web of opendata. In Proceedings of the 6th international The semantic web and 2ndAsian conference on Asian semantic web conference (ISWC’07/ASWC’07).

• Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: acore of semantic knowledge. In Proceedings of the 16th internationalconference on World Wide Web (WWW ’07).

• Oren Etzioni, Michele Banko, Stephen Soderland, and Daniel S. Weld.2008. Open information extraction from the web. Commun. ACM 51, 12(December 2008), 68-74.

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Resources• Mausam, Michael Schmitz, Robert Bart, Stephen Soderland, and Oren

Etzioni. 2012. Open language learning for information extraction. InProceedings of the 2012 Joint Conference on Empirical Methods in NaturalLanguage Processing and Computational Natural Language Learning(EMNLP-CoNLL ’12).

• Raymond J. Mooney and Razvan Bunescu. 2005. Mining knowledge fromtext using information extraction. SIGKDD Explor. Newsl.

• Chang Wang, James Fan, Aditya Kalyanpur, and David Gondek. 2011.Relation extraction with relation topics. In Proceedings of the Conferenceon Empirical Methods in Natural Language Processing (EMNLP ’11).

• Robert Isele, Anja Jentzsch, Christian Bizer: Silk Server - Adding missingLinks while consuming Linked Data. COLD 2010.

• Oren Etzioni. 2008. Machine reading at web scale. In Proceedings of the2008 International Conference on Web Search and Data Mining.

• Luciano Del Corro and Rainer Gemulla. 2013. ClausIE: clause-based openinformation extraction. In Proceedings of the 22nd international conferenceon World Wide Web (WWW ’13).

• Rudi Studer, V.Richard Benjamins, Dieter Fensel, Knowledge engineering:Principles and methods, Data & Knowledge Engineering, Volume 25, Issues1–2, 1998, pages 161-197.

• Christian Bizer, Tom Heath, and Tim Berners-Lee. International Journal onSemantic Web and Information Systems 5(3):1–22 (2009)

Introduction

Semantic Web

Ontologies

Linked Data

InformationSources

InformationExtraction andText Mining

MachineReading

RelationExtraction

Named EntityRecognition andDisambiguation

Semantic WebApplicationUse Cases

KnowledgeBases

Entity Linking

Entity Retrieval

Linked DataQuality

Conclusions

Papers forPresentations

Resources

Thank you!Questions?