The web of interlinked data and knowledge stripped
-
Upload
soeren-auer -
Category
Technology
-
view
1.140 -
download
0
Transcript of The web of interlinked data and knowledge stripped
Linked Data for Enterprise Information
Integration
Dr. Sören Auer
Creating Knowledge out of Interlinked Data
Web server
Web server
Problem: Try to search for these things on the current Web:
• Apartments near German-English bilingual childcare in Passau
• ERP service providers with offices in Vienna and London
• Researchers working on multimedia topics in Eastern Europe
Information is available on the Web, but opaque to current search.
Why do we need the Data Web?
passau.de Has everything about childcare in Passau.
Immobilienscout.de Knows all about real estate offers in Germany DB
Web server
DB
Web server
Search engine HTML HTML
RDF RDF
Solution: complement text on Web pages with structured linked open data & intelligently combine/integrate/join such structured information from different sources:
Creating Knowledge out of Interlinked Data
1. Uses RDF Data Model
Linked Data in a Nutshell
KESW2012
St. Petersburg
1.10.2012
IFMO organizes
starts
takesPlaceIn
2. Is serialised in triples: IFMO organizes KESW2012 .
KESW2012 starts “20121001”^^xsd:date .
KESW2012 takesPlaceAt St._Petersburg .
3. Uses Content-negotiation
Subject Predicate Object
The emerging Web of Data
2008 2007
2008 2008
2008
2009 2009
2010
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Creating Knowledge out of Interlinked Data
The situation at a world leading car manufacturer (€97.76 billion
revenue, 250.000 employees):
• 3.000 heterogeneous IT systems
• Different units (car, bus, truck etc.) with very different views
• No common language
• Inability to identify crucial entities (parts, locations etc.)
enterprise wide
There is no (can not be a) single Enterprise Information Model
A distributed, iterative, bottom-up integration approach such as
Linked Data might be able to help (pay-as-you-go).
Can Linked Data help to solve the EII problem in a fortune-500 company?
Creating Knowledge out of Interlinked Data
Distributed Social Semantic Networking
Fro
m In
tran
et
to E
nte
rpri
se D
ata
We
b a
rou
nd
a k
no
wle
dge
hu
b
Creating Knowledge out of Interlinked Data
Inter-linking/ Fusing
Classifi-cation/
Enrichment
Quality Analysis
Evolution / Repair
Search/ Browsing/
Exploration
Extraction
Storage/ Querying
Manual revision/ authoring
Linked Data Lifecycle
Creating Knowledge out of Interlinked Data
Extraction
Inter-linking
Enrichment
Quality Analysis
Evolution Repair
Explora-tion
Extrac-tion
Store Query
Authoring
Creating Knowledge out of Interlinked Data
From unstructured sources
• NLP, text mining, annotation
From semi-structured sources
• DBpedia, LinkedGeoData, DataCube
From structured sources
• RDB2RDF
Extraction
Creating Knowledge out of Interlinked Data
extract structured information from Wikipedia
& make this information available on the Web as LOD:
• ask sophisticated queries against Wikipedia (e.g.
universities in brandenburg, mayors of elevated towns, soccer
players),
• link other data sets on the Web to Wikipedia data
• Represents a community consensus
Recently launched DBpedia Live transforms Wikipedia
into a structured knowledge base
Transforming Wikipedia into an Knowledge Base
S. Auer et al.: DBpedia - A Crystallization Point for the Web of Data. Journal of Web Semantics, Elsevier 2009. Most Cited Article 2006-10 Award S. Auer et al.: DBpedia: A Nucleus for a Web of Open Data. 6th International Semantic Web Conference ISWC07. S. Auer et al.: What have Innsbruck and Leipzig in common? Extracting Semantics from Wiki Content. 4th European Semantic Web Conf. ESWC07
Structure in Wikipedia
• Title • Abstract • Infoboxes • Geo-coordinates • Categories • Images • Links
– other language versions – other Wikipedia pages – To the Web – Redirects – Disambiguations
Infobox templates
{{Infobox Korean settlement
| title = Busan Metropolitan City
| img = Busan.jpg
| imgcaption = A view of the [[Geumjeong]] district in Busan
| hangul = 부산 광역시 ...
| area_km2 = 763.46
| pop = 3635389
| popyear = 2006
| mayor = Hur Nam-sik
| divs = 15 wards (Gu), 1 county (Gun)
| region = [[Yeongnam]]
| dialect = [[Gyeongsang]]
}}
http://dbpedia.org/resource/Busan
dbp:Busan dbpp:title ″Busan Metropolitan City″
dbp:Busan dbpp:hangul ″부산 광역시″@Hang dbp:Busan dbpp:area_km2 ″763.46“^xsd:float
dbp:Busan dbpp:pop ″3635389“^xsd:int
dbp:Busan dbpp:region dbp:Yeongnam
dbp:Busan dbpp:dialect dbp:Gyeongsang
...
Wikitext-Syntax
RDF representation
A vast multi-lingual, multi-domain knowledge base
DBpedia extraction results in: • descriptions of ca. 3.4 million things (1.5 million classified in a consistent
ontology, including 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films, 15,000 video games, 140,000 organizations, 146,000 species, 4,600 diseases
• labels and abstracts for these 3.2 million things in up to 92 different languages; 1,460,000 links to images and 5,543,000 links to external web pages; 4,887,000 external links into other RDF datasets, 565,000 Wikipedia categories, and 75,000 YAGO categories
• altogether over 1 billion pieces of information (i.e. RDF triples): 257M from English edition, 766M from other language editions
• DBpedia Live (http://live.dbpedia.org/sparql/) & Mappings Wiki (http://mappings.dbpedia.org) integrate the community into a refinement cycle
• Upcomming DBpedia inline
Creating Knowledge out of Interlinked Data
SELECT ?name ?birth ?description ?person WHERE {
?person dbp:birthPlace dbp:Berlin .
?person skos:subject dbp:Cat:German_musicians .
?person dbp:birth ?birth .
?person foaf:name ?name .
?person rdfs:comment ?description .
FILTER (LANG(?description) = 'en') .
} ORDER BY ?name
DBpedia SPARQL Endpoint
Creating Knowledge out of Interlinked Data
DBpedia Applications: Relfinder
2011/05/12 CONSEGI - Sören Auer: DBpedia 17
Creating Knowledge out of Interlinked Data
Muddy Boots (BBC): Annotate actors in BBC News with DBpedia identifiers
Open Calais (Reuters): named entities connected via owl:sameAs to DBpedia
Faviki (social bookmarking): uses DBpedia to group tags & multi-language support
Topbraid Composer (ontology editor): links entities to DBpedia
DBpedia Applications (3rd party)
Creating Knowledge out of Interlinked Data
Many different approaches: D2R, Virtuoso RDF Views, Triplify,
No agreement on a formal
semantics of RDF2RDF
mapping
• LOD readiness,
SPARQL-SQL translation
W3C RDB2RDF WG
Extraction Relational Data
Tool Triplify Sparqlify D2RQ Virtuoso
RDF Views
Technology Scripting
languages (PHP)
Java Java Whole
middleware solution
SPARQL endpoint
- X X X
Mapping language
SQL SPARQL
CONSTRUCT Views + SQL
RDF based RDF based
Mapping generation
Manual Semi-
automatic Semi-
automatic Manual
Scalability
Medium-high
(but no SPARQL)
Very high Medium High
Malhotra, Auer, Erling, Hausenblas: W3C RDB2RDF Incubator Group Report. W3C RDB2RDF Incubator Group, 2009.
Creating Knowledge out of Interlinked Data
Triplify Light-weight approach for Linked Data publishing from relational databases
Auer, Tramp, Aumüller, Lehmann, Hellmann: Triplify - Light-weight Linked Data Publication from Relational Databases. In 18th International World Wide Web Conference (WWW 2009).
Creating Knowledge out of Interlinked Data
• Rationale: Exploit existing formalisms
(SQL, SPARQL Construct) as much as
possible
• flexible & versatile mapping language
• translating one SPARQL query into
exactly one efficiently executable SQL
query
• Solid theoretical formalization based on
SPARQL-relational algebra
transformations
• Extremely scalable through elaborated
view candidate selection mechanism
• Used to publish 20B triples for
LinkedGeoData
Sparqlify
Stadler, Unbehauen, Auer, Lehmann: Sparqlify – Very Large Scale Linked Data Publication from Relational Databases. Submitted to VLDB-Journal.
SPARQL Construct
SQL View
Bridge
Creating Knowledge out of Interlinked Data
Storage and Querying
Inter-linking
Enrichment
Quality Analysis
Evolution Repair
Explora-tion
Extrac-tion
Store Query
Authoring
Creating Knowledge out of Interlinked Data
Querying still by a factor 3-20 slower than relational data
management (BSBM, DBpedia Benchmark), but more flexibility
Performance increases steadily
Comprehensive, well-supported open-source and commercial
implementations are available:
• OpenLink’s Virtuoso (os+commercial)
• Big OWLIM (commercial), Swift OWLIM (os)
• 4store (os)
• Dydra (hosted)
• Bigdata (distributed)
• Allegrograph (commercial)
• Mulgara (os)
RDF Data Management
Creating Knowledge out of Interlinked Data
• Uses DBpedia as data and a
selection of 25 frequently
executed queries
• Can generate fractions and
multiples of DBpedia‘s size
• Does not resemble relational
data
Performance differences,
observed with other
benchmarks are amplified
DBpedia Benchmark
Geometric Mean
Morsey, Lehmann, Auer, Ngonga: DBpedia SPARQL Benchmark – Performance Assessment with Real Queries on Real Data. Int. Semantic Web Conf. (ISWC2011). Best-paper award.
Creating Knowledge out of Interlinked Data
Authoring Inter-
linking Enrichm
ent
Quality Analysis
Evolution Repair
Explora-tion
Extrac-tion
Store Query
Authoring
Creating Knowledge out of Interlinked Data
1. Semantic (Text) Wikis
• Authoring of semantically
annotated texts
2. Semantic Data Wikis
• Direct authoring of
structured information
(i.e. RDF, RDF-Schema,
OWL)
Two Kinds of Semantic Wikis
Creating Knowledge out of Interlinked Data
• Versatile domain-independent tool
• Serves as Linked Data / SPARQL endpoint on the Data Web
• Open-source project hosted at Google code
• Not just a Wiki UI, but a whole framework for the development of
Semantic Web applications
• Developed in PHP based on the Zend framework
• Very active developer and user community
• More than 500 downloads monthly
• Large number of use cases, including industry:
OntoWiki a semantic data wiki
[1] Auer, Dietzold, Riechert: OntoWiki - A Tool for Social, Semantic Collaboration. 5th International Semantic Web Conference, ISWC 2006. [2] Riechert, Morgenstern, Auer, Tramp, Martin: Knowledge Engineering for Historians on the Example of the Catalogus Professorum
Lipsiensis 9th Int. Semantic Web Conference ISWC2010. Best paper award.
Creating Knowledge out of Interlinked Data
The situation at a world leading car manufacturer (€97.76 billion
revenue, 250.000 employees):
• 3.000 heterogeneous IT systems
• Different units (car, bus, truck etc.) with very different views
• No common language
• Inability to identify crucial entities (parts, locations etc.)
enterprise wide
There is no (can not be a) single Enterprise Information Model
A distributed, iterative, bottom-up integration approach such as
Linked Data might be able to help (pay-as-you-go).
Can Linked Data help to solve the EII problem in a fortune-500 company?
Creating Knowledge out of Interlinked Data
OntoWiki with a car model database loaded
Creating Knowledge out of Interlinked Data
Creating Knowledge out of Interlinked Data
Creating Knowledge out of Interlinked Data
Management of Enterprise Taxonomies with OntoWiki Based on the W3C SKOS standard
Corporate Language Management: 500k concepts in 20 languages
Creating Knowledge out of Interlinked Data
Search for „combi“ also finds T-model
Creating Knowledge out of Interlinked Data
Creating Knowledge out of Interlinked Data
Structured knowledge base allows to search for specific data (i.e. cars with more than 6 seats)
Creating Knowledge out of Interlinked Data
… or less than 5 liter fuel consumption per 100km
Fro
m In
tran
et
to E
nte
rpri
se D
ata
We
b a
rou
nd
a k
no
wle
dge
hu
b
Auer, Frischmuth, Klímek, Unbehauen, Holzweißig, Marquardt: Linked Data in Enterprise Information Integration Submitted to Semantic Web Journal 2012.
Linked Data & Collaboration for the
Digital Humanities
Riechert, Morgenstern, Auer, Tramp, Martin: Knowledge Engineering for Historians on the Example of the Catalogus Professorum Lipsiensis. 9th International Semantic Web Conference (ISWC2010). Best Paper award.
On
toW
iki
Dynamic views on knowledge bases
OntoWiki for the Catalogus Professorum Lipsiensis
RDF triples on resource details page
Dynamische Vorschläge aus dem Daten Web
OntoWiki for the Catalogus Professorum Lipsiensis
CPM Ontologie
Catalogus Professorum Lipsiensis
Creating Knowledge out of Interlinked Data
© CC-BY-NC-ND by ~Dezz~ (residae on flickr)
Linking
Inter-linking
Enrichment
Quality Analysis
Evolution Repair
Explora-tion
Extrac-tion
Store Query
Authoring
Creating Knowledge out of Interlinked Data
In an uncontrolled
environment as the Data
Web, there will be a
proliferation of equivalent
or similar entity identifiers
Manual Link discovery:
• Sindice integration into UIs
• Semantic Pingback
Semi-automatic:
• SILK
• LIMES
Automatic/ Supervised:
• Raven [1]
Linking Entities on the Data Web
[1] Ngonga, Lehmann, Auer, Höffner: RAVEN -- Active Learning of Link Specifications, OM@ISWC, 2011.
Creating Knowledge out of Interlinked Data
Similarity/Equality/relatedness of entities can be
often expressed using a distance metric (e.g.
strings - edit distance, POIs - euclidian distance)
Uses the characteristics of metric spaces
Esp. consequences of triangle inequality
d(x, y) < d(x, z) + d(z, y)
d(x, z) - d(z, y) < d(x, y) < d(x, z) + d(z, y)
Use pessimistic approximations of distances
instead of computing them
Only compute distances when needed
High-performance LIMES framework is available as open-
source and outperformes state-of-the-art by an order of
magnitude
LIMES: Link Discovery in Metric Spaces
Ngonga, Auer: LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data 22nd Int. Joint Conf. on Artificial Intelligence (IJCAI2011).
Creating Knowledge out of Interlinked Data
Active learning of link specifications:
Raven - Towards Zero-Conguration Link Discovery
Ngonga Ngomo, Lehmann, Auer, Höffner: RAVEN: Towards Zero-Configuration Link Discovery. In OM 2012.
Creating Knowledge out of Interlinked Data
• Experiments even
with very large KBs
(Diseasome &
DBpedia) show that
with 10-20
examples a f-score
of >95% can be
achieved
• Learning iteration
takes <1s
Active learning of link specifications
Creating Knowledge out of Interlinked Data
Enrichment Inter-
linking Enrichm
ent
Quality Analysis
Evolution Repair
Explora-tion
Extrac-tion
Store Query
Authoring
Creating Knowledge out of Interlinked Data
Linked Data is mainly instance data!!!
ORE (Ontology Repair and Enrichment) tool allows to improve an
OWL ontology by fixing inconsistencies & making suggestions for
adding further axioms.
• Ontology Debugging: OWL reasoning to detect inconsistencies and
satisfiable classes + detect the most likely sources for the problems.
user can create a repair plan, while maintaining full control.
• Ontology Enrichment: uses the DL-Learner framework to suggest
definitions & super classes for existing classes in the KB. works if
instance data is available for harmonising schema and data.
http://aksw.org/Projects/ORE
Enrichment & Repair
Lehmann, Auer, Tramp: Class Expression Learning for Ontology Engineering. Journal of Web Semantics (JWS), 2011.
Creating Knowledge out of Interlinked Data
Given:
• Background knowledge base
• Positive and negative examples
(example = individual in ontology)
Goal:
• Find an OWL Class Expression / DL
concept which
• covers as many positive examples as
possible
• covers as few negative examples as
possible
Concept C covers example a <=>
a is instance of C
Analogous problem can be defined for logic
programs => Inductive Logic Programming
Supervised Machine Learning Task
Improving Linked Data Quality by Ontology Learning
Hellmann, Lehmann, Auer: Learning of OWL Class Descriptions on Very Large Knowledge Bases. Int. Journal on Semantic Web & Information Systems (IJSWIS), Vol. 5, Issue 2, April-July 2009, ISSN: 1552-6283.
Creating Knowledge out of Interlinked Data
Analysis Quality
Inter-linking
Enrichment
Quality Analysis
Evolution Repair
Explora-tion
Extrac-tion
Store Query
Authoring
Creating Knowledge out of Interlinked Data
Quality on the Data Web is varying a lot
• Hand crafted or expensively curated knowledge base
(e.g. DBLP, UMLS) vs. extracted from text or Web
2.0 sources (DBpedia)
Research Challenge
• Establish measures for assessing the authority,
provenance, reliability of Data Web resources
Opportunity for EII: Employ crowd-sourced
knowledge from the Data Web in the Enterprise
Linked Data Quality Analysis
FP7-IP DIACHRON Managing the Evolution and Preservation of the Data Web Started April 2013
Creating Knowledge out of Interlinked Data
Evolution © CC-BY-SA by alasis on flickr)
Inter-linking
Enrichment
Quality Analysis
Evolution Repair
Explora-tion
Extrac-tion
Store Query
Authoring
Creating Knowledge out of Interlinked Data
• unified method, for data evolution &
ontology refactoring.
• modularized, declarative definition
of evolution patterns => simple
compared to imperative description
• RDF representation of evolution
patterns => patterns can be shared
and reused on the Data Web.
• declarative definition of bad smells
and corresponding evolution
patterns promotes the (semi-
)automatic improvement of
information quality.
EvoPat Pattern based KB Evolution
Rieß, Heino, Dietzold, Auer: EvoPat - Pattern-Based Evolution and Refactoring of RDF Knowledge Bases. In: 9th International Semantic Web Conference ISWC2010.
Creating Knowledge out of Interlinked Data
Exploration
Inter-linking
Enrichment
Quality Analysis
Evolution Repair
Explora-tion
Extrac-tion
Store Query
Authoring
Creating Knowledge out of Interlinked Data
An ecosystem of LOD visualizations
LOD
Exp
lora
tio
n
Wid
gets
Spatial faceted- browsing
Faceted- browsing
Statistical visualization
Entity-/faceted- Based browsing
Domain specific visualizations … …
LOD
Dat
aset
s C
ho
reo
grap
hy
laye
r
• Dataset analysis (size, vocabularies, property histograms etc.) • Selection of suitable visualization widgets
Brunetti, Auer, García: The Linked Data Visualization Model. To appear in IJSWIS, 2012.
Creating Knowledge out of Interlinked Data
Creating Knowledge out of Interlinked Data
Creating Knowledge out of Interlinked Data
Creating Knowledge out of Interlinked Data
Creating Knowledge out of Interlinked Data
Creating Knowledge out of Interlinked Data
Creating Knowledge out of Interlinked Data
Creating Knowledge out of Interlinked Data
LOD Life-(Washing-)cycle supported by Debian
based LOD2 Stack
http://stack.lod2.eu
Creating Knowledge out of Interlinked Data
Linked Enterprise Intra Data Webs fill the gap between Intra-/Extranets and EIS/ERP
Unstructured Information Management
Structured Information Management
Support the long tail of enterprise information domains
• Human-resources • Requirements engineering • Supply-chains
Creating Knowledge out of Interlinked Data
When just data shall be exchanged and
integrated SOA is quite expensive
Facilitates data integration along value-chains within and across enterprises
PricewaterhouseCoopers, Technology Forecast, 2009
Creating Knowledge out of Interlinked Data
• Linked Data is a promising technology for closing the
gap between SOA and unstructured information
management
• wealth of knowledge available as LOD can be
leveraged as background knowledge for Enterprise
applications
• The application of Linked Data in the enterprise is still
largely unexplored (opportunity)
• Linked Data will make Enterprise Information Integration
more flexible, iterative, cost effective
Take home messages
Auer, Frischmuth, Klímek, Tramp, Unbehauen, Holzweißig, Marquardt: Linked Data in Enterprise Information Integration Submitted to Semantic Web Journal.
Creating Knowledge out of Interlinked Data
DBpedia “Semantification” of Wikipedia
AKSW: Bridging Theory with Applications
Triplify “Semantification” of (small) Web Applications
OntoWiki Collaborative creation of explicit knowledge via Semantic Wikis
LIMES Link Discovery Framework for metric spaces
Vakantieland Building Data Web applications
SoftWiki Distributed, stakeholder driven Requirements Engineering
Foundations Marrying databases with RDF and ontologies Tools & Datasets
Applications Bringing the Data Web to end users
NLP2RDF Integrating Natural Language processing tool chains with LOD
Enterprise Knowledge Bases Realizing knowledge hubs within an Enterpise’s Data Intranet
Thesaurus Management Defining corp. language & data
…
DL-Learner Machine Learning for Ontologies
Catalogus Professorum Prosopographical knowledge base
LinkedGeoData “Semantification” of OpenStreetMaps
LESS Semantification Syndication
RDB2RDF Mapping relational data to RDF
ORE Ontology Enrichment & Repair
EU-FP7 LOD2 Project Overview . Page 71 http://lod2.eu
Creating Knowledge out of Interlinked Data
AKSW Team
EU-FP7 LOD2 Project Overview . Page 72 http://lod2.eu
Creating Knowledge out of Interlinked Data
The LOD2 Gang
Creating Knowledge out of Interlinked Data
Thanks for your attention!
Sören Auer
http://www.informatik.uni-leipzig.de/~auer | http://aksw.org | http://lod2.org
Soon at: