Linked Data and Semantic Web Application Development by Peter Haase
-
Upload
laboratory-of-information-science-and-semantic-technologies -
Category
Science
-
view
189 -
download
1
Transcript of Linked Data and Semantic Web Application Development by Peter Haase
Who am I and What am I Talking About? A Linked Data Perspective
affilia%on
develops
affilia%on
owl:sameAs
develops founder
www.metaphacts.com
owl:sameAs
project
worksOn
For exercises, quiz and further material visit our website:
EUCLID -‐ Providing Linked Data 3
@euclid_project euclidproject euclidproject
http://www.euclid-‐project.eu
Other channels:
eBook Course
Semantic Technologies enabling Smart Data § Not just data, not just information, but actionable
insights, delivering insight and support better decisions
4
Data Informa%on Knowledge
Raw Data Access
Sense Making
Ac%onable Insights
Decision Support
See http://wiki.dbpedia.org/
Classes and properties for Wikipedia export (infoboxes), regularly updated
DBpedia
Linked (Open) Data
11
• Set of standards, principles for publishing, sharing and interrelating structured knowledge
• Data from different knowledge domains, self-described, linked and accessible
• From data silos to a Web of Data • RDF as data model,
SPARQL for querying • Ontologies to
describe the semantics
Linked Data Principles
1. Use URIs as names for things. 2. Use HTTP URIs so that users can look up those
names. 3. When someone looks up a URI, provide useful
informa7on, using the standards (RDF*, SPARQL).
4. Include links to other URIs, so that users can discover more things.
Semantics on the Web
13
Seman%c Web Stack Berners-‐Lee (2006)
Syntac%c basis
Basic data model
Simple vocabulary (schema) language
Expressive vocabulary (ontology) language
Query language
Applica%on specific declara%ve-‐knowledge
Digital signatures, recommenda%ons
Proof genera%on, exchange, valida%on
Ontologies
§ An ontology defines a domain of interest – … in terms of the things you talk about in the domain, their attributes, as
well as relationships between them § Ontologies are used to
– Share a common understanding about a domain among people and machines
– Enable reuse of domain knowledge
06.12.14
EUCLID – Building Linked Data applica%ons 15
Furthermore, Linked Data applica%ons can be classified according to the following dimensions:
Categories of Linked Data Applications
Source: M. Mar%n and S. Auer. “Categorisa%on of Seman%c Web Applica%ons”
Dimensions Levels Descrip7on
Seman%c technology depth
Extrinsic Use of seman%cs on the surface of the applica%on.
Intrinsic Conven%onal technologies (e.g., RDBMS) are complemented or replaced with SW equivalents.
Informa%on flow direc%on
Consuming LD is retrieved from the source or via a wrapper.
Producing Publishes LD (in RDF-‐based formats).
Seman%c richness Shallow Simple taxonomies, use of RDF or RDFS.
Strong High level representa%on formalisms (OWL variants)
Seman%c integra%on
Isolated Crea%on of own vocabularies
Integrated Reuse of informa%on at schema or instance level
Example: ResearchSpace
EUCLID – Building Linked Data applica%ons 18
• The ResearchSpace environment aims at providing a set of RDF data sets and tools to describe concepts and objects related to cultural historical research.
• The tools are highly interac7ve: allow users to access the data and contribute to the data set by crea%ng RDF annota%ons.
Geo Mapper
Image Annota%on
Source: hcps://sites.google.com/a/researchspace.org/researchspace/
Example: ResearchSpace CRM Search System
EUCLID – Building Linked Data applica%ons 19 Source: Snapshot from hcps://www.youtube.com/watch?v=HCnwgq6ebAs
Search by predicates
Faceted search
Benefits of Linked Data in the Enterprise
§ Enterprise Data Integra7on: Seman%cally integrate data scacered across different informa%on systems, leading to transparent, streamlined informa%on management with less redundancies and inconsistencies
§ Simplified publishing, sharing and reuse of data: increase openness and accessibility of enterprise data through open, standards-‐based APIs
§ Enrichment and contextualiza7on through interlinking: Increase value add by linking to Linked Open Data
§ Improved analy7cs: enable cross-‐organiza7on analysis, interac7ve analy7cs, and repor7ng on top of a collabora7ve plaKorm
Optique Case Study: Statoil Exploration
Experts in geology and geophysics develop stratigraphic models of unexplored areas
– Based on production and exploration data from nearby locations
– Analytics on: • 1,000 TB of relational data • using diverse schemata • spread over 3,000 tables • spread over multiple individual data bases
– 900 experts in Statoil Exploration – Up to 4 days for new data access
queries – Assistance from IT-experts
required
Complex case:
information need specialized queryengineer IT expert
translation
disparate sources
Ontology Based Data Access
Up to 80% of expert‘s %me spent on data access
Example Query
§ Find – fields together with their remaining oil – that are currently operated by Statoil
and – show the types of wellbores located
on this fields
General Architecture of Linked Data Applications
28
SPARQL Endpoints Web Data accessed via APIs
Data Tier
RDF/ XML
Integrated Dataset
(Triple Store)
Interlinking Cleansing Data Access Component
Linked Data EUCLID – Building Linked Data applica%ons
Rela%onal Data
Vocabulary Mapping
Logic Tier
Presenta7on Tier
Data Integra%on Component
Republica%on Republica%on Component
SPARQL Wr. R2R Transf. LD Wrapper Physical Wrapper
Architectural Patterns
EUCLID – Building Linked Data applica%ons 29
1. The Crawling PaPern: Crawls or loads data in advance. Data is managed in one triple store, thus it can be accessed efficiently. The disadvantage of this pacern is that the data might not be up to date.
2. The On-‐The-‐Fly Dereferencing PaPern: URIs are dereferenced at the moment that the app requires the data. This pacern retrieves up to date data. Performance is affected when the app must dereference many URIs.
3. The (Federated) Query PaPern: Submits complex queries to a fixed set of data sources. Enables applica%ons to work with current data directly retrieved from the sources. Finding op%mal query execu%on plans over a large number of sources is a complex problem.
Data Access
Cache
App
App
Data Access
Data Access
App
Source: T. Heath, C. Bizer. Linked Data: Evolving the Web into a Global Data Space
Data Layer
EUCLID – Building Linked Data applica%ons 30
Data Access Component • Linked Data applica%ons may implement a Mediator-‐
Wrapper Architecture to access heterogeneous sources: – Wrappers are built around each data source in order to provide an
unified view of the retrieved data.
• The method to access the data depends on the Linked Data architectural paPern.
• The factors that determine the decision of a paPern are: – Number of data sources to access – Requirement of consuming up-‐to-‐date data – Tolerance to high response %me – Requirement of discovering new data sources
Data Layer (2)
EUCLID – Building Linked Data applica%ons 31
Data Access Component (2) • The data access component may be implemented by using
one or a combina%on of the following tools: Mechanisms Tools (Examples)
Linked Data Crawlers LDspider hcps://code.google.com/p/ldspider/ Slug hcps://code.google.com/p/slug-‐semweb-‐crawler/
Linked Data Client Libraries Seman%c Web Client Library hcp://wifo5-‐03.informa%k.uni-‐mannheim.de/bizer/ng4j/semwebclient/ The Tabulator hcp://www.w3.org/2005/ajar/tab Moriarty hcps://code.google.com/p/moriarty/
SPARQL Client Libraries Jena Seman%c Web Framework hcp://jena.apache.org/
Federated SPARQL Engines ANAPSID hcps://github.com/anapsid/anapsid FedX hcp://www.fluidops.com/fedx/ SPLENDID hcps://code.google.com/p/rdffederator/
Search Engine APIs Sindice hcp://sindice.com/developers/api Uberblic hcp://uberblic.com/
Data Integration Component • Consolidates the data retrieved from heterogeneous sources.
• This component may operate at: – Schema level: Performs vocabulary mappings in order to translate
data into a single unified schema. Links correspond to RDFS proper%es or OWL property and class axioms.
– Instance level: Performs en%ty resolu%on via owl:sameAs links. In case the data sources do not provide the links, further tools like Silk or Open Refine can be used to integrate the data.
Data Layer (3)
EUCLID – Building Linked Data applica%ons 32
Interlinking Cleansing Data Access Component Vocabulary
Mapping
Data Integra%on Component
Data Layer (4)
EUCLID – Building Linked Data applica%ons 33
Integrated Dataset • The dataset resul%ng of integrated and consolidated data can
be cached in a RDF store.
• There are many solu%ons to deploy triple/RDF stores, e.g.: • bigdata (hcp://www.bigdata.com/)
• OWLIM (hcp://www.ontotext.com/owlim)
• Jena TDB (hcp://jena.apache.org/documenta%on/tdb/)
• AllegroGraph (hcp://www.franz.com/agraph/allegrograph/)
• Virtuoso Universal Server (hcp://virtuoso.openlinksw.com/)
• RDF3x (hcps://code.google.com/p/rdf3x/)
Integrated Dataset
Republica%on Republica%on Component
Data Layer (5)
EUCLID – Building Linked Data applica%ons 34
Republication Component • Exposes as Linked Data por%ons
• There are different solu%ons to make the data accessible: • Via SPARQL endpoints (e.g., Sesame OpenRDF SPARQL Endpoint, …) • Via APIs (e.g., Linked Data API) • As RDF dumps • With the built-‐in means of your framework/CMS (e.g., Drupal,
Informa%on Workbench, …)
Data Layer
Integrated Dataset
Republica%on Republica%on Component
• The logic layer implements sophis%cated processing according to the func%onali%es of the applica%on. This layer may include data mining components as well as reasoners that are not integrated in the data layer.
• The presenta7on layer displays the informa%on to the user in various formats, including text, diagrams or other type of visualiza%on techniques.
Application and Presentation Layers
EUCLID – Building Linked Data applica%ons 35
Logic Layer
Presenta%on Layer
LINKED DATA APPLICATION DEVELOPMENT FRAMEWORKS
EUCLID – Building Linked Data applica%ons 36
Informa%on Workbench
Information Workbench
• Platorm for development of linked data applica%ons
EUCLID – Building Linked Data applica%ons 37
Seman%c Web Data
Seman%cs-‐ & Linked Data-‐based Integra%on of Enterprise and Open Data Sources Intelligent Data Access and Analy%cs • Visual explora%on • Seman%c search • Dashboarding and repor%ng Collabora%on and Knowledge Management Platorm • Wiki-‐based cura%on & authoring of
data • Collabora%ve workflows
Source: hcp://www.fluidops.com/informa%on-‐workbench/
EUCLID – Building Linked Data applica%ons 38
Data storage and management platorm
Reusable UI and data integra%on components
Customized applica%on solu%ons
External resources to reuse data and create mashups
Information Workbench (2)
Data Integration: Data Provider Concept
EUCLID – Building Linked Data applica%ons 39
Data providers support the periodic extrac7on & integra7on from external data sources into a central repository
• Living from arbitrary data formats to RDF (e.g., rela%onal, XML, CSV)
• Parametrizable (e.g. connec%on informa%on, refresh interval, ..)
• Built-‐in UI for instan%a%ng providers • Intui%ve interfaces and APIs for
wri%ng own, custom providers
Connect to data source
Convert data into RDF
Extract data from source
RDF R2RML
XML2RDF
SPARQL
Examples:
Store RDF in repository
W3C RDB2RDF
• Task: Integrate data from rela%onal DBMS with Linked Data
• Approach: map from rela%onal schema to seman%c vocabulary with R2RML
• Publishing: two alterna%ves – – Translate SPARQL into
SQL on the fly – Batch transform data into
RDF, index and provide SPARQL access in a triplestore
40
LD Data set
Access
Integrated Data in
Triplestore
Interlinking Cleansing Vocabulary Mapping
SPARQL Endpoint
Publishing
Data acquisi%
on
EUCLID -‐ Providing Linked Data
R2RML Engine
Rela%onal DBMS
W3C RDB2RDF • The W3C made, last year, two recommenda%ons for mapping between rela%onal databases and RDF: – Direct mapping directly exposes data as RDF
• Not allowance for vocabulary mapping • No allowance for interlinking (unless URIs used in rela%onal data)
– R2RML, the RDB to RDF mapping language • Allows vocabulary mapping (subject, predicate and object maps with class op%ons)
• Allows interlinking – URIs can be constructed
EUCLID -‐ Providing Linked Data 41
hcp://www.w3.org/2001/sw/rdb2rdf/
R2RML Class Mapping
• Declera%ve mappings with an RDF-‐based syntax:
lb:Artist a rr:TriplesMap ; rr:logicalTable [rr:tableName "artist"] ; rr:subjectMap [rr:class mo:MusicArtist ; rr:template "http://musicbrainz.org/artist/{gid}#_"] ; rr:predicateObjectMap [rr:predicate mo:musicbrainz_guid ; rr:objectMap [rr:column "gid" ; rr:datatype xsd:string]] .
EUCLID -‐ Providing Linked Data 42
Data Warehousing vs. Federation Warehousing / Crawling • Data is copied from the source
into the warehouse • Query runs in the warehouse • Supported in IWB using data
providers
Federa7on • Data remains in federated DB • Query is pushed down to
federated DB • Supported in IWB using
SPARQL federa3on
DB DB
Warehouse
Query
Load
DB DB
Federa%on
Query
Query
EUCLID – Building Linked Data applica%ons 43
Customizable User Interface
EUCLID – Building Linked Data applica%ons 44
Demo available at hcp://musicbrainz.fluidops.net
Main view area
Wiki page management
View selec%on toolbar
Current resource
Naviga%on shortcuts
User Interface Concept: One Page URI
Resource page
Graph
Resource page
Resource page
Resource page
EUCLID – Building Linked Data applica%ons 45
Template:…
Data Driven UI: Ontology as “Structural Backbone”
EUCLID – Building Linked Data applica%ons 46
Resource page
RDF Data Graph
Ontology (RDFS/OWL)
UI templates
Template:mo:MusicAr7st
Resource page
Different Views on Every Resource
Wiki View
Table View
Graph View
Pivot View
EUCLID – Building Linked Data applica%ons 47
CH 4
Analy7cs and Repor7ng Visualiza7on and Explora7on
Mashups with Social Media Authoring and Content Crea7on
Widgets are not static and can be integrated into the UI using a Wiki-style syntax.
EUCLID – Building Linked Data applica%ons 48
Widget-‐Based User Interface
Example: Add Widgets to Wiki
• {{#widget: BarChart | • query ='SELECT distinct (COUNT(?Release) AS ?COUNT) ?label WHERE {
• ?? foaf:made ?Release .
• ?Release rdf:type mo:Release . • ?Release dc:title ?label .
• } • GROUP BY ?label
• ORDER BY DESC(?COUNT)
• LIMIT 10 • '
• | input = 'label'
• | output = 'COUNT' • }}
Example: Show top 10 released records for an ar=st
EUCLID – Building Linked Data applica%ons 49
Music Example
Page of a class: • Shows an overview of MusicAr%st instances
EUCLID – Building Linked Data applica%ons 50
See hcp://musicbrainz.fluidops.net/resource/mo:MusicAr%st
Music Example (2)
EUCLID – Building Linked Data applica%ons 51
Page of a class template: • Defines a layout for displaying each resource of the class • Uses seman%c wiki syntax
See hcp://musicbrainz.fluidops.net/resource/Template:mo:MusicAr%st
Music Example (3)
EUCLID – Building Linked Data applica%ons 52
Page of a class instance: • Displays the data about the resource according to the class
template
See hcp://musicbrainz.fluidops.net/resource/?uri=hcp%3A%2F%2Fmusicbrainz.org%2Far%st%2Fb10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d%23_
Mashups with external sources
• Relevant informa%on and UI elements from external sources can be incorporated in the wiki view
• IWB contains mul%ple mashup widgets for popular social media sources – Twicer – Youtube – Facebook – New York Times news – LinkedIn – … {{#widget: Youtube | searchString = $SELECT ?x WHERE { ?? foaf:name ?x . }$ | asynch = 'true’ }}
Template instantiation ?? = http://musicbrainz.org/artist/a3cb23fc-‐acd3-‐4ce0-‐8f36-‐1e5aa6a18432%23_ ?x = „U2“
EUCLID – Building Linked Data applica%ons 53
Triple Editor
Table View
• Edit structured data associated with a resource • Make change, add and remove triples
EUCLID – Building Linked Data applica%ons 54
Ontology-‐Based Data Input
Triple Editor takes into account the ontology defini%on: • Autosugges%on tool considers the domains and ranges of the
proper%es
Example: proper%es available for the class mo:MusicGroup are suggested automa%cally
EUCLID – Building Linked Data applica%ons 55
Validation of User Input
Valida%on uses property defini%ons in the ontology:
• The property myIntegerProperty has an associated rdfs:range defini%on.
• This ensures that all objects must be of XML schema type xsd:integer.
EUCLID – Building Linked Data applica%ons 56
Systap Bigdata
Users
Original data sources
IWB Fron
tend
IW
B Ba
cken
d
Use Case 1: Data Provisioning
Museum visitor
Museums and other sources
• Data crawling • Data transforma7on • Data Interlinking • Data enrichment /
Informa7on extrac7on • Data valida7on
Cards
Social networks
Russian Museum Project – Architecture and Use Cases
Russian Museum Data
DBpedia Subset
Bri%sh Museum Data User Data
Use Case 3: Mobile App
• HTML5 Templates + CSS for mobile devices
• Simplified IWB Wiki View • Google Glass App • QR Code recogni7on • PaPern / image recogni7on
Use Case 2: Search and Visualiza7on
• Base Templates for visualiza7on • Templates for external data • PivotViewer • Step-‐by-‐step visualiza7on • Extended Search widgets • SemFacet
Website visitor Data Engineer
Linked Data Applica%on for the Russian Museum
Data
Data Providers
Ontology
Templates
Widgets
Web Crawl, RDF Dump
Summary
§ Linked Data and Semantic Technologies – From data to information to knowledge – Graphs for integration of heterogeneous data in variety of data models – Ontologies for knowledge representation and interpretation of data
§ Linked Data applications – Publishing and consuming Linked Data – Main components and architecture
§ Standards-based, declarative models for all aspects of the application
– RDF: common data model – OWL Ontology: conceptual domain model – R2RML: Integrating data sources – SPARQL queries: expressing informatin needs – Wiki-templates: interfaces for interacting with the data
Contact us!
metaphacts GmbH Kautzelweg 13 69190 Walldorf Germany p +49 6227 8308660 m +49 157 50152441 e [email protected] @metaphacts
62