Linked Data and Semantic Web Application Development by Peter Haase

62
Linked Data and Semantic Application Development Peter Haase Санкт-Петербург 4. December 2014

Transcript of Linked Data and Semantic Web Application Development by Peter Haase

Linked Data and Semantic Application Development Peter Haase Санкт-Петербург 4. December 2014

Who am I and What am I Talking About? A Linked Data Perspective

affilia%on  

develops  

affilia%on  

owl:sameAs  

develops  founder  

www.metaphacts.com

owl:sameAs  

project  

worksOn  

For  exercises,  quiz  and  further  material  visit  our  website:    

EUCLID  -­‐  Providing  Linked  Data   3  

@euclid_project   euclidproject   euclidproject  

http://www.euclid-­‐project.eu  

Other  channels:  

eBook   Course  

Semantic Technologies enabling Smart Data §  Not just data, not just information, but actionable

insights, delivering insight and support better decisions

4  

Data   Informa%on   Knowledge  

Raw  Data  Access  

Sense  Making  

Ac%onable  Insights  

 

Decision  Support  

 

Google Knowledge Graph

5  

Google Knowledge Graph

6  

Google Knowledge Graph

7  

LinkedIn Economic Graph

8  

Freebase

§  http://www.freebase.com

See http://wiki.dbpedia.org/

  Classes and properties for Wikipedia export (infoboxes), regularly updated

DBpedia

Linked (Open) Data

11  

•  Set of standards, principles for publishing, sharing and interrelating structured knowledge

•  Data from different knowledge domains, self-described, linked and accessible

•  From data silos to a Web of Data •  RDF as data model,

SPARQL for querying •  Ontologies to

describe the semantics

Linked Data Principles

1.  Use  URIs  as  names  for  things.  2.  Use  HTTP  URIs  so  that  users  can  look  up  those  

names.  3.  When  someone  looks  up  a  URI,  provide  useful  

informa7on,  using  the  standards  (RDF*,  SPARQL).  

4.  Include  links  to  other  URIs,  so  that  users  can  discover  more  things.  

Semantics  on  the  Web  

13  

Seman%c  Web  Stack  Berners-­‐Lee  (2006)  

Syntac%c  basis  

Basic  data  model  

Simple  vocabulary  (schema)  language  

Expressive  vocabulary  (ontology)  language  

Query  language  

Applica%on  specific    declara%ve-­‐knowledge  

Digital  signatures,  recommenda%ons  

Proof  genera%on,  exchange,  valida%on  

Ontologies

§  An ontology defines a domain of interest –  … in terms of the things you talk about in the domain, their attributes, as

well as relationships between them §  Ontologies are used to

–  Share a common understanding about a domain among people and machines

–  Enable reuse of domain knowledge

06.12.14  

EUCLID  –  Building  Linked  Data  applica%ons     15  

Furthermore,  Linked  Data  applica%ons  can  be  classified  according  to  the  following  dimensions:  

Categories  of    Linked  Data  Applications  

Source:  M.  Mar%n  and  S.  Auer.  “Categorisa%on  of  Seman%c  Web  Applica%ons”  

Dimensions   Levels   Descrip7on  

Seman%c  technology  depth  

Extrinsic   Use  of  seman%cs  on  the  surface  of  the  applica%on.  

Intrinsic   Conven%onal  technologies  (e.g.,  RDBMS)  are  complemented  or  replaced  with  SW  equivalents.  

Informa%on  flow  direc%on  

Consuming   LD  is  retrieved  from  the  source  or  via  a  wrapper.  

Producing   Publishes  LD  (in  RDF-­‐based  formats).  

Seman%c  richness   Shallow   Simple  taxonomies,  use  of  RDF  or  RDFS.  

Strong   High  level  representa%on  formalisms  (OWL  variants)  

Seman%c  integra%on  

Isolated   Crea%on  of  own  vocabularies    

Integrated   Reuse  of  informa%on  at  schema  or  instance  level    

Linked  Data  Examples  

16  

hcp://data.ny%mes.com/schools/schools.html    

NYTimes  

Some  Application  Scenarios  

17  

BBC  

Example:    ResearchSpace  

EUCLID  –  Building  Linked  Data  applica%ons     18  

•  The  ResearchSpace  environment  aims  at  providing  a  set  of  RDF  data  sets  and  tools  to  describe  concepts  and  objects  related  to  cultural  historical  research.    

•  The  tools  are  highly  interac7ve:    allow  users  to  access  the  data  and  contribute  to  the  data  set  by  crea%ng  RDF  annota%ons.  

Geo  Mapper  

Image  Annota%on  

Source:  hcps://sites.google.com/a/researchspace.org/researchspace/  

Example:    ResearchSpace  CRM  Search  System  

EUCLID  –  Building  Linked  Data  applica%ons     19  Source:  Snapshot  from  hcps://www.youtube.com/watch?v=HCnwgq6ebAs    

Search  by  predicates    

Faceted  search  

Some  Application  Scenarios  

20  

Linked  Government  Data:  USA  

Some  Application  Scenarios  

21  

Linked  Government  Data:  UK  

Benefits of Linked Data in the Enterprise

§  Enterprise  Data  Integra7on:  Seman%cally  integrate  data  scacered  across  different  informa%on  systems,  leading  to  transparent,  streamlined  informa%on  management  with  less  redundancies  and  inconsistencies  

§  Simplified  publishing,  sharing  and  reuse  of  data:  increase  openness  and  accessibility  of  enterprise  data  through  open,  standards-­‐based  APIs  

§  Enrichment  and  contextualiza7on  through  interlinking:  Increase  value  add  by  linking  to  Linked  Open  Data  

§  Improved  analy7cs:  enable  cross-­‐organiza7on  analysis,  interac7ve  analy7cs,  and  repor7ng  on  top  of  a  collabora7ve  plaKorm  

Optique Case Study: Statoil Exploration

Experts in geology and geophysics develop stratigraphic models of unexplored areas

–  Based on production and exploration data from nearby locations

–  Analytics on: •  1,000 TB of relational data •  using diverse schemata •  spread over 3,000 tables •  spread over multiple individual data bases

–  900 experts in Statoil Exploration –  Up to 4 days for new data access

queries –  Assistance from IT-experts

required

Complex case:

information need specialized queryengineer IT expert

translation

disparate sources

Ontology Based Data Access

Up  to  80%  of  expert‘s  %me  spent  on  data  access    

Example Query

§  Find –  fields together with their remaining oil –  that are currently operated by Statoil

and –  show the types of wellbores located

on this fields

Visual Query Formulation

Optique Demo Videos

hcp://www.youtube.com/user/op%queproject  hcp://www.op%que-­‐project.eu  

General  Architecture  of    Linked  Data  Applications  

28  

SPARQL  Endpoints  Web  Data  accessed  via  APIs  

Data  Tier  

RDF/  XML  

Integrated  Dataset  

(Triple  Store)  

Interlinking   Cleansing  Data  Access  Component  

Linked  Data  EUCLID  –  Building  Linked  Data  applica%ons    

Rela%onal  Data  

Vocabulary  Mapping  

Logic  Tier  

Presenta7on  Tier  

Data  Integra%on  Component  

Republica%on     Republica%on  Component  

SPARQL  Wr.   R2R  Transf.   LD  Wrapper  Physical  Wrapper  

Architectural  Patterns  

EUCLID  –  Building  Linked  Data  applica%ons     29  

1.   The  Crawling  PaPern:  Crawls  or  loads  data  in  advance.  Data  is  managed  in  one  triple  store,  thus  it  can  be  accessed  efficiently.  The  disadvantage  of  this  pacern  is  that  the  data  might  not  be  up  to  date.  

2.   The  On-­‐The-­‐Fly  Dereferencing  PaPern:  URIs  are  dereferenced  at  the  moment  that  the  app  requires  the  data.  This  pacern  retrieves  up  to  date  data.  Performance  is  affected  when  the  app  must  dereference  many  URIs.  

3.   The  (Federated)  Query  PaPern:  Submits  complex  queries  to  a  fixed  set  of  data  sources.  Enables  applica%ons  to  work  with  current  data  directly  retrieved  from  the  sources.  Finding  op%mal  query  execu%on  plans  over  a  large  number  of  sources  is  a  complex  problem.  

Data  Access  

Cache  

App  

App  

Data  Access  

Data  Access  

App  

Source:  T.  Heath,  C.  Bizer.  Linked  Data:  Evolving  the  Web  into  a  Global  Data  Space  

Data  Layer  

EUCLID  –  Building  Linked  Data  applica%ons     30  

Data  Access  Component  •  Linked  Data  applica%ons  may  implement  a  Mediator-­‐

Wrapper  Architecture  to  access  heterogeneous  sources:  –  Wrappers  are  built  around  each  data  source  in  order  to  provide  an  

unified  view  of  the  retrieved  data.      

•  The  method  to  access  the  data  depends  on  the  Linked  Data  architectural  paPern.  

•  The  factors  that  determine  the  decision  of  a  paPern  are:  –  Number  of  data  sources  to  access  –  Requirement  of  consuming  up-­‐to-­‐date  data  –  Tolerance  to  high  response  %me  –  Requirement  of  discovering  new  data  sources    

Data  Layer  (2)  

EUCLID  –  Building  Linked  Data  applica%ons     31  

Data  Access  Component  (2)  •  The  data  access  component  may  be  implemented  by  using  

one  or  a  combina%on  of  the  following  tools:  Mechanisms   Tools  (Examples)  

Linked  Data  Crawlers   LDspider  hcps://code.google.com/p/ldspider/  Slug  hcps://code.google.com/p/slug-­‐semweb-­‐crawler/  

Linked  Data  Client  Libraries   Seman%c  Web  Client  Library  hcp://wifo5-­‐03.informa%k.uni-­‐mannheim.de/bizer/ng4j/semwebclient/  The  Tabulator  hcp://www.w3.org/2005/ajar/tab  Moriarty  hcps://code.google.com/p/moriarty/  

SPARQL  Client  Libraries   Jena  Seman%c  Web  Framework  hcp://jena.apache.org/  

Federated  SPARQL  Engines   ANAPSID  hcps://github.com/anapsid/anapsid  FedX  hcp://www.fluidops.com/fedx/  SPLENDID    hcps://code.google.com/p/rdffederator/  

Search  Engine  APIs   Sindice  hcp://sindice.com/developers/api  Uberblic  hcp://uberblic.com/  

Data  Integration  Component  •  Consolidates  the  data  retrieved  from  heterogeneous  sources.  

•  This  component  may  operate  at:  –  Schema  level:  Performs  vocabulary  mappings  in  order  to  translate  

data  into  a  single  unified  schema.    Links  correspond  to  RDFS  proper%es  or  OWL  property  and  class  axioms.    

–  Instance  level:  Performs  en%ty  resolu%on  via  owl:sameAs  links.  In  case  the  data  sources  do  not  provide  the  links,  further  tools  like  Silk  or  Open  Refine  can  be  used  to  integrate  the  data.  

Data  Layer  (3)  

EUCLID  –  Building  Linked  Data  applica%ons     32  

Interlinking   Cleansing  Data  Access  Component   Vocabulary  

Mapping  

Data  Integra%on  Component  

Data  Layer  (4)  

EUCLID  –  Building  Linked  Data  applica%ons     33  

Integrated  Dataset  •  The  dataset  resul%ng  of  integrated  and  consolidated  data  can  

be  cached  in  a  RDF  store.  

•  There  are  many  solu%ons  to  deploy  triple/RDF  stores,  e.g.:  •  bigdata  (hcp://www.bigdata.com/)  

•  OWLIM  (hcp://www.ontotext.com/owlim)  

•  Jena  TDB  (hcp://jena.apache.org/documenta%on/tdb/)  

•   AllegroGraph  (hcp://www.franz.com/agraph/allegrograph/)  

•  Virtuoso  Universal  Server  (hcp://virtuoso.openlinksw.com/)  

•  RDF3x  (hcps://code.google.com/p/rdf3x/)  

  Integrated  Dataset  

Republica%on     Republica%on  Component  

Data  Layer  (5)  

EUCLID  –  Building  Linked  Data  applica%ons     34  

Republication  Component  •  Exposes  as  Linked  Data  por%ons  

•  There  are  different  solu%ons  to  make  the  data  accessible:  •  Via  SPARQL  endpoints  (e.g.,  Sesame  OpenRDF  SPARQL  Endpoint,  …)  •  Via  APIs  (e.g.,  Linked  Data  API)  •  As  RDF  dumps  •  With  the  built-­‐in  means  of  your  framework/CMS  (e.g.,  Drupal,  

Informa%on  Workbench,  …)  

Data  Layer  

Integrated  Dataset  

Republica%on     Republica%on  Component  

•  The  logic  layer  implements  sophis%cated  processing  according  to  the  func%onali%es  of  the  applica%on.  This  layer  may  include  data  mining  components  as  well  as  reasoners  that  are  not  integrated  in  the  data  layer.  

•  The  presenta7on  layer  displays  the  informa%on  to  the  user  in  various  formats,  including  text,  diagrams  or  other  type  of  visualiza%on  techniques.  

Application  and  Presentation  Layers  

EUCLID  –  Building  Linked  Data  applica%ons     35  

Logic  Layer  

Presenta%on  Layer  

LINKED  DATA  APPLICATION  DEVELOPMENT  FRAMEWORKS  

EUCLID  –  Building  Linked  Data  applica%ons     36  

Informa%on  Workbench  

Information  Workbench  

•  Platorm  for  development  of  linked  data  applica%ons  

EUCLID  –  Building  Linked  Data  applica%ons     37  

Seman%c  Web  Data  

Seman%cs-­‐  &  Linked  Data-­‐based  Integra%on  of  Enterprise  and  Open  Data  Sources    Intelligent  Data  Access  and  Analy%cs  •  Visual  explora%on  •  Seman%c  search  •  Dashboarding  and  repor%ng    Collabora%on  and  Knowledge  Management  Platorm    •  Wiki-­‐based  cura%on  &  authoring  of  

data                        •  Collabora%ve  workflows    

Source:  hcp://www.fluidops.com/informa%on-­‐workbench/  

EUCLID  –  Building  Linked  Data  applica%ons     38  

Data  storage  and  management  platorm  

Reusable  UI  and  data  integra%on  components    

Customized  applica%on  solu%ons  

External  resources  to  reuse  data  and  create  mashups  

Information  Workbench  (2)  

Data  Integration:    Data  Provider  Concept  

EUCLID  –  Building  Linked  Data  applica%ons     39  

Data  providers  support  the  periodic  extrac7on  &  integra7on  from  external  data  sources  into  a  central  repository  

 

•  Living  from  arbitrary  data  formats  to  RDF  (e.g.,  rela%onal,  XML,  CSV)  

•  Parametrizable  (e.g.  connec%on  informa%on,  refresh  interval,  ..)  

•  Built-­‐in  UI  for  instan%a%ng  providers  •  Intui%ve  interfaces  and  APIs  for  

wri%ng  own,  custom  providers  

Connect  to  data  source  

Convert  data  into  RDF  

Extract  data  from  source  

RDF  R2RML  

XML2RDF  

SPARQL  

Examples:  

Store  RDF  in  repository  

W3C  RDB2RDF  

•  Task:  Integrate  data  from  rela%onal  DBMS  with  Linked  Data  

•  Approach:  map  from  rela%onal  schema  to  seman%c  vocabulary  with  R2RML  

•  Publishing:  two  alterna%ves  –  –  Translate  SPARQL  into  

SQL  on  the  fly  –  Batch  transform  data  into  

RDF,  index  and  provide  SPARQL  access  in  a  triplestore  

40  

LD  Data  set  

Access  

Integrated  Data  in  

Triplestore  

Interlinking   Cleansing  Vocabulary  Mapping  

SPARQL  Endpoint  

Publishing  

Data  acquisi%

on  

EUCLID  -­‐  Providing  Linked  Data  

R2RML  Engine  

Rela%onal  DBMS  

W3C  RDB2RDF  •  The  W3C  made,  last  year,  two  recommenda%ons  for  mapping  between  rela%onal  databases  and  RDF:  –  Direct  mapping  directly  exposes  data  as  RDF  

•  Not  allowance  for  vocabulary    mapping  •  No  allowance  for  interlinking  (unless  URIs  used  in  rela%onal  data)  

– R2RML,  the  RDB  to  RDF  mapping  language  •  Allows  vocabulary  mapping  (subject,  predicate  and  object  maps  with  class  op%ons)  

•  Allows  interlinking  –  URIs  can  be  constructed  

EUCLID  -­‐  Providing  Linked  Data   41  

hcp://www.w3.org/2001/sw/rdb2rdf/  

R2RML  Class  Mapping  

•  Declera%ve  mappings  with  an  RDF-­‐based  syntax:    

lb:Artist  a  rr:TriplesMap  ;      rr:logicalTable  [rr:tableName  "artist"]  ;      rr:subjectMap            [rr:class  mo:MusicArtist  ;            rr:template                        "http://musicbrainz.org/artist/{gid}#_"]  ;      rr:predicateObjectMap            [rr:predicate  mo:musicbrainz_guid  ;            rr:objectMap  [rr:column  "gid"  ;                                          rr:datatype  xsd:string]]  .  

EUCLID  -­‐  Providing  Linked  Data   42  

Data  Warehousing  vs.  Federation    Warehousing  /  Crawling  •  Data  is  copied  from  the  source  

into  the  warehouse  •  Query  runs  in  the  warehouse  •  Supported  in  IWB  using  data  

providers  

Federa7on  •  Data  remains  in  federated  DB  •  Query  is  pushed  down  to  

federated  DB  •  Supported  in  IWB  using  

SPARQL  federa3on  

DB   DB  

Warehouse  

Query    

Load  

DB   DB  

Federa%on  

Query    

Query  

EUCLID  –  Building  Linked  Data  applica%ons     43  

Customizable  User  Interface  

EUCLID  –  Building  Linked  Data  applica%ons     44  

Demo  available  at  hcp://musicbrainz.fluidops.net    

Main  view  area  

Wiki  page  management  

View    selec%on    toolbar  

Current  resource  

Naviga%on  shortcuts  

User  Interface  Concept:    One  Page  URI  

Resource  page        

Graph  

Resource  page        

Resource  page          

Resource  page            

EUCLID  –  Building  Linked  Data  applica%ons     45  

Template:…        

Data  Driven  UI:  Ontology  as  “Structural  Backbone”  

EUCLID  –  Building  Linked  Data  applica%ons     46  

Resource  page        

RDF  Data  Graph  

Ontology  (RDFS/OWL)  

UI  templates  

Template:mo:MusicAr7st        

Resource  page        

Different  Views  on    Every  Resource  

Wiki  View  

Table  View  

Graph  View  

Pivot  View  

EUCLID  –  Building  Linked  Data  applica%ons     47  

CH  4  

Analy7cs  and  Repor7ng  Visualiza7on  and  Explora7on  

 

Mashups  with  Social  Media  Authoring  and  Content  Crea7on  

Widgets are not static and can be integrated into the UI using a Wiki-style syntax.

EUCLID  –  Building  Linked  Data  applica%ons     48  

Widget-­‐Based  User  Interface    

Example:  Add  Widgets  to  Wiki  

•  {{#widget: BarChart | •  query ='SELECT distinct (COUNT(?Release) AS ?COUNT) ?label WHERE {

•  ?? foaf:made ?Release .

•  ?Release rdf:type mo:Release . •  ?Release dc:title ?label .

•  } •  GROUP BY ?label

•  ORDER BY DESC(?COUNT)

•  LIMIT 10 •  '

•  | input = 'label'

•  | output = 'COUNT' •  }}  

Example:  Show  top  10  released  records  for  an  ar=st  

EUCLID  –  Building  Linked  Data  applica%ons     49  

Music  Example  

Page  of  a  class:    •  Shows  an  overview  of  MusicAr%st  instances  

 

EUCLID  –  Building  Linked  Data  applica%ons     50  

See  hcp://musicbrainz.fluidops.net/resource/mo:MusicAr%st  

Music  Example  (2)  

EUCLID  –  Building  Linked  Data  applica%ons     51  

Page  of  a  class  template:    •  Defines  a  layout  for  displaying  each  resource  of  the  class    •  Uses  seman%c  wiki  syntax  

See  hcp://musicbrainz.fluidops.net/resource/Template:mo:MusicAr%st  

Music  Example  (3)  

EUCLID  –  Building  Linked  Data  applica%ons     52  

Page  of  a  class  instance:    •  Displays  the  data  about  the  resource  according  to  the  class  

template  

See  hcp://musicbrainz.fluidops.net/resource/?uri=hcp%3A%2F%2Fmusicbrainz.org%2Far%st%2Fb10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d%23_  

Mashups  with  external  sources    

•  Relevant  informa%on  and  UI  elements  from  external  sources  can  be  incorporated  in  the  wiki  view  

•  IWB  contains  mul%ple  mashup  widgets  for  popular  social  media  sources  –  Twicer  –  Youtube  –  Facebook  –  New  York  Times  news  –  LinkedIn  –  …  {{#widget:  Youtube  |  searchString  =  $SELECT  ?x  WHERE  {  ??  foaf:name  ?x  .  }$  |  asynch  =  'true’  }}  

Template  instantiation  ??  =  http://musicbrainz.org/artist/a3cb23fc-­‐acd3-­‐4ce0-­‐8f36-­‐1e5aa6a18432%23_  ?x  =  „U2“  

EUCLID  –  Building  Linked  Data  applica%ons     53  

Triple  Editor  

Table  View  

•  Edit  structured  data  associated  with  a  resource  •  Make  change,  add  and  remove  triples  

EUCLID  –  Building  Linked  Data  applica%ons     54  

Ontology-­‐Based  Data  Input  

Triple  Editor  takes  into  account  the  ontology  defini%on:  •  Autosugges%on  tool  considers  the  domains  and  ranges  of  the  

proper%es  

Example:  proper%es  available  for  the  class  mo:MusicGroup  are  suggested  automa%cally  

EUCLID  –  Building  Linked  Data  applica%ons     55  

Validation  of  User  Input  

Valida%on  uses  property  defini%ons  in  the  ontology:  

•  The  property  myIntegerProperty  has  an  associated  rdfs:range  defini%on.  

•  This  ensures  that  all  objects  must  be  of  XML  schema  type  xsd:integer.  

EUCLID  –  Building  Linked  Data  applica%ons     56  

Systap  Bigdata              

Users  

Original    data    sources    

IWB  Fron

tend

 IW

B  Ba

cken

d  

Use  Case  1:  Data  Provisioning        

     

Museum  visitor  

Museums  and  other  sources  

•  Data  crawling  •  Data  transforma7on  •  Data  Interlinking  •  Data  enrichment  /  

Informa7on  extrac7on  •  Data  valida7on  

Cards  

Social  networks  

Russian Museum Project – Architecture and Use Cases

Russian  Museum  Data  

DBpedia  Subset  

Bri%sh  Museum  Data   User  Data  

Use  Case  3:  Mobile  App    

•  HTML5  Templates  +  CSS  for  mobile  devices  

•  Simplified  IWB  Wiki  View  •  Google  Glass  App  •  QR  Code  recogni7on  •  PaPern  /  image  recogni7on  

Use  Case  2:  Search  and  Visualiza7on    

•  Base  Templates  for  visualiza7on  •  Templates  for  external  data  •  PivotViewer  •  Step-­‐by-­‐step  visualiza7on  •  Extended  Search  widgets  •  SemFacet  

Website  visitor  Data  Engineer  

Linked  Data  Applica%on  for  the  Russian  Museum  

Data

Data Providers

Ontology

Templates

Widgets

Web Crawl, RDF Dump

Sample Visualization Russian Museum

Google Glass

60  

Summary

§  Linked Data and Semantic Technologies –  From data to information to knowledge –  Graphs for integration of heterogeneous data in variety of data models –  Ontologies for knowledge representation and interpretation of data

§  Linked Data applications –  Publishing and consuming Linked Data –  Main components and architecture

§  Standards-based, declarative models for all aspects of the application

–  RDF: common data model –  OWL Ontology: conceptual domain model –  R2RML: Integrating data sources –  SPARQL queries: expressing informatin needs –  Wiki-templates: interfaces for interacting with the data

Contact us!

metaphacts GmbH Kautzelweg 13 69190 Walldorf Germany p +49 6227 8308660 m +49 157 50152441 e [email protected] @metaphacts

62