Hello Open World - Semtech 2009

112
Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Hello Open Data World! The Web of Data for the Pragmatic Developer Alexandre Passant & Giovanni Tummarello SEMTECH, San Jose, California 2009

Transcript of Hello Open World - Semtech 2009

Page 1: Hello Open World - Semtech 2009

Copyright 2008 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Hello Open Data World!

The Web of Data for the Pragmatic Developer

Alexandre Passant & Giovanni Tummarello

SEMTECH, San Jose, California 2009

Page 2: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Outline

Content

Motivation

What is the Web of Data?

Creating Structured Data

Discovery, Accessing & Querying

Mash-ups & Advanced Topics

2

NB: Tutorial adapted from the WWW2009 “Hello Open

World” presentation by Alexandre Passant and Michael

Hausenblas

Page 3: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Speakers introduction

Alexandre Passant

Postdoctoral researcher, DERI Galway

Social Software and Semantic Web

http://apassant.net

Giovanni Tummarello

Research Fellow, DERI Galway

Web of Data Search and Mashups

http://g1o.net

3

Page 4: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

DERI Galway

Enabling Networked Knowledge

Social Semantic Information Spaces

Semantic Reality

Approximately 130 people

4

Page 5: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Companion website

http://helloopenworld.net

5

Page 6: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

What is this tutorial about

What you will learn

Web of Data principles

Architecture principles for applications on the Web of Data

Finding and creating structured data

Using vocabularies and lightweight inference

Querying Web data with SPARQL and with Sindice

User interfaces and mash-ups

What you will not

Ontology mapping and alignment

Advanced rules languages

Complex SPARQL querying

66

Page 7: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

What you should be able to do after this

tutorial

Explain the Web of Data to your CTO / Students /

Advisor / Grandmother

Spread the values of the Web of Data

Join the Web of Data

Effectively enriching your existing pages and applications with

annotations that will help your data to be found and integrated

Leverage the Web of Data for your apps

Finding data “out there” and reusing it

Using open-source and xAMP technologies

Creating, consuming and mashing-up RDF data

77

Page 8: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Outline

Content

Motivation

What is the Web of Data?

Creating Structured Data

Discovery, Accessing & Querying

Mash-ups & Advanced Topics

8

Page 9: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Motivation

9

Yahoo

Google

Rich

Snippets

Page 10: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Motivation

More and more data is available on the Web

Structured data, in RDF, microformats, etc.

Reusing Data is a value for your application !

Up to now people would develop against proprietary

APIs (such as from Flickr, Google, etc.)

Loss of time for developers

The Web of Data …

Provides a uniform data model (RDF)

Provides a uniform API for accessing data (RDF/SPARQL)

Provide common semantics for this data (RDFS/OWL)

Enables serendipitous usage of data

10

Page 11: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Outline

Content

Motivation

What is the Web of Data?

Creating Structured Data

Discovery, Accessing & Querying

Mash-ups & Advanced Topics

11

Page 12: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

What is the Web of Data?

Web of information structured via standards and made

available on the Web

Microformats and GRDDL

RDF using various serializations: RDF/XML, RDFa, etc.

12

Page 13: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

RDF

13

RDF: Resource Description Framework

As of RDF abstract syntax, a data model: a directed,

labeled graph based on URIs

RDF is not XML !

RDF/XML is only one of the multiple way to serialize RDF data

(N3, RDFa …)

Triple: (subject predicate object)

subject

predicate

object

<http://sw-app.org/#i><http://xmlns.com/foaf/0.1/knows>

<http://apassant.net/alex>.

Page 14: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

RDF

14

Page 15: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

RDF

@prefix dcterms: <http://purl.org/dc/terms/> .

<http://deri.ie/teaching/tutorials/lod-intro>

dcterms:title “Tutorial on Linked Data - A Practical Introduction” ;

dcterms:author<http://sw-app.org/mic.xhtml#i> ;

dcterms:subject <http://dbpedia.org/resource/Linked_Data> .

15

Page 16: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

“Semantic Data” on the Web

Web of Data == the Semantic Web?

Not really, it’s the “Web” facing part of it

It’s part of it, a kind of subset

In contrast to the full-fledged Semantic Web vision, the Web

of Data more about raw data publishing in interoperable

format than about logic inference and reasoning on it

16

Page 17: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

What is the Web of Data?

17

Page 18: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Web of data via “Linked Data”

Linked data principles, by Tim Berners-Lee, ca. 2006

Use URIs to identify things (anything, not just documents)

– “To benefit from and increase the value of the World Wide Web,

agents should provide URIs as identifiers for resources”

– http://www.w3.org/TR/webarch/

Use HTTP URIs – globally unique names, distributed ownership

– allows people to look up things

Provide useful information in RDF – when someone looks up a

URI

Include RDF links to other URIs– to enable discovery of related

information

Example http://dbpedia.org/resource/Dublin

18

Page 19: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Linked data growth

19

2008

2007

Page 20: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Linked data growth

20

2009

2008

Page 21: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Needed: shared vocabularies (Ontologies)

21

Ontologies provide common semantics for the Web of

Data

“An ontology is a specification of a conceptualization.”

Main languages are RDFS and OWL

This tutorial will mainly focus on RDFS

OWL allows advanced axioms (contraints, unions …)

Classes and properties

:Person a rdfs:Class

:father a rdfs:Property

:father rdfs:domain :Person

:father rdfs:range :Person

21

Page 22: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Ontologies

22

Hierarchies in ontologies

Are needed to define narrower / broader concepts

:LivingThing > :Person

Can be applied to both classes and properties

:Person rdfs:subClassOf :LivingThing

:father rdfs:subPropertyOf :familyRelation

Inference engines can take advantage of it to create new

facts

Can be used when querying information

Retrieve all :LivingThing instances with a :familyRelation

– Will get :Person and :father !

22

Page 23: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Notable ontologies

Social networks and social data

FOAF – Friend Of A Friend

SIOC – Semantically-Interlinked Online Communities

Software development

DOAP – Description Of A Project

BEATLE - Bug And Enhancement Tracking LanguagE

Comprehensive / Top-level

Yago (From Wikipedia)

OpenCYC

Taxonomies

SKOS – Simple Knowledge Organisation System

23

Page 24: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Zooming in: FOAF Ontology

A model to describe people and social networks

http://foaf-project.org

Concepts

Person, OnlineAccount, Document, etc.

Properties

name, homepage, holdsAccount, knows, etc.

24

Page 25: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

FOAF in use

Google Social Graph API

http://code.google.com/intl/fr/apis/socialgraph/

Uses FOAF information already there on the Web to find

your contacts

http://socialgraph-

resources.googlecode.com/svn/trunk/samples/findcontacts.html

E.g.: http://apassant.net

– http://socialgraph-

resources.googlecode.com/svn/trunk/samples/findcontacts.html?q=

http%3A%2F%2Fapassant.net

– Contacts found in various FOAF files that link to myself and to my

profile

25

Page 26: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Zooming in: SIOC Ontology

Describe Web communities and their social interactions

Who’s writing what, who’s answering who, etc.

A simple model to ensure easy-integration into existing

applications

Lightweight: one core ontology, 4 modules

Plug-ins /core-feature for several CMS

Drupal, Wordpress etc.

Enables interoperability between social applications

http://sioc-project.org

26

Page 27: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

The SIOC ontology

The main classes and properties are:

27

Page 28: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

28

Combining FOAF + SIOC

Page 29: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Adoption of SIOC

29

Page 30: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Which ontologies to use ?

SearchMonkey Vocabularies

http://developer.yahoo.com/searchmonkey/smguide/profile_voca

b.html

30

Page 31: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Which ontologies to use ?

How to Publish Linked Data on the Web

http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/

31

Page 32: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Extending ontologies ?

What if existing ontologies are not enough for your

needs ?

Create a new one

… or extend existing ones !

Ontologies can be extended in a decentralized way

E.g. you can create a subproperty of foaf:knows,

“attendedTutorialWith”, in your own ontology

Publish in on your own

Or use http://open.vocab.org

32

Page 33: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Attention: Domain and range

Domain and range of properties are descriptive, not

prescriptive

Example if we say :father rdfs:domain :Person

– Not only pre-defined Persons can be fathers

– But every father is a Person !

Consequence 1: One triple is enough to describe several

informations

Consequence 2: DON’T use foaf:homepage for a shoe

For details

Based on RDF semantics (Rule rdfs2)

http://www.w3.org/TR/rdf-mt/

33

Page 34: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Linking Open Data Project

Community project with W3C support started in early

2007 [LOD]: http://linkeddata.org

Idea: take existing (open) data sets and make them

available on the Web in RDF

Interlink them with other data sets

Expand the network effect of Linked Data !

Raw Data Now !

Tim Berners-Lee TED talk

http://www.ted.com/index.php/talks/tim_berners_lee_on_the_nex

t_web.html

34

Kudos to Tom Heath and Richard Cyganiak; the material in

this section is heavily based on their work.

Page 35: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Linking Open Data Project

35

May 2007

Page 36: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Linking Open Data Project

36

Feb 2009

Page 37: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Notable datasets of the LOD cloud

37

Linking Open Data

Community project started in 2007 - http://linkeddata.org

Dbpedia – http://dbpedia.org

Wikipedia in RDF: “more than 2.6 million things, including at

least 213,000 persons, 328,000 places, 57,000 music albums,

36,000 films, 20,000 companies”

Geonames – http://geonames.org

“over eight million geographical names”: coordinates, etc.

Freebase - http://rdf.freebase.com/

“5203825 Topics 14110006 Named Entities”

E.g. http://rdf.freebase.com/rdf/en.blade_runner

Page 38: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Linking Open Data Project

38

DBpedia

Page 39: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Linking Open Data Project

39

Geonames

Page 40: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Querying dbpedia

Programmatically (via SPARQL, see later)

Via User Interface

http://wikipedia.aksw.org

40

Page 41: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Tools and Applications

Linking Open Data homepage [LOD] has

Browsing with Tabulator, VisiNav, Sig.ma, DBpedia Mobile,

iLOD, etc.

Searching with Sindice, SWSE, Falcons, etc.

Mashups, e.g. Revyu, BBC Music, DERI Pipes

See further

http://esw.w3.org/topic/SweoIG/TaskForces/Community

Projects/LinkingOpenData/Applications

41

Page 42: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Tools and Applications

42

DBpedia Mobile

Page 43: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Applications integration

43

BBC music beta

Page 44: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Typical architectures of applications

for the Web of Data (1/2)

from: Heitmann, B., et al., “Towards a reference architecture for Semantic Web

applications,” Proceedings of the 1st Int. Web Science Conference, 2009

44

Page 45: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Typical architectures of applications

for the Web of Data (2/2)

Data Interface: Abstraction layer regarding implementation,

number and distribution of persistence layers.

Persistence Layer: Persistent storage of data and run time state.

User Interface: Human accessible interface for using application

and viewing data. (“read-only”)

Annotation User Interface: Edit, create, import or export data.

Integration Service: Merge Structure, Syntax or Semantics of data

from multiple heterogeneous sources.

Search Engine: Search on content or semantic features.

Crawler: Retrieval of remote data for integration service.

45

Page 46: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Outline

Content

Motivation

What is the Web of Data?

Creating Structured Data

Discovery, Accessing & Querying

Mash-ups & Advanced Topics

46

Page 47: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Creating Structured Data

Overview of different methods:

Create RDF/XML manually (using your favourite text-editor or

Web-based interfaces)

Create XHTML+RDFa documents and use GRDDL

transformation

– For both human and machines !

Use exporters / wrappers for existing service

Use applications that natively expose RDF data

Provide mappings from RDBMS to RDF data

Hands-on !

We will go through several of them to create interlinked RDF

data from various sources of structured data

47

Page 48: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Getting a FOAF profile

Or how to give yourself a URI

Be part of the Web of Data

Create your FOAF file

http://www.ldodds.com/foaf/foaf-a-matic (requires hosting -

provided during the tutorial)

http://foafbuilder.qdos.com/builder/ (requires OpenID)

I already have an homepage, what about duplication of

information ?

Use RDFa to embed RDF annotations in yourhomepage !

More on thistopic in a few slides

48

Page 49: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Extend your FOAF profile

The foaf:knows property aims to represent social

connections between people

:alex foaf:knows :michael

Going further with the relationship vocabulary

http://vocab.org/relationship/: colleagueOf, hasMet …

Add some people from the workshop, validate, and

upload to the workshop repository

http://www.w3.org/RDF/Validator/

http://helloopenworld.net/semtech09/data

You finally got a URI !

– http://helloopenworld.net/semtech09/data/apassant.rdf#me

49

Page 50: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Defining personal interests

Instead of modeling interests as plain-text strings, use

URIs to describe them !

Allows interlinking of various resources for advanced query

purposes: “find all people that like movies directed by Tarantino”

And link them to you using foaf:topic_interest

:me foaf:topic_interest :movie

But … where to get these URIs ?

The Linking Open Data cloud !

– Provide URIs for million of concepts, esp. thanks to DBpedia

Sindice can be used to find URIs for a given concept

– http://sindice.com

50

Page 51: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Defining personal interests

51

Page 52: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Defining personal interests

52

Page 53: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

RDFa and GRDDL

GRDDL is a mechanism to transform any kind of XML to

RDF

XHTML+RDFa is XML, hence GRDDL can extract it

Simply embeds RDFa annotations in your HTML code

Indexed by Yahoo! SearchMonkey and Google

Done via XSLT, available at http://www.w3.org/2008/07/rdfa-xslt

53

Page 54: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

RDFa and GRDDL

The GRDDL Primer athttp://www.w3.org/TR/grddl-

primer/#scheduling shows the overall processing of

XHTML+RDFa:

54

Page 55: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

RDFa and GRDDL

http://sdow2009.semanticweb.org

55

Page 56: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

RDFa and GRDDL

http://sdow2009.semanticweb.org

Browse source to check RDFa annotations

56

Page 57: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

RDFa and GRDDL

http://sdow2009.semanticweb.org

Header contains prefixes and links to the GRDDL transformation

57

Page 58: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

RDFa and GRDDL

http://sdow2009.semanticweb.org

Webpage can be translated to native RDF/XML using an RDFa

distiller - http://www.w3.org/2007/08/pyRdfa/

58

Page 59: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

AddRDFa to yourhomepage

Choose the right DTD

http://www.w3.org/TR/rdfa-syntax/DTD/xhtml-rdfa-1.dtd

Addprefixesdefinition in the header

Depending on ontologies youwill use

Addappropriate profile

E.g. http://ns.inria.fr/grddl/rdfa/

Addaditionalmarkup

E.g. rel, about, typeof

Example

http://helloopenworld.net/semtech2009/files/profile.html

59

Page 60: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Wrappers for existing sources

Creating and maintaining a FOAF file by hand can be a

time-consuming task

How can we automatically get RDF data from existing sources ?

What about Web 2.0 services in which we already give

lots of personal information ?

Most of them provide APIs to get structured information (JSON,

XML …) about the user profiles, content, etc.

API to RDF wrappers can easily be implemented

60

Page 61: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Wrappers for Web 2.0 services

Facebook wrapper

Generates a FOAF file from your Facebook profile

http://www.dcs.shef.ac.uk/~mrowe/foafgenerator.html

Flickr wrapper

Generates FOAF + SIOC + links to geographical information

(using geonames.org)

http://apassant.net/home/2007/12/flickrdf

61

Page 62: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

RDFification services

Translates many structured sources into RDF

URIBurner

– http://linkeddata.uriburner.com/

– Open Source, C++ , Based on Virtuoso

Any23

– Sindice sponsored

– Open Source, Java based

Swignition

– http://buzzword.org.uk/swignition/

– Perl based

Triplr

– Purelysyntactic, fast

– http://triplr.org

62

Page 63: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Interlinking identities

The previous exporters create different URIs

A need to unify your online identity on the Web of Data

owl:sameAs

Used to state that two resources with different URIs are about the same entitiy

http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/

owl:InverseFunctionalProperty

foaf:mbox, foaf:openid, etc.

“Inverse Functional” properties can be used to identify uniqueness for a foaf:Person

63

Page 64: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Interlinking identities and networks

64

Page 65: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Native export of RDF data

CMS can expose RDF data natively using dedicated

plug-ins

SIOC Export for Drupal: http://drupal.org/project/SIOC

Provide RDF export of each blog post

– http://apassant.net/blog/2009/03/07/call-suggested-features-sparql-

working-group

– http://apassant.net/sioc/node/235

Using RDF autodiscovery feature in the HTML header

– So that RDF can be discovered when browsing HTML

– Semantic Radar: http://sioc-project.org/firefox

RDFa to be included in Drupal7 core !– http://groups.drupal.org/node/16597

– 100.000’s of RDFa-powered websites

65

Page 66: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Overview: SIOC for vBulletin

66

Page 67: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Relational to RDF Mapping

Relational data (RDB) is structured data and can be

mapped to RDF straight-forward

Main issues:

Closed-world vs. open-world modeling

Assigning URIs for entities (records)

Mapping language expressivity

For a state-of-the-art see

http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_Su

rveyReport.pdf

67

Page 68: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Relational to RDF Mapping

Standardization

W3C RDB2RDF Incubator Group 2008/2009

Upcoming W3C RDB2RDF Working Group

Current solutions (see state-of-the-art)

D2RQ

– http://www4.wiwiss.fu-berlin.de/bizer/d2rq/

– DBLP in RDF: http://dblp.l3s.de/d2r/

OpenLink’s Virtuoso

– http://www.openlinksw.com/virtuoso/

Triplify

– http://triplify.org

68

Page 69: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Outline

Content

Motivation

What is the Web of Data?

Creating Structured Data

Discovery, Accessing & Querying

Mash-ups & Advanced Topics

69

Page 70: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Discovery of RDF data

Discovery is the process of starting with a URI and learn

more about the resources that can be accessed or

described through it

.. Without using a search engine

70

Page 71: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Discovering RDF data

Simple case: Dereference

“Follow-Your-Nose” approach

Semantic Sitemaps

to find SPARQL endpoints or data dumps

http://sw.deri.org/2007/07/sitemapextension/

Links from within the RDF

voiD, vocabulary of interlinked datasets

– Allows to learn what a dataset is about

– Provides quantitative data on interlinking (statistics)

– Enables to deliver licensing, provenance and access information

– http://semanticweb.org/wiki/VoiD

SeeAlso, owl:import or simply dereference other URIs

71

Page 72: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

72

Semantic Web Sitemaps

Easy to create metadata from your existing database

(D2RQ, microformats etc)

But you need to tell the world about it!

More is needed to make your data useful (e.g. linking to OTHER

URIs if your entities are not something completely “yours”)

Need to make the world know your data is there.

Semantic Web Sitemaps can help

Page 73: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

73

Large quantities of linked data: how to

expose?

The fact that the data is HTTP retrievable in small bits

makes it crawlable.

But data producers are very scared of this:

Million of hits for each refresh

And clearly something better must be possible

Most data producers do in fact already provide full dumps of the

base data

Or SPARQL endpoints

Page 74: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

74

Extending Sitemaps to expose data

Sitemaps:

Originally by Google, immediately adopted by all (Yahoo, MSN)

etc

Expose the “deep web”, by providing a list of pages “to be

crawled”

Written in XML, Linked directly in the robot.txt

Example:<?xml version="1.0" encoding="UTF-8"?>

< urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

< url>

< loc>http://www.example.com/</loc>

< lastmod>2005-01-01</lastmod>

< changefreq>monthly</changefreq>

< priority>0.8</priority>

</url>

</urlset>

Page 75: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

75

The Semantic Sitemap Extention

Example first:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"

xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">

<sc:dataset>

<sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel>

<sc:dataDumpLocation>http://example.org/cataloguedump.rdf

</sc:dataDumpLocation><sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix><changefreq>monthly</changefreq>

</sc:dataset>

</urlset>

Page 76: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

76

The Semantic Sitemap Extention

Example first:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"

xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">

<sc:dataset>

<sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel>

<sc:dataDumpLocation>http://example.org/cataloguedump.rdf

</sc:dataDumpLocation><sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix><changefreq>monthly</changefreq>

</sc:dataset>

</urlset>

Page 77: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

77

The Semantic Sitemap Extention

Example first:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"

xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">

<sc:dataset>

<sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel>

<sc:dataDumpLocation>http://example.org/cataloguedump.rdf

</sc:dataDumpLocation><sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix><changefreq>monthly</changefreq>

</sc:dataset>

</urlset>

Page 78: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

78

The Semantic Sitemap Extention

Example first:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"

xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">

<sc:dataset>

<sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel>

<sc:dataDumpLocation>http://example.org/cataloguedump.rdf

</sc:dataDumpLocation><sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix><changefreq>monthly</changefreq>

</sc:dataset>

</urlset>

Page 79: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

79

The Semantic Sitemap Extention

Example first:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"

xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">

<sc:dataset>

<sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel>

<sc:dataDumpLocation>http://example.org/cataloguedump.rdf

</sc:dataDumpLocation><sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix><changefreq>monthly</changefreq>

</sc:dataset>

</urlset>

Page 80: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

80

The Semantic Sitemap Extention

Example first:

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"

xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">

<sc:dataset>

<sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel>

<sc:dataDumpLocation>http://example.org/cataloguedump.rdf

</sc:dataDumpLocation><sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix><changefreq>monthly</changefreq>

</sc:dataset>

</urlset>

Page 81: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

81

How it is meant to be used

As a crawler

If you are given a URL for an RDF site check for the sitemap

If a dump is available, download that instead

As a client

If you have a dump, and want an update

Check the sitemap, to locate it in case it has changed position

Or to locate a SPARQL endpoint

Page 82: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Or Search!

Sindice Search Engine

http://sindice.com

Look up by RDF by keywords and on property/value

descriptions

Simple queries but executed fast.

Fast indexing (20 to 60m) of newly “pinged” information

Sindice can be thought as a “Spider In the middle” for application

2 application semantic communication via published data.

82

Page 83: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Searching the Web of Data

E.g. search for people who claim to know X

83

Page 84: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Query service over all the major “linked dataset”

Advanced queries, (SPARQL, see later)

No pings

In general best effort results due to timeouts (on non

simple queries)

84

Lod.openlinksoftware.com

Page 85: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

How to manipulate and query RDF?

Querying Web data on runtime

Needs to load RDF data in memory, can be quite slow

Storing information in RDF-stores and using SPARQL

Involves data replication and need to sync between data from the

Web and data in your RDF-store

– Sindice API to help syncing

Lots of RDF-stores available on the market

– Sesame

– Jena

– Openlink Virtuoso

– Allegrograph

– OpenAnzo

– Mulgara, etc

85

Page 86: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

SPARQL

SPARQL Protocol and RDF Query Language

“The SQL of the Semantic Web”

Both a protocol and a query language

– RDF data can be queried via REST

Four different query forms

SELECT, CONSTRUCT, ASK, DESCRIBE

We will mainly focus on the first one

SPARQL is based on a graph-matching approach

Retrieve statements that match some patterns in one (or more)

RDF graph(s): independant from serialization

W3C SPARQL WG currently working on new features

http://www.w3.org/2009/01/sparql-charter

86

Page 87: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

SPARQL SELECT

SELECT all people and their name

http://helloopenworld.net/semtech2009/files/select1.sparql

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?person ?name

WHERE

?person a foaf:Person ;

foaf:name ?name .

87

Page 88: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

SPARQL CONSTRUCT

Contruct an RDF graph from other ones

Can be seen as the XSLT of the Semantic Web

http://helloopenworld.net/semtech2009/files/construct1.sparql

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX semtech09: <http://ex.org/semtech09/>

CONSTRUCT ?person a semtech09:attendee .

WHERE ?person a foaf:Person .

88

Page 89: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

SPARQL DESCRIBE

Get information about a given resource

DESCRIBE is implementation specific and can return

different results depending on the triple-store used

http://helloopenworld.net/semtech2009/files/desc1.sparql

DESCRIBE

<http://helloopenworld.net/semtech2009/data/apassant.r

df#me>

89

Page 90: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

SPARQL ASK

Check if a particular pattern matches the RDF graph

Is Alex a foaf:Person ?

http://helloopenworld.net/semtech2009/files/ask1.sparql

PREFIX foaf: http://xmlns.com/foaf/0.1/

ASK

<http://helloopenworld.net/semtech2009/data/apassant.r

df#me> a foaf:Person .

90

Page 92: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

SPARCool.net

http://sparcool.net

Run SPARQL queries on any URI thatfollows the Linked

Data principles

http://sparcool.net/j/dbp:abstract;l=en/http://dbpedia.org/resource

/Semantic_Web

Various formats for results (HTML, JSON, etc.)

Embedresults in your documents using JSONP

92

Page 93: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Setup a SPARQL endpoint

Various open-source triple-store availables

Virtuoso, Sesame, Joseki …

Based on various back-ups (MySQL, dedicated FS …)

We will focus on xAMP solutions with ARC2

Lightweight RDF framework for PHP - http://arc.semsol.org

RDF Store based on MySQL

Only a few lines of code to set-up a repository

– http://helloopenworld.net/semtech2009/store1/index.phps

Using SPARQL+ to LOAD / UPDATE / DELETE RDF data

– SPARQL being read-only

Used in various of our projects

– SMOB, LODr, etc

93

Page 94: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Loading RDF data

SPARQL is a read-only language

SPARQL+ allows to add / modify / delete RDF data

LOAD <URI> [INTO <URI>]

Will load the RDF data from <URI> into the store before going

into SPARQL querying

LOAD your FOAF files in the RDF store

http://helloopenworld.net/semtech2009/store1/

E.g. LOAD

<http://helloopenworld.net/semtech2009/data/apassant.rdf>

– NB: To be done in POST mode

94

Page 95: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Lightweight inference

ARC2 does not provide RDFS inference engine

But triggers can be used to write one

http://apassant.net/blog/2008/10/01/lightweight-subpropertyof-

subclassof-inference-arc2

Rule rdfs9: inference on subproperties

http://www.w3.org/TR/rdf-mt/#RDFSRules

Can be done with SPARQL CONSTRUCT and ARC2 triggers

http://helloopenworld.net/semtech2009/store2/index.phps

http://helloopenworld.net/semtech2009/files/ARC2_SubPropertyInfe

renceTrigger.phps

95

Page 96: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Lightweight inference

LOAD your profile in the inference-enabled store

http://helloopenworld.net/semtech2009/store2

Try the following query in both stores

Only the second one deals with subProperty inference

http://helloopenworld.net/semtech/2009/files/select2.sparql

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?person

WHERE

<YOUR_URI>foaf:knows ?person .

96

Page 97: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Complex triggers

LOAD the RDF file corresponding to each user interest

Must be done to avoid issues of distributed SPARQL querying

Trigger designed using SPARQL SELECT + SPARUL

LOAD

For each loaded file, check if there are any foaf:topic_interest

and load them into the store

http://helloopenworld.net/semtech2009/files/ARC2_InterestLoad

Trigger.phps

http://helloopenworld.net/semtech2009/store3

97

Page 98: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

SPARQL SELECT w/ Triggers

Advanced querying capabilities

http://helloopenworld.net/semtech2009/files/select3.sparql

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT ?person

WHERE

?person foaf:topic_interest [

dbo:director<http://dbpedia.org/resource/Quentin_Tarantino

> . ]

98

Page 99: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Outline

Content

Motivation

What is the Web of Data?

Creating Structured Data

Discovery, Accessing & Querying

Mash-ups & Advanced Topics

99

Page 100: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Mash-ups & Advanced Topics

Applying and extending what we have learned

Displaying and rendering the Web of Data

End-user Interaction, UI

Read/write Web of Data

Writing Applications!

100

Page 101: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Exhibit faceted browsing

Exhibit

JavaScript library for faceted browsing

http://www.simile-widgets.org/exhibit/

Can be used directly on the top of a SPARQL endpoint

Thanks to SPARQL CONSTRUCT and the Babel translation

service

http://helloopenworld.net/semtech2009/files/construct2.sparql

http://helloopenworld.net/semtech2009/exhibit

http://microplanet.sioc-project.org/

– With geolocation services

101

Page 102: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Ubiquity command

Ubiquity

Mozilla Firefox command line for the Web

http://ubiquity.mozilla.com/

Find people that like a given topic when browsing

Wikipedia

(1) Query Dbpedia to find the related URI

(2) Query the RDF Store to identify related people

Total: 2 SPARQL queries !

http://helloopenworld.net/semtech2009/ubiquity

http://en.wikipedia.org/wiki/Reservoir_Dogs

102

Page 103: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Ubiquity command

103

Page 104: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Mash-up tools

DERI Pipes, a tool for semantic mashups

http://pipes.deri.org

104

Page 105: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Sig.ma Mashup Maker

Web of data comes together

In an interactive mashup maker

Mashups can be embedded or queries programmatically

105

Page 106: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Geolocation mash-ups

Since data is interlinked, it’s easy to combine it

Can mix personal and public data, e.g. in organisations

106

Page 107: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Data directories

http://doapstore.org

A directory of Software projectsdescribedwith DOAP

CompletelyRDF-based

107

Page 108: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Ongoing works

Enabling the write on the Web of Data

“Pushback”

http://esw.w3.org/topic/PushBackDataToLegacySources

More demos at: http://ld2sd.deri.org/pushback/

Check out code at

http://code.google.com/p/pushback/ and contribute !

Using RDFa to automate web forms

RDFormshttp://rdfs.org/ns/rdforms

108

Page 109: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Outline

Content

Motivation

What is the Web of Data?

Creating Structured Data

Discovery, Accessing & Querying

Mash-ups & Advanced Topics

109

Page 110: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Conclusion

Web of Data is a reality

Tools and technologies exist to

Create and describe data on the Web

Store and access data on the Web

Discover and query data on the Web

Build and expand your applications with data on the Web

Challenges

Technical issues such as scalability and usability

Social issues (trust, privacy, etc.)

Economic issues (building a critical mass)

110

Page 111: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Events

LDOW Series

http://events.linkeddata.org/

ESWC and ISWC

Major venues for academic research on the Semantic Web

SFSW: Scripting For the Semantic Web Workshop

– http://www.semanticscripting.org/

Triplification challenge

Applications enabling existing systems being part of the Web of

Data

http://triplify.org/Challenge/2009

Deadline June 30th, 2009

111

Page 112: Hello Open World - Semtech 2009

Digital Enterprise Research Institute www.deri.ie

Feedback

Did you learn something during the tutorial ?

Do you think you can now explain the Web of Data and

build applications ?

Which topics that you expected were not covered ?

Feel free to discuss or contact us

[email protected]

[email protected]

[email protected]

#swig on irc.freenode.net

112