SWT Lecture Session 2 - RDF

86
+ RDF Mariano Rodriguez-Muro, Free University of Bozen-Bolzano

description

 

Transcript of SWT Lecture Session 2 - RDF

Page 1: SWT Lecture Session 2 - RDF

+

RDF

Mariano Rodriguez-Muro, Free University of Bozen-Bolzano

Page 2: SWT Lecture Session 2 - RDF

+Disclaimer

License

This work is licensed under the Creative Commons Attribution-Share Alike 3.0 License http://creativecommons.org/licenses/by-sa/3.0/

Page 3: SWT Lecture Session 2 - RDF

+

RDF

Background

The Data Model

In-Detail

Syntax: Turtle, N-triple

Tools

Syntax: XML

Page 4: SWT Lecture Session 2 - RDF

+

RDFBackground

Page 5: SWT Lecture Session 2 - RDF

+URI, URL and IRI

URI = Uniform Resource IdentifierURL = Uniform Resource Locator (has a location on the WWW) IRI = Internationalized Resource Identifier (uses Unicode)

Used for identifying resources (web, local, etc.)

Resources can be anything that has an identity in the context of an application (books, locations, humans, abstract concepts, etc.)

Analogous to, e.g., ISBN for books URLs ⊆ URIs ⊆ IRIs

Page 6: SWT Lecture Session 2 - RDF

+URI, URL and IRI

scheme: type of URI, e.g. http, ftp, mailto, file, irc

authority: typically a domain name

path: e.g. /etc/passwd/

query: optional; provides non-hierarchical information. Usually for parameters, e.g. for a web service

fragment: optional; often used to address part of a retrieved resource, e.g. section of a HTML file.

Good IRI design is important for semantic applications. More later.

scheme:[//authority]path[?query][#fragment]

Page 7: SWT Lecture Session 2 - RDF

+QNames

Used in RDF as shorthand for long URIsIf prefix “foo” is bound to http://example.com/

Then foo:bar expands to

http://example.com/bar

Not quite the same as XML namespaces Mostly the same as CURIEs

Practically relevant due to IO restrictions

Necessary to fit any example on a page! Simple string concatenation

Page 8: SWT Lecture Session 2 - RDF

+

RDFThe Data Model

Page 9: SWT Lecture Session 2 - RDF

+RDF is…

Resource Description Framework

Page 10: SWT Lecture Session 2 - RDF

+RDF is…

The data model of Semantic Technologies and of the Semantic

Web.

Page 11: SWT Lecture Session 2 - RDF

+RDF is…

A schema-less data model that features unambiguous identifiers

and named relations between pairs of resources.

Page 12: SWT Lecture Session 2 - RDF
Page 13: SWT Lecture Session 2 - RDF
Page 14: SWT Lecture Session 2 - RDF
Page 15: SWT Lecture Session 2 - RDF
Page 16: SWT Lecture Session 2 - RDF
Page 17: SWT Lecture Session 2 - RDF
Page 18: SWT Lecture Session 2 - RDF
Page 19: SWT Lecture Session 2 - RDF
Page 20: SWT Lecture Session 2 - RDF
Page 21: SWT Lecture Session 2 - RDF
Page 22: SWT Lecture Session 2 - RDF

+Unambiguous Names

How many things are named “Boston”? How about “Riverside”?

So, we use URIs. Instead of “Boston”: http://dbpedia.org/resource/Boston QName: db:Boston

And instead of “nickname” we use: http://example.org/terms/nickname QName: dbo:nickname

Page 23: SWT Lecture Session 2 - RDF
Page 24: SWT Lecture Session 2 - RDF

+

The graph data structure makes merging data with shared identifiers trivial (as we saw earlier)

Triples act as a least common denominator for expressing data

URIs for naming remove ambiguity …the same identifier means the same thing

Why RDF? What’s different here?

Page 25: SWT Lecture Session 2 - RDF

+

RDFIn-Detail

Page 26: SWT Lecture Session 2 - RDF

+

RDF graphs are sets of triples

Triples are made up of a subject, a predicate, and an object (spo)

Resources and relationships are named with URIs

RDF is…

A labeled, directed graph of relations between resources and

literal values.

subject objectpredicate

Page 27: SWT Lecture Session 2 - RDF

+

Resources are: IRI (denotes an object)

Subjects: Resource or blank-node

Predicates: Resource

Object: Resource, literal or blank-node

Triple

A triple is also called a “statement”

Page 28: SWT Lecture Session 2 - RDF

+Turtle syntax

Simple syntax for RDF

Triples are directly listed as such: S P O IRIs are in < angle brackets > End with full-stop “.” Whitespaces are ignored

Page 29: SWT Lecture Session 2 - RDF
Page 30: SWT Lecture Session 2 - RDF

+

<http://dbpedia.org/resource/Massachusets> <http://example.org/terms/captial> <http://dbpedia.org/resource/Boston> .

<http://dbpedia.org/resource/Massachusets> <http://example.org/terms/nickname> “The Bay State” .

<http://dbpedia.org/resource/Boston> <http://example.org/terms/inState> <http://dbpedia.org/resource/Massachusets> .

<http://dbpedia.org/resource/Boston> <http://example.org/terms/nickname> “Beantown” .

<http://dbpedia.org/resource/Boston> <http://example.org/terms/population> “642,109”^^xsd:integer .

In Turtle

Page 31: SWT Lecture Session 2 - RDF

+Shortcuts

Prefixes (simple string concatenation)

Grouping of triples with the same subject using semi-colon ‘;’

Grouping of triples with the same subject and predicate using comma ‘,’

Page 32: SWT Lecture Session 2 - RDF

@prefix db: <http://dbpedia.org/resource/>@prefix dbo: <http://example.org/terms/>

db:Massachusets dbo:captial db:Boston .db:Massachusets dbo:nickname “The Bay State” .db:Boston dbo:inState db:Massachusets .db:Boston dbo:nickname “Beantown” .db:Boston dbo:population “642,109”^^xsd:integer .

Page 33: SWT Lecture Session 2 - RDF

@prefix db: <http://dbpedia.org/resource/>@prefix dbo: <http://example.org/terms/>

db:Massachusets dbo:captial db:Boston ; dbo:nickname “The Bay

State” .db:Boston dbo:inState db:Massachusets ;

dbo:nickname “Beantown” ; dbo:population

“642,109”^^xsd:integer .

Page 34: SWT Lecture Session 2 - RDF

+

Represent data values

Encoded as strings (the value)

Interpreted by means of datatypes

Literals without a type are treated the same as string (but they are not equal to strings)

An literal without a type is called plain literal. A plain literal may have a language tag

Datatypes are not defined by RDF, we reuse XML datatypes.

RDF does not require implementation support for any datatype. However, system generally implement most of XSD datatypes.

Literals

Page 35: SWT Lecture Session 2 - RDF

+

Typed literal: “Mariano”^^xsd:string, “12-12-12”^^xsd:date

Plain literal and literals with language “France” “France”@fr “Frankreich”@de

Equalities under simple RDF interpretation (lexical form matters): “Mariano” != “Mariano”@es != “Mariano”^^xsd:string “001”^^xsd:integer != “1”^^xsd:integer

Equalities under typed interpretation (lexical form doesn’t matter): “123”^^xsd:integer == “0123”^^xsd:integer Type hierarchy: “123.0”^^xsd:decimal = “00123”^^xsd:integer

Literals (cont.)

May 12, 2009

35

Page 36: SWT Lecture Session 2 - RDF
Page 37: SWT Lecture Session 2 - RDF

+Type definition

Datatypes can be defined by the user, as with XML

New “derived simple types” are derived by restriction, as with XML. Complex types based on enumerations, unions and list are also possible. Example:

<xsd:schema ...> <xsd:simpleType name="humanAge"> <xsd:restriction base="integer"> <xsd:minInclusive value="0"> <xsd:maxExclusive value="150"> </xsd:restriction> </xsd:simpleType> ...</xsd:schema>

Page 38: SWT Lecture Session 2 - RDF

+Modeling with RDF

Lets revisit our motivational examples and do some modeling in RDF ourselves.

Given the following relational data, generate an RDF graph

Page 39: SWT Lecture Session 2 - RDF

+Exercise: Data set “A”: A simplified book store

39

<ID> Author

Title <Publisher>

Year

ISBN0-00-651409-X

id_xyz The Glass Palace id_qpr 2000

<ID> Name Home page

id_xyz Ghosh, Amitav http://www.amitavghosh.com

<ID> Publisher Name

am Amazon

bn Barnes & Nobel

Sellers

Authors

StoresGenerate the RDF graph. Keys marked with <>. Primary keys are underscored.Steps: 1) Generate the graph 2) Adjust identifier 3) Adjust name of relations and types

Page 40: SWT Lecture Session 2 - RDF

+Relational to Graph (not yet RDF)

Page 41: SWT Lecture Session 2 - RDF

+With proper uri’s and bnodes

Page 42: SWT Lecture Session 2 - RDF

+Complete with rdf:type

In the lab: generate a turtle file for this graph. Additionally, transform it into n3 and RDF/XML file using Sesame or Jena

Page 43: SWT Lecture Session 2 - RDF

+

ToolsSystems and Frameworks

Page 44: SWT Lecture Session 2 - RDF

+

Triple stores Built on relational database Native RDF store

Development libraries

Full-featured application servers

Types of RDF Tools

May 12, 2009

44

Most RDF tools contain some elements of each of these.

Page 45: SWT Lecture Session 2 - RDF

+

Community-maintained lists http://esw.w3.org/topic/SemanticWebTools

Emphasis on large triple stores http://esw.w3.org/topic/LargeTripleStores

Michael Bergman’s Sweet Tools searchable list: http://www.mkbergman.com/?page_id=325

Finding RDF Tools

May 12, 2009

45

Page 46: SWT Lecture Session 2 - RDF

+RDF Tools – (Some) Triple Stores

May 12, 2009

46

Tool Commercial orOpen-source Environment

Anzo Both Java

ARC Open-source PHP

AllegroGraph Commercial Java, Prolog

Jena Open-source Java

Mulgara Open-source Java

Oracle RDF Commercial SQL / SPARQL

RDF::Query Open-source Perl

Redland Open-source C, many wrappers

Sesame Open-source Java

Talis Platform Commercial HTTP (Hosted) Virtuoso Both C++

Page 47: SWT Lecture Session 2 - RDF

+

Jena

Page 48: SWT Lecture Session 2 - RDF

+Jena

Available at http://jena.apache.org/

Available under the apache license.

Developed by HP Labs (now community based development)

Most well known framework

Used to: Create and manipulate RDF graphs Query RDF graphs Read/Serialize RDF from/into different syntaxes Perform inference Build SPARQL endpoints

Tutorial: http://jena.apache.org/tutorials/rdf_api.html

Page 49: SWT Lecture Session 2 - RDF

+Basic operations

Creating a graph from Java URIs/Literals/Bnodes

Listing all “Statements”

Writing RDF (Turtle/N-Triple/XML)

Reading RDF

Prefixes

Querying (through the API)

Page 50: SWT Lecture Session 2 - RDF

+Creating a basic graph

// some definitionsstatic String personURI = "http://somewhere/JohnSmith";static String fullName = "John Smith";

// create an empty ModelModel model = ModelFactory.createDefaultModel();

// create the resourceResource johnSmith = model.createResource(personURI);

// add the property johnSmith.addProperty(VCARD.FN, fullName);

Page 51: SWT Lecture Session 2 - RDF

+Creating a basic graph

// some definitionsString personURI = "http://somewhere/JohnSmith";String givenName = "John";String familyName = "Smith";String fullName = givenName + " " + familyName;

// create an empty ModelModel model = ModelFactory.createDefaultModel();

// create the resource// and add the properties cascading styleResource johnSmith = model.createResource(personURI) .addProperty(VCARD.FN, fullName) .addProperty(VCARD.N, model.createResource() .addProperty(VCARD.Given, givenName) .addProperty(VCARD.Family, familyName));

Page 52: SWT Lecture Session 2 - RDF

+Result (internally)

Page 53: SWT Lecture Session 2 - RDF

+Listing the statements of a model// list the statements in the Model

StmtIterator iter = model.listStatements();

// print out the predicate, subject and object of each statementwhile (iter.hasNext()) { Statement stmt = iter.nextStatement(); // get next statement Resource subject = stmt.getSubject(); // get the subject Property predicate = stmt.getPredicate(); // get the predicate RDFNode object = stmt.getObject(); // get the object

System.out.print(subject.toString()); System.out.print(" " + predicate.toString() + " "); if (object instanceof Resource) { System.out.print(object.toString()); } else { // object is a literal System.out.print(" \"" + object.toString() + "\""); } System.out.println(" .");}

Page 54: SWT Lecture Session 2 - RDF

+Output

http://somewhere/JohnSmith http://www.w3.org/2001/vcard-rdf/3.0#N anon:14df86:ecc3dee17b:-7fff .anon:14df86:ecc3dee17b:-7fff http://www.w3.org/2001/vcard-rdf/3.0#Family "Smith" .anon:14df86:ecc3dee17b:-7fff http://www.w3.org/2001/vcard-rdf/3.0#Given "John" .http://somewhere/JohnSmith http://www.w3.org/2001/vcard-rdf/3.0#FN "John Smith" .

Page 55: SWT Lecture Session 2 - RDF

+Writing RDF

Use the model.write(OutputStream s) method Any output stream is valid

By default it will write in RDF/XML format

Change format by specifying the format with: model.write(OutputStream s, String format) Possible format strings:

RDF/XML-ABBREV N-TRIPLE RDF/XML TURTLE TTL N3

Page 56: SWT Lecture Session 2 - RDF

+Writing RDF

<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:vcard='http://www.w3.org/2001/vcard-rdf/3.0#' > <rdf:Description rdf:about='http://somewhere/JohnSmith'> <vcard:FN>John Smith</vcard:FN> <vcard:N rdf:nodeID="A0"/> </rdf:Description> <rdf:Description rdf:nodeID="A0"> <vcard:Given>John</vcard:Given> <vcard:Family>Smith</vcard:Family> </rdf:Description></rdf:RDF>

// now write the model in XML form to a filemodel.write(System.out);

Page 57: SWT Lecture Session 2 - RDF

+Reading RDF

Use model.read(InputStream, String syntax)

// create an empty model Model model = ModelFactory.createDefaultModel();

// use the FileManager to find the input file InputStream in = FileManager.get().open( inputFileName );if (in == null) { throw new IllegalArgumentException( "File: " + inputFileName + " not found");}

// read the RDF/XML filemodel.read(in, null);

// write it to standard outmodel.write(System.out);

Page 58: SWT Lecture Session 2 - RDF

+Prefixes

Prefixes are used in Turtle/RDF and other syntaxes

Define prefixes prior to writing to obtain a “short” rendering

Page 59: SWT Lecture Session 2 - RDF

+Example

Model m = ModelFactory.createDefaultModel(); String nsA = "http://somewhere/else#"; String nsB = "http://nowhere/else#"; Resource root = m.createResource( nsA + "root" ); Property P = m.createProperty( nsA + "P" ); Property Q = m.createProperty( nsB + "Q" ); Resource x = m.createResource( nsA + "x" ); Resource y = m.createResource( nsA + "y" ); Resource z = m.createResource( nsA + "z" ); m.add( root, P, x ).add( root, P, y ).add( y, Q, z ); System.out.println( "# -- no special prefixes defined" ); m.write( System.out ); System.out.println( "# -- nsA defined" ); m.setNsPrefix( "nsA", nsA ); m.write( System.out ); System.out.println( "# -- nsA and cat defined" ); m.setNsPrefix( "cat", nsB ); m.write( System.out );

Page 60: SWT Lecture Session 2 - RDF

+Navigating the model

The API allows to query the model to get specific statements

Use model.getResource(…)

With a resource, use .getProperty to retrieve objects resource.getProperty(…).getObject(…)

You can further add statement to the model through the resource

Page 61: SWT Lecture Session 2 - RDF

// retrieve the John Smith vcard resource from the modelResource vcard = model.getResource(johnSmithURI);

// retrieve the value of the N propertyResource name = (Resource) vcard.getProperty(VCARD.N) .getObject();

// retrieve the value of the FN propertyResource name = vcard.getProperty(VCARD.N) .getResource();

// retrieve the given name propertyString fullName = vcard.getProperty(VCARD.FN) .getString();

Page 62: SWT Lecture Session 2 - RDF

// add two nickname properties to vcardvcard.addProperty(VCARD.NICKNAME, "Smithy") .addProperty(VCARD.NICKNAME, "Adman");

// set up the outputSystem.out.println("The nicknames of \"" + fullName + "\" are:");// list the nicknamesStmtIterator iter = vcard.listProperties(VCARD.NICKNAME);while (iter.hasNext()) { System.out.println(" " + iter.nextStatement() .getObject() .toString());} The nicknames of "John Smith" are:

Smithy Adman

Page 63: SWT Lecture Session 2 - RDF

+Last notes

Key API objects: DataSet, Model, Statement, Resource and Literal

The default model implementation is in-memory

Other implementations exists that use different storage methods Native Jena TDB. Persistent, in disk, storage of models using

Jena’s own data structures and indexing techniques. SDB. Persistent storage through a relational database.

We’ll see more features as we advance in the course

Third parties offer their own triple stores through Jena’s API (OWLIM, Virtuoso, etc.)

Page 64: SWT Lecture Session 2 - RDF

+

Advanced RDF featuresn-ary relations, reification, containers

Page 65: SWT Lecture Session 2 - RDF

+Data set “A”: A simplified book store

65

<ID> Author Title <Publisher>

Year

ISBN0-00-651409-X

id_xyz The Glass Palace

id_qpr 2000

<ID> Name Home page

id_xyz Ghosh, Amitav http://www.amitavghosh.com

<ID> Publisher Name

am Amazon

bn Barnes & Nobel

Sellers

Authors

Stores

<Book> <Store>

Price

ISBN0-00-651409-X am 22.50

ISBN0-00-651409-X bn 21.00

Sold-By

Page 66: SWT Lecture Session 2 - RDF

+N-ary relations

Not all relations are binary

All n-ary relations can be “encoded” as a set of binary relations using auxiliary nodes.

This process is called “reification” in conceptual modeling (do not confuse with reification in RDFS, to come later).

Page 67: SWT Lecture Session 2 - RDF

+Data set “A”: A simplified book store

67

ID Author Title Publisher Year

ISBN0-00-651409-X

id_xyz The Glass Palace id_qpr 2000

ID Name Home page

id_xyz Ghosh, Amitav http://www.amitavghosh.com

ID Publisher Name

am Amazon

bn Barnes & Nobel

Sellers

Authors

Stores

Book Store Price

ISBN0-00-651409-X am 22.50

ISBN0-00-651409-X bn 21.00

Sold-By

Page 68: SWT Lecture Session 2 - RDF

+Blank Nodes

Nodes without a IRI Unnamed resources Complex nodes (later)

Representation of blank nodes is syntax-dependent In Turtle we use underscore followed by colon, then an ID _:b0 _:nodeX

The scope of the ID of a blank node is only the document where it belong. That is, two different RDF file, that contain the blank node _:n0 DO NOT REFER TO THE SAME NODE

Page 69: SWT Lecture Session 2 - RDF

+Blank Nodes

Page 70: SWT Lecture Session 2 - RDF

+RDF Reification

How would you state in RDF: “The detective supposes that the butler killed the

gardener”

Page 71: SWT Lecture Session 2 - RDF

+RDF Reification

How would you state in RDF: “The detective supposes that the butler killed the

gardener”

Page 72: SWT Lecture Session 2 - RDF

+RDF Reification

Reification allows to state statements about statements

Use special vocabulary: rdf:subject rdf:predicate rdf:object rdf:Statement

Page 73: SWT Lecture Session 2 - RDF

+RDF Reification

Reification allows to state statements about statements

Use special vocabulary: rdf:subject rdf:predicate rdf:object rdf:Statement

Warning: The triple

<Buttler> <Killed> <Gardener>

Is NOT in the graph.

Page 74: SWT Lecture Session 2 - RDF

+A reification puzzle

Know the story?

Page 75: SWT Lecture Session 2 - RDF

+

Express the following natural language sentences as a graph: Maria saw Eric eating ice cream The professor explained that the scientific community

regards evolution theory as the truth

Exercise

Page 76: SWT Lecture Session 2 - RDF

+Containers

Groups of resources rdf:Bag. Group, possibly with duplicates, no order. rdf:Seq. Group, possibly with duplicates, order matters. rdf:Alt. Group, indicates alternatives

Use rdf:type to indicate one type of container.

Use container membership properties to enumerate: rdf:_1, rdf:_2, rdf:_3, …, rdf:_n

Page 77: SWT Lecture Session 2 - RDF

+Example

Page 78: SWT Lecture Session 2 - RDF

+Collections (closed containers)

Containers are open. No way to “close them”. Imposible to say “no other member exists”. Consider merging datasets.

Group of things represented as a linked list structure

The list is defined using the RDF vocabulary: rdf:List, rdf:first, rdf:rest and rdf:nil

Each member of the list is of type rdf:List (implicitly)

Page 79: SWT Lecture Session 2 - RDF

+Example

Page 80: SWT Lecture Session 2 - RDF

+

SyntaxTurtle and N-Triple

Page 81: SWT Lecture Session 2 - RDF

+Turtle

We already covered ALMOST the nuts and bolts

Missing: Blank nodes, Containers

Page 82: SWT Lecture Session 2 - RDF

+Blank nodes

Use square brackets to define a blank node

ex:book1 ex:title "RDF/XML Syntax Specification (Revised)"; ex:editor [ ex:fullName "Dave Beckett"; ex:homePage <http://purl.org/net/dajobe/> ].

Page 83: SWT Lecture Session 2 - RDF

+Containers

Use ( )

ex:course ex:students (

ex:Johnex:Peterex:Mary

) .

Page 84: SWT Lecture Session 2 - RDF

+Turtle

Advantages and uses: Easy to read and write manually or programmatically Good performance for IO, supported by many tools Turtle is not a W3C recommendation YET

Page 85: SWT Lecture Session 2 - RDF

+N-Triples

Turtle minus: No prefix definitions are allowed No reference shortcuts (semi-colon, comma) Every other shortcut

Very simple to parse/generate (even through scripts)

Supported by most tools

VERY verbose. Wastes space/IO (problem is reduced with compression)

Page 86: SWT Lecture Session 2 - RDF

+RDF/XML

W3C Standard since 1999, revised in 2004

Used to be the only standard

Standard XML (works with any XML tools)

Different semantics than XML!