Post on 03-Oct-2021
First things first…
• Assignment of slots for final presentations • Q&A – I expect you to resend me corrected assignments, taking my feedback
into account • e.g. for Assignment 1: make sure that your FOAF file validates in an RDF
validator for Assignment 2: send me only parseable Turtle for Assignment 3: send me only running SPARQL queries, which you have tested.
don’t forget Assignment 4 (just published)
• Grades: • No exam necessary. • But no “Sehr Gut” unless you have been excellent in the assignments and in
your presentation. • I will send you some suggested grade after the presentation. • You can improve in an oral exam, if you want – by appointment.
Page 1
2012, Axel Polleres. All rights reserved.
Unit 7: Querying and Exchanging Data on the Web
Overview
• Linked Data – The idea • Why is it interesting for companies? • Which challenges are lying ahead? • XSPARQL: An approach to query and combine several Web Data Formats at
once.
Axel Polleres Page 3
Linked Data – The idea
1. Everything gets a URI (conferences, people, talks, …) 2. These URIs are linked via RDF describing relations 3. Relations are URIs again (e.g. :name) 4. When I dereference the URIs, I should find more information about them
4
4
Let Tim Berners-Lee explain it:
http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
(around 5:40)
http://www.ted.com/talks/tim_berners_lee_the_year_open_data_went_worldwide.html
Axel Polleres Page 5
Linked Data – The idea
2012, Axel Polleres. All rights reserved.
Why is this all interesting for companies?
© Siemens AG 2012. All rights reserved
Why is this interesting for companies?
Linked Data and Open Data (apart from Linked Open Data) are both emerging paradigms:
Linked Data apart from the “LOD cloud”: Enterprise Linked Data (for Knowledge Management within the Enterprise Online companies (eCommerce, Search) start to leverage and support Linked Data
2008-04-01 Author Page 7
Why is this interesting for companies?
Linked Data and Open Data (apart from Linked Open Data) are both emerging paradigms:
Open Data: Open Data is a trend towards transparency for Governments More Publically available Data leverages new Business Models (not only for SMEs!) Many Governments realize that Opening Data brings more revenue than selling it (EU) regulations force Cities and Governments to publish Data Trend towards harmonization (nationally, at European level, etc.)
2008-04-01 Author Page 8
Siemens Corporate Technology (CT) Networking the integrated technology company
Customers
Corporate Technology (CT)
Reg
ions
Sectors / Divisions
Energy Healthcare Industry Infrastructure & Cities
Chief Technology Officer (CTO)
Review innovation strategies
Drive technology based synergies
Secure innovation power
Technology assessments
Governance and guidance
Corporate Intellectual Property and Functions (CT IP) Intellectual property Standardization and regulation Information research
Corporate Research and Technologies (CT T) GTFs with multiple impact Pictures of the Future Accelerators
Chief Technology Office (CT O) Direct support
of CTO
Corporate Development Center (CT DC) Software development
partner for the Sectors
2012, Axel Polleres. All rights reserved.
Challenges ahead...
© Siemens AG 2012. All rights reserved
This Photo was taken by Böhringer Friedrich.
Challenges/Problems
The Linked Data Web is “brittle”…
Just like the normal Web is (did you ever try to run an HTML validator on google.com)?
Page 11
How good/bad is published Linked Data?
Page 12
ISWC2010
Journal of Web Semantics (forthcoming)
“Almost all infrastructural connectivity on the WoD is mediated by 3 servers, xmlns.com, dbpedia.org and purl.org, making the system very brittle.”
“conformance of data providers varies significantly for the different Linked Data guide- lines highlighted, which in turn may have implications for ad hoc consumers operating over the Web of Data.”
How much OWL is on the Web of Data? What’s missing for using Linked Data?
LDOW workshop @ WWW2012
DESWEB workshop @ICDE2012
Page 13
“Single-triple expressible OWL RL axioms are most prominent on the Web.”
“indexes for Linked Data in the Web are often incomplete and outdated.”
Needs rethinking in terms of applying traditional Database techniques.
Linked Data, RDFS and OWL: Linked Vocabularies
…
… Image from http://blog.dbtune.org/public/.081005_lod_constellation_m.jpg:; Giasson, Bergman
So what OWL is used out there?
Looked at Billion Triple Challenge 2011 Dataset 2.1 billion quadruples, crawled from… 7.4 million RDF/XML documents, covering… 791 (pay-level) domains
Count OWL features used in the dataset: Per use Per document Per domain Can be skewed by data
Ranked OWL features using PageRank: Rank documents based on dereferenceable links For each OWL feature, sum the rank of documents using it Intuition: Approximates probability of encountering an OWL feature
during a random walk of the data
Results of ranking (see paper for all details)
1 rdf:Property 5.74E-1 2 rdfs:range 4.67E-1 3 rdfs:domain 4.62E-1 4 rdfs:subClassOf 4.60E-1 5 rdfs:Class 4.45E-1 6 rdfs:subPropertyOf 2.35E-1 7 owl:Class 1.74E-1 8 owl:ObjectProperty 1.68E-1 9 rdfs:Datatype 1.68E-1 10 owl:DatatypeProperty 1.65E-1 11 owl:AnnotationProperty 1.60E-1 12 owl:FunctionalProperty 9.18E-2 13 owl:equivalentProperty 8.54E-2 14 owl:inverseOf 7.91E-2 15 owl:disjointWith 7.65E-2
Results of ranking (see paper for all details)
… 16 owl:sameAs 7.29E-2 17 owl:equivalentClass 5.24E-2 18 owl:InverseFunctionalProperty 4.79E-2 19 owl:unionOf 3.15E-2 20 owl:SymmetricProperty 3.13E-2 21 owl:TransitiveProperty 2.98E-2 22 owl:someValuesFrom 2.13E-2 23 rdf:_* 1.42E-2 24 owl:allValuesFrom 2.98E-3 25 owl:minCardinality 2.43E-3 26 owl:maxCardinality 2.14E-3 27 owl:cardinality 1.75E-3 28 owl:oneOf 4.13E-4 29 owl:hasValue 3.91E-4 30 owl:intersectionOf 3.37E-4 31 owl:NamedIndividual 3.37E-4
Observations?
RDFS features amongst the most prominently used OWL 2 features not yet used prominently
RDF | RDFS | OWL | OWL 2 x-axis is log-scale!
Observations?
(OWL) Features expressed with a single RDF triple are most prominent Roughly speaking, features not requiring blank nodes
e.g., sub-class/-property, inverse-of, equivalent property/class, sameas, domain/range, disjoint with, etc.
Not those requiring lists or n-ary predicate in RDF mapping e.g., union, intersection, cardinalities, all-disjoint, some/all/has-value restrictions, hasKey, pCAs, etc.
Single Triple (No BNodes) | Multi-Triple (Needs BNodes) x-axis is log-scale!
What Reasoning is needed?
Bottomline: A subset of OWL 2 RL (which is efficiently implementable, i.e. without ABox-joins) is sufficient to cover reasoning on most Linked Data sources!
Details, cf.
Page 20
However…
Not all Web Data is RDF (and OWL):
In fact, most Web Data is still in other formats: XML, CSV, JSON…
We need approaches to deal with these formats!
2012-04-17 Axel Polleres Page 21
XML
XML & RDF: one Web – two formats
<XML/> SOAP/WSDL
RSS HTML
SPARQL
XSLT/XQuery
XSPA
RQ
L
Page 22
A Sample Scenario…
Example: Favourite artists location
Using RDF allows to combine Last.fm info with other information on the web, e.g. location.
Last.fm knows what music you listen to, your most played artists, etc.
Display information about your favourite artists on a map
Show your top bands hometown in Google Maps.
Page 24
1) Get your favourite bands
Example: Favourite artists location How to implement the visualisation?
Last.fm shows your most listened bands
2) Get the hometown of the bands 3) Create a KML file to be displayed in Google Maps
Last.fm API:
http://www.last.fm/api
Last.fm is not so useful in this step
Page 25
1) Get your favourite bands
Example: Favourite artists location How to implement the visualisation?
2) Get the hometown of the bands 3) Create a KML file to be displayed in Google Maps
SPARQL XML Res
SPARQL
?
XQuery
XQuery
XQue
ry
Page 26
Transformation and Query Languages
XSLT
XML Transformation Language Syntax: XML
XPath
XPath is the common core Mostly used to select nodes
from an XML doc
SPARQL
Query Language for RDF Pattern based declarative
RDF world XML world
XQuery
XML Query Language non-XML syntax
Page 27
Querying XML Data from Last.fm with XQuery 1/2
Page 28
<lfm status="ok"> <topartists type="overall"> <artist rank="1"> <name>Therion</name> <playcount>4459</playcount> <url>http://www.last.fm/music/Therion</url> </artist> <artist rank="2"> <name>Nightwish</name> <playcount>3627</playcount> <url>http://www.last.fm/music/Nightwish</url> </artist> </topartists></lfm>
Last.fm API format: • root element: “lfm”, then “topartists” • sequence of “artist”
XPath steps: /lfm Selects the “lfm” root element
//artist Selects all the “artist” elements
XPath Predicates: //artist[@rank = 1]Selects the “artist” with rank 1
Querying this document with XPath:
Querying XML Data from Last.fm with XQuery 2/2
let $doc := "http://ws.audioscrobbler.com/2.0/user.gettopartist"for $artist in doc($doc)//artistwhere $artist[@rank = 2] return <artistData>{$artist}</artistData>
Query: Retrieve information regarding a users' 2nd top artists from the
Last.fm API
assign values to variables
iterate over sequences
filter expressions
create XML elements
Page 29
Querying XML Data from Last.fm 2/2
let $doc := "http://ws.audioscrobbler.com/2.0/user.gettopartist"for $artist in doc($doc)//artistwhere $artist[@rank = 2] return <artistData>{$artist}</artistData>
Query: Retrieve information regarding a users' 2nd top artists from the
Last.fm API
Result for user “jacktrades”
Page 30
Now what about RDF Data?
Lots of RDF Data out there, ready to “query the Web”
Page 31
XML vs. RDF
XML: “treelike” semi-structured Data (mostly schema-less, but “implicit” schema by tree structure… not easy to combine, e.g. how to combine lastfm data with wikipedia data?
2012-04-17 Axel Polleres Page 32
accountNam
e
likes
“Jacktrades”
RDF Simple, declarative, graph-style format based on dereferenceable URIs (= Linked Data)
Page 33
Kitee label
” Kitee”@en
Finland
Nightwish
origin
“Nightwish” label
33
<http://dbpedia.org/resource/Nightwish> <http://dbpedia.org/property/origin>
<http://dbpedia.org/resource/Kitee> .
<http://dbpedia.org/resource/Nightwish> <http://www.w3.org/2000/01/rdf-schema#label>
”Kitee”@es .
_:x <http://xmlns.com/foaf/0.1/accountName> “Jacktrades" .
_:x <http://graph.facebook.com/likes> <http://dbpedia.org/resource/Nightwish> .
XSPARQL
Transformation language
Consume and generate XML and RDF Syntactic extension of XQuery, ie.
XSPARQL = XQuery + SPARQL
34
Idea: One approach to conveniently query XML and RDF side-by-side: XSPARQL
Data Input (XML or RDF)
XSPARQL: Syntax overview (I)
Prefix declarations
Data Output (XML or RDF)
Page 35
“SPARQL-FOR-Clause” represents a SPARQL query
36
XSPARQL Syntax overview (II)
36
XQuery or SPARQL prefix declarations Any XQuery query
construct allows to create RDF
Page 36
Use case
37 Page 37
XSPARQL: Convert XML to RDF
prefix lastfm: <http://xsparql.deri.org/lastfm#>
let $doc := "http://ws.audioscrobbler.com/2.0/?method=user.gettopartists"for $artist in doc($doc)//artistwhere $artist[@rank < 6]construct { [] lastfm:topArtist {$artist//name}; lastfm:artistRank {$artist//@rank} . }
38
@prefix lastfm: <http://xsparql.deri.org/lastfm#> .
[ lastfm:topArtist "Therion" ; lastfm:artistRank "1" ] .[ lastfm:topArtist "Nightwish" ; lastfm:artistRank "2" ] .[ lastfm:topArtist "Blind Guardian" ; lastfm:artistRank "3" ] .[ lastfm:topArtist "Rhapsody of Fire" ; lastfm:artistRank "4" ] .[ lastfm:topArtist "Iced Earth" ; lastfm:artistRank "5" ] .
Result:
Query: Convert Last.fm top artists of a user into RDF
XSPARQL construct generates valid Turtle RDF
Page 38
Use case
Page 39
XSPARQL: Integrate RDF sources
prefix dbprop: <http://dbpedia.org/property/> prefix foaf: <http://xmlns.com/foaf/0.1/>
construct { $artist foaf:based_near $origin }from <http://dbpedia.org/resource/Nightwish>where { $artist dbprop:origin $origin }
Issue: determining the artist identifiers
Query: Retrieve the origin of an artist from DBPedia: Same as the SPARQL query
DBPedia does not have the map coordinates
Page 40
XSPARQL: Integrate RDF sources
Query: Retrieve the origin of an artist from DBPedia including map coordinates
DBPedia does not have the map coordinates
prefix wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>prefix dbprop: <http://dbpedia.org/property/>
for * from <http://dbpedia.org/resource/Nightwish>where { $artist dbprop:origin $origin }return let $hometown := fn:concat("http://api.geonames.org/search?type=rdf&q=",fn:encode-for-uri($origin))for * from $hometownwhere { [] wgs84_pos:lat $lat; wgs84_pos:long $long }limit 1construct { $artist wgs84_pos:lat $lat; wgs84_pos:long $long }
Page 41
Use case
Page 42
Output: KML XML format
43
<kml xmlns="http://www.opengis.net/kml/2.2"> <Document> <Placemark> <name>Hometown of Nightwish</name> <Point> <coordinates> 30.15,62.1,0 </coordinates> </Point> </Placemark> </Document></kml>
KML format: • root element: “kml”, then “Document” • sequence of “Placemark” • Each “Placemark” contains:
• “Name” element • “Point” element with the “coordinates”
Axel Polleres Page 43
prefix dbprop: <http://dbpedia.org/property/>
<kml><Document>{let $doc := "http://ws.audioscrobbler.com/2.0/?method=user.gettopartists”for $artist in doc($doc)//artistreturn let $artistName := fn:data($artist//name) let $uri := fn:concat("http://dbpedia.org/resource/", $artistName) for $origin from $uri where { [] dbprop:origin $origin } return let $hometown := fn:concat("http://api.geonames.org/search?type=rdf&q=", fn:encode-for-uri($origin)) for * from $hometown where { [] wgs84_pos:lat $lat; wgs84_pos:long $long } limit 1 return <Placemark> <name>{fn:concat("Hometown of ", $artistName)}</name> <Point><coordinates>{fn:concat($long, ",", $lat, ",0")} </coordinates></Point> </Placemark>}</Document></kml>
XSPARQL: Putting it all together Query: Display top artists origin in a map
Page 44
Axel Polleres Page 45
XSPARQL: Demo
45
http://xsparql.deri.org/lastfm
XSPARQL: another example…
46
Federated Queries in SPARQL1.1
PREFIX dbpedia2: <http://dbpedia.org/property/> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?N ?MyB FROM <http://polleres.net/foaf.rdf> { [ foaf:birthday ?MyB ].
SERVICE <http://dbpedia.org/sparql> { SELECT ?N WHERE { [ dbpedia2:born ?B; foaf:name ?N ]. FILTER ( Regex(Str(?B),str(?MyB)) ) } } }
Find which persons in DBPedia have the same birthday as Axel (foaf-file):
SPARQL 1.1 has new feature SERVICE to query remote endpoints PREFIX dbpedia2: <http://dbpedia.org/property/> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?N ?MyB FROM <http://polleres.net/foaf.rdf> { [ foaf:birthday ?MyB ].
SERVICE <http://dbpedia.org/sparql> { SELECT ?N WHERE { [ dbpedia2:born ?B; foaf:name ?N ]. FILTER ( Regex(str(?B),str(?MyB)) ) } } }
Doesn’t work!!! ?MyB unbound in SERVICE query
Page 47
Federated Queries in SPARQL1.1
48
PREFIX dbpedia2: <http://dbpedia.org/property/> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?N ?MyB FROM <http://polleres.net/foaf.rdf> { [ foaf:birthday ?MyB ].
SERVICE <http://dbpedia.org/sparql> { SELECT ?N WHERE { [ dbpedia2:born ?B; foaf:name ?N ]. FILTER ( Regex(Str(?B),str(?MyB)) ) } } }
Find which persons in DBPedia have the same birthday as Axel (foaf-file):
SPARQL 1.1 has new feature SERVICE to query remote endpoints PREFIX dbpedia2: <http://dbpedia.org/property/> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?N ?MyB FROM <http://polleres.net/foaf.rdf> { [ foaf:birthday ?MyB ].
SERVICE <http://dbpedia.org/sparql> { SELECT ?N WHERE { [ dbpedia2:born ?B; foaf:name ?N ]. } }
FILTER ( Regex(Str(?B),str(?MyB)) ) }
Doesn’t work either in practice as SERVICE endpoints often only returns limited results…
Axel Polleres Page 48
Federated Queries
49
prefix dbprop: <http://dbpedia.org/property/> prefix foaf: <http://xmlns.com/foaf/0.1/> prefix : <http://xsparql.deri.org/bday#>
let $MyB := for * from <http://polleres.net/foaf.rdf> where { [ foaf:birthday $B ]. } return $B
for * from <http://dbpedia.org/> endpoint <http://dbpedia.org/sparql> where { [ dbprop:born $B; foaf:name $N ]. filter ( regex(str($B),str($MyB)) ) } construct { :axel :sameBirthDayAs $N }
Specifies the endpoint to perform the query, similar to SERVICE in SPARQL1.1
Find which persons in DBPedia have the same birthday as Axel (foaf-file):
In XSPARQL:
Works! In XSPARQL bound values (?MyDB) are injected into the SPARQL subquery More direct control over “query execution plan”
Axel Polleres
Test Queries and play around…
2012-04-17 Axel Polleres Page 50
http://xsparql.deri.org/demo
Details about XSPARQL1.1 semantics and implementation
Check our Technical Report (just accepted at Springer’s Journal of Data Semantics):
Stefan Bischof, Stefan Decker, Thomas Krennwallner, Nuno Lopes, Axel Polleres. Mapping between RDF and XML with XSPARQL. Technical Report 2011. http://www.deri.ie/fileadmin/documents/DERI-TR-2011-04-04.pdf
BTW: First author started in this lecture two years ago! If you are interested in Internships, Diploma theses, PhD theses let me know!)
Page 51