Post on 20-Aug-2015
Releasing Relational Data to the Semantic Web
Alex Milleramiller@revelytix.com
1
On the web, we identify things with a URI.
5
A URI is about "identifying" things, not "locating" things (a URL).
6
dbp:Chicago
dbp:The_Blues_Brothers_(film)
dbp: http://dbpedia.org/resource/
dbp:Wrigley_Field
dbp:Chicago_Cubs dbp:Barack_Obama
dbp:Pizza
dbp:Chicago_(band)
8
dbp:Chicago
dbp:The_Blues_Brothers_(film)dbp:Wrigley_Field
dbp:Chicago_Cubsdbp:Barack_Obama
dbp:Pizza
dbp:Chicago_(band)
movie:
film_lo
catio
ndbpo:location
dbp: http://dbpedia.org/resource/dbpo: http://dbpedia.org/ontology/
dbpo:owner
dbpo:residence
10
dbp:Chicago
dbp:The_Blues_Brothers_(film)dbp:Wrigley_Field
dbp:Chicago_Cubsdbp:Barack_Obama
dbp:Pizza
dbp:Chicago_(band)
movie:
film_lo
catio
ndbpo:location
dbp: http://dbpedia.org/resource/dbpo: http://dbpedia.org/ontology/
dbpo:owner
dbpo:residence
Subject
Predicate
Object
11
dbp:Wrigley_Field dbpo:location dbp:Chicago
<subject> <predicate> <object>
resource resource resourceor
value
12
dbp:Chicago
dbp:The_Blues_Brothers_(film)dbp:Wrigley_Field
dbp:Chicago_Cubsdbp:Barack_Obama
dbp:Pizza
dbp:Chicago_(band)
movie:
film_lo
catio
ndbpo:location
dbp: http://dbpedia.org/resource/dbpo: http://dbpedia.org/ontology/
dbpo:owner
dbpo:residence
18
dbp:Chicago dbp:Saint_Louis
dbp: http://dbpedia.org/resource/ex: http://example.org/ontology/rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema#
is a
ex:City
dbp:San_Francisco
is a is a
19
dbp:Chicago dbp:Saint_Louis
dbp: http://dbpedia.org/resource/ex: http://example.org/ontology/rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema#
rdf:type
ex:City
dbp:San_Francisco
rdf:type rdf:type
20
dbp:Chicago dbp:Saint_Louis
dbp: http://dbpedia.org/resource/ex: http://example.org/ontology/rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema#
rdf:type
ex:City
dbp:San_Francisco
rdf:type rdf:type
rdfs:Class
rdf:type
21
ex:City
ex:Location
rdfs:subClassOf
dbp: http://dbpedia.org/resource/ex: http://example.org/ontology/rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema#
rdfs:Classrdf:type
rdfs:Classrdf:type
23
dbp: http://dbpedia.org/resource/ex: http://example.org/ontology/rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema#
ex:City
dbp:United_States
dbp:Chicago
rdf:type
ex:founded1837
ex:country
24
dbp: http://dbpedia.org/resource/ex: http://example.org/ontology/rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema#
ex:City
dbp:Chicago
rdf:type
ex:founded1837
rdf:Property
rdf:type
rdfs:domainrdfs:range
xsd:gYear
27
dbp:Chicagodbp:Wrigley_Fielddbp:Chicago_Cubs
dbpo:location
dbp: http://dbpedia.org/resource/dbpo: http://dbpedia.org/ontology/
dbpo:owner
rdf:type
ex:City
rdf:type
ex:Stadiumex:Baseball_Team
rdf:type
29
dbpo:locationdbpo:owner
ex:City
rdf:type
ex:Stadium
rdf:type
?owner ?stadium ?city
?owner dbpo:owner ?stadium . ?stadium dbpo:location ?city .
?stadium rdf:type ex:Stadium . ?city rdf:type ex:City .
30
dbpo:locationdbpo:owner
ex:City
rdf:type
ex:Stadium
rdf:type
?owner ?stadium ?city
?owner dbpo:owner ?stadium . ?stadium dbpo:location ?city .
?stadium rdf:type ex:Stadium . ?city rdf:type ex:City .
SELECT ?owner ?stadium ?cityWHERE { ?owner dbpo:owner ?stadium . ?stadium dbpo:location ?city . ?stadium rdf:type ex:Stadium . ?city rdf:type ex:City .}
31
Unions JoinsOuter joinsFilter with criteriaProject expressionsSortDuplicate removalSlice (limit / offset)Aggregates (grouping, etc)Subqueries
22
SPARQL
Music Database
34
MID First Last Inst_ID
1 Eddie Van Halen 10
2 Yo Yo Ma 20
3 Kenny G 30
Musicians:
IID Instrument Type
10 Guitar String
20 Cello String
30 Saxophone Woodwind
Instruments:
Musician Schema
35
music:Instrument
rdfs:domain
music:Musician
rdf:type
rdfs:Class rdf:Property
music:firstName
music:lastName
music:plays
music:instName
music:instType
rdf:type
rdfs:domain
rdfs:domain
rdfs:range
rdfs:domainrdfs:domain
Triples From Tables
36
MID First Last Inst_ID
1 Eddie Van Halen 10
2 Yo Yo Ma 20
3 Kenny G 30
Musicians:
artist:1 rdf:type music:Musicianartist:2 rdf:type music:Musicianartist:3 rdf:type music:Musician
Turn each key into a resource and specify the proper type of each resource:
IID Instrument Type
10 Guitar String
20 Cello String
30 Saxophone Woodwind
Instruments:
instrument:10 rdf:type music:Instrumentinstrument:20 rdf:type music:Instrumentinstrument:30 rdf:type music:Instrument
Triples From Tables
37
MID First Last Inst_ID
1 Eddie Van Halen 10
2 Yo Yo Ma 20
3 Kenny G 30
Musicians:
artist:1 music:firstName "Eddie"artist:1 music:lastName "Van Halen"artist:2 music:firstName "Yo Yo"artist:2 music:lastName "Ma"artist:3 music:firstName "Kenny"artist:3 music:lastName "G"
Turn each cell into a triple based on the key, property (mapped per column), and value:
IID Instrument Type
10 Guitar String
20 Cello String
30 Saxophone Woodwind
Instruments:
instrument:10 music:instName "Guitar"instrument:10 music:instType "String"instrument:20 music:instName "Cello"instrument:20 music:instType "String"instrument:30 music:instName "Saxophone"instrument:30 music:instType "Woodwind"
Triples From Tables
38
MID First Last Inst_ID
1 Eddie Van Halen 10
2 Yo Yo Ma 20
3 Kenny G 30
Musicians:
artist:1 music:plays instrument:10artist:1 music:plays instrument:20artist:2 music:plays instrument:30
Turn each foreign key reference into a relationship between the foreign and primary resources.
IID Instrument Type
10 Guitar String
20 Cello String
30 Saxophone Woodwind
Instruments:
R2RML
39
• "Relational to RDF Mapping Language"
• RDB2RDF Working Group at W3C
• ETL "data transformation" use case
• Dynamic "query translation" use case
• SPARQL to SQL
R2RML Triple Mapping
40
IID Instrument Type
10 Guitar String
music:Instrumentmusic:instName
music:instType
rdfs:domain
rdfs:domain
Instruments:
R2RML Triple Mapping
40
IID Instrument Type
10 Guitar String
music:Instrumentmusic:instName
music:instType
rdfs:domain
rdfs:domain
Instruments:
Triples Map rr:tableName
R2RML Triple Mapping
40
IID Instrument Type
10 Guitar String
music:Instrumentmusic:instName
music:instType
rdfs:domain
rdfs:domain
Instruments:
Triples Map
Subject Map"http://example.com/music/
Inst-{iid}"
rr:class
rr:tableName
R2RML Triple Mapping
40
IID Instrument Type
10 Guitar String
music:Instrumentmusic:instName
music:instType
rdfs:domain
rdfs:domain
Instruments:
Triples Map
Subject Map"http://example.com/music/
Inst-{iid}"
Predicate Object Map
Predicate Map
Object Map
rr:class
rr:tableName
rr:predicate
rr:column
@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix music: <http://example.com/music/> .@prefix mapping: <http://example.com/ont/> .
mapping:InstrumentMapping a rr:TriplesMapClass; rr:tableName "Instruments"; rr:subjectMap [ rr:template "http://example.com/music/Inst-{iid}"; rr:class music:Instrument ]; rr:predicateObjectMap [ rr:predicateMap [ rr:predicate music:instName ]; rr:objectMap [ rr:column "instrument" ]; ]; rr:predicateObjectMap [ rr:predicateMap [ rr:predicate music:instType ]; rr:objectMap [ rr:column "type" ]; ];.
41
SPARQL Protocol
44
• Standard HTTP API for calling a SPARQL processor
• Supported by all major triple stores and query processors
SPARQL Federation
45
SELECT ?artist ?song ?buyLinkWHERE { SERVICE <http://listening> { ?listened rdf:type listen:event . ?listened listen:artist ?artist . ?listened listen:song ?song } OPTIONAL { SERVICE <http://amazon> { ?isbn rdf:type amaz:mp3 . ?isbn amaz:artist ?artist . ?isbn amaz:song ?song . ?isbn amaz:link ?buyLink } } }
Call SPARQL endpoint that tracks your listening (like last.fm)
Call Amazon endpoint to get info on where to download the song.
Return Federated data
Federator
47
R2RML Endpoint
Web Endpoint
SPARQL Endpoint
Triple Store
SPARQL Endpoint
Federator
Database
Dbpedia
Ontology and service registry
Named graph mapping
• Services can provide named graphs, described in their service description
• Federator lets you create federated named graphs that map to service named graphs
48
Data integration
• Performance - data volume from sources is key
• Source capabilities
• Source statistics
49
Performance concerns: data volume
50
Domain
SELECT ... FILTER (?age >= 24) ...
WHERE Person.age >= 24
Reduction factors:•criteria•minimal projection•aggregation•joins (sometimes)•dup removal
Que
ry
Resu
lts
Data source capabilities
• SQL support
• Function support
• Function translation
• Inverse functions
• Data type mappings and translations
52
Data source statistics
• Table cardinality
• Column selectivity
• Column null density
• Join selectivity
53
“Which Marines that speak French and/or French Creole have had at least six months since their last deployment?”
56
Technologies
HR Domain
Mapping
Sources
HR Standards
61
Ontology
development
Analytics
SPARQL Federation
SPARQL to database
Rules
Collaborative Ontologies
Domain ontology
Ontologist Subject Matter Experts
model
discuss
wiki discuss
62
diagram
RIF
• Rule Interchange Format, W3C recommendation
• Rule = IF - THEN statement
• Used to derive new triples from existing triples
• Dialects
• Core
• Framework for Logic Dialects (FLD)
• Basic Logic Dialect (BLD)
• Production Rules Dialect (PRD)
• Rex - Revelytix RIF Core implementation
65
Enterprise Semantic Web
69
• Knoodl - collaborative ontology creation
• OntVis - ontology visualization (OWL)
• Spyder - SPARQL to SQL (RDF, R2RML)
• Federator - SPARQL federation (SPARQL 1.1, SPARQL Federation extensions)
• Rex - entailment with rules (RIF)
• Dashboards - analytics, visualization
More information
70
• Revelytix - http://revelytix.com
• Knoodl - http://knoodl.com
• OntVis - http://bit.ly/hLm3sd
• Spyder - http://revelytix.com/content/spyder
• Federator - beta coming soon...
• Rex - beta coming soon...