Managing Semantic Graphs with Stardog 4

Managing Semantic Graphs with Stardog 4*

Pavel Klinov

Senior Research Engineer Complexible Inc

Based on Evren Sirin’s talk “Taming Big Data Variety with Semantic Graph Databases” at Smart Data 2015

Overview

Graphs, semantic graphs, and data variety

Semantic graphs and data integration

• RDF as unified data model

• Virtual graphs

A little on Stardog (RDF database)

About Complexible

Leading semtech provider since 2006 (aka Clark & Parsia)

• software (Pellet, Stardog)

• W3C participation

Released Stardog 1.0 in 2012 (current version 4.0.1)

Raising Round A

http://complexible.com


Big Data VsVolume

Velocity

Variety Veracity Volatility Value

Data variety is the real challenge

Based on Paradigm4 survey of more than 100 data scientists

http://www.paradigm4.com/infographic2014/

http://www.paradigm4.com/infographic2014/

Data Variety

Syntax: formats

Structure: schemas

https://www.flickr.com/photos/designmilk/8552219138

https://www.flickr.com/photos/designmilk/8552219138

In complex enterprises with lots of data variety, most

analytic challenges can be reduced to data integration

Data integration spaceIntegrated data

Integration effort

Data lakes

Data warehouses

Data integration spaceIntegrated data

Integration effort

Data lakes

Data warehouses

Sweet spot

Data integration challenge

RDB RDB RDBData lakes:

How to query this as a single integrated data source?

Data integration challenge

RDB RDB RDBData lakes:

How to query this as a single integrated data source?

Unified Data Model

Unified Data Model

Global coherent view over heterogenous data

Unified Data Model


flexible and extensible

Unified Data Model



at the right level of abstraction

Unified Data Model



at the right level of abstraction

enabling automated processing and analysis

• querying

• constraint validation

• reasoning (making implicit knowledge explicit)

Graphs are everywhere


Knowledge Graph


Knowledge Graph

Open Graph

Linked Open Data


Knowledge Graph

Open Graph

Why graphs?

Why graphs?

Generic data representation model

Why graphs?


Utilize connectedness of the data

Why graphs?



Flexible and extensible

Why graphs?




Easy to compose and connect

Why graphs?





Increasing number of graph database offerings

(Neo4j, Titan,…)

Generic data representation model Utilize connectedness of the data



Increasing number of graph database offerings

(Neo4j, Titan,…)

Why graphs?not

No standards for syntax, semantics, or queries

RDF, briefly

RDF addresses this standardization gap for graphs

RDF, briefly


RDF data is a set of triples (edges)

<emp:John, emp:worksFor, emp:Google>

RDF, briefly




Originally developed to publish and link data on Web

thus Linked Data

RDF, briefly




Originally developed to publish and link data on Web

thus Linked Data

But it can serve as general graph data model

Abstract Graph

http://www.w3.org/TR/rdf11-primer/


RDF Graph



RDF graphs are semantic graphs

RDF graphs are graphs with meaning



• explicit references to terms and their definitions

• definitions have formal semantics





Important for creating unified data models

• thus supporting data integration





Important for creating unified data models

• thus supporting data integration

Important for declaratively describing complex

information processing tasks

RDF serialization


01 BASE <http://example.org/> 02 PREFIX foaf: <http://xmlns.com/foaf/0.1/> 03 PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 04 PREFIX schema: <http://schema.org/> 05 PREFIX dcterms: <http://purl.org/dc/terms/> 06 PREFIX wd: <http://www.wikidata.org/entity/> 07 08 <bob#me> 09 a foaf:Person ; 10 foaf:knows <alice#me> ; 11 schema:birthDate "1990-07-04"^^xsd:date ;12 foaf:topic_interest wd:Q12418 . 13 14 wd:Q12418 15 dcterms:title "Mona Lisa" ; 16 dcterms:creator <http://dbpedia.org/resource/Leonardo_da_Vinci> .17 18 <http://data.europeana.eu/item/04802/243FA8618938F4117025F17A8B813C5F9AA4D619> 19 dcterms:subject wd:Q12418 .


http://example.org/

http://xmlns.com/foaf/0.1/

http://www.w3.org/2001/XMLSchema#

http://schema.org/

http://purl.org/dc/terms/

http://www.wikidata.org/entity/

RDF serialization


01 BASE <http://example.org/> 02 PREFIX foaf: <http://xmlns.com/foaf/0.1/> 03 PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 04 PREFIX schema: <http://schema.org/> 05 PREFIX dcterms: <http://purl.org/dc/terms/> 06 PREFIX wd: <http://www.wikidata.org/entity/> 07 08 <bob#me> 09 a foaf:Person ; 10 foaf:knows <alice#me> ; 11 schema:birthDate "1990-07-04"^^xsd:date ;12 foaf:topic_interest wd:Q12418 . 13 14 wd:Q12418 15 dcterms:title "Mona Lisa" ; 16 dcterms:creator <http://dbpedia.org/resource/Leonardo_da_Vinci> .17 18 <http://data.europeana.eu/item/04802/243FA8618938F4117025F17A8B813C5F9AA4D619> 19 dcterms:subject wd:Q12418 .

PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX schema: <http://schema.org/> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX dbpedia: <http://dbpedia.org/resource/>

SELECT ?person ?title WHERE { ?person a foaf:Person ; schema:birthDate ?birthDate ; foaf:topic_interest ?interest . ?interest dcterms:title ?title ; dcterms:creator dbpedia:Leonardo_da_Vinci . FILTER (?birthDate < "1991-01-01"^^xsd:date ) }

SPARQL query


http://example.org/

http://xmlns.com/foaf/0.1/

http://www.w3.org/2001/XMLSchema#

http://schema.org/

http://purl.org/dc/terms/

http://www.wikidata.org/entity/

Schema (aka ontology)

Person

Agent

Organization

rdfs:subClassOf rdfs:subClassOfworksFor hasEmployee

owl:inverseOf

rdfs:range


Person

Agent

Organization

rdfs:subClassOf

Bob

rdfs:subClassOf

rdf:type

worksFor hasEmployeeowl:inverseOf

rdfs:range


Person

Agent

Organization

rdfs:subClassOf

Bob

rdfs:subClassOf

rdf:type

rdf:type


rdfs:range


Person

Agent

Organization

rdfs:subClassOf

Bob

rdfs:subClassOf

rdf:type

ACME

rdf:type

worksFor


rdfs:range


Person

Agent

Organization

rdfs:subClassOf

Bob

rdfs:subClassOf

rdf:type

ACME

rdf:type

worksFor

hasEmployee


rdfs:range


Person

Agent

Organization

rdfs:subClassOf

Bob

rdfs:subClassOf

rdf:typerdf:type

ACME

rdf:type

worksFor

hasEmployee


rdfs:range


Person

Agent

Organization

rdfs:subClassOf

Bob

rdfs:subClassOf

rdf:typerdf:type

ACME

rdf:type

worksFor

hasEmployee


rdfs:range

rdf:type

Semantic models in RDF are:

Interoperable: no vendor lock-in

Actionable: run queries against it

Expressive: describe arbitrary (hyper) graphs

Flexible: adapt to changing data, new data, etc.

Reusable: by different apps in other domains

Viewing RDBs as RDF graphs

Take this:


Take this:

And view it as something like:


Take this:

And view it as something like:

http://www.w3.org/TR/rdb2rdf-ucr/

R2RML: mapping from RDB to RDF

R2RML is a standard for mapping RDB sources to RDF



Mapping is conceptual, vendors can:

• extract, transform, load as RDF

• query on the fly (virtual graphs)



Mapping is conceptual, vendors can:

• extract, transform, load as RDF

• query on the fly (virtual graphs)

Direct and customizable mappings

Virtual graphs in Stardog

1. Register: name, properties, mappings

2. Use in queries

Virtual graphs in Stardog

1. Register: name, properties, mappings

2. Use in queries

SELECT * { GRAPH <virtual://dept> { ?person a emp:Employee ; emp:department ?department . } ?department foaf:organization <urn:engineering> . }

Customizable mapping exampleemp:{"empno"} a emp:Employee ; emp:name "{\"ename\"}" ; emp:role emp:{ROLE} ; emp:department dept:{"deptno"} ; sm:map [ sm:query """ SELECT \"empno\", \"ename\", \"deptno\", (CASE \"job\" WHEN 'CLERK' THEN 'general-office' WHEN 'NIGHTGUARD' THEN 'security' WHEN 'ENGINEER' THEN 'engineering' END) AS ROLE FROM \"EMP\" """ ; ] .

Data integration with unified domain model and R2RML

Reasoning with virtual graphs


Get results which


Get results which

• do not exist in the data lakes


Get results which


• but follow given the domain models and mappings


Get results which



Turn your data lakes into deductive databases…


Get results which



Turn your data lakes into deductive databases…

… without them noticing!

Reasoning with virtual graphs: example

Author ArticleJohn http://nature.com/123

Publisher NameSpringer http://springer.com/LCNS

Article database Publisher database

http://nature.com/123

http://springer.com/LCNS





Goal: query for all publications across both databases







John

nature:123authors

Articlerdf:type

Springer

springer:lncs

publishes








John

nature:123authors

Articlerdf:type

Springer

springer:lncs

publishes

Publication

rdfs:subClassOf

rdfs:range








John

nature:123authors

Articlerdf:type

Springer

springer:lncs

publishes

Publication

rdfs:subClassOf

rdfs:rangerdf:type

rdf:type




Stardog: Semantic Graph DatabaseThe leading RDF database

Pure Java: any JVM language, full REST bindings

Client-server, embedded, middleware modes

Rich feature set

Supports property graphs (Tinkerpop)

ACID Transactions, High Availability, Hot backup/restore, JMX server monitoring, Access & Audit logging, RBAC security model, LDAP integration, SPARQL 1.1 queries, OWL 2 Reasoning, Proof trees, Integrity constraints, Full-text search, Geospatial support, Virtual graphs, Provenance support

Single-node ScalabilityScale up to 50B triples on modest hardware


● 32 cores, 256 GB RAM, 2 x 7200RPM HDDs, < $10K cost



Load rates up to 500k triples/second

● That’s 100M triples in 3 min, 1B in 30 min, and 20B in 20 hours



Load rates up to 500k triples/second

● That’s 100M triples in 3 min, 1B in 30 min, and 20B in 20 hours

Best-of-breed query answering performance

● Query 100M triples with a throughput of 3M+ queries/hour, 1B at

500k queries/hour, and 10B at 20k queries/hour (BSBM, 64 clients)

Stardog for Big Data (coming 2016)


HDFS-backed storage

Horizontal partitioning of data


HDFS-backed storage

Horizontal partitioning of data

Advanced query planner and optimization

Parallel query execution with async messaging

Questions?@klinovp, [email protected]

http://complexible.com, http://stardog.com


http://stardog.com

Managing Semantic Graphs with Stardog 4

Technology

Transcript of Managing Semantic Graphs with Stardog 4