SHACL: Shaping the Big Ball of Data Mud

Post on 15-Apr-2017

1.025 views 1 download

Transcript of SHACL: Shaping the Big Ball of Data Mud

Shaping the Big Ball of Data Mud

W3C's Shapes Constraint Language (SHACL)

Richard CyganiakLotico Berlin Semantic Web Meetup, 17 November 2016

Semantic WebRDF

SPARQLOWLRDFS

RDFSPARQL

OWLRDFS

Strengths Weaknesses• Flexible can-say-anything data model•Merging data is trivial• Shared, explicit meaning thanks to URIs•Mixing and matching of schemas;

partial understanding• Painstakingly developed vocabularies• “Neutral ground” for modelling• SPARQL

• Overgeneralisation: works for anything, but great at nothing• “RDF tax”• Logic foundations and web

foundations can be baggage•Maps poorly to common

programming language data structures• Schemaless nature makes

optimisation difficult• Not good at semi-structured

Application Areas• Knowledge graphs• Publishing• Life sciences• Fraud detection & identity management• Data integration & analysis

The V’s of Big Data: Volume, Velocity, Variety

https://www.w3.org/blog/2010/05/linked-data-its-is-not-like-th/

RDFSPARQL

OWLRDFS

Validation?Constraint checking?

RDF is supposedly self-describing.

RDF

Schema.org

Simple Knowledge Organization Scheme (SKOS)

Dublin Core

Data Cube Vocabulary

R2RML

Linked Data Platform (LDP)

Why is RDFS not enough?

RDFSPARQL

OWLRDFS

Why is RDFS not enough?• RDF “Schema” — and schemas are for validation, right?• It’s a misnomer; should be “RDF Vocabulary Definition Language”• Very limited expressivity• Not the right semantics for validation• ex:capital range ex:City. ex:Berlin ex:capital ex:Germany => …?

• Invalid data -> infer more invalid data

=> ex:Germany a ex:City

RDFS

Why is OWL not enough?

RDFSPARQL

OWLRDFS

Why is OWL not enough?• De facto a constraint language: logical contradiction => invalid• Very expressive• But targeted at logic modelling, not validity constraints• Not the right semantics for validation• ex:Dublin ex:inCountry ex:Ireland, ex:USA => …?

• Open world assumption• No unique name assumption

=> ex:Ireland owl:sameAs ex:USA

OWL

ICV: OWL closed-world semantics in Stardog

Why is SPARQL not enough?

RDFSPARQL

OWLRDFS

Why is SPARQL not enough?SPARQL

http://spinrdf.org/

Why is SPARQL not enough?• SPARQL ASK seems ideal for constraint validation• Very expressive• Efficient implementations• But writing even simple constraints can be tedious

SPARQL

Other proposals

ShEx — Shape Expressions

http://shex.io/

So, something new?

RDFSPARQL

OWLRDFS

Validation?Constraint checking?

SHACLShapes Constraint

Language

SHACL Overview • A language for “checking RDF graphs against conditions”• Produced by W3C Data Shapes Working Group• Work in progress, some features at risk• 4th Working Draft: August 2016• Should be done by June 2017• Like RDFS and OWL, SHACL constraints are themselves written in RDF• SPARQL underneath (for evaluation semantics and extensibility)

ex:PersonShapea sh:Shape ;sh:targetClass ex:Person ;sh:property [

sh:predicate ex:ssn ;sh:maxCount 1 ;sh:datatype xsd:string ;sh:pattern "^\\d{3}-\\d{2}-\\d{4}$" ;

] ;sh:property [

sh:predicate ex:child ;sh:class ex:Person ;sh:nodeKind sh:IRI ;

] ;sh:property [

sh:path [ sh:inversePath ex:child ] ;sh:name "parent" ;sh:maxCount 2 ;

] .

How a Shape works

Diagram: Dimitris Kontokostas

Targets: Initial selection of focus nodes• Node target• Class instance target• Subjects-of target• Objects-of target• SPARQL-based selection (advanced)

Node constraintsConstraints about the focus node itself:

• Node kind (IRI, blank, literal)• IRI stem (namespace)• IRI regex• SPARQL query constraint (advanced)

Property constraintsConstraints about a certain outgoing or incoming property of the focus node(s):

• Cardinality• Class• Datatype• Node kind (IRI, blank node, literal)• String min/max length, string regex• Numeric min/max

• Value must match another shape• Value must not match another shape

Other features• Combine constraints with logical OR/any (default: AND/all)• Property-pair comparison (=, <, >)• Severities (Violation, Warning, Info)• Annotations (name, description, grouping, order)• Define additional types of constraints based on SPARQL (advanced)

Violation reports can be produced in RDFex:ExampleConstraintViolation

a sh:ValidationResult ;sh:severity sh:Violation ;sh:focusNode ex:Bob ;sh:path ex:age ;sh:value "twenty two" ;sh:message "ex:age must be literal of datatype xsd:integer." ;sh:sourceConstraintComponent sh:DatatypeConstraintComponent ;sh:sourceShape ex:PersonShape .

Relationship to Rules• Rules: “If someone says this, then I say that.”• SHACL can’t do this.• Does not replace SWRL, Jena Rules, RIF, SPIN Rules

Uses and implementations

SHACL in TopBraid Composer:Shapes + Constraints

SHACL support is available in the TopBraid Composer Free Edition. http://www.topquadrant.com/downloads/

SHACL in TopBraid Composer: SPARQL-based constraints

SHACL in TopQuadrant’s web products (EVN, EDG)

SHACL Protégé Plugin

http://me-at-big.blogspot.de/2015/07/shacl4p-shapes-constraint-language.html

Repairing SKOS taxonomies with SHACLValidation of SKOS with SHACL, and extension of SHACL with specification of repair strategies.

Christian Mader and Monika Solanki, http://ceur-ws.org/Vol-1666/paper-06.pdf

Validating the “bag of crisps”…• Validation is often not about correct/incorrect or valid/invalid• Constraints-first (e.g., SQL)• Well-formed vs valid (e.g., XML Schema)

• Validation is often about completeness and correctness for a specific purpose: “This is what I produce”; “This is what I understand”• Assumption is that there may be other statements• Different consumers may apply different constraints• SHACL should work well in this flexible, multi-source, multi-consumer

world.

“Anyone can say anything about anything”

RDFSPARQL

OWLRDFS

Statements: What is being said?

What words dowe have?

What makes logical sense to say?

What did you sayabout XYZ?

OWL SHACL

Is that word used correctly?What do you need to know from me?You can't say that here!I’d never say that!

2017

richard@topquadrant.com

Backup slides