An RDF and XML Database

34
An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013

description

An RDF and XML Database. John Snelson, Lead Engineer 23 rd October 2013. MarkLogic. DATABASE. SEARCH. APPLICATION SERVICES. Data ≠ Information. Data + Context = Information. Dynamic Semantic Publishing BBC Sports. The Challenge. Goals. Size and Complexity: # of athletes # of teams - PowerPoint PPT Presentation

Transcript of An RDF and XML Database

Page 1: An RDF and XML Database

An RDF and XML DatabaseJohn Snelson, Lead Engineer23rd October 2013

Page 2: An RDF and XML Database

Slide 2 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

MarkLogic

SEARCHDATABASE

APPLICATION SERVICES

Page 3: An RDF and XML Database

Slide 3 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Data ≠

Information

Page 4: An RDF and XML Database

Slide 4 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Data +Context =Information

Page 5: An RDF and XML Database

Slide 5 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Dynamic Semantic PublishingBBC Sports

Size and Complexity: # of athletes # of teams # of assets (match

reports, statistics, etc.) # of relations (facts)

Rich user experience See information in

context Personalize content Easy navigation Intelligently serve ads

(outside of UK) Manageable

Static pages? Too many, changing too fast

Limited number of journalists

Automate as much as possible

The Challenge Goals

Page 6: An RDF and XML Database

Slide 6 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Dynamic Semantic PublishingA Solution

Store, manage documents

Stories Blogs Feeds Profiles

Store, manage values Statistics

Full-Text search Performance,

scalability Robustness

Metadata about documents

Tagged by journalists Added

(semi-)automatically Inferred

Facts reported by journalists

Linked Open Data for real-world facts

XML Database Triple Store

Page 7: An RDF and XML Database

Slide 7 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

played in

plays in

plays for

Dynamic Semantic PublishingUnderstanding Data

Page 8: An RDF and XML Database

Slide 8 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Dynamic Semantic PublishingScaling Up

Page 9: An RDF and XML Database

Slide 9 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

What is RDF?

:has-child:has-parent

:birth-place

:spouse

:spouse

:birth-place

:has-child:has-parent

:person20:person5

:place5 :first-name:person4 “John”

Page 10: An RDF and XML Database

Slide 10 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

What is RDF?

• Schema-less• Triple granularity• Open world assumption• Joins - the cost of granularity

RDF

Page 11: An RDF and XML Database

Slide 11 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Data stored in TriplesExpressed as Subject : Predicate : Object

Example: "John Smith" : livesIn : "London""London" : isIn : "England"

What is Semantics?

Page 12: An RDF and XML Database

Slide 12 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Data stored in TriplesExpressed as Subject : Predicate : Object

Example: "John Smith" : livesIn : "London""London" : isIn : "England"

Rules tell us something about the triplesExample:If (A livesIn X) AND (X isIn Y) then (A livesIn Y)Inference: "John Smith" : livesIn : "England"

What is Semantics?

Page 13: An RDF and XML Database

Slide 13 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Data stored in TriplesExpressed as Subject : Predicate : Object

Example: "John Smith" : livesIn : "London""London" : isIn : "England"

Rules tell us something about the triples

What is Semantics?

"John Smith" "England"livesIn

"London"isIn

livesIn

Page 14: An RDF and XML Database

Slide 15 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Semantics Architecture

TRIPLE

XQY XSLT SQL SPARQL

GRAPHSPARQL

Page 15: An RDF and XML Database

Slide 16 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Triple Index

• 3 triple orders• Cached for performance• Works seamlessly with other indexes• Security• 150 bytes per triple on disk• Billions of triples per host• Scaling out horizontally

TRIPLE

Page 16: An RDF and XML Database

Slide 17 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

RDF Loading

RDF

Page 17: An RDF and XML Database

Slide 18 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Triples Embedded in Documents

…<sem:triple> <sem:subject> http://example.org/kennedy/person12 </sem:subject> <sem:predicate> http://example.org/kennedy/last-name </sem:predicate> <sem:object datatype="http://www.w3.org/2001/XMLSchema#string"> Lawford </sem:object></sem:triple>…

Page 18: An RDF and XML Database

Slide 19 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Content, Data, and Semantics

<SAR><title>Suspicious vehicle…Suspicious vehicle near airport<date><type><threat>

2012-11-12Zobservation/surveillance

<type>suspicious activity<category>suspicious vehicle

<location><lat>37.497075<long>-122.363319

<subject>IRIID<subject>IRIID

<predicate><predicate>

isavalue

<triple><triple>

<object>license-plate<object>ABC 123

<description>A blue van…A blue van with license plate ABC 123 was observed parked behind the airport sign…

</title></date>

</type>

</type></category>

</threat>

</lat></long>

</location>

</subject></subject>

</predicate></predicate>

</object></object>

</description></SAR>

</triple></triple>

Page 19: An RDF and XML Database

Slide 20 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Content, Data, and Semantics

<SAR>

<title>

Suspicious vehicle…

<date>

2012-11-12Z

<type>

<threat>

suspicious activity<category>

suspicious vehicle

<location>

<lat>

37.497075

<long>

-122.363319

<description>

A blue van…

<subject><subject>

<predicate>

<object>IRIID

IRIID

isa

value

license-plate

ABC 123<predicate>

<object>

observation/surveillance <type>

<triple>

<triple>

Semantic

(RDF)

Triples

Unstructured full-

text

Geospatial

Data

Page 20: An RDF and XML Database

Slide 21 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

RDF Values

<http://example.org/kennedy/person4>

“string value”^^xs:string

“987”^^xs:double

“2013-04-09”^^xs:date “bonjour”@fr

_:blank1

“simple”

Page 21: An RDF and XML Database

Slide 22 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Datatype MappingDatatype SPARQL XQuery

Typed Literal

“2013-04-09”^^xs:date

xs:date(“2013-04-09”)

IRI <http://example.com> sem:iri(“http:// example.com”)

Blank Node _:blank1 sem:blank(“…”)Simple Literal “simple” xs:string(“simple”)Language “bonjour”@frTaggedLiteral

rdf:langString(“bonjour”,“fr”)

Page 22: An RDF and XML Database

Slide 23 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

SPARQL

• Executed using the triple index• SPARQL 1.0 + much of SPARQL 1.1• Cost-based optimization• Join ordering and algorithms

select * where { ?person :birth-place ?place; :first-name “John”}

SPARQL

Page 23: An RDF and XML Database

Slide 24 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Executing SPARQL

sem:sparql(“ prefix : <http://example.org/kennedy/> select * { ?person :first-name ?first; :last-name ?last; :alma-mater [:ivy-league :true] }”, map:entry(“first”,“John”), (), cts:collection-query(“mycollection”))

Page 24: An RDF and XML Database

Slide 25 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Returning Binding Solutions

select * where { ?person :birth-place :place5}

select * where { ?person :birth-place ?place; :first-name “John”}

Page 25: An RDF and XML Database

Slide 26 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Solution Results

person place

:person22 :place13

:person4 :place5

map:map

Page 26: An RDF and XML Database

Slide 27 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

SPARQL Query Results XML Format

sem:query-result-serialize( sem:sparql(“select * { … }”), “xml”)

Page 27: An RDF and XML Database

Slide 28 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Returning Triples

describe :person4

construct { ?bp :uses-name ?fn} where { ?person :birth-place ?bp; :first-name ?fn}

Page 28: An RDF and XML Database

Slide 29 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Triple Resultssem:triple

:place0 :uses-name “Ethel”, “Jeffrey”, “Kara” .:place1 :uses-name “Edward”, “James” .:place10 :uses-name “Robert”, “Sheila”, “Stephen” .

sem:iri

Page 29: An RDF and XML Database

Slide 30 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Querying Named Graphs

select *from <http://my_graph>where { ?s ?p ?o }

select * where { graph <http://my_graph> { ?s ?p ?o }}

collection

Page 30: An RDF and XML Database

Slide 31 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Restricting The Datasets

let $options := “properties”let $query := cts:and-query( cts:directory-query(“/triples/”), cts:element-range-query( xs:QName(“date”),“>”,$date) )return sem:sparql(“…”,(),(), $options,$query)

Page 31: An RDF and XML Database

Slide 32 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Creating Triples

• sem:triple()• sem:rdf-parse()• sem:rdf-get()• sem:rdf-builder()

• sem:rdf-load()• sem:rdf-insert()

Returning sem:triple values

Inserting to a database

Page 32: An RDF and XML Database

Slide 33 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Graph Store APIdeclare function graph-insert( $graphname as sem:iri, $triples as sem:triple*, [$permissions as element(sec:permission)*, $collections as xs:string*, $quality as xs:int?, $forest-ids as xs:unsignedLong*]) as xs:string*;

declare function graph-delete( $graphname as sem:iri) as empty-sequence();

Page 33: An RDF and XML Database

Slide 34 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Conclusion

• Semantics can enhance your data-oriented and search applications.• XQuery and SPARQL work well together.• A combination RDF and XML database simplifies working with the technologies together.• Try MarkLogic 7: http://www.marklogic.com/early-access/

Page 34: An RDF and XML Database

Slide 35 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Any Questions?