An RDF and XML Database

Post on 23-Feb-2016

55 views 0 download

Tags:

description

An RDF and XML Database. John Snelson, Lead Engineer 23 rd October 2013. MarkLogic. DATABASE. SEARCH. APPLICATION SERVICES. Data ≠ Information. Data + Context = Information. Dynamic Semantic Publishing BBC Sports. The Challenge. Goals. Size and Complexity: # of athletes # of teams - PowerPoint PPT Presentation

Transcript of An RDF and XML Database

An RDF and XML DatabaseJohn Snelson, Lead Engineer23rd October 2013

Slide 2 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

MarkLogic

SEARCHDATABASE

APPLICATION SERVICES

Slide 3 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Data ≠

Information

Slide 4 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Data +Context =Information

Slide 5 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Dynamic Semantic PublishingBBC Sports

Size and Complexity: # of athletes # of teams # of assets (match

reports, statistics, etc.) # of relations (facts)

Rich user experience See information in

context Personalize content Easy navigation Intelligently serve ads

(outside of UK) Manageable

Static pages? Too many, changing too fast

Limited number of journalists

Automate as much as possible

The Challenge Goals

Slide 6 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Dynamic Semantic PublishingA Solution

Store, manage documents

Stories Blogs Feeds Profiles

Store, manage values Statistics

Full-Text search Performance,

scalability Robustness

Metadata about documents

Tagged by journalists Added

(semi-)automatically Inferred

Facts reported by journalists

Linked Open Data for real-world facts

XML Database Triple Store

Slide 7 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

played in

plays in

plays for

Dynamic Semantic PublishingUnderstanding Data

Slide 8 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Dynamic Semantic PublishingScaling Up

Slide 9 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

What is RDF?

:has-child:has-parent

:birth-place

:spouse

:spouse

:birth-place

:has-child:has-parent

:person20:person5

:place5 :first-name:person4 “John”

Slide 10 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

What is RDF?

• Schema-less• Triple granularity• Open world assumption• Joins - the cost of granularity

RDF

Slide 11 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Data stored in TriplesExpressed as Subject : Predicate : Object

Example: "John Smith" : livesIn : "London""London" : isIn : "England"

What is Semantics?

Slide 12 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Data stored in TriplesExpressed as Subject : Predicate : Object

Example: "John Smith" : livesIn : "London""London" : isIn : "England"

Rules tell us something about the triplesExample:If (A livesIn X) AND (X isIn Y) then (A livesIn Y)Inference: "John Smith" : livesIn : "England"

What is Semantics?

Slide 13 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Data stored in TriplesExpressed as Subject : Predicate : Object

Example: "John Smith" : livesIn : "London""London" : isIn : "England"

Rules tell us something about the triples

What is Semantics?

"John Smith" "England"livesIn

"London"isIn

livesIn

Slide 15 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Semantics Architecture

TRIPLE

XQY XSLT SQL SPARQL

GRAPHSPARQL

Slide 16 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Triple Index

• 3 triple orders• Cached for performance• Works seamlessly with other indexes• Security• 150 bytes per triple on disk• Billions of triples per host• Scaling out horizontally

TRIPLE

Slide 17 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

RDF Loading

RDF

Slide 18 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Triples Embedded in Documents

…<sem:triple> <sem:subject> http://example.org/kennedy/person12 </sem:subject> <sem:predicate> http://example.org/kennedy/last-name </sem:predicate> <sem:object datatype="http://www.w3.org/2001/XMLSchema#string"> Lawford </sem:object></sem:triple>…

Slide 19 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Content, Data, and Semantics

<SAR><title>Suspicious vehicle…Suspicious vehicle near airport<date><type><threat>

2012-11-12Zobservation/surveillance

<type>suspicious activity<category>suspicious vehicle

<location><lat>37.497075<long>-122.363319

<subject>IRIID<subject>IRIID

<predicate><predicate>

isavalue

<triple><triple>

<object>license-plate<object>ABC 123

<description>A blue van…A blue van with license plate ABC 123 was observed parked behind the airport sign…

</title></date>

</type>

</type></category>

</threat>

</lat></long>

</location>

</subject></subject>

</predicate></predicate>

</object></object>

</description></SAR>

</triple></triple>

Slide 20 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Content, Data, and Semantics

<SAR>

<title>

Suspicious vehicle…

<date>

2012-11-12Z

<type>

<threat>

suspicious activity<category>

suspicious vehicle

<location>

<lat>

37.497075

<long>

-122.363319

<description>

A blue van…

<subject><subject>

<predicate>

<object>IRIID

IRIID

isa

value

license-plate

ABC 123<predicate>

<object>

observation/surveillance <type>

<triple>

<triple>

Semantic

(RDF)

Triples

Unstructured full-

text

Geospatial

Data

Slide 21 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

RDF Values

<http://example.org/kennedy/person4>

“string value”^^xs:string

“987”^^xs:double

“2013-04-09”^^xs:date “bonjour”@fr

_:blank1

“simple”

Slide 22 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Datatype MappingDatatype SPARQL XQuery

Typed Literal

“2013-04-09”^^xs:date

xs:date(“2013-04-09”)

IRI <http://example.com> sem:iri(“http:// example.com”)

Blank Node _:blank1 sem:blank(“…”)Simple Literal “simple” xs:string(“simple”)Language “bonjour”@frTaggedLiteral

rdf:langString(“bonjour”,“fr”)

Slide 23 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

SPARQL

• Executed using the triple index• SPARQL 1.0 + much of SPARQL 1.1• Cost-based optimization• Join ordering and algorithms

select * where { ?person :birth-place ?place; :first-name “John”}

SPARQL

Slide 24 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Executing SPARQL

sem:sparql(“ prefix : <http://example.org/kennedy/> select * { ?person :first-name ?first; :last-name ?last; :alma-mater [:ivy-league :true] }”, map:entry(“first”,“John”), (), cts:collection-query(“mycollection”))

Slide 25 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Returning Binding Solutions

select * where { ?person :birth-place :place5}

select * where { ?person :birth-place ?place; :first-name “John”}

Slide 26 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Solution Results

person place

:person22 :place13

:person4 :place5

map:map

Slide 27 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

SPARQL Query Results XML Format

sem:query-result-serialize( sem:sparql(“select * { … }”), “xml”)

Slide 28 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Returning Triples

describe :person4

construct { ?bp :uses-name ?fn} where { ?person :birth-place ?bp; :first-name ?fn}

Slide 29 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Triple Resultssem:triple

:place0 :uses-name “Ethel”, “Jeffrey”, “Kara” .:place1 :uses-name “Edward”, “James” .:place10 :uses-name “Robert”, “Sheila”, “Stephen” .

sem:iri

Slide 30 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Querying Named Graphs

select *from <http://my_graph>where { ?s ?p ?o }

select * where { graph <http://my_graph> { ?s ?p ?o }}

collection

Slide 31 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Restricting The Datasets

let $options := “properties”let $query := cts:and-query( cts:directory-query(“/triples/”), cts:element-range-query( xs:QName(“date”),“>”,$date) )return sem:sparql(“…”,(),(), $options,$query)

Slide 32 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Creating Triples

• sem:triple()• sem:rdf-parse()• sem:rdf-get()• sem:rdf-builder()

• sem:rdf-load()• sem:rdf-insert()

Returning sem:triple values

Inserting to a database

Slide 33 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Graph Store APIdeclare function graph-insert( $graphname as sem:iri, $triples as sem:triple*, [$permissions as element(sec:permission)*, $collections as xs:string*, $quality as xs:int?, $forest-ids as xs:unsignedLong*]) as xs:string*;

declare function graph-delete( $graphname as sem:iri) as empty-sequence();

Slide 34 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Conclusion

• Semantics can enhance your data-oriented and search applications.• XQuery and SPARQL work well together.• A combination RDF and XML database simplifies working with the technologies together.• Try MarkLogic 7: http://www.marklogic.com/early-access/

Slide 35 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Any Questions?