Improve Efficiency of Mapping Data Between XML and RDF with XSPARQL

7
Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Int. Conf. on Web Reasoning and Rule Systems August 28, 2011 Improve Efficiency of Mapping Data between XML and RDF with XSPARQL Stefan Bischof , Nuno Lopes, and Axel Polleres 1

Transcript of Improve Efficiency of Mapping Data Between XML and RDF with XSPARQL

Page 1: Improve Efficiency of Mapping Data Between XML and RDF with XSPARQL

13/03/2008 FAST kick-off, Madrid, 2008 Copyright 2011 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Int. Conf. on Web Reasoning and Rule SystemsAugust 28, 2011

Improve Efficiency of Mapping Data between XML and RDF with XSPARQL

Stefan Bischof, Nuno Lopes, and Axel Polleres

1

Page 2: Improve Efficiency of Mapping Data Between XML and RDF with XSPARQL

Digital Enterprise Research Institute www.deri.ie

XSPARQL: Bridging the gap of XML and RDF

2

XML

SPARQL XQuery

RDF XSPARQL

XMLFOAFExample XSPARQL

Page 3: Improve Efficiency of Mapping Data Between XML and RDF with XSPARQL

Digital Enterprise Research Institute www.deri.ie

Problem: Evaluating Nested Graph Patterns

for $p $name from <persons.rdf>

where { $p a foaf:Person .

$p foaf:name $name . }

return

<person> <name>{ $name }</name>

for $friend from <persons.rdf>

where { $p foaf:knows $friend . $friend foaf:name $fname . }

return <friend>{ $fname }</friend> </person>

3

SPARQLXQuery

Page 4: Improve Efficiency of Mapping Data Between XML and RDF with XSPARQL

Digital Enterprise Research Institute www.deri.ie

One Approach: Nested Loop Join in XQuery

friendlist :=

for $fname from <persons.rdf> where { $p1 foaf:knows $friend .

$friend foaf:name $fname . } for $p $name from <persons.rdf>

where { $p a foaf:Person .

$p foaf:name $name . }

return

<person>

<name>{ $name }</name> for $friend in friendlist

where $p = $friend/$p1 return <friend>{ $fname }</friend>

</person>

4

SPARQLXQuery

Join

Page 5: Improve Efficiency of Mapping Data Between XML and RDF with XSPARQL

Digital Enterprise Research Institute www.deri.ie

Evaluation Results

5

1

10

100

1000

1 10 100

Tim

e (s

ec)

Dataset Size (MB)

Naive X

SPAR

QL Imple

mentat

ionNes

ted Lo

op W

HERE

Clause

Sort-Merge Nested Loop XPath

Merge Graph Patterns

Named Graph

scales with number of saved

SPARQL calls

Page 6: Improve Efficiency of Mapping Data Between XML and RDF with XSPARQL

Digital Enterprise Research Institute www.deri.ie

Future Work/My PhD Proposal

• Formalise the integrated language XSPARQL– Formalism combining XQuery (functional) with SPARQL (rel.algebra)

• Optimise XSPARQL using this formal model– Currently only manual optimisations

– Useful for any approach manipulating both XML and RDF data

• RDFS + OWL reasoning– Add different kinds of reasoning to the formal model

– SPARQL 1.1 entailment regimes

6

Page 7: Improve Efficiency of Mapping Data Between XML and RDF with XSPARQL

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Digital Enterprise Research Institute

Improve Efficiency of Mapping Data between XML and RDF with XSPARQL

Stefan Bischof, Nuno Lopes, and Axel Polleres

XSPARQL: Bridging the gap of XML and RDF‣ Language to map data between XML and RDF

‣ Combines the strengths of XQuery and SPARQL query languages

‣ Provides XQuery’s function library to SPARQL

‣ Provides SPARQL’s graph pattern matching facility to XQuery

Prototype: Rewrite XSPARQL to XQuery‣ Uses standard XQuery and SPARQL engines

‣ Try the prototype http://xsparql.deri.org/demo

Problem: Evaluating Nested Graph Patterns‣ Loops with nested graph patterns result in a large number

interactions between XQuery and SPARQL engines

‣ Prototype evaluates such joins naively as nested loop join

‣ Prototype is unable to exploit high similarity of the SPARQL calls

Proposed Optimisations‣ Minimize communication overhead for problematic queries

‣ Reduce the number of interactions between XQuery and SPARQL

‣ Perform only a static number of SPARQL calls by moving the join

‣ Move join to pure XQuery

-! Nested loop join using an XQuery WHERE clause or XPath

-! Tail recursive implementation of sort-merge join

‣ Move join to SPARQL

-! Join by merging SPARQL graph patterns

-! Join using named graph injection in triple store

Evaluation: Optimisations on several data sizes ‣ XMark benchmarks for XQuery adopted to XSPARQL use case

‣ Optimisations are applicable for the 3 slowest out of 20 queries

Results: XSPARQL can be faster‣ Optimisations performed always better than standard XSPARQL

‣ SPARQL join optimisations were the fastest (when applicable)

Future Work: More Optimisations and Features‣ Query also relational databases

‣ Create a concise formalisation of XSPARQL

‣ Exploit properties of XSPARQL fragments for optimisation

‣ Support SPARQL 1.1 and SPARQL 1.1 Entailment Regimes

More information http://xsparql.deri.org/

AcknowledgementsThis work has been funded by Science Foundation Ireland, Grant No. SFI/08/CI/I1380 (Lion-2) and by an IRCSET scholarship

XSPARQL query

XSPARQL rewriter

SPARQL engine

RDFdata

XQuery query

XQuery engine

XMLdata

XML or RDF

XML

SPARQLXQuery

RDFXSPARQL

Conclusion: Maintainable and Efficient Mapping‣ Performance of standard XSPARQL is drastically reduced for

queries containing nested graph patterns

‣ Performance of such queries improves with different optimisations

‣ XSPARQL can provide better performance than ad-hoc setups for mapping data between XML and RDF

1

10

100

1000

1 10 100

Tim

e (s

ec)

Dataset Size (MB)

Naive X

SPAR

QL Imple

mentat

ion

Nested

Loop

WHERE C

lause

Sort-Merge Nested Loop XPath

Merge Graph Patterns

Named Graph

scales with number of saved SPARQL calls

Questions about XSPARQL, syntax, semantics, implementation, prototype, optimisation, performance, RDF/XML …

… visit us at our poster in the afternoon!

Thanks for your attention!

7