ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches...

76
1 [email protected] MonetDB/XQuery: Updates ADT 2010 ADT 2010 ADT 2010 XQuery Updates in MonetDB/XQuery XQuery Updates in MonetDB/XQuery & & Other Approaches to XQuery Processing Other Approaches to XQuery Processing Stefan Manegold [email protected] http://www.cwi.nl/~manegold/

Transcript of ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches...

Page 1: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

1

[email protected] MonetDB/XQuery: Updates ADT 2010

ADT 2010ADT 2010

XQuery Updates in MonetDB/XQueryXQuery Updates in MonetDB/XQuery

&&

Other Approaches to XQuery ProcessingOther Approaches to XQuery Processing

Stefan [email protected]

http://www.cwi.nl/~manegold/

Page 2: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

2

[email protected] MonetDB/XQuery: Updates ADT 2010

• 09.11.2010:

•RDBMS back-end support for XML/XQuery (1/2):

•Document Representation (XPath Accelerator, Pre/Post plane)

• 16.11.2010:

•XPath navigation (Staircase Join)

•XQuery to Relational Algebra Compiler:

•Item- & Sequence- Representation

•Efficient FLWoR Evaluation (Loop-Lifting)

•Optimization

• 23.11.2010:

•RDBMS back-end support for XML/XQuery (2/2):

•Updateable Document Representation

•Other (DB-) approaches to XML/XQuery processing

ScheduleSchedule

Page 3: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

3

[email protected] MonetDB/XQuery: Updates ADT 2010

XQuery Update Facility 1.0 W3C Candidate Recommendation http://www.c3.org/TR/xquery-update-10/

• Categorize updates into• Value updates• Structural updates

(MonetDB/XQuery does not yet support the latest syntax changes made by W3C; for details see

http://monetdb.cwi.nl/XQuery/Documentation/XQuery-Updates.html)

XML/XQuery UpdatesXML/XQuery Updates

Page 4: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

4

[email protected] MonetDB/XQuery: Updates ADT 2010

do replace value of fn:doc("bib.xml")/books/book[1]/pricewith fn:doc("bib.xml")/books/book[1]/price * 1.1

do replace value of fn:doc(“bib.xml”)/books/book[2]/@isbnwith “90-6196-517-9”

do rename fn:doc(“bib.xml”)/books/book[3]/author[1]into “primary-author”

do rename fn:doc(“bib.xml”)/journals/journal[9]/@isbninto “issn”

=> map directly to simple value updates in relational storage

Value UpdatesValue Updates

Page 5: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

5

[email protected] MonetDB/XQuery: Updates ADT 2010

do insert attribute isbn {“90-6196-517”}into fn:doc("bib.xml")/books/book[17]

do delete fn:doc(“bib.xml”)/books/book[2]/@wrong

do insert <author>Stefan Manegold</author>after fn:doc(“bib.xml”)/books/book[33]/author[last()]

do replace fn:doc(“bib.xml”)/books/book[44]/author[1]with fn:doc(“bib.xml”)/books/book[33]/author[last()]

do delete fn:doc(“bib.xml”)/books/book[author = “Kermit”]

=> How to implement on pre-/post-encoding?

Structural UpdatesStructural Updates

Page 6: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

6

[email protected] MonetDB/XQuery: Updates ADT 2010

XML/XQuery XML/XQuery UpdatesUpdates

do insert <k><l/><m/></k> as first into /a/f/g

Page 7: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

7

[email protected] MonetDB/XQuery: Updates ADT 2010

XML/XML/XQuery XQuery UpdatesUpdates

do insert <k><l/><m/></k> as first into /a/f/g

Page 8: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

8

[email protected] MonetDB/XQuery: Updates ADT 2010

XML/XQuery UpdatesXML/XQuery Updates

Page 9: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

9

[email protected] MonetDB/XQuery: Updates ADT 2010

XML/XQuery UpdatesXML/XQuery Updates

Page 10: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

10

[email protected] MonetDB/XQuery: Updates ADT 2010

XML/XML/XQuery XQuery UpdatesUpdates

StaircaseStaircaseJoinJoin

Page 11: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

11

[email protected] MonetDB/XQuery: Updates ADT 2010

XML Storage RevisitedXML Storage Revisited

N9N8N7

N6N5N4N3N2nullnullN1N0nid

147

null03

30113010229

208

306305224

null-121510110

levelsizerid

309308227206145304303222131090

levelsizepre

null-12nullnull3

30113010229208147306305224

1510110

levelsizepre

69j58i77h46g85f14e03d22c31b90a

postpre

post = pre + size - level

Allow holes Define logical pages

Page 12: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

12

[email protected] MonetDB/XQuery: Updates ADT 2010

XML Storage RevisitedXML Storage Revisited

N5N4N3

N2N9N8N7N6nullnullN1N0nid

307

null03

14113010309

228

306225204

null-121510110

levelsizerid

309308227206145304303222131090

levelsizepre

null-12nullnull3

30113010229208147306305224

1510110

levelsizepre

69j58i77h46g85f14e03d22c31b90a

postpre

post = pre + size - level

Allow holes Define logical pages

122100

mappage

rid = pre.swizzle( )

Page 13: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

13

[email protected] MonetDB/XQuery: Updates ADT 2010

XML Storage RevisitedXML Storage RevisitedUpdate-friendly• rid-table is append-only• rid-tuples may be unused• rid = autoincrement column

MonetDB: • rid not stored but computed (virtual oid)• allows positional lookup/join

Opportunity currently not exploited by other RDBMS

Occurs widely in our XQuery translation.

N5N4N3

N2N9N8N7N6nullnullN1N0nid

307

null03

14113010309

228

306225204

null-121510110

levelsizerid

Page 14: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

14

[email protected] MonetDB/XQuery: Updates ADT 2010

XML Storage RevisitedXML Storage RevisitedUpdate-friendly• rid-table is append-only• rid-tuples may be unused• rid = autoincrement column

MonetDB: • rid not stored but computed (virtual oid)• allows positional lookup/join

Opportunity currently not exploited by other RDBMS

Occurs widely in our XQuery translation.

N5N4N3

N2N9N8N7N6nullnullN1N0nid

307

null03

14113010309

228

306225204

null-121510110

levelsizerid

Page 15: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

15

[email protected] MonetDB/XQuery: Updates ADT 2010

MonetDB/XQueryMonetDB/XQueryOur own XML DBMS with (almost..) full XQuery support.• Built purely on an RDBMS, namely MonetDB

Pathfinder compiler & “staircase join”:– Universität Tübingen (Torsten Grust, et al.)

– Technical University Twente (Maurice van Keulen, et. al.)

MonetDB High-Performance DBMS– CWI Amsterdam (Peter Boncz, Stefan Manegold, ...)

Useful for:

• Large XML databases!

• Querying XML annotations (multimedia, forensic NFI)

• XML information retrieval

• ...

Pathfinder Compiler

RelationalAlgebra

XQuery

RDBMS

(MonetDB)

Page 16: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

16

[email protected] MonetDB/XQuery: Updates ADT 2010

Research Projects & ExtensionsResearch Projects & Extensions• Value indeces

• Runtime optimization• SIGMOD'09 [Abdel Kader, Boncz, v. Keulen Manegold]

• Algebraic Query Optimization• Grust, Rittinger, et al. (Universität Tübingen)

• Distributed XQuery P2P XQuery• SOAP group communication, XQuery RPC

• VLDB'07 [Zhang, Boncz]

• Benchmarking beyond XMark• ExpDB'06 Workshop [Manegold]

• Support for XML Interval Annotations• XIME-P'06 Workshop [Alink et al.]

• Xquery + Information Retrieval: PF/Tijah

Page 17: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

17

[email protected] MonetDB/XQuery: Updates ADT 2010

ConclusionsConclusions• Relational approach can be scalable & fast

• MonetDB/XQuery compares favorably with all other available systems

• Techniques that made it work• Property-driven peephole optimization

Order & other properties

• Loop-lifted XPath steps Evaluate Sets of context nodes in a single pass

• Support for dense (autoincrement) keys Positional lookup

• Background Information & Literaturehttp://monetdb-xquery.orghttp://pathfinder-xquery.org

Page 18: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

18

[email protected] Other Xquery Processing Approaches ADT 2010

• 09.11.2010:

•RDBMS back-end support for XML/XQuery (1/2):

•Document Representation (XPath Accelerator, Pre/Post plane)

• 16.11.2010:

•XPath navigation (Staircase Join)

•XQuery to Relational Algebra Compiler:

•Item- & Sequence- Representation

•Efficient FLWoR Evaluation (Loop-Lifting)

•Optimization

• 23.11.2010:

•RDBMS back-end support for XML/XQuery (2/2):

•Updateable Document Representation

•Other (DB-) approaches to XML/XQuery processing

ScheduleSchedule

Page 19: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

19

[email protected] Other Xquery Processing Approaches ADT 2010

TopicsTopics Other approaches & techniques (selection, far from complete!)

Document storage / tree encoding:

ORDPATH

DataGuides

XPath processing:

Tree patterns, holistic twig joins

Page 20: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

20

[email protected] Other Xquery Processing Approaches ADT 2010

Fixed-Width Tree Encodings & UpdatesFixed-Width Tree Encodings & Updates Fixed-width tree encoding (like XPath Accelerator) are

Good for read(-only) processing

small footprint, positional lookup, staircase join

But inherently static

Milo et al., PODS 2002:

“There is a sequence of updates (subtree insertions) for any persistent tree encoding scheme E (where each node keeps its initial encoding label even under updates), such that E needs labels of length (N) to encode the resulting tree of N nodes.”

Page 21: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

21

[email protected] Other Xquery Processing Approaches ADT 2010

Fixed-Width Tree Encodings & UpdatesFixed-Width Tree Encodings & Updates Fixed-width tree encoding (like XPath Accelerator) are

Good for read(-only) processing

small footprint, positional lookup, staircase join

But inherently static

Milo et al., PODS 2002:

“There is a sequence of updates (subtree insertions) for any persistent tree encoding scheme E (where each node keeps its initial encoding label even under updates), such that E needs labels of length (N) to encode the resulting tree of N nodes.”

Page 22: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

22

[email protected] Other Xquery Processing Approaches ADT 2010

XML/XQuery XML/XQuery UpdatesUpdates

do insert <k><l/><m/></k> as first into /a/f/g

Page 23: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

23

[email protected] Other Xquery Processing Approaches ADT 2010

XML/XML/XQuery XQuery UpdatesUpdates

MonetDB/XQuery

hack:

exploit paging

& mmap trick

but:

updating pg|off

is still O(N)

do insert <k><l/><m/></k> as first into /a/f/g

Page 24: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

24

[email protected] Other Xquery Processing Approaches ADT 2010

Fixed-Width Tree Encodings & UpdatesFixed-Width Tree Encodings & Updates Fixed-width tree encoding (like XPath Accelerator) are

Good for read(-only) processing

small footprint, positional lookup, staircase join

But inherently static

Non-solutions:

Gaps in the encoding (never large enough)

Encoding based on decimal fractions (limited precision)

Possible solution:

Variable-width tree encodings:

Cheaper updates

At the expense of more expensive read(-only) processing

Page 25: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

25

[email protected] Other Xquery Processing Approaches ADT 2010

A Variable-Width Tree Encoding: ORDPATHA Variable-Width Tree Encoding: ORDPATH

O'Neil et al., SIGMOD 2004.

Page 26: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

26

[email protected] Other Xquery Processing Approaches ADT 2010

ORDPATH Encoding: ExampleORDPATH Encoding: Example

Page 27: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

27

[email protected] Other Xquery Processing Approaches ADT 2010

ORDPATH: Insertion Between SiblingsORDPATH: Insertion Between Siblings

Page 28: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

28

[email protected] Other Xquery Processing Approaches ADT 2010

ORDPATH: Insertion Between SiblingsORDPATH: Insertion Between Siblings

Page 29: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

29

[email protected] Other Xquery Processing Approaches ADT 2010

ORDPATH: Insertion Between SiblingsORDPATH: Insertion Between Siblings

Page 30: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

30

[email protected] Other Xquery Processing Approaches ADT 2010

Is ORDPATH suitable for XQuery?Is ORDPATH suitable for XQuery?• Mapping core operations of the XQuery processing model

to operations on ORDPATH labels:

Page 31: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

31

[email protected] Other Xquery Processing Approaches ADT 2010

ORDPATH: Variable-Length Node EncodingORDPATH: Variable-Length Node Encoding• For a 10 MB XML sample document, the authors of ORDPATH observed

label lengths between 6 and 12 bytes.

• ORDPATH labels encode root-to-node paths => common prefixes.

=> Label comparisons often need to inspect encoding bits at the far right.

• MS SQL Server employs further path encodings organized in reverse

(node-to-root) order.

• Note: - Preorder ranks fit into CPU registers.- 4 byte pre's sufficient for 232 = 4G nodes (11 GB XMark fits easily).- 8 byte pre's sufficient for 264 nodes, i.e., “the universe” ...

Page 32: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

32

[email protected] Other Xquery Processing Approaches ADT 2010

TopicsTopics Other approaches & techniques (selection, far from complete!)

Document storage / tree encoding:

ORDPATH

DataGuides

XPath processing:

Tree patterns, holistic twig joins

Page 33: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

33

[email protected] Other Xquery Processing Approaches ADT 2010

DataGuidesDataGuides

XPath Accelerator, ORDPATH & similar encoding schemes

encode the document's tree structure in the node ranks/labels

they assign

DataGuides

Developed in the context of Lore project (DBMS for semi-

structured data)

Stanford University, Goldman & Widom, VLDB 1997

encode the document's tree structure in relation names

Observation:

Each node is uniquely identified by its path from the root

Paths of siblings with equal tag names can be unified,

Provided we keep their relative order (rank) explicitly

Page 34: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

34

[email protected] Other Xquery Processing Approaches ADT 2010

DataGuidesDataGuides

Definition

given a semistructured data instance DB, a DataGuide for DB is a graph G s.t.:

- every path in DB also occurs in G

- every path in G occurs in DB

- every path in G is unique

Page 35: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

35

[email protected] Other Xquery Processing Approaches ADT 2010

Example:

DataGuidesDataGuides

Page 36: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

36

[email protected] Other Xquery Processing Approaches ADT 2010

■ Multiple DataGuides for the same data:

DataGuidesDataGuides

Page 37: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

37

[email protected] Other Xquery Processing Approaches ADT 2010

DefinitionLet p, p’ be two path expressions and G a graph; we define

p ≡ G p’ if p(G) = p’(G)

i.e., p and p' are indistinguishable on G.

DefinitionG is a strong dataguide for a database DB if ≡ G is the same as ≡ DB

Example:- G1 is a strong dataguide- G2 is not strong

person.project !≡ DB dept.project

person.project !≡ G1 dept.project

person.project ≡ G2 dept.project

DataGuidesDataGuides

Page 38: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

38

[email protected] Other Xquery Processing Approaches ADT 2010

■ Constructing the strong DataGuide G:

Nodes(G)={{root}}

Edges(G)=∅

while changes do

choose s in Nodes(G), a in Labels

add s’={y|x in s, (x -a->y) in Edges(DB)} to Nodes(G)

add (x -a->y) to Edges(G)

• Use hash table for Nodes(G)

• This is precisely the powerset automaton construction.

DataGuidesDataGuides

Page 39: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

39

[email protected] Other Xquery Processing Approaches ADT 2010

Monet XML approachMonet XML approach

Early attempt to store and query XML data in MonetDB

By Albrecht Schmidt

Not related to Pathfinder & MonetDB/XQuery

Page 40: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

40

[email protected] Other Xquery Processing Approaches ADT 2010

Monet XML approachMonet XML approach

Page 41: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

41

[email protected] Other Xquery Processing Approaches ADT 2010

Monet XML approachMonet XML approach

Page 42: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

42

[email protected] Other Xquery Processing Approaches ADT 2010

Monet XML approachMonet XML approachMonet XML approachMonet XML approach

Page 43: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

43

[email protected] Other Xquery Processing Approaches ADT 2010

Monet XML approachMonet XML approach

Early attempt to store and query XML data in MonetDB

By Albrecht Schmidt

Not related to Pathfinder & MonetDB/XQuery

No XQuery compiler

XMark queries are hand-crafted and -optimized in MIL

Child, Descendant, Parent & Ancestor steps become regular

expressions on the relation names (i.e., catalog)

Open: preceeding & following steps?

Page 44: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

44

[email protected] Other Xquery Processing Approaches ADT 2010

TopicsTopics Other approaches & techniques (selection, far from complete!)

Document storage / tree encoding:

ORDPATH

DataGuides

XPath processing:

Tree patterns, holistic twig joins

Page 45: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

45

[email protected] Other Xquery Processing Approaches ADT 2010

Twig Join AlgorithmsTwig Join Algorithms

So far: interpreted XPath expressions in an imperative manner

Evaluated XPath expressions step-by-step, as stated in the query

Given /1::1/2::2/.../n::n,

we first evaluated /, then XPath step 1::1, then step 2::2, ...

This may not always be the best choice:

Intermediate results can get very large, even if the final result is

small:

Database context => think in a declarative manner

DBMS optimizer / engine can evaluate query in “best” order

Page 46: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

46

[email protected] Other Xquery Processing Approaches ADT 2010

Tree PatternsTree Patterns

In fact, XPath is a declarative language. /descendant::timeline/child::event

“Find all nodes v1, v2, and v3, such that

v1 is a document root,

v2 is a descendant element of v1 and is named timeline, and

v3 is a child element of v2 and named event.

All nodes of type v3 form the query result.

Observe the combination of

(a) predicates on single nodes, and

(b) structural conditions between these nodes.

Page 47: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

47

[email protected] Other Xquery Processing Approaches ADT 2010

Tree PatternsTree Patterns

Structural conditions: Intuitively expressed as tree patterns:

Nodes labeled with node predicates

Structural conditions:

Double line: ancestor/descendant relationships

Single line: parent/child relationships

Arbitrary predicates are allowed, but typical are predicate on tag names:

Nodes labeled with requested tag name

Document root: label /

If not /-node specified:

search for pattern anywhere in the documenttimeline

event

Page 48: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

48

[email protected] Other Xquery Processing Approaches ADT 2010

Tree PatternsTree Patterns

timeline

event

Page 49: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

49

[email protected] Other Xquery Processing Approaches ADT 2010

Tree PatternsTree Patterns

Not limited to path patterns

May also be twig patterns

Mapping between tree patterns and XPath is in general not trivial

Examples:

a

b d

ec

f

g

h i

j

Page 50: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

50

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack AlgorithmPathStack Algorithm

Page 51: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

51

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns

abcde

d

e

Page 52: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

52

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns

Page 53: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

53

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns

Page 54: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

54

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns

Page 55: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

55

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns

timeline

timeline timeline

event

first timeline node visited

second timeline node visited

first event node visited

timeline

event

Page 56: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

56

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns

Page 57: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

57

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns

Page 58: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

58

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns

Page 59: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

59

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns

Page 60: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

60

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns

Page 61: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

61

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns

Page 62: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

62

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns

Page 63: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

63

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns

Page 64: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

64

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Path PatternsPathStack Algorithm: Path Patterns

Page 65: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

65

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Twig PatternsPathStack Algorithm: Twig Patterns

So far we only considered path patterns

Can we extend our ideas for efficient twig pattern evaluation?

Idea:

Decompose twig patterns into multiple path patterns.

All path patterns start from the same root.

Use PathStack for each of them and merge their results.

Page 66: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

66

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Twig PatternsPathStack Algorithm: Twig Patterns

Example: Decompose twig pattern into path patterns

a

b

c d

e

Page 67: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

67

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Twig PatternsPathStack Algorithm: Twig Patterns

Example: Decompose twig pattern into path patterns

a

b

c d

e

a a

b b

c d

e

Page 68: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

68

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Twig PatternsPathStack Algorithm: Twig Patterns

Page 69: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

69

[email protected] Other Xquery Processing Approaches ADT 2010

PathStack Algorithm: Twig PatternsPathStack Algorithm: Twig Patterns

Page 70: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

70

[email protected] Other Xquery Processing Approaches ADT 2010

Summary (1/5)Summary (1/5) XML

Document markup

Data exchange

Semi-structured

Tree model

DTDs

XML Schema

XPath

Navigation, location steps, axes, node tests, predicates, functions

XQuery

Sequences & Iterations (FLWoR expressions)

Page 71: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

71

[email protected] Other Xquery Processing Approaches ADT 2010

Summary (2/5)Summary (2/5) XML Data Management

XML file processors

XML databases

XML integration platforms

RDBMS with XML functionality, SQL/XML

Relational XML storage: schema-based vs. schema-oblivious

Page 72: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

72

[email protected] Other Xquery Processing Approaches ADT 2010

Summary (3/5)Summary (3/5) Purely Relational XML/XQuery processing: MonetDB/XQuery

Document encoding: XPath Accelerator (pre/post plane)

XPath navigation: Staircase Join

XQuery to Relational Algebra translation

Item- & Sequence-representation

Iterations: Loop-lifting

Loop-lifted staircase join

Peephole Optimization

Order-awareness, sort avoidance

XML/XQuery Update Support

Page 73: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

73

[email protected] Other Xquery Processing Approaches ADT 2010

Summary (4/5)Summary (4/5) Other approaches & techniques

Document storage/encoding:

ORDPATH

DataGuides

XPath processing:

Tree patterns, holistic twig joins

Page 74: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

74

[email protected] Other Xquery Processing Approaches ADT 2010

Summary (5/5)Summary (5/5) Literature

Slides

Literature references in slides

Literature references on website:

http://www.cwi.nl/~manegold/teaching/adt/html/xquery.html

• Tentamen / Exam:

Tuesday December 21 2010

09:00 – 11:00

Zaal / Room: A1.14

Page 75: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

75

[email protected] Other Xquery Processing Approaches ADT 2010

Projects: Join the MonetDB Team!Projects: Join the MonetDB Team!• Own ideas, suggestions, initiative welcome!

• Master Student Projects (6 Months)

• Various projects, each consisting of both research & implementation

• See monetdb.cwi.nl/Development/Research/Projects/ for a sample list

• Feel free to come with your own idea(s)!

• Implementation Projects

• Both short-term & long-term

• E.g. open feature requests: sf.net/tracker/?group_id=56967

• Become owner/maintainer of some (new) part of MonetDB

• We are (desperately) looking for Windows SW-development & system

experts!

Page 76: ADT 2010 XQuery Updates in MonetDB/XQuery Other Approaches ...homepages.cwi.nl/~manegold/teaching/adt/lectures/... · 1 Stefan.Manegold@CWI.nl MonetDB/XQuery: Updates ADT 2010 ADT

76

[email protected] Other Xquery Processing Approaches ADT 2010

• 24x7x365 support & advice

• Membership in a kind & friendly Family-Team of Experts

• Chance to participate in & contribute to a large & successful open-source research project

• Lots of experiences, exiting research & fun

• Desk & workstation at CWI

Fridge, micro-wave, free coffee, free soup, free cake (occasionally)

Master Students only (possibly part-time)

Limited availability => FCFS!

Some pocket money (stage vergoeding)

Master Students only

Limited availability => FCFS!

...

We Offer...We Offer...