The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

The Connection Factory

Jeroen van Rotterdam, CTO

May 19th, WWW9

Contents

- Xhive setup

- Xpath

- Xpath performance issues within XML collections

Xhive

- OO-XML database- Highly scalable- High granularity- W3C DOM L2 compliant- Xpath 1.0 compliant

Architecture

Xhive Core

OODB

XP

ath

DO

M C

ore

L2

Ex

ten

de

d D

OM

DO

M T

rav

ers

al

DB

Ad

min

istr

ato

r

RMI Layer ( EJB / CORBA / SOAP )

RMI Layer

Client

Sc

he

ma

SQ

L l

oa

de

r

Architecture

Why XPath

Competing solutions:

- XML-QL: Where-In constructs- XQL: limited- SQL: no alternative

Xpath a complete pattern match language.

Xpath

Advantages:

- fairly complete- multiple axes- supported by W3C- base for Xpointer, Xlink- base for XML Query WG- user based functions

Disadvantages:

- document oriented- minor different tree model- no updates

Extending DOMCollection setup:

Every document is a “Bastard Node”

getLastChild()getFirstChild()

null

Library Node

Document Nodes

getParentNode()

Library Node

Advantages

- Natural extension of DOM- extendible- closely related to directory structures- searchable with Xpath

Library Node

Disadvantages

- potential bottleneck

Xpath

- Xpath in a large PDOM collection environment:

1. Address memory issues2. Solve differences in specs3. Address performance issues

Memory issues

- Avoid recursion- make subresults persistent capable

Solve differences

Differences in specs are f.i.:

- getParent on attributes vs. ownerElement- namespace nodes

Performance

Increase Xpath performance:

- Query analysis- Avoid reparsing- Lazy evaluation- Index structures- Cache strategy- DTD analysis- Statistical data

Performance

1. Query analysis:

a. Can I simplify my query

f.i: /child::chapter[5+5]

Performance

1. Query analysis:

b. Does your query depends on the context node.

Absolute queries are context independent:

“Give me all chapters where the title is the same as the book title”

//chapter[title=string(/book/title)]Evaluate string(/book/title) only once.

Performance

2. Storing parsed queries:

“Compile”, optimize queries only once

Performance3. Lazy evaluation:

f.i. operations on Nodesets

- booleans (evaluate first node)- strings (first in doc order)- number (string to number)

Example: “give me all chapters which have paragraphs”

/chapter[paragraph]

Finding 1 paragraph will do

Performance

4. Indexing:

- getFirstChildElementByName(String name)- getNextSiblingElementBySameName()- getFirstChildByType( short type )- getNextSiblingByType( short type )

Performance5. Caching strategy:

top level paging/cluster strategy

Library Node

Document Nodes......

...... Root elements

Performance

6. Use DTD information:

f.i. /child::chapter/child::book[4]

Might return null if you have info on the DTD’s used.

Performance

7. Gather statistical info:

DTD’s or Xschema specify structures that may occur, not what’s actually in your collection.

Conclusion

- DOM within database environments- Xpath on top of a PDOM - Xpath is fairly complete- Focus on performance

WWW9

Beta testers, Developers wanted.

Email: [email protected]

Have fun…...

The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Documents

Transcript of The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.