The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

24
The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9

Transcript of The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Page 1: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

The Connection Factory

Jeroen van Rotterdam, CTO

May 19th, WWW9

Page 2: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Contents

- Xhive setup

- Xpath

- Xpath performance issues within XML collections

Page 3: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Xhive

- OO-XML database- Highly scalable- High granularity- W3C DOM L2 compliant- Xpath 1.0 compliant

Page 4: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Architecture

Xhive Core

OODB

XP

ath

DO

M C

ore

L2

Ex

ten

de

d D

OM

DO

M T

rav

ers

al

DB

Ad

min

istr

ato

r

RMI Layer ( EJB / CORBA / SOAP )

RMI Layer

Client

Sc

he

ma

SQ

L l

oa

de

r

Page 5: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Architecture

Page 6: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Why XPath

Competing solutions:

- XML-QL: Where-In constructs- XQL: limited- SQL: no alternative

Xpath a complete pattern match language.

Page 7: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Xpath

Advantages:

- fairly complete- multiple axes- supported by W3C- base for Xpointer, Xlink- base for XML Query WG- user based functions

Disadvantages:

- document oriented- minor different tree model- no updates

Page 8: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Extending DOMCollection setup:

Every document is a “Bastard Node”

getLastChild()getFirstChild()

null

Library Node

Document Nodes

getParentNode()

Page 9: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Library Node

Advantages

- Natural extension of DOM- extendible- closely related to directory structures- searchable with Xpath

Page 10: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Library Node

Disadvantages

- potential bottleneck

Page 11: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Xpath

- Xpath in a large PDOM collection environment:

1. Address memory issues2. Solve differences in specs3. Address performance issues

Page 12: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Memory issues

- Avoid recursion- make subresults persistent capable

Page 13: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Solve differences

Differences in specs are f.i.:

- getParent on attributes vs. ownerElement- namespace nodes

Page 14: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Performance

Increase Xpath performance:

- Query analysis- Avoid reparsing- Lazy evaluation- Index structures- Cache strategy- DTD analysis- Statistical data

Page 15: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Performance

1. Query analysis:

a. Can I simplify my query

f.i: /child::chapter[5+5]

Page 16: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Performance

1. Query analysis:

b. Does your query depends on the context node.

Absolute queries are context independent:

“Give me all chapters where the title is the same as the book title”

//chapter[title=string(/book/title)]Evaluate string(/book/title) only once.

Page 17: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Performance

2. Storing parsed queries:

“Compile”, optimize queries only once

Page 18: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Performance3. Lazy evaluation:

f.i. operations on Nodesets

- booleans (evaluate first node)- strings (first in doc order)- number (string to number)

Example: “give me all chapters which have paragraphs”

/chapter[paragraph]

Finding 1 paragraph will do

Page 19: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Performance

4. Indexing:

- getFirstChildElementByName(String name)- getNextSiblingElementBySameName()- getFirstChildByType( short type )- getNextSiblingByType( short type )

Page 20: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Performance5. Caching strategy:

top level paging/cluster strategy

Library Node

Document Nodes......

...... Root elements

Page 21: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Performance

6. Use DTD information:

f.i. /child::chapter/child::book[4]

Might return null if you have info on the DTD’s used.

Page 22: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Performance

7. Gather statistical info:

DTD’s or Xschema specify structures that may occur, not what’s actually in your collection.

Page 23: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

Conclusion

- DOM within database environments- Xpath on top of a PDOM - Xpath is fairly complete- Focus on performance

Page 24: The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.

WWW9

Beta testers, Developers wanted.

Email: [email protected]

Have fun…...