The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.
-
Upload
destiny-perez -
Category
Documents
-
view
214 -
download
0
Transcript of The Connection Factory Jeroen van Rotterdam, CTO May 19th, WWW9.
The Connection Factory
Jeroen van Rotterdam, CTO
May 19th, WWW9
Contents
- Xhive setup
- Xpath
- Xpath performance issues within XML collections
Xhive
- OO-XML database- Highly scalable- High granularity- W3C DOM L2 compliant- Xpath 1.0 compliant
Architecture
Xhive Core
OODB
XP
ath
DO
M C
ore
L2
Ex
ten
de
d D
OM
DO
M T
rav
ers
al
DB
Ad
min
istr
ato
r
RMI Layer ( EJB / CORBA / SOAP )
RMI Layer
Client
Sc
he
ma
SQ
L l
oa
de
r
Architecture
Why XPath
Competing solutions:
- XML-QL: Where-In constructs- XQL: limited- SQL: no alternative
Xpath a complete pattern match language.
Xpath
Advantages:
- fairly complete- multiple axes- supported by W3C- base for Xpointer, Xlink- base for XML Query WG- user based functions
Disadvantages:
- document oriented- minor different tree model- no updates
Extending DOMCollection setup:
Every document is a “Bastard Node”
getLastChild()getFirstChild()
null
Library Node
Document Nodes
getParentNode()
Library Node
Advantages
- Natural extension of DOM- extendible- closely related to directory structures- searchable with Xpath
Library Node
Disadvantages
- potential bottleneck
Xpath
- Xpath in a large PDOM collection environment:
1. Address memory issues2. Solve differences in specs3. Address performance issues
Memory issues
- Avoid recursion- make subresults persistent capable
Solve differences
Differences in specs are f.i.:
- getParent on attributes vs. ownerElement- namespace nodes
Performance
Increase Xpath performance:
- Query analysis- Avoid reparsing- Lazy evaluation- Index structures- Cache strategy- DTD analysis- Statistical data
Performance
1. Query analysis:
a. Can I simplify my query
f.i: /child::chapter[5+5]
Performance
1. Query analysis:
b. Does your query depends on the context node.
Absolute queries are context independent:
“Give me all chapters where the title is the same as the book title”
//chapter[title=string(/book/title)]Evaluate string(/book/title) only once.
Performance
2. Storing parsed queries:
“Compile”, optimize queries only once
Performance3. Lazy evaluation:
f.i. operations on Nodesets
- booleans (evaluate first node)- strings (first in doc order)- number (string to number)
Example: “give me all chapters which have paragraphs”
/chapter[paragraph]
Finding 1 paragraph will do
Performance
4. Indexing:
- getFirstChildElementByName(String name)- getNextSiblingElementBySameName()- getFirstChildByType( short type )- getNextSiblingByType( short type )
Performance5. Caching strategy:
top level paging/cluster strategy
Library Node
Document Nodes......
...... Root elements
Performance
6. Use DTD information:
f.i. /child::chapter/child::book[4]
Might return null if you have info on the DTD’s used.
Performance
7. Gather statistical info:
DTD’s or Xschema specify structures that may occur, not what’s actually in your collection.
Conclusion
- DOM within database environments- Xpath on top of a PDOM - Xpath is fairly complete- Focus on performance