VTD-XML: The Future of XML Processing
-
Upload
guo-albert -
Category
Technology
-
view
3.395 -
download
1
Transcript of VTD-XML: The Future of XML Processing
VTD-XML◦ Motivations Behind VTD-XML◦ Why VTD-XML?◦ When to Use VTD-XML?◦ Known Limitations◦ Basic Concept◦ Essential Classes◦ Shortcomings
Typical Programming Flows Demo Reference
Agenda
2
*Numerous*, well-known issues of old XML processing models, below summarizes a few:
Comparison with DOM, SAX and PULL◦ http://vtd-xml.sourceforge.net/userGuide/5.html
Motivations Behind VTD-XML
3
Type Features
DOM Too slow and resource intensive
SAX Forward only; treat XML as CSV; performance/memory benefits insufficient to justify its difficulty
Pull Only programming style change; inherit most of the problems from SAX
The next generation XML processing model that is simultaneously:◦ The world's most memory-efficient random-access
XML parser. ◦ The world's fastest XML parser◦ The world's fastest XPath 1.0 implementation.◦ The world's most efficient XML indexer that
seamlessly integrates with your XML applications.◦ The world's only incremental-update capable XML
parser capable of cutting, pasting, splitting and assembling XML documents with max efficiency.
◦ The world's only XML parser that allows you to use XPath to process 256 GB XML documents.
Why VTD-XML?
4
The scenarios that you may consider using VTD-XML◦ Large XML files that DOM can’t handle◦ Performance-critical transactional Web-
Services/SOA applications◦ Native XML database applications◦ Network-based XML content
switching/routing/security applications
When to Use VTD-XML?
5
Not yet support external entities (those declared within DTD)
Not yet process DTD (return as a single VTD record)
Schema validation feature is planned for a future release.
Extreme long (>=512 chars) element/attribute names or ultra deep document (>= 255 levels) will cause parse exception
http://vtd-xml.sourceforge.net/userGuide/0.html
Known Limitations
6
Basic ConceptNon-extractive tokenization based on
Virtual Token Descriptor (VTD): use 64-bit integers to encode offsets, lengths, token types, depths
The XML document is kept intact and un-decoded.
7
Basic Concept – cont. In other words, in vast majority of the cases
string allocation is *unnecessary*, and nothing but a waste of CPU and memory
VTD-XML performs many string operations directly on VTD recordsString to VTD record comparison (both boolean
and lexicographically)Direct conversions from VTD records to ints,
longs, floats and doublesVTD record to String conversion also provided,
but avoid them whenever possible for performance reasons
8
Basic Concept – cont.VTD-XML’s document hierarchy consists
*exclusively* of elementsMove a single, global cursor to different
locations in the document treeMany VTDNav’s methods identify a VTD
record with its index value -1 corresponds to “no such record”
9
Essential Classes
10
Class Description
VTDGen Encapsulates the parsing, indexing routines
VTDNav VTD navigator allows cursor-based random access and various functions operating on VTD records
AutoPilot Contains XPath and Node iteration functions
XMLModifier
Incrementally update XML
Essential Classes – cont.
11
Class Descriotion
ParseException Thrown during parsing when XML is not well-formed
IndexingReadException Thrown by VTDGen when there is error in loading index
IndexingWriteException Thrown by VTDGen when there is error writing index
NavException Thrown when there is an exception condition when navigating VTD records
PilotException Child class of NavException; thrown when using autoPilot to perform node iteration.
XPathParseException Thrown by autoPilot when compiling an XPath expression
XPathEvalException Thrown by autoPilot when evaluating an XPath expression
ModifyException Thrown by XMLModifier when updating XML file
Poor exception handling
Shortcomings
12
If this method does not execute properly, it will just return false from parseFile method, and does not report any exception message.
Add BufferedInput Stream in parseFile method to avoid running out of read buffer max size in UNIX platform
Shortcomings – cont.
13
You need to modify the build.bat to rebuild VTD-XML jar file, then set it into class path.
//add commons-io jar file into the first linejavac -classpath .;D:\lib\commons-io-1.4\commons-io-1.4.jar com\ximpleware\*.javajavac com\ximpleware\xpath\*.javajavac com\ximpleware\parser\*.java…Finally, you just need to execute build.bat file. Then it will generate the brand-new jar file for you.
Typical Programming Flows
Call VTDGen’s parse()
Start with a byte buffer containing the content of XML, call set_doc() of VTDGen
Move VTDNav’s cursor manually tovarious locations and perform corresponding application logic
Obtain an instanceVTDNav from VTDGen
Call VTDGen’s parseFile(…)
Instantiate autoPilot for nodeiteration and XPath to performCorresponding application logic
Call VTDGen’s loadIndex(…)
14
Demo
15
1. Add <age> tag after <geneder>
16
1. Add <age> tag after <geneder> – cont.
17
Compiled XPath expression
Binded with NTDNav
Moved to gender cursor, and added <age> tag after <gender> tag
Assigned age value
Outputted to new xml file
2. Remove <age> tag
18
2. Remove <age> tag – cont.
19
Compiled XPath expression
Binded with NTDNav
Outputted to new xml file
Remove <age>
3. Add Contact info after <age> tag
20
3. Add Contact info after <age> tag – cont.
21
Compiled XPath expression
Binded with NTDNav
Assigned age value
Outputted to new xml file
Inserted new value after <gender> tag
4. Visit XML file
22
2009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:339) - name=Albert, gender=男2009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=02-111111112009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=09111111112009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=09155555552009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:339) - name=Mandy, gender=女2009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=02-222222222009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=09122222222009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:339) - name=Verio, gender=男2009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=0913333333
4. Visit XML file – cont.
23
Compiled XPath expression
Binded with NTDNav
4. Visit XML file – cont.
24
Official site◦ http://vtd-xml.sourceforge.net/
Jar files, source code, sample code◦ http://sourceforge.net/projects/vtd-xml/files/
JavaDoc◦ http://vtd-xml.sourceforge.net/javadoc/
Accelerate WSS applications with VTD-XML◦ http://
www.javaworld.com/javaworld/jw-01-2007/jw-01-vtd.html
Reference
25
Process SOAP with VTD-XML◦ http://
jimmyzhang.sys-con.com/node/48764/mobile XPath Tutorial:
◦ http://www.w3schools.com/XPath/default.asp Sample code
◦ http://sites.google.com/site/junyuo/Home/code
26
Reference – cont.