VTD-XML: The Future of XML Processing

26
VTD-XML: The Future of XML Processing Albert Guo [email protected]

Transcript of VTD-XML: The Future of XML Processing

Page 1: VTD-XML: The Future of XML Processing

VTD-XML: The Future of XML Processing

Albert [email protected]

Page 2: VTD-XML: The Future of XML Processing

VTD-XML◦ Motivations Behind VTD-XML◦ Why VTD-XML?◦ When to Use VTD-XML?◦ Known Limitations◦ Basic Concept◦ Essential Classes◦ Shortcomings

Typical Programming Flows Demo Reference

Agenda

2

Page 3: VTD-XML: The Future of XML Processing

*Numerous*, well-known issues of old XML processing models, below summarizes a few:

Comparison with DOM, SAX and PULL◦ http://vtd-xml.sourceforge.net/userGuide/5.html

Motivations Behind VTD-XML

3

Type Features

DOM Too slow and resource intensive

SAX Forward only; treat XML as CSV; performance/memory benefits insufficient to justify its difficulty

Pull Only programming style change; inherit most of the problems from SAX

Page 4: VTD-XML: The Future of XML Processing

The next generation XML processing model that is simultaneously:◦ The world's most memory-efficient random-access

XML parser. ◦ The world's fastest XML parser◦ The world's fastest XPath 1.0 implementation.◦ The world's most efficient XML indexer that

seamlessly integrates with your XML applications.◦ The world's only incremental-update capable XML

parser capable of cutting, pasting, splitting and assembling XML documents with max efficiency.

◦ The world's only XML parser that allows you to use XPath to process 256 GB XML documents.

Why VTD-XML?

4

Page 5: VTD-XML: The Future of XML Processing

The scenarios that you may consider using VTD-XML◦ Large XML files that DOM can’t handle◦ Performance-critical transactional Web-

Services/SOA applications◦ Native XML database applications◦ Network-based XML content

switching/routing/security applications

When to Use VTD-XML?

5

Page 6: VTD-XML: The Future of XML Processing

Not yet support external entities (those declared within DTD)

Not yet process DTD (return as a single VTD record)

Schema validation feature is planned for a future release.

Extreme long (>=512 chars) element/attribute names or ultra deep document (>= 255 levels) will cause parse exception

http://vtd-xml.sourceforge.net/userGuide/0.html

Known Limitations

6

Page 7: VTD-XML: The Future of XML Processing

Basic ConceptNon-extractive tokenization based on

Virtual Token Descriptor (VTD): use 64-bit integers to encode offsets, lengths, token types, depths

The XML document is kept intact and un-decoded.

7

Page 8: VTD-XML: The Future of XML Processing

Basic Concept – cont. In other words, in vast majority of the cases

string allocation is *unnecessary*, and nothing but a waste of CPU and memory

VTD-XML performs many string operations directly on VTD recordsString to VTD record comparison (both boolean

and lexicographically)Direct conversions from VTD records to ints,

longs, floats and doublesVTD record to String conversion also provided,

but avoid them whenever possible for performance reasons

8

Page 9: VTD-XML: The Future of XML Processing

Basic Concept – cont.VTD-XML’s document hierarchy consists

*exclusively* of elementsMove a single, global cursor to different

locations in the document treeMany VTDNav’s methods identify a VTD

record with its index value -1 corresponds to “no such record”

9

Page 10: VTD-XML: The Future of XML Processing

Essential Classes

10

Class Description

VTDGen Encapsulates the parsing, indexing routines

VTDNav VTD navigator allows cursor-based random access and various functions operating on VTD records

AutoPilot Contains XPath and Node iteration functions

XMLModifier

Incrementally update XML

Page 11: VTD-XML: The Future of XML Processing

Essential Classes – cont.

11

Class Descriotion

ParseException Thrown during parsing when XML is not well-formed

IndexingReadException Thrown by VTDGen when there is error in loading index

IndexingWriteException Thrown by VTDGen when there is error writing index

NavException Thrown when there is an exception condition when navigating VTD records

PilotException Child class of NavException; thrown when using autoPilot to perform node iteration.

XPathParseException Thrown by autoPilot when compiling an XPath expression

XPathEvalException Thrown by autoPilot when evaluating an XPath expression

ModifyException Thrown by XMLModifier when updating XML file

Page 12: VTD-XML: The Future of XML Processing

Poor exception handling

Shortcomings

12

If this method does not execute properly, it will just return false from parseFile method, and does not report any exception message.

Page 13: VTD-XML: The Future of XML Processing

Add BufferedInput Stream in parseFile method to avoid running out of read buffer max size in UNIX platform

Shortcomings – cont.

13

You need to modify the build.bat to rebuild VTD-XML jar file, then set it into class path.

//add commons-io jar file into the first linejavac -classpath .;D:\lib\commons-io-1.4\commons-io-1.4.jar com\ximpleware\*.javajavac com\ximpleware\xpath\*.javajavac com\ximpleware\parser\*.java…Finally, you just need to execute build.bat file. Then it will generate the brand-new jar file for you.

Page 14: VTD-XML: The Future of XML Processing

Typical Programming Flows

Call VTDGen’s parse()

Start with a byte buffer containing the content of XML, call set_doc() of VTDGen

Move VTDNav’s cursor manually tovarious locations and perform corresponding application logic

Obtain an instanceVTDNav from VTDGen

Call VTDGen’s parseFile(…)

Instantiate autoPilot for nodeiteration and XPath to performCorresponding application logic

Call VTDGen’s loadIndex(…)

14

Page 15: VTD-XML: The Future of XML Processing

Demo

15

Page 16: VTD-XML: The Future of XML Processing

1. Add <age> tag after <geneder>

16

Page 17: VTD-XML: The Future of XML Processing

1. Add <age> tag after <geneder> – cont.

17

Compiled XPath expression

Binded with NTDNav

Moved to gender cursor, and added <age> tag after <gender> tag

Assigned age value

Outputted to new xml file

Page 18: VTD-XML: The Future of XML Processing

2. Remove <age> tag

18

Page 19: VTD-XML: The Future of XML Processing

2. Remove <age> tag – cont.

19

Compiled XPath expression

Binded with NTDNav

Outputted to new xml file

Remove <age>

Page 20: VTD-XML: The Future of XML Processing

3. Add Contact info after <age> tag

20

Page 21: VTD-XML: The Future of XML Processing

3. Add Contact info after <age> tag – cont.

21

Compiled XPath expression

Binded with NTDNav

Assigned age value

Outputted to new xml file

Inserted new value after <gender> tag

Page 22: VTD-XML: The Future of XML Processing

4. Visit XML file

22

2009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:339) - name=Albert, gender=男2009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=02-111111112009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=09111111112009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=09155555552009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:339) - name=Mandy, gender=女2009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=02-222222222009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=09122222222009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:339) - name=Verio, gender=男2009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=0913333333

Page 23: VTD-XML: The Future of XML Processing

4. Visit XML file – cont.

23

Compiled XPath expression

Binded with NTDNav

Page 24: VTD-XML: The Future of XML Processing

4. Visit XML file – cont.

24

Page 25: VTD-XML: The Future of XML Processing

Official site◦ http://vtd-xml.sourceforge.net/

Jar files, source code, sample code◦ http://sourceforge.net/projects/vtd-xml/files/

JavaDoc◦ http://vtd-xml.sourceforge.net/javadoc/

Accelerate WSS applications with VTD-XML◦ http://

www.javaworld.com/javaworld/jw-01-2007/jw-01-vtd.html

Reference

25

Page 26: VTD-XML: The Future of XML Processing

Process SOAP with VTD-XML◦ http://

jimmyzhang.sys-con.com/node/48764/mobile XPath Tutorial:

◦ http://www.w3schools.com/XPath/default.asp Sample code

◦ http://sites.google.com/site/junyuo/Home/code

26

Reference – cont.