Dan SuciuXML Toolkit1 From Searching Text to Querying XML Streams Dan Suciu .

50
Dan Suciu XML Toolkit 1 From Searching Text to Querying XML Streams Dan Suciu www.cs.washington.edu/homes/ suciu

Transcript of Dan SuciuXML Toolkit1 From Searching Text to Querying XML Streams Dan Suciu .

Dan Suciu XML Toolkit 1

From Searching Text to Querying XML Streams

Dan Suciu

www.cs.washington.edu/homes/suciu

Dan Suciu XML Toolkit 2

About Me• Born 1957, Romania• BS: Bucharest, PhD: University of Pennsylvania• Now: University of Washington (Seattle)

My work is on semistructured data• Book: Data on the Web:

From relations, to semistructured data and XML

Past/present projects:• XML-QL = precursor of XQuery• XMill = the XML compressor• XML toolkit

Dan Suciu XML Toolkit 3

Motivation

• Text databases– Studied over the past 15 years– Traditional client/server model– Struggled with lack of standard text syntax

• Recently, new standard: XML– Traditional client/server: in today’s dbms– New applications: stream processing

• This talk: processing stream XML data– My motivation: work on the XML Toolkit project

Dan Suciu XML Toolkit 4

Outline

• Background

• The XML stream processing problem

• Basic XML processing with automata

• Adapting automata to XML

• Stream indexes

• Conclusions

Dan Suciu XML Toolkit 5

Background:Relational Databases

• Structured, stored in tables

• Schema separate from data

• Queries: precise, refer to schema and data (SQL)

: BOOKS

ISBN Title Year Publisher

0201537710Foundations of

Databases1995 AW

155860622X Data on the Web 1999 MK

AUTHOR

AID Name Country

44 Abiteboul FR

06 Buneman UK

62 Hull USA

12 Suciu USA

29 Vianu USA

WROTE:

ISBN AID

0201537710 44

0201537710 62

0201537710 29

155860622X 44

155860622X 06

155860622X 12

Hard to publish, easy to query preciselyHard to publish, easy to query precisely

Dan Suciu XML Toolkit 6

Background:Text Databases

• Unstructured, stored in documents

• No schema, only data

• Queries: imprecise, refer to data only (keywords)

Foundations of Databases,

Abiteboul (FR), Hull (USA), Vianu (USA)

Addison Wesley,

1995

Foundations of Databases,

Abiteboul (FR), Hull (USA), Vianu (USA)

Addison Wesley,

1995

Data on the Web

Abiteoul (FR), Buneman (UK), Suciu (USA)

Morgan Kaufmann,

1999

Data on the Web

Abiteoul (FR), Buneman (UK), Suciu (USA)

Morgan Kaufmann,

1999

Easy to publish, hard to query preciselyEasy to publish, hard to query precisely

Dan Suciu XML Toolkit 7

Background:XML Data• Semistructured

• Schema and data are together: self-describing• Queries: precise, refer to schema and data (SQL)

<bib> <book> <title> Foundations… </title> <author> <name> Abiteboul </name> <country> FR </country> </author> <author> <name> Hull </name> <country> USA </country> </author> <author> <name> Vianu </name> <country> USA </country> </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> …

</bib>

<bib> <book> <title> Foundations… </title> <author> <name> Abiteboul </name> <country> FR </country> </author> <author> <name> Hull </name> <country> USA </country> </author> <author> <name> Vianu </name> <country> USA </country> </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> …

</bib>

XML: Easier to publish,easy to query precisely

XML: Easier to publish,easy to query precisely

Dan Suciu XML Toolkit 8

Background:XML Data

bib

book

paper

titletitle

author author author publisherauthor journal

book

Data onthe Web

name country

Abiteboul FR Buneman UK

name countryAddisonWesley

Data model = tree

Dan Suciu XML Toolkit 9

Background:XML Data

• Querying with XPath (and XQuery)• This talk: XPath queries restricted to:

tag///* [ ]path=“constant”

Dan Suciu XML Toolkit 10

Background:XPath in One Slide

/bib/book[author/name=“Abiteboul”]/bib/book[author/name=“Abiteboul”]

/bib/book/[year=“1995” and author[name=“Abiteboul” and country=“FR”]]/bib/book/[year=“1995” and author[name=“Abiteboul” and country=“FR”]]

/bib/book/author/name/bib/book/author/name

/bib/book//name/*/zip/bib/book//name/*/zip

tag, /

//,*

[ ]

This is precisely the “region algebra”

E.g. use proximal nodes [Navarro&Baeza-Yates’97]

This is precisely the “region algebra”

E.g. use proximal nodes [Navarro&Baeza-Yates’97]

Navigate partially known structure

Conjunctivequeries ala SQL

Dan Suciu XML Toolkit 11

Outline

• Background

• The XML stream processing problem

• Basic XML processing with automata

• Adapting automata to XML

• Stream indexes

• Conclusions

Dan Suciu XML Toolkit 12

Main Application:XML Packet Routing

• Selective Dissemination of Information [Altinel&Franklin’00, Chan et al.02]

• XML content routing [Snoeren et al.01]

• SOAP Message routing in Application Servers

Dan Suciu XML Toolkit 13

XML Packet Routing<doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc><doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc><doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc><doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc><doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc><doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc><doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc><doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc> <doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc><doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc><doc>

<tag> value </tag>

</doc>

<doc>

<tag> value </tag>

</doc>

Dan Suciu XML Toolkit 14

/bib/book /publisher=“MK”/bib/book [category=“recent”]/title =“Web”/bib/book //address//*/zip=“123”/bib/book //address//*="Galaxy"/bib/book /category=“recent”/bib/book /address=“123”/bib/book /address /field=“567”

/bib/book /publisher=“MK”/bib/book [category=“recent”]/title =“Web”/bib/book //address//*/zip=“123”/bib/book //address//*="Galaxy"/bib/book /category=“recent”/bib/book /address=“123”/bib/book /address /field=“567”

XPath expressions

<bib> <book>...</bib>

<bib> <book>...</bib>

Input XML StreamOutput XML Streams

Dan Suciu XML Toolkit 15

The XML Stream Processing Problem

Given:A set of XPath expressionsAn Incoming stream of XML documents

Decide:For each document which expressions it matches

Given:A set of XPath expressionsAn Incoming stream of XML documents

Decide:For each document which expressions it matches

Hard: Large number of XPath expressions e.g. 103 - 106

Streaming XML data, high throughput e.g. 5MB/s

Easy: Shallow XML data e.g. depth=20 Short XPath expressions

Hard: Large number of XPath expressions e.g. 103 - 106

Streaming XML data, high throughput e.g. 5MB/s

Easy: Shallow XML data e.g. depth=20 Short XPath expressions

Dan Suciu XML Toolkit 16

The ApproachesBasic techniques• NFA plus optimizations:

– Xfilter/Yfilter [Altinel&Franklin’00]– XTrie [Chan et al.02]

• DFA:– XML Toolkit

Beyond the obvious• Stream indexes (XML Toolkit)• Stream views

Dan Suciu XML Toolkit 17

Outline

• Background

• The XML stream processing problem

• Basic XML processing with automata

• Adapting automata to XML

• Stream indexes

• Conclusions

Dan Suciu XML Toolkit 18

From XPath to NFA

/catalog/product[category="tools"][*/price = 200]/quantity//price

/catalog/product[category="tools"][*/price = 200]/quantity//price

Extra processing needed

to combine branches

(not in this talk)

Extra processing needed

to combine branches

(not in this talk)

catalog

product

category

price

quantity

"tools"

200

*

price

*

Dan Suciu XML Toolkit 19

Basic NFA Evaluation/bib/book /publisher=“MK”/bib/book [category=“recent”]/title/bib/book //address//*/zip=“123”/bib/book //address//*="Galaxy"/bib/book /category=“recent”/bib/book /address=“123”/bib/book /address /field=“567”/bib/book /tag=“some”/bib/book [category=“recent”]/title/bib/book //address//*=“Seattle"/bib/book //address//*="Galaxy"/bib/book /category=“recent”/bib/book /address=“Lisbon”/bib/book /address /field=“some”. . .. . .. . ./bib/book/publisher=“AW”/bib/book [category=“recent”]/title/bib/book //address//*=“123”/bib/book //address//*="Galaxy"/bib/book /category=“new”/bib/book /address=“London”/bib/book /address /field =“some”/bib/book/category =“old”

/bib/book /publisher=“MK”/bib/book [category=“recent”]/title/bib/book //address//*/zip=“123”/bib/book //address//*="Galaxy"/bib/book /category=“recent”/bib/book /address=“123”/bib/book /address /field=“567”/bib/book /tag=“some”/bib/book [category=“recent”]/title/bib/book //address//*=“Seattle"/bib/book //address//*="Galaxy"/bib/book /category=“recent”/bib/book /address=“Lisbon”/bib/book /address /field=“some”. . .. . .. . ./bib/book/publisher=“AW”/bib/book [category=“recent”]/title/bib/book //address//*=“123”/bib/book //address//*="Galaxy"/bib/book /category=“new”/bib/book /address=“London”/bib/book /address /field =“some”/bib/book/category =“old”

<bib> <book>...</bib>

NFA

. . . . . .

XPath

3,66,102,4534,...

2,3,543,43,254

1,55,99,...

STACK

SAXevents

Current states

Dan Suciu XML Toolkit 20

Basic NFA Evaluation

Properties: Space = linear Throughput = decreases linearly

Systems:

• XFilter [Altinel&Franklin’99], YFilter.

• XTrie [Chan et al.’02]

Dan Suciu XML Toolkit 21

Basic DFA Evaluation/bib/book /publisher=“MK”/bib/book [category=“recent”]/title/bib/book //address//*/zip=“123”/bib/book //address//*="Galaxy"/bib/book /category=“recent”/bib/book /address=“123”/bib/book /address /field=“567”/bib/book /tag=“some”/bib/book [category=“recent”]/title/bib/book //address//*=“Seattle"/bib/book //address//*="Galaxy"/bib/book /category=“recent”/bib/book /address=“Lisbon”/bib/book /address /field=“some”. . .. . .. . ./bib/book/publisher=“AW”/bib/book [category=“recent”]/title/bib/book //address//*=“123”/bib/book //address//*="Galaxy"/bib/book /category=“new”/bib/book /address=“London”/bib/book /address /field =“some”/bib/book/category =“old”

/bib/book /publisher=“MK”/bib/book [category=“recent”]/title/bib/book //address//*/zip=“123”/bib/book //address//*="Galaxy"/bib/book /category=“recent”/bib/book /address=“123”/bib/book /address /field=“567”/bib/book /tag=“some”/bib/book [category=“recent”]/title/bib/book //address//*=“Seattle"/bib/book //address//*="Galaxy"/bib/book /category=“recent”/bib/book /address=“Lisbon”/bib/book /address /field=“some”. . .. . .. . ./bib/book/publisher=“AW”/bib/book [category=“recent”]/title/bib/book //address//*=“123”/bib/book //address//*="Galaxy"/bib/book /category=“new”/bib/book /address=“London”/bib/book /address /field =“some”/bib/book/category =“old”

<bib> <book>...</bib>

XPath

399

552

1

STACKSAXevents

DFAs

Current state

Dan Suciu XML Toolkit 22

Basic DFA Evaluation

Properties: Throughput = constant ! Space = GOOD QUESTION

System:

• XML Toolkit [University of Washington]http://xmltk.sourceforge.net

Dan Suciu XML Toolkit 23

XMLTK: An XML Toolkit for Scalable XML Stream Processing

I. Avila-Campillo, T.J. Green, A. Gupta, M. Onizuka,

D. Raven, D. Suciu

Dan Suciu XML Toolkit 24

Motivation

• Lots of data sits in large text files– ad hoc data formats

• “Queried” with Unix command line tools– grep, sort, tail, etc

• Would be nice to XML-ize it...

• ...but then the Unix command line tools won’t work any more.

Dan Suciu XML Toolkit 25

Example

• In the old Unix world…

6 accept P054 “Theory of XML parsing”7 reject P021 “Experience with an XML optimizer”7 accept P069 “Towards a unified theory of data models”. . . . . .

6 accept P054 “Theory of XML parsing”7 reject P021 “Experience with an XML optimizer”7 accept P069 “Towards a unified theory of data models”. . . . . .

score decision paperID title

grep “reject” papers.txt | sort | tail 10grep “reject” papers.txt | sort | tail 10

• Find the top ten rejected papers (in score order):

Text file

Dan Suciu XML Toolkit 26

Example (cont’d)

• In the new XML world…

<submissions><paper> <score> 6 </score> <decision> accept </decision> <paperID> P054 <paperID> <title>Theory of XML parsing </title></paper><paper> <score> 3 </score> <decision> reject </decision> <paperID> P021 </paperID> <title> Experience with an XML optimizer </title></paper>. . . . .

<submissions><paper> <score> 6 </score> <decision> accept </decision> <paperID> P054 <paperID> <title>Theory of XML parsing </title></paper><paper> <score> 3 </score> <decision> reject </decision> <paperID> P021 </paperID> <title> Experience with an XML optimizer </title></paper>. . . . .

… can’t use those tools anymore

Dan Suciu XML Toolkit 27

Example (con’d)

Doing it with the XML Toolkit:

Finds top ten rejected <paper>s, in <score> order

xsort –c /submissions –e paper[decision/text()=“reject”] –k score/text() papers.xml| xtail –c /submissions –e paper –n 10

xsort –c /submissions –e paper[decision/text()=“reject”] –k score/text() papers.xml| xtail –c /submissions –e paper –n 10

Dan Suciu XML Toolkit 28

Goals of the XML Toolkit

Simple, scalable tools for XML processing

• Provides service: there are people who need this

• Provides a research platform: for XML stream processing

Dan Suciu XML Toolkit 29

Outline

• The tools

• The XPath processing engine

• Conclusions

Dan Suciu XML Toolkit 30

The ToolsCurrent tools:• xsort• xagg• xnest• xflatten• xdelete• xpair• xhead• xtail• file2xml• xmill

Will talk only about this

May look plenty, but actually still incomplete...

Dan Suciu XML Toolkit 31

XSort: Definition

-c = the context, i.e. where to sort

-e = the item, i.e what to sort

-k = the key, i.e. what to sort on

xsort (–c XPathExpr (-e XPathExpr (-k XPathExpr)*)*)*xsort (–c XPathExpr (-e XPathExpr (-k XPathExpr)*)*)*

General form

Dan Suciu XML Toolkit 32

XSort: Definition

XSort

cc

c

e1e2

e3e4 e5e6 e7 e8 e9

cc

ce4

e1e3

e2 e6 e7 e5e9

e8

Dan Suciu XML Toolkit 33

XSort Examples

<bib> <book> <author>Elliotte Rusty Harold</author> <author>W. Scott Means</author> <title>XML in a Nutshell</title> <publisher>O'Reilly</publisher> <year>2001</year> <isbn>0-596-00058-8</isbn> </book> <paper> <author>Sylvain Devillers</author> <title>XML and XSLT Modeling for Multimedia Bitstream Manipulation.</title> <year>2001</year> <booktitle>WWW Posters</booktitle> <ee>http://www10.org/cdrom/posters/1112.pdf</ee> <url>db/conf/www/www2001p.html#Devillers01</url> </paper>. . . . .

<bib> <book> <author>Elliotte Rusty Harold</author> <author>W. Scott Means</author> <title>XML in a Nutshell</title> <publisher>O'Reilly</publisher> <year>2001</year> <isbn>0-596-00058-8</isbn> </book> <paper> <author>Sylvain Devillers</author> <title>XML and XSLT Modeling for Multimedia Bitstream Manipulation.</title> <year>2001</year> <booktitle>WWW Posters</booktitle> <ee>http://www10.org/cdrom/posters/1112.pdf</ee> <url>db/conf/www/www2001p.html#Devillers01</url> </paper>. . . . .

Examples illustrated on data like this:

Dan Suciu XML Toolkit 34

XSort: Examples

xsort –c /bib –e paper –k title/text()xsort –c /bib –e paper –k title/text()

Sorts the <paper>s, by <title>The <book>s are dropped from the output

<bib> <paper> . . . </paper> <paper> . . . </paper>. . . . .</bib>

<bib> <paper> . . . </paper> <paper> . . . </paper>. . . . .</bib>

Compare to…

xsort –c /bib –e * –k title/text()xsort –c /bib –e * –k title/text()

xsort –c /bib –e paper –k title/text() –e book –k title/text()xsort –c /bib –e paper –k title/text() –e book –k title/text()

Dan Suciu XML Toolkit 35

XSort: Examples

xsort –c /bib –e paper/author –k lastName/text() –k firstName/text()xsort –c /bib –e paper/author –k lastName/text() –k firstName/text()

Sorts the <author>s, by <lastName> then <firstName>

<bib> <author> . . . </author> <author> . . . </author>. . . . .</bib>

<bib> <author> . . . </author> <author> . . . </author>. . . . .</bib>

Dan Suciu XML Toolkit 36

XSort: Examples

xsort –c /bib –e paper –e article –e book –e *xsort –c /bib –e paper –e article –e book –e *

<paper>s first, then <article>s, then <book>s, then all the rest

<bib> <paper> . . . </paper> <paper> . . . </paper> . . . . . <article> . . . </article> . . . . . <book> . . . </book> . . . . .</bib>

<bib> <paper> . . . </paper> <paper> . . . </paper> . . . . . <article> . . . </article> . . . . . <book> . . . </book> . . . . .</bib>

Dan Suciu XML Toolkit 37

XSort: Examples

xsort –c /bib/* –e author –e title –e year –e *xsort –c /bib/* –e author –e title –e year –e *

Normalize all entries: <author>s first, then <title>s, then <year>sthen all the other elements

xsort –c /bib/paper –e author –e * –c /bib/book –e title –e *xsort –c /bib/paper –e author –e * –c /bib/book –e title –e *

In <paper>s list the <author>s first;in <book>s list the <title> first;Leave other entries unchanged

Dan Suciu XML Toolkit 38

XSort: Implementation

• Sorts one context at a time, copies the rest• For each context:

– Create a “global key” for each item

– Sort items, with a two-pass, multiway merge sort

• Quote from Databases 101 (news from the trenches):– with disk blocks of 4KB and 128MB of main memory,

one can sort files up to 4TB in two passes !

Dan Suciu XML Toolkit 39

XSort: Performance

Size (KB) Xalan (sec) Xsort (sec)

0.41 0.08 0.00

4.91 0.09 0.00

76.22 0.27 0.02

991.79 2.52 0.26

9671.79 27.45 2.85

100964.43 - 43.97

1009643.71 - 461.36

xsort –c /dblp –e * –k title/text()xsort –c /dblp –e * –k title/text()

1GB !8minutes

Dan Suciu XML Toolkit 40

Outline

• The tools

• The XPath processing engine

• Conclusions

Dan Suciu XML Toolkit 41

The XPath Processor

Common to all tools is the following problem:

Given:• Set of correlated XPath expressions• Stream of SAX events

Decide:• When are the expressions true variable events

Dan Suciu XML Toolkit 43

The XPath Processor

How we did it:• All Xpath expressions Deterministic Finite

Automaton– Restriction: no predicates yet (current work...)

• Does this scale to many, many XPath expressions ?– Yes, if we compute the DFA lazily (upcoming

ICDT’2003 paper)

• Evaluation time is = parsing time• Can do even better with a Stream IndeX (next)

Dan Suciu XML Toolkit 44

Stream IndeX (SIX)

Solution: “Index” the XML stream, parse only partially

Definition: The SIX = a table of (start, end) offsets

News: The parser isthe main bottleneckin XPath streamprocessing !

Dan Suciu XML Toolkit 45

Stream IndeX (SIX): Construction

<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

start end

bib 0 1490124

book 3 409023

publisher 12 423

author 426 879

author 978 . . .

. . .

SIXXML

Dan Suciu XML Toolkit 46

Stream IndeX (SIX): Skip Parsing

<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book><paper>. . . . . .

</bib>

<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book><paper>. . . . . .

</bib>

XPathXML

/bib/paper/title. . ./bib/paper/title. . .

Skip Parsing

Skip Parsing

Dan Suciu XML Toolkit 47

Stream IndeX (SIX) in XML Stream Processing

<bib> <book>...</bib>

<bib> <book>...</bib>

<bib> <book>...</bib>

<bib> <book>...</bib>

<bib> <book>...</bib>

<bib> <book>...</bib>

0 205

30 66

72 188

0 205

30 66

72 188

90 110

95 98

0 205

30 66

The SIX stream is about 6% of the data stream

And can be made MUCH smaller

The SIX stream is about 6% of the data stream

And can be made MUCH smaller

SIX

(E.g. DIME)

XML

Dan Suciu XML Toolkit 48

Throughput improvements from SIX (stable)

0

5

10

15

20

25

30

35

55 60 65 70 75 80 85 90 95 100 105

XML stream (MB)

MB

/s

Theta=3% (SIX)

Theta=3%

Theta=8% (SIX)

Theta=8%

Theta=14% (SIX)

Theta=14%

Dan Suciu XML Toolkit 49

Effect of Decreasing the SIX Size

0

5

10

15

20

25

30

0k 1k 2k 3k 4k 5k 6k 7k 8k 9k 10k

size of XML elements deleted

MB

/s

1

10

100

1000

10000

size

in K

B

Throughput

SIX size

Dan Suciu XML Toolkit 50

Outline

• The tools

• The XPath processing engine

• Conclusions

Dan Suciu XML Toolkit 51

Conclusions

• The toolkit is already available:– http://www.cs.washington.edu/homes/suciu/XMLTK– http://xmltk.sourceforge.net

• What it does so far it does very well:– Sorting, aggregation, nest/unnest

• But doesn’t do too much:– Restricted selections, no projections, no restructurings yet– Volunteers welcome !

• Can one process XML data without parsing it completely ?– SIX