Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.

17
Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy

Transcript of Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.

Page 1: Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.

Challenges in handling XML: performance and memory

usage

15.11.2001

Sami Poikonen

Republica oy

Page 2: Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.

Republica Oy is Finland’s leading provider of products and services

based on XML standards.

Founded: 1996

Employees: 70+ (11/2001)

Offices:Helsinki, Jyväskylä

Page 3: Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.

1. DOM2. SAX3. DOM or SAX or something else...4. Transformations5. Conclusions

TOCTOC

Page 4: Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.

Parsing XML: DOM Parsing XML: DOM

• Document Object Model• standard API for accessing and creating xml data• tree-based • programming language indepedent• developed by W3C• whole document is read into memory• read and write

Page 5: Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.

DomNode book||-->DomNode title| || |-->DomNode text||-->DomNode author

||-->DomNode name

<?xml version="1.0"?><book type="pokkari">

<title>Tuntematon sotilas</title><author>

<name first="Väinö" last="Linna"/></author>

</book>

Page 6: Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.

Parsing XML: SAX Parsing XML: SAX

• Simple API for XML• API for accessing xml data• event based • programming language indepedent• not defined by W3C• application has to store fragments into memory• read only

Page 7: Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.

<?xml version="1.0"?><poem><line>Roses are red,</line><line>Violets are blue.</line><line>Sugar is sweet,</line><line>and I love you.</line></poem> 

Start element: poemStart element: lineEnd element: lineStart element: lineEnd element: lineStart element: lineEnd element: lineStart element: lineEnd element: lineEnd element: poem

Page 8: Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.

DOM or SAX or DOM or SAX or something else?something else?

DOM:• read and write• need to move back and forth in data• document is human created

SAX:• read only• huge data or streams• data is machine generated

Best of both worlds?Adaptive parsing!

Page 9: Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.
Page 10: Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.
Page 11: Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.

TransformationsTransformations

• XSLT: XSL Transformations• XSLT processors are built to use DOM• XSLT to java conversion: still uses DOM• SAX based custom-made application for trasformations

• Adaptive parsing with data binding?

Page 12: Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.
Page 13: Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.
Page 14: Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.
Page 15: Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.

ConclusionsConclusions

Page 16: Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.

ConclusionsConclusions

• When building XML applications, you have to think how will youhandle large chunks of data

• Choosing between SAX and DOM is not always trivial

• There are more smarter ways to parse XML also

• Adaptive parsing with data binding gives a lot of needed performance into transformations

• It is easy to reach the limits of XLST processing capabilities

• In some cases problems handling xml streams and large files has lead to assume that its is almost impossible to handle those

Page 17: Challenges in handling XML: performance and memory usage 15.11.2001 Sami Poikonen Republica oy.

Republica Oy http://www.republica.fi/Survontie 940500 Jyväskylä http://www.x-fetch.com/

Sami PoikonenVice President, Solutionsp. 040 301 [email protected]

Contact Information