Before today’s lecture

62
1 Before today’s lecture Personal Project Due date (including demo your work): 4/12 Grading scheme Applicatio n All XML documents Schema documents Application source codes Web-based interfaces Other source codes 50% Paper Project paper 40% Demonstrat ion Design layout and functionalities 10%

description

Before today’s lecture. Personal Project Due date (including demo your work): 4/12 Grading scheme. Before today’s lecture. Final Project Group members : Deadline (for grouping your members): Before 4/10 Send the name list of your group members to 尚純 or 紹楷 - PowerPoint PPT Presentation

Transcript of Before today’s lecture

Page 1: Before today’s lecture

1Before today’s lecture

• Personal Project– Due date (including demo your work): 4/12– Grading scheme

Application

All XML documents Schema documents Application source codes Web-based interfaces Other source codes

50%

Paper Project paper 40%Demonstration Design layout and functionalities 10%

Page 2: Before today’s lecture

2Before today’s lecture

• Final Project– Group members:

• Deadline (for grouping your members): Before 4/10• Send the name list of your group members to 尚純 or 紹楷 • For those who can’t make a team, we’ll make a group for you. The group

members will be posted on 4/12• If you want to make a change, the deadline is on 4/15

– Project Topics: • Will be posted on the web, pick one and send your topic to 尚純 or 紹楷 • Alternatively, send a proposal for selecting your own topic. • The proposal should include reference information of the topic and the

scope of the project.• Teaching Assisstants: 吳尚純 [email protected]李紹楷 [email protected]

Page 3: Before today’s lecture

3

Simple API for XML (SAX)

Is SAX too hard for mortal programmers? And is the domination of DOM a bad

thing?

Page 4: Before today’s lecture

4

• Introduction

• XML Parsing Operations

• The SAX API

• How SAX Processing Works

• SAX-based parsers

• Events

• An SAX Example: Step by Step

• Example (SAX1.0): Tree Diagram

• SAX 2.0

• Example: Printing the notes in an XML document

• Summary

Page 5: Before today’s lecture

5Introduction

• Processing XML– Create a Parser object Point the object to an XML doc. Process

• Basic Operations for processing an XML document– A basic XML processing architecture

– 3 key layers: XML documents, The application, infrastructure for working with XML doc.

XML

Document(s)Applicatio

n

Character Stream

Serializer

Parser

Standardized

XML APIs

Page 6: Before today’s lecture

6Introduction (cont.)

• Basic Operations (cont.)– Parsing is the first step that enables an application to work with an XML

doc.

– Parsing process breaks up the text of an XML document into small identifiable pieces (nodes)

– Parser will break documents into pieces, recognized as start-end tags, attribute value pairs, chunks of text content, processing instructions, comments, and so on.

– These pieces are fed into application through well-defined APIs implementing a particular parsing model

– Four parsing models are commonly in use:

Page 7: Before today’s lecture

7Introduction (cont.)

• Basic Operations (cont.)– Four parsing models are commonly in use:

1. Pull Parsing

① The application always ask the parser to give it the next piece of information

② It is as if the app. has to “pull” the info. out of the parser, activate the communication by the app.

③ The XML community has not yet defined standard APIs for the “pull parsing”

④ It could happen soon because of its popularity!

2. Push Parsing

① The parser sends notifications to the application during the parsing process

② The notifications are sent in “reading” order (i.e., their appearance order in the document)

Page 8: Before today’s lecture

8Introduction (cont.)

• Basic Operations (cont.)

2. Push Parsing

③ Notifications are typically implemented as event callbacks in the application

④ Known as event-based parsing

⑤ Simple API for XML (SAX) is the standard for push parsing

3. One-step Parsing

① The parser reads the whole XML doc. and generates a data structure (a parse tree) describing its entire contents (elements, attributes,… etc.)

② W3C Standard : XML DOM (Document Object Model): specifies the types of objects that will be included in the parse tree, their properties, and operations

③ The DOM is a language- and platform-independent API.

④ The biggest problem is memory overhead and computational efficiency

Page 9: Before today’s lecture

9Introduction (cont.)

• Basic Operations (cont.)4. Hybrid Parsing

① Combine the characteristics of the other two parsing models to create efficient parsers for special scenarios

② Lets break the concept of loading and parsing to analyse the condition

– Loading the document: one-step parsing

– Parsing the rest of the document: providing partial information extracted from the document for the application

③ For example, Push + one-step parsing

– The application thinks it is working with a one-step parser; in reality, the parsing process has just begun

– As the application keep accessing more objects on the DOM tree, the parsing continues incrementally

– Just enough of the XML document is parsed at any given point to give the application the objects it wants to see

Page 10: Before today’s lecture

10An example of hybrid parsing

• In Sun's reference implementation, the DOM API builds on the SAX API as shown in the diagram,

• Sun's implementation of the Document Object Model (DOM) API uses the SAX libraries to read in XML data and construct the tree of data objects that constitutes the DOM.

• Sun's implementation also provides a framework to help output the object tree as XML data

Page 11: Before today’s lecture

11Introduction (cont.)

• Why define many models?– Trade-offs between memory efficiency, computational efficiency, and ease of

programming– A table is presented to compare the trade-offs of the models

Model Control of Parsing

Control of Context

Memory Efficiency

Computational efficiency

Ease of Programming

Pull Application Application High Highest LowPush (SAX) Parser Application High High LowOne-step(DOM) Parser Parser Lowest Lowest High

One-step(JDOM) Parser Parser Low Low Highest

Hybrid (DOM) Parser Parser Medium Medium High

Hybrid (JDOM) Parser Parser Medium Medium Highest

Page 12: Before today’s lecture

12Introduction (cont.)• How to choose between SAX and DOM: Whether you choose DOM or SAX is

going to depend on several factors:– Purpose of the application:

• To make changes to the data and output it as XML, then in most cases, DOM is the way to go.

• SAX is much more complex to program, as you'd have to make changes to a copy of the data rather than to the data itself.

– Amount of data: For large files, SAX is a better bet. – How the data will be used: If only a small amount of the data will actually be used,

you may be better off using SAX to extract it into your application. – On the other hand, if you know that you will need to refer back to large amounts of

information that has already been processed, SAX is probably not the right choice. – The need for speed: SAX implementations are normally faster than DOM

implementations.

• It's important to remember that SAX and DOM are not mutually exclusive. • Use DOM to create a stream of SAX events, • Use SAX to create a DOM tree. • In fact, most parsers used to create DOM trees are actually using SAX to do it!

Page 13: Before today’s lecture

13The SAX APIs

• SAX (The Simple API for XML )

– SAX is the Simple API for XML, originally a Java-only API.

– SAX was the first widely adopted API for XML in Java, and is a “de facto” standard.

– The current version is SAX 2.0.x, and there are versions for several programming language environments other than Java

– Another method for accessing XML document’s contents

– Developed by XML-DEV mailing-list members

– Uses event-based model

• Notifications (events) are raised as document is parsed

Page 14: Before today’s lecture

14The SAX APIs (cont.)

• SAX Parsing architecture: using the common abstract factory design pattern

1. Create an instance of SAXParserFactory (used to create an instance of SAX Parser)

2. SAXReader: event trigger, when the parse() method is invoked, the reader starts firing events to the application by invoking registered callbacks

3. Those methods are defined by the interfaces ContentHandler, ErrorHandler, DTDHandler, and EntityResolver.

Page 15: Before today’s lecture

15The SAX APIs (cont.)

• Here is a summary of the key objects in SAX APIs:

• SAXParserFactory

Creates an instance of the parser determined by the system property, javax.xml.parsers.SAXParserFactory

• SAXParser

Defines several kinds of parse() methods. In general, you pass an XML data source and a DefaultHandler object to the parser, which processes the XML and invokes the appropriate methods in the handler object.

• SAXReader

Carries on the conversation with the SAX event handlers you define

Page 16: Before today’s lecture

16The SAX APIs (cont.)

• DefaultHandler

Implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces (with null methods), so you can override only the ones you're interested in.

• ContentHandler

Defines methods, which are invoked when the parser encounters the text in an XML element or an inline processing instruction, respectively.

• ErrorHandler

Methods in response to various parsing errors.

• DTDHandler

Defines methods you will generally never be called upon to use. Used when processing a DTD to recognize and act on declarations for an unparsed entity.

Page 17: Before today’s lecture

17The SAX APIs (cont.)

• Being event-based means that the parser reads an XML document from beginning to end,

• Each time it recognizes a syntax construction, it notifies the application that is running it

• The SAX parser notifies the application by calling methods from the ContentHandler interface.

• For example, when the parser comes to a less than symbol ("<"), it calls the startElement method;

Page 18: Before today’s lecture

18The SAX API (cont.)

• when it comes to character data, it calls the characters method;

• when it comes to the less than symbol followed by a slash ("</"), it calls the endElement method

• To illustrate, let's look at an example XML document and walk through what the parser does for each line.

Page 19: Before today’s lecture

19How SAX Processing Works

• SAX analyzes an XML stream as it goes by, much like an old ticker tape.

• Consider the following XML code snippet:

• A SAX processor analyzing this code snippet would generate, in general, the following events:

Start document Start element (samples) Characters (white space) Start element (server) Characters (UNIX) End element (server) Characters (white space) Start element (monitor) Characters (color) End element (monitor) Characters (white space) End element (samples)

<?xml version="1.0"?> <samples>

<server>UNIX</server> <monitor>color</monitor>

</samples>

Page 20: Before today’s lecture

20How SAX Processing Works (cont.)

• The SAX API allows a developer to capture these events and act on them

– What does “the developer” represent for?

• SAX processing involves the following steps:1. Create an event handler. 2. Create the SAX parser. 3. Assign the event handler to the parser. 4. Parse the document, sending each event to the handler.

Page 21: Before today’s lecture

21How SAX Processing Works (cont.)

• The pros and cons of event-based processing

– The advantages of this kind of processing are much like the advantages of streaming media. (like interpreter?)

– Analysis can get started immediately, rather than waiting for all of the data to be processed.

– The application is simply examining the data as it goes by, it doesn't need to store it in memory:

– A huge advantage when it comes to large documents.

Page 22: Before today’s lecture

22How SAX Processing Works (cont.)

• The pros and cons of event-based processing

– In fact, an application doesn't even have to parse the entire document;

– Stop when certain criteria have been satisfied.

– In general, SAX is also much faster than the alternative, the DOM.

– On the other hand, because the application is not storing the data in any way,

– it is impossible to make changes to it using SAX, or to move backwards in the data stream.

Page 23: Before today’s lecture

23SAX-based Parsers

• SAX-based parsers– Use Sun Microsystem’s JAXP in Textbook

• Tools– A text editor: XML files are simply text. To create and read them, a text editor is

all you need. – JavaTM 2 SDK, Standard Edition version 1.4.x: SAX support has been built

into the latest version of Java (available at http://java.sun.com/j2se/1.4.2/download.html), won't need to install any separate classes. Using an earlier version of Java, such as Java 1.3.x, you'll also need

• an XML parser such as the Apache project's Xerces-Java (available at http://xml.apache.org/xerces2-j/index.html),

• or Sun's Java API for XML Parsing (JAXP), part of the Java Web Services Developer Pack (available at http://java.sun.com/webservices/downloads/webservicespack.html).

• You can also download the official version from SourceForge (available at http://sourceforge.net/project/showfiles.php?group_id=29449).

– Other Languages: Should you wish to adapt the examples, SAX implementations are also available in other programming languages.

– You can find information on C, C++, Visual Basic, Perl, and Python implementations of a SAX parser at http://www.saxproject.org/?selected=langs.

Page 24: Before today’s lecture

24Some SAX-based parsers.

Product Description

JAXP

Sun’s JAXP is available from java.sun.com/xml. JAXP supports both SAX and DOM.

Xerces Apache’s Xerces parser is available at www.apache.org. Xerces supports both SAX and DOM.

MSXML 3.0 Microsoft’s msxml parser available at msdn.microsoft.com/xml. This parser supports both SAX and DOM.

Page 25: Before today’s lecture

25Setup

• Java applications to illustrate SAX API

– Java 2 Standard Edition required

• Download at www.java.sun.com/j2se

• Installation instructions

– www.deitel.com/faq/java3install.htm

– JAXP required• Download at java.sun.com/xml/download.html

Page 26: Before today’s lecture

26Events

• SAX parser– Invokes certain methods (Fig.

9.2) when events occur– Programmers override these

methods to process data

Page 27: Before today’s lecture

27Fig. 9.2 Methods invoked by the SAX

parserMethod Name Description

setDocumentLocator Invoked at the beginning of parsing.

startDocument Invoked when the parser encounters the start of an XML document.

endDocument Invoked when the parser encounters the end of an XML document.

startElement Invoked when the start tag of an element is encountered.

endElement Invoked when the end tag of an element is encountered.

characters Invoked when text characters are encountered. ignorableWhitespace Invoked when whitespace that can be safely

ignored is encountered. processingInstruction Invoked when a processing instruction is

encountered.

Page 28: Before today’s lecture

28The SAX API – an Example

<priceList>    [parser calls startElement]  <coffee>     [parser calls startElement]    <name>Mocha Java</name>  

   [parser calls startElement, characters, and endElement]    <price>11.95</price>    

[parser calls startElement, characters, and endElement]  </coffee>     [parser calls endElement]<priceList>   [parser calls endElement]

• The default implementations of the methods that the parser calls do nothing• You need to write a subclass implementing the appropriate methods to get

the functionality you want• For example, suppose you want to get the price per pound for Mocha Java. • You would write a class extending DefaultHandler (the default

implementation of ContentHandler) in which you write your own implementations of the methods startElement and characters

Page 29: Before today’s lecture

29The SAX API – an Example (cont.)

• You code has three tasks. – Scan the command line for the name (or URI) of an XML file. – Create a parser object. – Tell the parser object to parse the XML file named on the command line, and tell it

to send your code all of the SAX events it generates.

• Step I: Scan the command line – For an argument. If there isn't an argument, you print an error message and exit. – Otherwise, assume that the first argument is the name or URI of an XML file

public static void main(String argv[]) { if (argv.length == 0 || (argv.length == 1 && argv[0].equals("-help"))) { // Print an error message and exit... } PrintOutline s1 = new PrintOutline(); s1.parseURI(argv[0]);

}

Page 30: Before today’s lecture

30The SAX API – an Example (cont.)

• Step II: Create a parser object – To create a parser object, use JAXP's SAXParserFactory API to create

a SAXParser

public void parseURI(String uri) {

try {

SAXParserFactory spf = SAXParserFactory.newInstance();

SAXParser sp = spf.newSAXParser();

. . .

Page 31: Before today’s lecture

31The SAX API – an Example (cont.)

• Step 3: Parse the file and handle any events – We've created our parser object, we need to have it parse the file. That's

done with the parse() method

– Notice that the parse() method takes two arguments. The first is the URI of the XML document, while the second is an object that implements the SAX event handlers

public void parseURI(String uri) { try {

SAXParserFactory spf = SAXParserFactory.newInstance(); SAXParser sp = spf.newSAXParser();

sp.parse(uri, this); } catch (Exception e) {

System.err.println(e); }

}

Page 32: Before today’s lecture

32The SAX API – an Example (cont.)

– In the case of PrintOutline, you're extending the SAX DefaultHandler interface:

– DefaultHandler has an implementation of a number of event handlers. These implementations do nothing, which means all your code has to do is implement handlers for the events you care about.

– Note: The exception handling above is sloppy; as an exercise for the reader, feel free to handle specific exceptions, such as SAXException or java.io.IOException.

– A major benefit of the DefaultHandler interface is that it shields you from having to implement all of the event handlers.

– DefaultHandler implements all of the event handlers; you just implement the ones you care about.

public class PrintOutline extends DefaultHandler{

…….

}

Page 33: Before today’s lecture

33The SAX API – an Example (cont.)

• Step IV: Implementing event handlers – startdocument() event handlers

– Simply writing out a basic XML declaration, regardless of whether one was in the original XML document or not.

– Currently the base SAX API doesn't return the details of the XML declaration

public void startDocument() { System.out.println("<?xml version=\"1.0\"?>");

}

Page 34: Before today’s lecture

34The SAX API – an Example (cont.)

• Next, here's what you do for startElement():– Print the name of the elements and attributes– Namespace URI in braces before the element's local name – rawName contains the raw XML 1.0 name if a namespace URI doesn't

have

public void startElement(String namespaceURI, String localName, String rawName, Attributes attrs) {

System.out.print("<"); System.out.print(rawName); if (attrs != null) {

int len = attrs.getLength(); for (int i = 0; i < len; i++) {

System.out.print(" "); System.out.print(attrs.getQName(i));

System.out.print("=\"");System.out.print(attrs.getValue(i));System.out.print("\"");

} } System.out.print(">");

}

Page 35: Before today’s lecture

35The SAX API – an Example (cont.)

• More event handling – characters() : printing the XML document to the console, you're simply

printing the portion of the character array that relates to this event

public void characters(char ch[ ], int start, int length) {

System.out.print(new String(ch, start, length)); }

– endElement() : simply write out the end tag – endDocument() : Do nothing just for the completeness.

public void endElement(String namespaceURI, String localName, String rawName) { System.out.print("</"); System.out.print(rawName); System.out.print(">");

} public void endDocument() {

System.out.println("End of Document");}

Page 36: Before today’s lecture

36The SAX API – an Example (cont.)• Step V: Error handling:

– SAX defines the ErrorHandler interface; – Implemented by DefaultHandler; – contains three methods: warning, error, and fatalError (defined by the XML

specification ) • warning(): Issued in response to a warning• error(): Issued in response to an error condition. • fatalError(): Issued in response to a fatal error

public void warning(SAXParseException ex) { System.err.println("[Warning] "+ getLocationString(ex)+": "+ ex.getMessage());

} public void error(SAXParseException ex) {

System.err.println("[Error] "+ getLocationString(ex)+": "+ ex.getMessage()); } public void fatalError(SAXParseException ex) throws SAXException {

System.err.println("[Fatal Error] "+ getLocationString(ex)+": "+ ex.getMessage()); throw ex;

}

Page 37: Before today’s lecture

37Example: Tree Diagram

• Java application– Parse XML document with SAX-based parser– Output document data as tree diagram– extends org.xml.sax.HandlerBase

• implements interface EntityResolver– Handles external entities

• implements interface DTDHandler– Handles notations and unparsed entities

• implements interface DocumentHandler– Handles parsing events

• implements interface ErrorHandler– Handles errors

Page 38: Before today’s lecture

Outline 38

Fig. 9.3 Application to create a tree diagram for an XML document.

import specifies location of classes needed by application

Assists in formatting

Override method to output parsed document’s URL

1 // Fig. 9.3 : Tree.java2 // Using the SAX Parser to generate a tree diagram.34 import java.io.*;5 import org.xml.sax.*; // for HandlerBase class6 import javax.xml.parsers.SAXParserFactory;7 import javax.xml.parsers.ParserConfigurationException;8 import javax.xml.parsers.SAXParser;910 public class Tree extends HandlerBase {11 private int indent = 0; // indentation counter1213 // returns the spaces needed for indenting14 private String spacer( int count )15 {16 String temp = "";1718 for ( int i = 0; i < count; i++ )19 temp += " ";2021 return temp;22 }2324 // method called before parsing25 // it provides the document location26 public void setDocumentLocator( Locator loc )27 {28 System.out.println( "URL: " + loc.getSystemId() );29 }30

import specifies location of classes needed by application

Assists in formatting

Override method to output parsed document’s URL

Page 39: Before today’s lecture

Outline 39

Fig. 9.3 Application to create a tree diagram for an XML document. (Part 2)

Overridden method called when root node encountered

Overridden method called when end of document is encountered

Overridden method called when start tag is encountered

Output each attribute’s name and value (if any)

31 // method called at the beginning of a document32 public void startDocument() throws SAXException33 {34 System.out.println( "[ document root ]" );35 }3637 // method called at the end of the document38 public void endDocument() throws SAXException39 {40 System.out.println( "[ document end ]" );41 }4243 // method called at the start tag of an element44 public void startElement( String name,45 AttributeList attributes ) throws SAXException46 {47 System.out.println( spacer( indent++ ) +48 "+-[ element : " + name + " ]");4950 if ( attributes != null )5152 for ( int i = 0; i < attributes.getLength(); i++ )53 System.out.println( spacer( indent ) +54 "+-[ attribute : " + attributes.getName( i ) +55 " ] \"" + attributes.getValue( i ) + "\"" );56 }57

Overridden method called when root node encountered

Overridden method called when end of document is encountered

Overridden method called when start tag is encountered

Output each attribute’s name and value (if any)

Page 40: Before today’s lecture

Outline 40

Fig. 9.3 Application to create a tree diagram for an XML document. (Part 3)

Overridden method called when end of element is encountered

Overridden method called when processing instruction is encountered

Overridden method called when character data is encountered

58 // method called at the end tag of an element59 public void endElement( String name ) throws SAXException60 {61 indent--;62 }6364 // method called when a processing instruction is found65 public void processingInstruction( String target,66 String value ) throws SAXException67 {68 System.out.println( spacer( indent ) +69 "+-[ proc-inst : " + target + " ] \"" + value + "\"" );70 }7172 // method called when characters are found73 public void characters( char buffer[], int offset,74 int length ) throws SAXException75 {76 if ( length > 0 ) {77 String temp = new String( buffer, offset, length );7879 System.out.println( spacer( indent ) +80 "+-[ text ] \"" + temp + "\"" );81 }82 }83

Overridden method called when end of element is encountered

Overridden method called when processing instruction is encountered

Overridden method called when character data is encountered

Page 41: Before today’s lecture

Outline 41

Fig. 9.3 Application to create a tree diagram for an XML document. (Part 4)

Overridden method called when ignorable whitespace is encountered

Overridden method called when error (usually validation) occurs

Overridden method called when problem is detected (but not considered error)

Method main starts application

84 // method called when ignorable whitespace is found85 public void ignorableWhitespace( char buffer[],86 int offset, int length )87 {88 if ( length > 0 ) {89 System.out.println( spacer( indent ) + "+-[ ignorable ]" );

90 }91 }9293 // method called on a non-fatal (validation) error94 public void error( SAXParseException spe ) 95 throws SAXParseException96 {97 // treat non-fatal errors as fatal errors98 throw spe;99 }100101 // method called on a parsing warning102 public void warning( SAXParseException spe )103 throws SAXParseException104 {105 System.err.println( "Warning: " + spe.getMessage() );106 }107108 // main method109 public static void main( String args[] )110 {111 boolean validate = false;112

Overridden method called when ignorable whitespace is encountered

Overridden method called when error (usually validation) occurs

Overridden method called when problem is detected (but not considered error)

Method main starts application

Page 42: Before today’s lecture

Outline 42

Fig. 9.3 Application to create a tree diagram for an XML document. (Part 5)

Allow command-line arguments (if we want to validate document)

SAXParserFactory can instantiate SAX-based parser

113 if ( args.length != 2 ) {

114 System.err.println( "Usage: java Tree [validate] " +

115 "[filename]\n" );

116 System.err.println( "Options:" );

117 System.err.println( " validate [yes|no] : " +

118 "DTD validation" );

119 System.exit( 1 );

120 }

121

122 if ( args[ 0 ].equals( "yes" ) )

123 validate = true;

124

125 SAXParserFactory saxFactory =

126 SAXParserFactory.newInstance();

127

128 saxFactory.setValidating( validate );

129

Allow command-line arguments (if we want to validate document)

SAXParserFactory can instantiate SAX-based parser

Page 43: Before today’s lecture

Outline 43

Fig. 9.3 Application to create a tree diagram for an XML document. (Part 6)

Instantiate SAX-based parser

Handles errors (if any)

130 try {

131 SAXParser saxParser = saxFactory.newSAXParser();

132 saxParser.parse( new File( args[ 1 ] ), new Tree() );

133 }

134 catch ( SAXParseException spe ) {

135 System.err.println( "Parse Error: " + spe.getMessage() );

136 }

137 catch ( SAXException se ) {

138 se.printStackTrace();

139 }

140 catch ( ParserConfigurationException pce ) {

141 pce.printStackTrace();

142 }

143 catch ( IOException ioe ) {

144 ioe.printStackTrace();

145 }

146

147 System.exit( 0 );

148 }

149}

Instantiate SAX-based parser

Handles errors (if any)

Page 44: Before today’s lecture

Outline 44

Fig. 9.4 XML document spacing1.xml.

XML document does not reference DTD

XML document with elements test, example and object

Root element test contains attribute name with value “ spacing 1 ”

Note that whitespace is preserved: attribute value (line 7), line feed (end of line 7), indentation (line 8) and line feed (end of line 8)

1 <?xml version = "1.0"?>

2

3 <!-- Fig. 9.4 : spacing1.xml -->

4 <!-- Whitespaces in nonvalidating parsing -->

5 <!-- XML document without DTD -->

6

7 <test name = " spacing 1 ">

8 <example><object>World</object></example>

9 </test>

URL: file:C:/Tree/spacing1.xml[ document root ]+-[ element : test ] +-[ attribute : name ] " spacing 1 " +-[ text ] "" +-[ text ] " " +-[ element : example ] +-[ element : object ] +-[ text ] "World" +-[ text ] ""[ document end ]

Root element test contains attribute name with value “ spacing 1 ”

XML document with elements test, example and object

XML document does not reference DTD

Note that whitespace is preserved: attribute value (line 7), line feed

(end of line 7), indentation (line 8) and line feed (end of line 8)

Page 45: Before today’s lecture

Outline 45

Fig. 9.5 XML document spacing2.xml.

DTD checks document’s characters, so any “removable” whitespace is ignorable

Line feed at line 14, spaces at beginning of line 15 and line feed at line 15 are ignored

1 <?xml version = "1.0"?>23 <!-- Fig. 9.5 : spacing2.xml -->4 <!-- Whitespace and nonvalidated parsing -->5 <!-- XML document with DTD -->67 <!DOCTYPE test [8 <!ELEMENT test (example)>9 <!ATTLIST test name CDATA #IMPLIED>10 <!ELEMENT element (object*)>11 <!ELEMENT object (#PCDATA)>12 ]>1314 <test name = " spacing 2 ">15 <example><object>World</object></example>16 </test>

URL: file:C:/Tree/spacing2.xml[ document root ]+-[ element : test ] +-[ attribute : name ] " spacing 2 " +-[ ignorable ] +-[ ignorable ] +-[ element : example ] +-[ element : object ] +-[ text ] "World" +-[ ignorable ][ document end ]

DTD checks document’s characters, so any “removable” whitespace is ignorable

Line feed at line 14, spaces at beginning of line 15 and line

feed at line 15 are ignored

Page 46: Before today’s lecture

Outline 46

Fig. 9.6 Well-formed XML document.

Invalid document because element example cannot contain element item

Validation disabled, so document parses successfully

Parser does not process text in CDATA section and returns character data

1 <?xml version = "1.0"?>23 <!-- Fig. 9.6 : notvalid.xml -->4 <!-- Validation and non-validation -->56 <!DOCTYPE test [7 <!ELEMENT test (example)>8 <!ELEMENT example (#PCDATA)>9 ]>1011 <test>12 <?test message?>13 <example><item><![CDATA[Hello & Welcome!]]></item></example>14 </test>

URL: file:C:/Tree/notvalid.xml[ document root ]+-[ element : test ] +-[ ignorable ] +-[ ignorable ] +-[ proc-inst : test ] "message" +-[ ignorable ] +-[ ignorable ] +-[ element : example ] +-[ element : item ] +-[ text ] "Hello & Welcome!" +-[ ignorable ][ document end ]

Invalid document because element example cannot contain element item

Validation disabled, so document parses successfully

Parser does not process text in CDATA section and returns character data

Page 47: Before today’s lecture

Outline 47

Fig. 9.6 Well-formed XML document.(Part 2)

Validation enabled

Parsing terminates when fatal error occurs at element item

URL: file:C:/Tree/notvalid.xml[ document root ]+-[ element : test ] +-[ ignorable ] +-[ ignorable ] +-[ proc-inst : test ] "message" +-[ ignorable ] +-[ ignorable ] +-[ element : example ]Parse Error: Element "example" does not allow "item"

Parsing terminates when fatal error occurs at element item

Validation enabled

Page 48: Before today’s lecture

Outline 48

Fig. 9.7 Checking an XML document without a DTD for validity.

Validation disabled in first output, so document parses successfully

Validation enabled in second output, and parsing fails because DTD does not exist

1 <?xml version = "1.0"?>23 <!-- Fig. 9.7 : valid.xml -->4 <!-- DTD-less document -->56 <test>7 <example>Hello &amp; Welcome!</example>8 </test>

URL: file:C:/Tree/valid.xml[ document root ]+-[ element : test ] +-[ text ] "" +-[ text ] " " +-[ element : example ] +-[ text ] "Hello " +-[ text ] "&" +-[ text ] " Welcome!" +-[ text ] ""[ document end ]

URL: file:C:/Tree/valid.xml[ document root ]Warning: Valid documents must have a <!DOCTYPE declaration.Parse Error: Element type "test" is not declared.

Validation disabled in first output, so document parses successfully

Validation enabled in second output, and parsing fails because DTD does not exist

Page 49: Before today’s lecture

49Example: Tree Diagram (Summary)

• SAX 1.0 supported!

• When compiling, the message,

“Tree.java uses or overrides a deprecated API”

“Recompile with –deprecation for details”

• After compiling, 3 warning (class has been deprecated) were issued:

1. HandlerBase should be replaced by DefaultHandler

2. & 3. AttributeList should be replaced by Attributes

Better replace SAX1.0 with SAX2.0

Problem with Xerces vs. JAXP

Page 50: Before today’s lecture

50SAX 2.0

• SAX 2.0– Recently released– We have been using JAXP– Xerces parser (Apache) supports SAX 2.0

Page 51: Before today’s lecture

51SAX 2.0 (cont.)

• SAX 2.0 major changes– Class HandlerBase replaced with DefaultHandler

– AttributeList replaced with Attributes

– Element and attribute processing support namespaces

– Loading and parsing processes has changed

• Alternative methods can be applied

– Methods for retrieving and setting parser properties• e.g., whether parser performs validation

Page 52: Before today’s lecture

Outline 52

Fig. 9.10Java application that indents an XML document.

Replace class HandlerBase with class DefaultHandler

Provides same service as that of SAX 1.0

1 // Fig. 9.10 : printXML.java2 // Using the SAX Parser to indent an XML document.34 import java.io.*;5 import org.xml.sax.*;6 import org.xml.sax.helpers.*;7 import javax.xml.parsers.SAXParserFactory;8 import javax.xml.parsers.ParserConfigurationException;9 import javax.xml.parsers.SAXParser;1011 public class PrintXML extends DefaultHandler {12 private int indent = 0; // indention counter1314 // returns the spaces needed for indenting15 private String spacer( int count )16 {17 String temp = "";1819 for ( int i = 0; i < count; i++ )20 temp += " ";2122 return temp;23 }2425 // method called at the beginning of a document26 public void startDocument() throws SAXException27 {28 System.out.println( "<?xml version = \"1.0\"?>" );29 }30

Replace class HandlerBase with class DefaultHandler

Provides same service as that of SAX 1.0

Page 53: Before today’s lecture

Outline 53

Fig. 9.10Java application that indents an XML document. (Part 2)

Provides same service as that of SAX 1.0

Method startElement now has four arguments (namespace URI, element name, qualified element name and element attributes)

Attributes are now stored in Attributes object

Method endElement now has three arguments (namespace URI, element name and qualified element name)

31 // method called at the end of the document32 public void endDocument() throws SAXException33 {34 System.out.println( "---[ document end ]---" );35 }3637 // method called at the start tag of an element38 public void startElement( String uri, String eleName, 39 String raw, Attributes attributes ) throws SAXException40 {4142 System.out.print( spacer( indent ) + "<" + raw );4344 if ( attributes != null )4546 for ( int i = 0; i < attributes.getLength(); i++ )47 System.out.print( " "+ attributes.getLocalName( i ) +48 " = " + "\"" +49 attributes.getValue( i ) + "\"" );50 System.out.println( ">" );51 indent += 3;52 }5354 // method called at the end tag of an element55 public void endElement( String uri, String eleName, 56 String raw ) throws SAXException57 {58 indent -= 3;59 System.out.println( spacer(indent) + "</" + raw + ">");60 }61

Provides same service as that of SAX 1.0

Method startElement now has four arguments

(namespace URI, element name, qualified element name

and element attributes)

Method endElement now has three arguments (namespace

URI, element name and qualified element name)

Attributes are now stored in Attributes object

Page 54: Before today’s lecture

Outline 54

Fig. 9.10Java application that indents an XML document. (Part 3)

Provides same service as that of SAX 1.0

Provides same service as that of SAX 1.0

62 // method called when characters are found

63 public void characters( char buffer[], int offset,

64 int length ) throws SAXException

65 {

66 if ( length > 0 ) {

67 String temp = new String( buffer, offset, length );

68

69 if ( !temp.trim().equals( "" ) )

70 System.out.println( spacer(indent) + temp.trim() );

71 }

72 }

73

74 // method called when a processing instruction is found

75 public void processingInstruction( String target,

76 String value ) throws SAXException

77 {

78 System.out.println( spacer( indent ) +

79 "<?" + target + " " + value + "?>");

80 }

81

82 // main method

83 public static void main( String args[] )

84 {

85

Provides same service as that of SAX 1.0

Provides same service as that of SAX 1.0

Page 55: Before today’s lecture

Outline 55

Fig. 9.10Java application that indents an XML document. (Part 4)

Create Xerces SAX-based parser

SAX-based parser parses InputSource

86 try {

87 XMLReader saxParser = ( XMLReader ) Class.forName(

88 "org.apache.xerces.parsers.SAXParser" ).newInstance();

89

90 saxParser.setContentHandler( new PrintXML() );

91 FileReader reader = new FileReader( args[ 0 ] );

92 saxParser.parse( new InputSource( reader ) );

93 }

94 catch ( SAXParseException spe ) {

95 System.err.println( "Parse Error: " + spe.getMessage() );

96 }

97 catch ( SAXException se ) {

98 se.printStackTrace();

99 }

100 catch ( Exception e ) {

101 e.printStackTrace();

102 }

103

104 System.exit( 0 );

105 }

106}

Create Xerces SAX-based parser

SAX-based parser parses InputSource

Lines: 86-92 replace with the following codes:XMLReader xmlReader = null;try {

SAXParserFactory spfactory = SAXParserFactory.newInstance(); SAXParser saxParser = spfactory.newSAXParser();xmlReader = saxParser.getXMLReader();xmlReader.setContentHandler( new PrintXML() );xmlReader.setErrorHandler(new PrintXML());FileReader reader = new FileReader( argv[0] );xmlReader.parse( new InputSource( reader ) );

}

Page 56: Before today’s lecture

Outline 56

Fig. 9.11Sample execution of printXML.java

Processing instruction that links to stylesheet

Output

1 <?xml version = "1.0"?>23 <!-- Fig. 9.11 : test.xml -->

45 <?xml:stylesheet type = "text/xsl" href = "something.xsl"?>67 <test>8 <example value = "100">Hello and Welcome!</example>910 <a>

11 <b>12345</b>12 </a>13 </test>

Processing instruction that links to stylesheet

<?xml version = "1.0"?><?xml:stylesheet type = "text/xsl" href = "something.xsl"?><test> <example value = "100"> Hello and Welcome! </example> <a> <b> 12345 </b> </a></test>---[ document end ]---

Page 57: Before today’s lecture

57Summary

• SAX is a faster,

• More lightweight way to read and manipulate XML data than the Document Object Model (DOM).

• SAX is an event-based processor that allows you to deal with elements, attributes, and other data as it shows up in the original document. (streaming evenets)

• Because of this architecture, SAX is a read-only system,

• But that doesn't prevent you from using the data. Make a copy and process it!

Page 58: Before today’s lecture

58Summary (cont.)

• Resources– Basic grounding in XML read through the "Introduction to XML" tutorial (developerWorks,

August 2002).See the official SAX 2.0 page (http://www.saxproject.org).

– Learn to use a SAX filter to manipulate data (developerWorks, October 2001).– Read about using SAX filters for flexible processing (developerWorks, March 2003).– Find out how to build SAX-like apps in PHP (developerWorks, March 2003).– Learn how to set up a SAX parser (developerWorks, July 2003).– Learn more about validation and the SAX ErrorHandler interface (developerWorks, June 2001).– Understand how to stop a SAX parser when you have enough data (developerWorks, June 2002).– Explore XSL transformations to and from a SAX stream (developerWorks, July 2002).– Turn a SAX stream into a DOM or JDOM object with "Converting from SAX" (developerWorks,

April 2001).– Download the Java 2 SDK, Standard Edition version 1.4.2

(http://java.sun.com/j2se/1.4.2/download.html).– SAX was developed by the members of the XML-DEV mailing list. Try the Java version, now a

SourceForge project (http://sourceforge.net/project/showfiles.php?group_id=29449).– Try SAX implementations: available in other programming languages – Get IBM's XML-related tools such as the DB2 XML Extender, which provides a bridge between

XML and relational systems. Visit the DB2 Developer Domain to learn more about DB2.– Find out how you can become an IBM Certified Developer in XML and related technologies

Page 59: Before today’s lecture

59

That’s it for today!Have a nice and lovely spring

holiday!

• Do not forget to check the web site for important message regarding the demo date of your personal project.

Page 60: Before today’s lecture

60getLocationString()• The private method gives more details about the error. • The SAXParseException class defines methods such as getLineNumber() and

getColumnNumber() to provide the line and column number where the error occurred.

• getLocationString merely formats this information into a useful string• Putting this code into a separate method means you don't have to include this

code in every error handler

private String getLocationString(SAXParseException ex) {

StringBuffer str = new StringBuffer();String systemId = ex.getSystemId();if (systemId != null){

int index = systemId.lastIndexOf('/');if (index != -1)

systemId = systemId.substring(index + 1);str.append(systemId);

}str.append(':');str.append(ex.getLineNumber());str.append(':');str.append(ex.getColumnNumber());return str.toString();

}

Page 61: Before today’s lecture

61Processing Instruction

• Processing Instructions

• An XML file can also contain processing instructions that give commands or information to an application that is processing the XML data.

• Processing instructions have the following format:<?target instructions?>

Page 62: Before today’s lecture

62

• At the most basic level:– An application can directly output XML markup

– In the figure, this is indicated by the application working with a character stream

– Simple? Not really, must handle all the basic syntax rules (start-end tag, attribute quoting, …. etc.) – a good topic for final project!

• Parsing and serialization:

– Parsing the XML document first,

– Constructing a data structure describing the XML document

– Utilizing the process of emitting XML markup from a data structure

– Utilizing the API for the processing methods