Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was...

45
Simple API for XML Simple API for XML (SAX) (SAX) Aug’10 – Dec ’10

description

Problem with DOM Before traversing starts, it has to build up a massive in-memory map of the document Before traversing starts, it has to build up a massive in-memory map of the document This takes up space and time This takes up space and time If used to extract small amount of information from the document, this can be extremely difficult If used to extract small amount of information from the document, this can be extremely difficult Better suited for small XML documents Better suited for small XML documents Aug’10 – Dec ’10

Transcript of Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was...

Page 1: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Simple API for XMLSimple API for XML(SAX)(SAX)

Aug’10 – Dec ’10

Page 2: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Introduction to SAX

• Simple API for XML or SAX was developed as a standardized Simple API for XML or SAX was developed as a standardized way to parse an XML documentway to parse an XML document

• To enable more efficient analysis of large XML documentsTo enable more efficient analysis of large XML documents

This chapter covers the followingThis chapter covers the following❑ ❑ What is SAXWhat is SAX

❑ ❑ Where to download and how to set it upWhere to download and how to set it up

❑ ❑ How and when to use the primary SAX InterfacesHow and when to use the primary SAX Interfaces

Aug’10 – Dec ’10

Page 3: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Problem with DOMProblem with DOM

Before traversing starts, it has to build up a massive in-memory Before traversing starts, it has to build up a massive in-memory map of the documentmap of the document

This takes up space and timeThis takes up space and time

If used to extract small amount of information from the If used to extract small amount of information from the document, this can be extremely difficultdocument, this can be extremely difficult

Better suited for small XML documentsBetter suited for small XML documents

Aug’10 – Dec ’10

Page 4: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

How SAX worksHow SAX works

As the XML parser parses the documents, it returns a stream of As the XML parser parses the documents, it returns a stream of events back to the applicationevents back to the application

There are events for start of the document, end of the document, There are events for start of the document, end of the document, start and end of each element, contents of each element etcstart and end of each element, contents of each element etc

Once started , cannot interrupt the parser to go back and look at Once started , cannot interrupt the parser to go back and look at an earlier part of the documentan earlier part of the document

Unlike DOM, which gives access to the entire document at once, Unlike DOM, which gives access to the entire document at once, SAX stores little or nothing from event to eventSAX stores little or nothing from event to event

This makes SAX faster compared to DOMThis makes SAX faster compared to DOM

Aug’10 – Dec ’10

Page 5: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Where to get SAXWhere to get SAX

SAX is specified as a set of JAVA interfacesSAX is specified as a set of JAVA interfaces

Downloads available at http://saxproject.orgDownloads available at http://saxproject.org

Xerces-J – Parser developed to work with SAXXerces-J – Parser developed to work with SAXDownloads available at http://xml.apache.org/xerces-jDownloads available at http://xml.apache.org/xerces-j

Needs Java Development Kit, release 1.1 or laterNeeds Java Development Kit, release 1.1 or later

Aug’10 – Dec ’10

Page 6: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Receiving SAX EventsReceiving SAX Events

Write Java class that implements one of the SAX InterfacesWrite Java class that implements one of the SAX Interfaces

public class Myclass implements ContentHandlerpublic class Myclass implements ContentHandler

ContentHandler is the name of the interface. Most important ContentHandler is the name of the interface. Most important interface in SAXinterface in SAX

ContentHandler interface defines the callback methods for ContentHandler interface defines the callback methods for content related eventscontent related events

Better to use DefaultHandler class– provides default Better to use DefaultHandler class– provides default implementations of functions in ContentHandler interfaceimplementations of functions in ContentHandler interface

public class Myclass extends DefaultHandlerpublic class Myclass extends DefaultHandler

Aug’10 – Dec ’10

Page 7: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

ContentHandler InterfaceContentHandler Interface

Designed to control the reporting of events for the content of the Designed to control the reporting of events for the content of the documentdocument

Includes information about text, attributes, processing Includes information about text, attributes, processing instructions, elements and the document itselfinstructions, elements and the document itself

ContentHandler MethodsContentHandler Methods

Event Event DescriptionDescription

startDocument Event to notify the application that the parser has read the start of

the documentendDocument Event to notify the application

that the parser has read the end ofthe document

startElement Event to notify the application that the parser has read an element

start-tag Aug’10 – Dec ’10

Page 8: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Event Description

endElement Event to notify the application that the parser has read an elementend-tag.

skippedEntity Event to notify the application that the parser has skipped an external entity

processingInstruction Event to notify the application that the parser has read a processing instruction

startPrefixMapping Event to notify the application that the parser has read an XML namespace declaration, and that a new namespace prefix is in scope

Aug’10 – Dec ’10

ContentHandler InterfaceContentHandler Interface

Page 9: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Example : TrainReaderExample : TrainReader

createXMLReader functioncreateXMLReader function

XMLReader reader = XMLReader reader = XMLReaderFactory.createXMLReader( “org.apache.xerces.parsersXMLReaderFactory.createXMLReader( “org.apache.xerces.parsers.SAXParser” );.SAXParser” );

Creates an XMLReader object using a factory helper object by Creates an XMLReader object using a factory helper object by sending a registered parser name to the factory functionsending a registered parser name to the factory function

setContentHandler functionsetContentHandler function

reader.setContentHandler(this);reader.setContentHandler(this);

To tell the XMLReader which class should receive events about To tell the XMLReader which class should receive events about the content of the XML documentthe content of the XML document

Aug’10 – Dec ’10

Page 10: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Handling Element EventsHandling Element Events

startElement functionstartElement function

public void startElement(String uri, String localName, String public void startElement(String uri, String localName, String qName, Attributes atts)qName, Attributes atts)

The first three parameters help to identify the element the parser The first three parameters help to identify the element the parser encounteredencountered

The fourth parameter Attributes – to lookup attributes and valuesThe fourth parameter Attributes – to lookup attributes and values

endElement functionendElement function

public void endElement(String uri, String localName, String public void endElement(String uri, String localName, String qName)qName)

Aug’10 – Dec ’10

Page 11: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Handling Element EventsHandling Element Events startElementstartElement

The first three paramters help identify the element based on its The first three paramters help identify the element based on its namespace name and local name or by its prefixnamespace name and local name or by its prefix

This behavior allows to identify similar elements in different This behavior allows to identify similar elements in different vocabulariesvocabularies

If the parser encounters the following element : If the parser encounters the following element : <myPrefix:myElement xmlns:myPrefix=“http://example.com”><myPrefix:myElement xmlns:myPrefix=“http://example.com”>

uri uri http://example.comhttp://example.com

localNamelocalName myElementmyElement

qNameqName myPrefix:myElementmyPrefix:myElement

If there is no prefix for element name, then the localName and If there is no prefix for element name, then the localName and qName should be the sameqName should be the same

Aug’10 – Dec ’10

Page 12: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Handling Element EventsHandling Element Events AttributesAttributes

The Attributes interface gives the ability to easily lookup the The Attributes interface gives the ability to easily lookup the attributes and their values at the start of each elementattributes and their values at the start of each element

The default Attrributes interface provides the following The default Attrributes interface provides the following functions :functions :

getLengthgetLength Determine the number of attributes Determine the number of attributes available in the Attributes available in the Attributes

interfaceinterface

getIndexgetIndex Retrieves the index of a specific attribute in Retrieves the index of a specific attribute in the list. Uses attribute’s qualified name or the list. Uses attribute’s qualified name or both local name and namespace URIboth local name and namespace URI

getLocalNamegetLocalName Retrieves a specific attribute’s local name Retrieves a specific attribute’s local name by sending the index in the list.by sending the index in the list.

getQnamegetQname Retrieves a specific attribute’s Retrieves a specific attribute’s qualified qualified name by sending the index name by sending the index in the list.in the list.

getURIgetURI Retrieve a specif attribute’s namespace URI Retrieve a specif attribute’s namespace URI by sending the index in the list.by sending the index in the list.

Aug’10 – Dec ’10

Page 13: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Handling Element EventsHandling Element Events

getTypegetType Retrieve a specific attribute’s type by Retrieve a specific attribute’s type by sending the index in the list, by using the sending the index in the list, by using the attribute’s qualified name, or by using both attribute’s qualified name, or by using both the local name and the namespace URI.the local name and the namespace URI.

If there is no DTD, this function will If there is no DTD, this function will always always return CDATAreturn CDATA

getValuegetValue Retrieve a specific attribute’s value by Retrieve a specific attribute’s value by sending the index in the list, by using the sending the index in the list, by using the attribute’s qualified name, or by using both attribute’s qualified name, or by using both the local name and the namespace URI.the local name and the namespace URI.

Some parsers expose extended behavior through an interface Some parsers expose extended behavior through an interface called Attributes2, which allows to checkcalled Attributes2, which allows to check

whether an attribute was declared in a DTDwhether an attribute was declared in a DTD

whether or not the attribute value appeared in the XML documentwhether or not the attribute value appeared in the XML document

if it appeared because of a DTD attribute default declarationif it appeared because of a DTD attribute default declaration

Aug’10 – Dec ’10

Page 14: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Element and Attribute EventsElement and Attribute Eventspublic void startElement(String uri, String localName, String qName, public void startElement(String uri, String localName, String qName,

Attributes atts) throws SAXExceptionAttributes atts) throws SAXException{{

if(localName.equals(“car”))if(localName.equals(“car”)){{

if(atts != null)if(atts != null){{

System.out.println(“Car: “ + System.out.println(“Car: “ + atts.getValue(“type”));atts.getValue(“type”));

}}}}

}}

Output :Output :Running train readerRunning train readerStart of the trainStart of the trainCar : EngineCar : EngineCar : BaggageCar : BaggageCar : DiningCar : DiningEnd of the trainEnd of the train Aug’10 – Dec ’10

Page 15: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Handling Character ContentHandling Character Content

public void characters(char[] ch, int start, int len) throws public void characters(char[] ch, int start, int len) throws SAXExceptionSAXException

To retrieve character content between two tagsTo retrieve character content between two tags

Characters are delivered as a bufferCharacters are delivered as a buffer

start and len indicates the starting position and length of data to start and len indicates the starting position and length of data to be read if it is going to be copied from the bufferbe read if it is going to be copied from the buffer

Parser reports the characters for an element in multiple chunksParser reports the characters for an element in multiple chunks

Aug’10 – Dec ’10

Page 16: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Handling Character ContentHandling Character ContentTo retrieve character content in color tag in train.xmlTo retrieve character content in color tag in train.xml

private boolean isColor;private boolean isColor;private String trainCarType = “”;private String trainCarType = “”;private StringBuffer trainCarColor = new StringBuffer();private StringBuffer trainCarColor = new StringBuffer();

public void startElement(String uri, String localName, String qName, public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXExceptionAttributes atts) throws SAXException

{{

if(localName.equals(“car”))if(localName.equals(“car”)){{

if(atts ! = null)if(atts ! = null)trainCarType = atts.getValue(“type”);trainCarType = atts.getValue(“type”);

}}if(localName.equals(“color”))if(localName.equals(“color”)){{

trainCarColor.setLength(0);trainCarColor.setLength(0);isColor = true;isColor = true;

}}elseelse

isColor = false;isColor = false;}}

Aug’10 – Dec ’10

Page 17: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Handling Character ContentHandling Character Contentpublic void characters(char[] ch, int start, int len) throws SAXExceptionpublic void characters(char[] ch, int start, int len) throws SAXException{{

if(isColor)if(isColor)trainCarColor.append(ch,start,len);trainCarColor.append(ch,start,len);

}}

public void endElement(String uri, String localName, String qName) public void endElement(String uri, String localName, String qName) throws SAXExceptionthrows SAXException

{{if(isColor)if(isColor){{

System.out.println(“The color of the” + trainCarType + “ car System.out.println(“The color of the” + trainCarType + “ car is “ + trainCarColor.toString());is “ + trainCarColor.toString());}}

}}

Aug’10 – Dec ’10

Page 18: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Handling Character ContentHandling Character Content

Output:Output:

Running train readerRunning train readerStart of the trainStart of the trainThe color of the Engine car is BlackThe color of the Engine car is BlackThe color of the Baggage car is GreenThe color of the Baggage car is GreenThe color of the Dining car is Green and YellowThe color of the Dining car is Green and YellowEnd of the trainEnd of the train

Aug’10 – Dec ’10

Page 19: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

When to ignore IgnorableWhitespaceWhen to ignore IgnorableWhitespace

public void ignorableWhitespace(char[] ch, int start, int len) public void ignorableWhitespace(char[] ch, int start, int len) throws SAXExceptionthrows SAXException

Similar to characters eventSimilar to characters event

Parser may call this function multiple times within a single Parser may call this function multiple times within a single elementelement

Whitespaces such as spaces, tabs and line feeds which are used Whitespaces such as spaces, tabs and line feeds which are used to make the xml document more readable are often not important to make the xml document more readable are often not important to the applicationto the application

<car type=“Engine”><car type=“Engine”><color>Black</color><color>Black</color><weight>512 tons</weight><weight>512 tons</weight>

</car></car>

Aug’10 – Dec ’10

Page 20: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

When to ignore IgnorableWhitespaceWhen to ignore IgnorableWhitespace

The only way for the SAX parser to know that the whitespace is The only way for the SAX parser to know that the whitespace is ignorable is when an element is declared in DTD to not contain ignorable is when an element is declared in DTD to not contain PCDATAPCDATA

Only validating parsers can report this eventOnly validating parsers can report this event

If parser has no knowledge of the DTD, then it assumes that all If parser has no knowledge of the DTD, then it assumes that all character data including whitespace is importantcharacter data including whitespace is important

Aug’10 – Dec ’10

Page 21: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Skipped EntitiesSkipped Entities

Alerts the application that the SAX parser has encountered Alerts the application that the SAX parser has encountered information that the application can or must skipinformation that the application can or must skip

An entity can be skipped for several reasons : -An entity can be skipped for several reasons : -

The entity is a reference to an external resource that cannot be parsed The entity is a reference to an external resource that cannot be parsed or cannot be foundor cannot be found

The entity is an external general entity and the The entity is an external general entity and the http://xml.org/sax/features/external-general-entities feature is set to http://xml.org/sax/features/external-general-entities feature is set to falsefalse

The entity is an external parameter entity and the The entity is an external parameter entity and the http://xml.org/sax/features/external-parameter-entities feature is set to http://xml.org/sax/features/external-parameter-entities feature is set to falsefalse

public void skippedEntity (String name) throws SAXExceptionpublic void skippedEntity (String name) throws SAXException

Aug’10 – Dec ’10

Page 22: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Skipped EntitiesSkipped Entities

The skippedEntity event is declared as follows:-The skippedEntity event is declared as follows:-

public void skippedEntity (String name) throws SAXExceptionpublic void skippedEntity (String name) throws SAXException

The name parameter is the name of the entity that was skippedThe name parameter is the name of the entity that was skipped

The name parameter will begin with “%” in case of a parameter The name parameter will begin with “%” in case of a parameter entityentity

SAX considers the external DTD subset an entitySAX considers the external DTD subset an entity

If the name parameter is “[dtd]” it means the external DTD If the name parameter is “[dtd]” it means the external DTD subset was not processed.subset was not processed.

Aug’10 – Dec ’10

Page 23: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Processing InstructionsProcessing Instructions

To pass specific instructions to applicationsTo pass specific instructions to applications

SAX allows to receive these special instructions in application SAX allows to receive these special instructions in application through the processingInstruction eventthrough the processingInstruction event

public void processingInstruction (String target, String data) public void processingInstruction (String target, String data) throws SAXExceptionthrows SAXException If the processing instruction in the XML document is as follows :If the processing instruction in the XML document is as follows :

<?TrainApplication instructionForTrainPrograms?><?TrainApplication instructionForTrainPrograms?>

target target -- TrainApplicationTrainApplication

datadata - - instructionForTrainProgramsinstructionForTrainPrograms

XML declaration is not a processing instructionXML declaration is not a processing instruction

Aug’10 – Dec ’10

Page 24: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Namespace PrefixesNamespace Prefixes

SAX processors fire a startPrefixMapping and endPrefixMapping SAX processors fire a startPrefixMapping and endPrefixMapping event for any namespace declarationevent for any namespace declaration

public void startPrefixMapping (String prefix, String uri) throws public void startPrefixMapping (String prefix, String uri) throws SAXExceptionSAXException

public void endPrefixMapping (String prefix) throws SAXExceptionpublic void endPrefixMapping (String prefix) throws SAXException

The prefix parameter is the namespace prefix that is being The prefix parameter is the namespace prefix that is being declareddeclared

In case of a default namespace declaration, the prefix should be In case of a default namespace declaration, the prefix should be an empty stringan empty string

The uri parameter is the namespace URI that is being declaredThe uri parameter is the namespace URI that is being declared

xmlns:example = http://example.comxmlns:example = http://example.com prefix prefix -- exampleexampleuriuri -- http://example.comhttp://example.com

Aug’10 – Dec ’10

Page 25: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Stopping the process in exceptional Stopping the process in exceptional circumstancescircumstances

To stop processing, create and throw new SAXExceptionTo stop processing, create and throw new SAXException

For example, to check and throw exception if the Engine color is not BlackFor example, to check and throw exception if the Engine color is not Black

public void endElement(String uri, String localName, String qName) throws public void endElement(String uri, String localName, String qName) throws SAXExceptionSAXException{{

if(isColor)if(isColor){{ System.out.println(“The color of the “ + trainCarType + “ car System.out.println(“The color of the “ + trainCarType + “ car

is “+trainCarColor.toString());is “+trainCarColor.toString()); if ((trainCarType.equals(“Engine”)) && (!if ((trainCarType.equals(“Engine”)) && (!

trainCarColor.toString().equals(“Black”))trainCarColor.toString().equals(“Black”)) {{

throw new SAXException(“The engine is not black ! “);throw new SAXException(“The engine is not black ! “); }}}}isColor = false;isColor = false;

}}

If the Engine color is not Black, parsing process will be stopped.If the Engine color is not Black, parsing process will be stopped. Aug’10 – Dec ’10

Page 26: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Stopping the process in exceptional Stopping the process in exceptional circumstancescircumstances

Output:Output:

Running train reader..Running train reader..Start of the trainStart of the trainThe color of the Engine car is RedThe color of the Engine car is RedException in thread “main” org.xml.sax.SAXException : The engine is Exception in thread “main” org.xml.sax.SAXException : The engine is

not black !not black !at TrainReader.endElement (TrainReader.java:80)at TrainReader.endElement (TrainReader.java:80)at org.apache.xerces…at org.apache.xerces…........at TrainReader.readat TrainReader.readat TrainReader.mainat TrainReader.main

When the exception is raised it stops the whole applicationWhen the exception is raised it stops the whole application

This is because the exception is not handled anywhereThis is because the exception is not handled anywhere

Aug’10 – Dec ’10

Page 27: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Stopping the process in exceptional Stopping the process in exceptional circumstancescircumstances

Add a try..catch block to handle exceptionAdd a try..catch block to handle exception

public void read (String filename) throws Exceptionpublic void read (String filename) throws Exception{{

XMLReader reader = XMLReader reader = XMLReaderFactory.createXMLReader( “org.apache.xerces.parsers.SAXParsXMLReaderFactory.createXMLReader( “org.apache.xerces.parsers.SAXParser”);er”);

reader.setContentHandler(this);reader.setContentHandler(this);trytry{{

reader.parse(fileName);reader.parse(fileName);}}catch (SAXException e)catch (SAXException e){{

System.out.println(“Parsing stopped ! “ + System.out.println(“Parsing stopped ! “ + e.getmessage());e.getmessage());

}}}}

Output:Output:Running train reader..Running train reader..Start of the trainStart of the trainThe color of the Engine car is RedThe color of the Engine car is RedParsing stopped ! The engine is not black !Parsing stopped ! The engine is not black !

Aug’10 – Dec ’10

Page 28: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Providing the location of the ErrorProviding the location of the Error

SAX can provide line number and column position information of SAX can provide line number and column position information of the error using setDocumentLocator eventthe error using setDocumentLocator event

setDocumentLocator event allows the parser to pass the setDocumentLocator event allows the parser to pass the application a Locator interfaceapplication a Locator interface

The methods of the Locator object include :The methods of the Locator object include :

getLineNumbergetLineNumber Retrieves the line number for the Retrieves the line number for the current eventcurrent event

getColumnNumbergetColumnNumber Retrieves the column number for Retrieves the column number for the current eventthe current event

getSystemIdgetSystemId Retrieves the system identifier of Retrieves the system identifier of the document for the current eventthe document for the current event

getPublicIdgetPublicId Retrieves the public identifier of Retrieves the public identifier of the the document for the current eventdocument for the current event

Aug’10 – Dec ’10

Page 29: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Providing the location of the ErrorProviding the location of the Errorprivate Locator trainLocator = null;private Locator trainLocator = null;

public void setDocumentLocator (Locator loc)public void setDocumentLocator (Locator loc){{

trainLocator = loc;trainLocator = loc;}}

public void endElement(String uri, String localName, String qName) throws public void endElement(String uri, String localName, String qName) throws SAXExceptionSAXException{{

if(isColor)if(isColor){{ System.out.println(“The color of the “ + trainCarType + “ car System.out.println(“The color of the “ + trainCarType + “ car

is “+trainCarColor.toString());is “+trainCarColor.toString()); if ((trainCarType.equals(“Engine”)) && (!if ((trainCarType.equals(“Engine”)) && (!

trainCarColor.toString().equals(“Black”))trainCarColor.toString().equals(“Black”)) {{

if (trainLocator != null)if (trainLocator != null)throw new SAXException(“The engine is not throw new SAXException(“The engine is not

black ! at line “ + trainLocator.getLineNumber() + “, column” + black ! at line “ + trainLocator.getLineNumber() + “, column” + trainLocator.getColumnNumber() );trainLocator.getColumnNumber() );

}}}}isColor = false;isColor = false;

}}

Aug’10 – Dec ’10

Page 30: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Providing the location of the ErrorProviding the location of the ErrorOutput :Output :

Running train reader..Running train reader..Start of the trainStart of the trainThe color of the Engine car is RedThe color of the Engine car is RedParsing stopped ! The engine is not black ! at line 4, Parsing stopped ! The engine is not black ! at line 4,

column 20column 20

Locator object : - easily notify the user where the error occurred in Locator object : - easily notify the user where the error occurred in the XML documentthe XML document

The information provided by the Locator object is not always The information provided by the Locator object is not always absoluteabsolute

Aug’10 – Dec ’10

Page 31: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

ErrorHandler InterfaceErrorHandler Interface

To receive error eventsTo receive error events

Add call to setErrorHandler and set validation featureAdd call to setErrorHandler and set validation feature

warningwarning Allows the parser to notify the Allows the parser to notify the application of a warning it has application of a warning it has encountered in the parsing encountered in the parsing

processprocess

errorerror Allows the parser to notify the Allows the parser to notify the application that it has encountered an application that it has encountered an error. Even though the parser has error. Even though the parser has encountered an error, parsing can encountered an error, parsing can continue.continue.

fatalErrorfatalError Allows the parser to notify the Allows the parser to notify the application that it has application that it has

encountered a encountered a fatal error and cannot fatal error and cannot continue continue parsing. Well-parsing. Well-formedness errors are formedness errors are reported through reported through this eventthis event Aug’10 – Dec ’10

Page 32: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

ErrorHandler InterfaceErrorHandler InterfaceAdd internal DTD to validate the documentAdd internal DTD to validate the document

<?xml version=“1.0”?><?xml version=“1.0”?><!DOCTYPE train [<!DOCTYPE train [

<!ELEMENT train (car*)><!ELEMENT train (car*)><!ELEMENT car (color,weight,length,occupants)><!ELEMENT car (color,weight,length,occupants)><!ATTLIST car type CDATA #IMPLIED><!ATTLIST car type CDATA #IMPLIED><!ELEMENT color (#PCDATA)><!ELEMENT color (#PCDATA)><!ELEMENT weight (#PCDATA)><!ELEMENT weight (#PCDATA)><!ELEMENT length (#PCDATA)><!ELEMENT length (#PCDATA)><!ELEMENT occupants (#PCDATA)><!ELEMENT occupants (#PCDATA)>

]>]>

Aug’10 – Dec ’10

Page 33: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

ErrorHandler InterfaceErrorHandler InterfaceModifying read functionModifying read function

public void read(String fileName) throws Exceptionpublic void read(String fileName) throws Exception{{

XMLReader reader = XMLReader reader = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParseXMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");r");

reader.setContentHandler(this);reader.setContentHandler(this);reader.setErrorHandler(this);reader.setErrorHandler(this);trytry{{

reader.setFeature( reader.setFeature( “http://xml.org/sax/features/validation”, true);“http://xml.org/sax/features/validation”, true);

}}catch(SAXException e)catch(SAXException e){{

System.err.println(“Cannot activate validation”);System.err.println(“Cannot activate validation”);}}trytry{{

reader.parse(fileName);reader.parse(fileName);}}catch(SAXException e)catch(SAXException e){{

System.out.println(“Parsing stopped ! “ + System.out.println(“Parsing stopped ! “ + e.getMessage());e.getMessage());

}}}} Aug’10 – Dec ’10

Page 34: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

ErrorHandler InterfaceErrorHandler InterfaceAdd Error Handling FunctionsAdd Error Handling Functions

public void warning (SAXParseException exception) throws SAXExceptionpublic void warning (SAXParseException exception) throws SAXException{{

System.err.println(“[Warning]System.err.println(“[Warning] “ + exception.getMessage() + “ at line “ + exception.getMessage() + “ at line “ + exception.getLineNumber() + “, column “ + “ + exception.getLineNumber() + “, column “ + exception.getColumnNumber());exception.getColumnNumber());

}}

public void error (SAXParseException exception) throws SAXExceptionpublic void error (SAXParseException exception) throws SAXException{{

System.err.println(“[Error]System.err.println(“[Error] “ + exception.getMessage() + “ at line “ + “ + exception.getMessage() + “ at line “ + exception.getLineNumber() + “, column “ + exception.getColumnNumber());exception.getLineNumber() + “, column “ + exception.getColumnNumber());

}}

public void fatalError (SAXParseException exception) throws SAXExceptionpublic void fatalError (SAXParseException exception) throws SAXException{{

System.err.println(“[Fatal Error]“ + exception.getMessage() + “ at line “ + System.err.println(“[Fatal Error]“ + exception.getMessage() + “ at line “ + exception.getLineNumber() + “, column “ + exception.getColumnNumber());exception.getLineNumber() + “, column “ + exception.getColumnNumber());throw exception;throw exception;

}}

Introduce Errors into the XML document and check :Introduce Errors into the XML document and check :

Change Element to new Element nameChange Element to new Element nameRemove closing > bracket for tag – Violating Well formednessRemove closing > bracket for tag – Violating Well formedness

Aug’10 – Dec ’10

Page 35: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

DTDHandler InterfaceDTDHandler Interface To receive events about declarationsTo receive events about declarations

It supports generating events for only notations and unparsed It supports generating events for only notations and unparsed entitiesentities

NotationDeclNotationDecl Allows the parser to notify the Allows the parser to notify the application that it has read a notation application that it has read a notation declarationdeclaration

UnparsedEntityDeclUnparsedEntityDecl Allows the parser to notify the Allows the parser to notify the application that it has read an application that it has read an unparsed entity declaration.unparsed entity declaration.

Events for declarations of elements, attributes and internal Events for declarations of elements, attributes and internal entities are made available in one of the extension interfaces, entities are made available in one of the extension interfaces, DeclHandlerDeclHandler

To use the DTDHandler interface,To use the DTDHandler interface,reader.setDTDHandler(this);reader.setDTDHandler(this);

Aug’10 – Dec ’10

Page 36: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

EntityResolver InterfaceEntityResolver Interface

Allows to control how a SAX parser behaves when it attempts to Allows to control how a SAX parser behaves when it attempts to resolve external entity references within the DTD.resolve external entity references within the DTD.

The EntityResolver interface defines one function:The EntityResolver interface defines one function:

resolveEntityresolveEntity Allows the application to handle the Allows the application to handle the resolution of entity lookups for the resolution of entity lookups for the parserparser

To use EntityResolver interface :To use EntityResolver interface :reader.setEntityResolver(this);reader.setEntityResolver(this);

Allows application to control how the processor opens and Allows application to control how the processor opens and connects to external resources.connects to external resources.

Aug’10 – Dec ’10

Page 37: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Features and PropertiesFeatures and PropertiesSome behavior of SAX parsers is controlled through setting features Some behavior of SAX parsers is controlled through setting features

and propertiesand properties

Working with FeaturesWorking with Features

To change the value of a feature in SAX, call the setFeature To change the value of a feature in SAX, call the setFeature function of the XMLReaderfunction of the XMLReader

public void setFeature(String name, boolean value) throws public void setFeature(String name, boolean value) throws SAXNotRecognizedException, SAXNotSupportedExceptionSAXNotRecognizedException, SAXNotSupportedException

Parsers may not support or recognize every featureParsers may not support or recognize every feature

The getFeature function allows to check the value of any featureThe getFeature function allows to check the value of any feature

public boolean getFeature (String name) throws public boolean getFeature (String name) throws SAXNotRecognizedException, SAXNotSupportedExceptionSAXNotRecognizedException, SAXNotSupportedException

Aug’10 – Dec ’10

Page 38: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Features and PropertiesFeatures and PropertiesWorking with FeaturesWorking with Features

http://xml.org/sax/features/validationhttp://xml.org/sax/features/validationControls whether or not the parser will validate the Controls whether or not the parser will validate the

document document as it parsesas it parses

http://xml.org/sax/features/external-general-entities http://xml.org/sax/features/external-general-entities Controls whether or not external general entities should Controls whether or not external general entities should

be be processedprocessed

http://xml.org/sax/features/external-parameter-entities http://xml.org/sax/features/external-parameter-entities Controls whether or not external parameter entities Controls whether or not external parameter entities

should be should be processedprocessed

http://xml.org/sax/features/xml-1.1 http://xml.org/sax/features/xml-1.1 Read-only property that returns true if the parser Read-only property that returns true if the parser

supports supports XML 1.1 and XML 1.0XML 1.1 and XML 1.0

Aug’10 – Dec ’10

Page 39: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Features and PropertiesFeatures and Properties

Working with PropertiesWorking with Properties

Used to connect helper objects to an XMLReaderUsed to connect helper objects to an XMLReader

SAX comes with an extension set of interfaces called DeclHandler SAX comes with an extension set of interfaces called DeclHandler and LexicalHandler that’s allows to receive additional events and LexicalHandler that’s allows to receive additional events about the XML documentabout the XML document

The only way to register these events with the XMLReader is The only way to register these events with the XMLReader is through the setProperty functionthrough the setProperty function

public void setProperty(String name, Object value) throws public void setProperty(String name, Object value) throws SAXNotRecognizedException, SAXNotSupportedExceptionSAXNotRecognizedException, SAXNotSupportedException

public object getProperty(String name) throws public object getProperty(String name) throws SAXNotRecognizedException, SAXNotSupportedExceptionSAXNotRecognizedException, SAXNotSupportedException

Aug’10 – Dec ’10

Page 40: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Features and PropertiesFeatures and Properties

Working with PropertiesWorking with Properties

http://xml.org/sax/properties/declaration-handlerhttp://xml.org/sax/properties/declaration-handlerSpecifies the DeclHandler object registered to receive Specifies the DeclHandler object registered to receive

events events for declarations within the DTDfor declarations within the DTD

http://xml.org/sax/properties/lexical-handlerhttp://xml.org/sax/properties/lexical-handlerSpecifies the LexicalHandler object registered to receive Specifies the LexicalHandler object registered to receive lexical events such as comments, CDATA sections and lexical events such as comments, CDATA sections and

entity entity referencesreferences

http://xml.org/sax/properties/document-xml-versionhttp://xml.org/sax/properties/document-xml-versionRead-only property that describes the actual version of Read-only property that describes the actual version of

the the XML document such as “1.0” or “1.1”XML document such as “1.0” or “1.1”

Aug’10 – Dec ’10

Page 41: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Extension InterfacesExtension Interfaces

DeclHandler Interface – for declarations within the DTDDeclHandler Interface – for declarations within the DTD

The DeclHandler interface declares the following events:The DeclHandler interface declares the following events:

AttributeDeclAttributeDecl Allows the parser to notify the application Allows the parser to notify the application that that it has read an attribute declarationit has read an attribute declaration

ElementDeclElementDecl Allows the parser to notify the application Allows the parser to notify the application that that it has read an element declarationit has read an element declaration

ExternalEntityDeclExternalEntityDecl Allows the parser to notify the application that Allows the parser to notify the application that it has read an external entity declarationit has read an external entity declaration

InternalEntityDeclInternalEntityDecl Allows the parser to notify the application that Allows the parser to notify the application that it has read an internal entity declarationit has read an internal entity declaration

Aug’10 – Dec ’10

Page 42: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Extension InterfacesExtension Interfaces

LexicalHandler Interface – for lexical eventsLexicalHandler Interface – for lexical events

The LexicalHandler interface declares the following events:The LexicalHandler interface declares the following events:

commentcomment Allows the parser to notify the application that Allows the parser to notify the application that it has read a commentit has read a comment

startCDATAstartCDATA Allows the parser to notify the application that Allows the parser to notify the application that it has encountered a CDATA section start it has encountered a CDATA section start

markermarker

endCDATAendCDATA Allows the parser to notify the application that Allows the parser to notify the application that it has encountered a CDATA section end it has encountered a CDATA section end

markermarker

Other events supported are startDTD, endDTD, startEntity, endEntityOther events supported are startDTD, endDTD, startEntity, endEntity

To register,To register,

reader.setProperty(“http://xml.org/sax/properties/lexical-handler”, this);reader.setProperty(“http://xml.org/sax/properties/lexical-handler”, this); Aug’10 – Dec ’10

Page 43: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Good SAX and Bad SAXGood SAX and Bad SAXAdvantagesAdvantages

SimpleSimple

Doesn’t load the whole document into memoryDoesn’t load the whole document into memory

The parser has a smaller footprint than DOMThe parser has a smaller footprint than DOM

It is fasterIt is faster

Focuses on real content rather than the way it is laid outFocuses on real content rather than the way it is laid out

Good for filtering data and lets concentrate on the subset of Good for filtering data and lets concentrate on the subset of interestinterest

Aug’10 – Dec ’10

Page 44: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Good SAX and Bad SAXGood SAX and Bad SAX

DisadvantagesDisadvantages

Receive the data in the order SAX gives. Absolutely no control Receive the data in the order SAX gives. Absolutely no control over the order in which the parser searchesover the order in which the parser searches

SAX programming requires fairly intricate state keepingSAX programming requires fairly intricate state keeping

If the focus is on analyzing an entire document, DOM is much If the focus is on analyzing an entire document, DOM is much betterbetter

Aug’10 – Dec ’10

Page 45: Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.

Consumers, Producers and FiltersConsumers, Producers and Filters

In addition to consuming events from an XMLReader, it is possible In addition to consuming events from an XMLReader, it is possible to write a class that produce SAX eventsto write a class that produce SAX events

eg: class that reads a comma-delimited file and fires SAX eventseg: class that reads a comma-delimited file and fires SAX events

Can filter events as they pass from XMLReader to event handlerCan filter events as they pass from XMLReader to event handler

A SAX filter acts as a middleman between the parser and the A SAX filter acts as a middleman between the parser and the applicationapplication

Filters can insert, remove or modify events before passing them Filters can insert, remove or modify events before passing them on to the applicationon to the application

Aug’10 – Dec ’10

Other LanguagesOther Languages

C++, Perl, Python, Pascal, Visual Basic, .NET, CurlC++, Perl, Python, Pascal, Visual Basic, .NET, Curl