Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node...

16
Unit 10 Unit 10 Schema Data Processing Schema Data Processing

Transcript of Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node...

Page 1: Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node hierarchy.

Unit 10Unit 10

Schema Data Schema Data ProcessingProcessing

Page 2: Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node hierarchy.

Key ConceptsKey Concepts• DOM basics• DOM components• Nodes• Node types• Node hierarchy

Page 3: Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node hierarchy.

Document Object ModelDocument Object Model• In-memory data storage.• Hierarchical data storage model.• Allows navigation between nodes.• Allows data insertion and data

retrieval from the DOM• Language-neutral representation of

data• Supported by .NET languages

Page 4: Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node hierarchy.

Some DOM-based parsersSome DOM-based parsers Parser Description

JAXP Sun Microsystem’s Java API for XML Parsing (JAXP) is available at no charge from java.sun.com/xml.

XML4J IBM’s XML Parser for Java (XML4J) is available at no charge from www.alphaworks.ibm.com/tech/xml4j.

Xerces Apache’s Xerces Java Parser is available at no charge from xml.apache.org/xerces.

msxml Microsoft’s XML parser (msxml) version 2.0 is built-into Internet Explorer 5.5. Version 3.0 is also available at no charge from msdn.microsoft.com/xml.

4DOM 4DOM is a parser for the Python programming language and is available at no charge from fourthought.com/4Suite/4DOM.

XML::DOM XML::DOM is a Perl module that we use in Chapter 17 to manipulate XML documents using Perl. For additional information, visit www-4.ibm.com/software/developer/library/xml-perl2.

Page 5: Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node hierarchy.

DOM ComponentsDOM Components

Class/Interface Description

Document interface Represents the XML document’s top-level node, which provides access to all the document’s nodes—including the root element.

Node interface Represents an XML document node.

NodeList interface Represents a read-only list of Node objects.

Element interface Represents an element node. Derives from Node.

Attr interface Represents an attribute node. Derives from Node.

CharacterData interface Represents character data. Derives from Node.

Text interface Represents a text node. Derives from CharacterData.

Comment interface Represents a comment node. Derives from CharacterData.

ProcessingInstruction interface

Represents a processing instruction node. Derives from Node.

CDATASection interface Represents a CDATA section. Derives from Text.

Page 6: Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node hierarchy.

XML NodesXML Nodes<root>

<name>

<first>Joe</first>

<middle/>

<last>Smith</last>

</name>

<root>

Page 7: Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node hierarchy.

Node TypesNode TypesNode type Description

Document The document root, which is a container for all document nodes.

DocumentFragment Temporary containing holding a subset of document nodes.

DocumentType A <!DOCTYPE…> node.

EntityReference Entity reference text.

Element An element node.

Attr An attributed of an element, represented as a node.

Page 8: Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node hierarchy.

Node Types (cont'd)Node Types (cont'd)Node type Description

ProcessingInstruction Node containing processing instruction information.

Comment Comment node.

Text Text value of an element or attribute.

CDATASection Node representing a CDATA section.

Entity DTD <!ENTITY…> declaration.

Notation DTD notation declaration.

Page 9: Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node hierarchy.

Creating an Creating an XML DocumentXML Document

1 // Fig. 8.14 : BuildXml.java2 // Creates element node, attribute node, comment node,3 // processing instruction and a CDATA section.45 import java.io.*;6 import org.w3c.dom.*;7 import org.xml.sax.*;8 import javax.xml.parsers.*;9 import com.sun.xml.tree.XmlDocument;1011 public class BuildXml {12 private Document document; 1314 public BuildXml()15 {1617 DocumentBuilderFactory factory =18 DocumentBuilderFactory.newInstance();1920 try {2122 // get DocumentBuilder23 DocumentBuilder builder = 24 factory.newDocumentBuilder();25

Page 10: Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node hierarchy.

Creating an Creating an XML Document (cont'd)XML Document (cont'd)26 // create root node 27 document = builder.newDocument();28 } 29 catch ( ParserConfigurationException pce ) {30 pce.printStackTrace();31 }3233 Element root = document.createElement( "root" );34 document.appendChild( root );3536 // add a comment to XML document37 Comment simpleComment = document.createComment( 38 "This is a simple contact list" );39 root.appendChild( simpleComment );4041 // add a child element42 Node contactNode = createContactNode( document );43 root.appendChild( contactNode );4445 // add a processing instruction46 ProcessingInstruction pi = 47 document.createProcessingInstruction(48 "myInstruction", "action silent" );49 root.appendChild( pi );50

Page 11: Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node hierarchy.

Creating an Creating an XML Document (cont'd)XML Document (cont'd)

Parser Description

JAXP Sun Microsystem’s Java API for XML Parsing (JAXP) is available at no charge from java.sun.com/xml.

XML4J IBM’s XML Parser for Java (XML4J) is available at no charge from www.alphaworks.ibm.com/tech/xml4j.

Xerces Apache’s Xerces Java Parser is available at no charge from xml.apache.org/xerces.

msxml Microsoft’s XML parser (msxml) version 2.0 is built-into Internet Explorer 5.5. Version 3.0 is also available at no charge from msdn.microsoft.com/xml.

4DOM 4DOM is a parser for the Python programming language and is available at no charge from fourthought.com/4Suite/4DOM.

XML::DOM XML::DOM is a Perl module that we use in Chapter 17 to manipulate XML documents using Perl. For additional information, visit www-4.ibm.com/software/developer/library/xml-perl2.

51 // add a CDATA section52 CDATASection cdata =

document.createCDATASection(53` "I can add <, >, and ?" ); 54 root.appendChild( cdata ); 5556 try { 5758 // write the XML document to a file59 ( (XmlDocument) document).write( new

FileOutputStream(60 "myDocument.xml" ) ); 61 } 62 catch ( IOException ioe ) {63 ioe.printStackTrace();64 }65 }6667 public Node createContactNode( Document

document )68 {6970 // create FirstName and LastName

elements 71 Element firstName = document.createElement( "FirstName" );72 Element lastName = document.createElement( "LastName" );73

74 firstName.appendChild( document.createTextNode( "Sue" ) );75 lastName.appendChild( document.createTextNode( "Green" ) );76

Page 12: Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node hierarchy.

Creating an Creating an XML Document (cont'd)XML Document (cont'd)

77 // create contact element78 Element contact = document.createElement( "contact" );7980 // create an attribute81 Attr genderAttribute = document.createAttribute( "gender" ); 82 genderAttribute.setValue( "F" );8384 // append attribute to contact element85 contact.setAttributeNode( genderAttribute );86 contact.appendChild( firstName );87 contact.appendChild( lastName );88 return contact;89 } 90 91 public static void main( String args[] )92 {93 BuildXml buildXml = new BuildXml(); 94 }95 }

Page 13: Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node hierarchy.

XML XML Document Document ResultResult

1 // Fig. 8.15 : TraverseDOM.java

2 // Traverses DOM and prints various nodes.

3

4 import java.io.*;

5 import org.w3c.dom.*;

6 import org.xml.sax.*;

7 import javax.xml.parsers.*;

8 import com.sun.xml.tree.XmlDocument;

9

10 public class TraverseDOM {

11 private Document document;

12

13 public TraverseDOM( String file )

14 {

15 try {

16

17 // obtain the default parser

18 DocumentBuilderFactory factory =

19 DocumentBuilderFactory.newInstance();

20 factory.setValidating( true );

21 DocumentBuilder builder = factory.newDocumentBuilder();

22

23 // set error handler for validation errors

24 builder.setErrorHandler( new MyErrorHandler() );

25

Page 14: Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node hierarchy.

XML Document Result (cont’d)XML Document Result (cont’d)

22

23 // set error handler for validation errors

24 builder.setErrorHandler( new MyErrorHandler() );

25

26 // obtain document object from XML document27 document = builder.parse( new File( file ) );28 processNode( document );29 } 30 catch ( SAXParseException spe ) {31 System.err.println( 32 "Parse error: " + spe.getMessage() );33 System.exit( 1 );34 }35 catch ( SAXException se ) {36 se.printStackTrace(); 37 }38 catch ( FileNotFoundException fne ) {39 System.err.println( "File \'" 40 + file + "\' not found. " );41 System.exit( 1 );42 }43 catch ( Exception e ) {44 e.printStackTrace();45 }46 }4748 public void processNode( Node currentNode )49 {50 switch ( currentNode.getNodeType() ) {51

Page 15: Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node hierarchy.

XML XML Document Document

Result Result (cont’d)(cont’d)

52 // process a Document node53 case Node.DOCUMENT_NODE:54 Document doc = ( Document ) currentNode;5556 System.out.println( 57 "Document node: " + doc.getNodeName() +58 "\nRoot element: " +59 doc.getDocumentElement().getNodeName() );60 processChildNodes( doc.getChildNodes() );61 break;6263 // process an Element node64 case Node.ELEMENT_NODE: 65 System.out.println( "\nElement node: " + 66 currentNode.getNodeName() );67 NamedNodeMap attributeNodes =68 currentNode.getAttributes();6970 for ( int i = 0; i < attributeNodes.getLength(); i++){

71 Attr attribute = ( Attr ) attributeNodes.item( i );

7273 System.out.println( "\tAttribute: " + 74 attribute.getNodeName() + " ; Value = " +75 attribute.getNodeValue() );76 }7778 processChildNodes( currentNode.getChildNodes() );79 break;80

Page 16: Unit 10 Schema Data Processing. Key Concepts DOM basics DOM components Nodes Node types Node hierarchy.

XML XML Document Document

Result Result (cont’d)(cont’d)

81 // process a text node and a CDATA section82 case Node.CDATA_SECTION_NODE:83 case Node.TEXT_NODE: 84 Text text = ( Text ) currentNode;8586 if ( !text.getNodeValue().trim().equals( "" ) )87 System.out.println( "\tText: " +88 text.getNodeValue() );89 break;90 }91 }9293 public void processChildNodes( NodeList children )94 {95 if ( children.getLength() != 0 ) 9697 for ( int i = 0; i < children.getLength(); i++)98 processNode( children.item( i ) );99 }100101 public static void main( String args[] )102 {103 if ( args.length < 1 ) {104 System.err.println( 105 "Usage: java TraverseDOM <filename>" );106 System.exit( 1 );107 }108109 TraverseDOM traverseDOM = new TraverseDOM( args[ 0 ] ); 110 }111}