1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - –...

50
© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher invited talk at UTS - 29/03/06 - XML Data Management / 1 XML Data Management Prof. Dr. Stefan Böttcher University of Paderborn (Germany) 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX , XPath 3. XSLT

Transcript of 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - –...

Page 1: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 1

XML Data Management

Prof. Dr. Stefan BöttcherUniversity of Paderborn (Germany)

1. XML standards: XML , DTD , XML Schema

2. DOM , SAX , XPath

3. XSLT

Page 2: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 2

Data centric XML - XML data storage

<doc><order>

<customer> Alice </customer><PC> pc400 </PC>

</order><order> <customer> Bob </customer>

<PC> pc500 </PC></order><order>

<customer> Carla </customer><PC> pc600 </PC>

</order></doc>

% customer PCorder( ).

).).

order(order(

Alice pc400Bob pc500

Carla pc600

doc

markup (tag)

content

Page 3: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 3

eXtended Markup Language (XML)XML - a family of standards:

XML (eXtensible Markup Language) data format exchangable accross different operating systems, applications, and enterprisesoften used for content

XPathpath expressions used for navigation in XML treesused within other XML standards (e.g. XSL(T))

XSL (eXtensible Stylesheet Language)used to describe layout of content / to convert data

many more standards: XQuery ( queries ) , DTD ( type definition ) , XML-Schema ( integrity constraints )

Page 4: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 4

Unique Standard for ContentDTD or XML Schema:

defines structure of all XML trees exchanged=> unique data format for all participants

data formats exchangable accross company borders

New data exchange formats and languages based on XML example:

ebXML (E-Business XML) as a basis forOTA (Open Travel Association)

data exchange between travel agency , airline etc.

Consequence of these standards: ( economic ) force to use the standard

Page 5: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 5

Separation of content and layout

content (product1.xml)

content (product2.xml)

layout (customer1.xsl)

layout ( technican2.xsl)

HTML file

combines requested data with requested layout

Page 6: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 6

Separation of content and layout (2)consequences:

• 1 (content) data source for different layouts(technican, seller, customer, re-seller, ...)

• layout may change without changing content( different logo, different seller or customer,

different employee or job, new view of data )

• reuse 1 layout for different content( frame with company logo, ...)

• content may change without changing layout ( new prices , … )

Page 7: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 7

XML on Java servers

• XML + XSL separate layout and content• layout (.xsl file) • content data (.xml file) • combine them in the web server

ServletBrowserHTML-page

client server

calls

generatedHTML page

inputtransformXML+XSL

HTML

XMLfile

XSLfile

Page 8: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 8

XML syntaxXML - Prolog:

<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>

<?xml-stylesheet type="text/xsl" href="xmlbsp1.xsl"?>

XML - main part:

<order><customer> Alice </customer> <PC> pc400 </PC>

</order>

version character set without DTD !

used stylesheet(only inside ie5)

start tag

element

text node

end tag

Page 9: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 9

XML syntax (2)In the XML main part: <offers>

<offer supplier=“vobis“ item=“pc500“ ></offer>

<offer supplier=“IBM“ item=“pc600“ / ></offers>

attribute attribute value end of tag (no text)

(arbitrarily) no text node

element

Page 10: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 10

XML syntax (3)all tags must be closed

(<tag> ... </tag> or <singleTag />)

incorrectly nested tags not allowed( <tag1> <tag2> ... </tag1> </tag2> )

case-sensitive ( <tag> different from <Tag> )

attribute values must be quoted ( e.g. <p align="center"> )

text must be enclosed in elements

Page 11: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 11

XML document as a tree<doc>

<customer name=“Alice“> <order> ...</order>

<address> </address>

</customer><customer>

<order/> <address/>

</customer></doc>

doc

customer customer

addressorder order address

name = “Alice“

Page 12: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 12

XML node types7 kinds of nodes:

root - has no parent node

element

text - leaf node (has no child node)attribute - leaf node (has no child node)

comment - leaf node (has no child node)name-space - leaf node (has no child node)processing-instruction - leaf node (has no child node)

Page 13: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 13

DTD and XML Schema

DTD ( the older standard ) : + defines the structure (nesting of tags) of the documents

<customer><order>

<item> …+ defines structural dependencies,

e.g. every order contains at least one item element

XML-Schema ( the newer standard ) additionally : + binds XML elements to types defined in the XML Schema+ defines domains+ defines integrity constraints

Page 14: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 14

Document-Type-Definition (DTD)<!-- DTD xmlbsp2d.dtd for example xmlbsp2d.mxl --><!ELEMENT orders ( order )* > <!ELEMENT order ( customer , PC ) ><!ELEMENT customer (#PCDATA) ><!ELEMENT PC (#PCDATA) >

<?xml version="1.0" encoding="iso-8859-1" standalone="no"?><!DOCTYPE orders SYSTEM "xmlbsp2d.dtd"><?xml-stylesheet type="text/xsl" href="xmlbsp2.xsl"?><orders>

<order><customer> Alice </customer><PC> pc400 </PC>

</order> <order> ... </order>

</orders>

parsed char data sequence required

arbitrary many

root element

Page 15: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 15

Element declarations in DTDs<!ELEMENT PC (#PCDATA) >

<!ELEMENT offer (EMPTY) >

<!ELEMENT supplies (offer) >

<!ELEMENT offers (offer)* >

<!ELEMENT order (customer,PC) >

<!ELEMENT payment (cash|card) >

<!ELEMENT E ((A|B)*,C,(D)?)+ >

text (no elements)

empty

1 sub-element

? 0 or 1 * arbitrary many+ al least 1

sub-element

sequence

choice

parenthesis

Page 16: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 16

Attribute Declarations in DTDs<!-- DTD xmlbsp2d.dtd for the example xmlbsp2d.xml --><!ELEMENT offers (offer)* > <!ELEMENT offer (EMPTY) ><!ATTLIST offer supplier CDATA #REQUIRED

item CDATA #REQUIRED >

<offers><offer supplier=“vobis“ item=“pc500“ ></offer><offer supplier=“IBM“ item=“pc600“ / >

</offers>

type(char data)

attribute must occur

arbitrary many

root element

empty

Page 17: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 17

Part 1 (XML) - summary

• XML : tree structure for content

• DTD : structure definition

• XML-Schema additionally: type checking and logical consistency checking

well documented standards

http://www.w3c.org

Page 18: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 18

Part 2: Search and Navigationin XML Documents

• DOM - Parser

• SAX - Parser

• the XML Path language XPath

Page 19: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 19

Axes in XML document trees (1)

doc

customer

addressorder

parent

ancestor

followingfollowing-sibling

PC

user manual

@nrattributedescendant

child

self

ancestor-or-self

descendant-or-self

Page 20: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 20

Axes in XML document trees (2)

doc

customer customer

addressorder

<doc><customer> … </customer><customer>

<name> … </name>

<order>

...</order>

<address> …</address>

</customer><customer> … </customer>

</doc>

name

customer

ancestor::

descendant::preceding:: following::

self::

Page 21: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 21

Axes in XML document trees (3)The following axes select for a given context node:

• child:: its child nodes• descendant:: its descendants (=children and their descendants)• parent:: the parent node (only root does not have a parent).• ancestor:: nodes on the path to the root (=parent and its anc's). • following-sibling:: siblings have identical parent , following in doc order

(empty for attribute and namespace nodes).• preceding-sibling:: inverse to following sibling

(empty for attribute and namespace nodes).• following:: all nodes following in doc order after context node

(excluding descendant-, attribute- & namespace-nodes). • preceding:: all nodes preceeding in doc order before context node

(excluding ancestor-, attribute- & namespace-nodes).• attribute:: its attributes (empty for each non-element node). • namespace:: its namespace-nodes

(empty for each non-element node).

Page 22: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 22

Axes in XML document trees (4)

More axes that can be used in XPath expressions:

the following axes select for a given context node:

• self:: the context node itself

• descendant-or-self:: the context node and its descendants

• ancestor-or-self:: the context node and its ancestors

Page 23: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 23

The Document Object Model (DOM)XML–Document as a tree in main memory:

+ the program can navigate arbitrarily from node to nodeeasy to program

- consumes much memory- long loading time until document is in main memory

doc

customer customer

addressorder order address

name = “Alice“

Page 24: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 24

DOM Parser Java API (1)DOMParser parser = new DOMParser(); // instantiate parsertry { parser.parse(uri); // parse text found at uri

Document doc = parser.getDocument(); // get document rootrecurseNodes(doc, …); // work on document

} catch (Exception e) { … }

public void recurseNodes(Node node, …) // recursively on all nodes{ … ; switch (node.getNodeType()) // depending on node type

{ case Node.DOCUMENT_NODE: … // if root node …case Node.ELEMENT_NODE: … // if element node …case Node.TEXT_NODE: … // if text node …

}}

Page 25: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 25

DOM-Parser-Java-API (2)public void recurseNodes(Node node, …) // recursively on all nodes{ …

String name = node.getNodeName(); // read element name…NodeList nodes = node.getChildNodes(); // collect all childrenfor (int i=0; i<nodes.getLength(); i++)

recurseNodes(nodes.item(i), ""); // call each child node…NamedNodeMap attributes = node.getAttributes(); // get attribute listfor (int i=0; i<attributes.getLength(); i++) {

Node current = attributes.item(i); // get 1 attributeSystem.out.print(" " + current.getNodeName() + // attribute name

"=\"" + current.getNodeValue() + // attribute value"\""); // quote attribute value

}

Page 26: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 26

Simple Access to XML (SAX)

Parser accesses at most one XML element node at a time: - can navigate and process nodes only in document order

less flexible programming than DOM+ needs less space in main memory+ loading document nodes into main memory is fast

doc

customer customer

addressorder order address

name = “Alice“

1.

2.

3.

4. 5. 6.7.

1. <doc>2. <customer name=“Alice“> 3. <order> 4. 5. 6. ...

</order>7. <address>

</address></customer>

8. <customer> 9. <order/> 10. <address/>

</customer></doc>

8.

9. 10.

Page 27: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 27

SAX-Parser-Java-API// Parser calls this procedure once, when parsing the document startspublic void startDocument() throws SAXException { … }

// SAX parser calls this once for each start tag of an elementpublic void startElement( String namespaceURI, String localName,

String qName, Attributes atts)throws SAXException { … // code example:

for(int i=0; i<atts.getLength(); i++) { // for each attributeout.println( atts.getQName(i) + "=\"" + atts.getValue(i)+"\"");

} // output attribute name and attribute value… }

// SAX parser calls this once for each end tag of an elementpublic void endElement( String namespaceURI, String localName,

String qName) throws SAXException { … }

// SAX parser calls this once when end of document is reachedpublic void endDocument() throws SAXException { … }

Page 28: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 28

Navigation along axes of an XML document

XML document

Axes: child-axis /child::doc/child::customer/child::order

/ doc / customer / orderattribute-axis

/child::doc/child::customer/attribute::name/ doc / customer / @ name

doc

customer customer

addressorder order address

name = “Alice“

Page 29: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 29

XML Path language XPath (1)/ root element

. current context node

/ child::doc / child:: customer absolute path (starting at root)

. / child::order / child::PC relative path (starting at current context node)

doc

customer

addressorder

name = “Alice“

PC

location steps

XPath expression

Page 30: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 30

XPath (2): Retrieval of XML data

XML document

XPath expression:/ child::doc / child::customer [attribute::name=“Alice“] / child::order

doc

customer customer

addressorder order address

name = “Alice“

filter expression

location step

Page 31: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 31

XPath (3) – Location stepsXPath-Location-Expression ::=

LocationStep1 / … / LocationStepN (relative path)| / LocationStep1 / … / LocationStepN (absolute path)

e.g. child::customer [attribute::name=“Alice“] / descendant::order

LocationStepI ::= Axis-Specifier ‘::‘ NodeTest ( ‘[‘ FilterExpression ‘]‘ ) *

examples (given in long form)

child::customer [attribute::name=“Alice“] parent:: * node test is always successful (*)descendant-or-self::addressancestor-or-self::* [descendant::customer]

Page 32: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 32

XPath (4): Node (name) tests

axis-specifier:: Ename selects only elements (or attributes) with the name Ename that arereachable from the context nodethrough the specified axis

axis-specifier:: * selects all elements (or attributes) that are reachable fromthe context nodethrough the specified axis

example:descendant-or-self:: customer selects all customer descendant nodes

of the current context node

Page 33: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 33

Summary XML-Navigation and XPath

• XML : tree structure of dataincludes axes

• DTD and XML-Schema : type checking, consistency checking

• DOM : XML parser - loads the completedocument into main memory

• SAX : XML parser - works in document order

• XPath : declarative path languagesupports qualified search using filters

documentation sources at http://www.w3c.org

Page 34: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 34

Part 3:

Transformation of XML Documentsusing XSLT

Page 35: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 35

XML- and XSL - examples (1)<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?><?xml-stylesheet type="text/xsl" href="xmlbsp1.xsl"?><order>

<customer>Alice</customer><PC>pc400</PC>

</order>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

</xsl:stylesheet> X S L

X ML

XML+XSL

Page 36: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 36

XSL default templates for elements, …

default template for elements and the root:

<xsl:template match="*|/"><xsl:apply-templates/>

</xsl:template>

transform inner nodes

default template for text nodes and attribute nodes:

<xsl:template match="text()|@*"><xsl:value-of select="."/>

</xsl:template>

shows text values and attribute values

Page 37: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 37

XML and XSL - examples (1a)<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>

<order><customer>Alice</customer><PC>pc400</PC>

</order>

<xsl:stylesheetversion="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="order/*">found a succesor node of order

</xsl:template> <!-- does not visit child nodes of visited nodes ! --></xsl:stylesheet>

X S L

X ML

XML+XSL

Page 38: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 38

XML and XSL - examples (1b)<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>

<order><customer>Alice</customer><PC>pc400</PC>

</order>

<xsl:stylesheetversion="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="*"> <!-- applicable to each element (Tag) -->node found<xsl:apply-templates/> <!-- continue with child nodes -->

</xsl:template> <!-- including text nodes (PCDATA) --> </xsl:stylesheet>

X S L

X ML

XML+XSL

Page 39: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 39

XML and XSL - example (1c)

<xsl:stylesheet ...><xsl:template match="*"> <!-- applicable to each node -->

node found: <xsl:value-of select="."/> <!-- show text included by current node --> its successor node: <xsl:apply-templates/> <!-- process child nodes too -->

</xsl:template> <!-- text (PCDATA) nodes are processed too --></xsl:stylesheet> X S L

XML+XSL

Page 40: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 40

XML and XSL - example (1e)<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?><?xml-stylesheet type="text/xsl" href="xmlbsp1e.xsl"?><order>

<customer>Alice</customer><PC>pc400</PC>

</order>

<?xml version="1.0" encoding="iso-8859-1"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"><xsl:template match="/">

<html> <body> customer an PC – in HTMLcustomer is <xsl:value-of select=„order/customer"/> PC is <xsl:value-of select=„order/PC"/>

</body> </html></xsl:template></xsl:stylesheet>

X S L

X ML

XML+XSL

Page 41: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 41

Node types & XSL default templates

default template for elements and the root:

<xsl:template match = "*|/"><xsl:apply-templates/>

</xsl:template>

transform inner nodes

default template for text nodes and attributes:

<xsl:template match="text()|@*"><xsl:value-of select="."/>

</xsl:template>

show text values and attribute values

Page 42: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 42

Node types and XSL default templates (2)

default template for comments and processing instructions :

<xsl:template match="comment()|processing-instruction()"></xsl:template>

do nothing with comments and processing instructions

default behaviour for namespace nodes

do not output namespace nodes

Page 43: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 43

HTML file and HTML output<html><body><table width="100%" border="1"><tr><td> customer : </td><td> PC : </td>

</tr>

<tr><td>Alice</td><td>pc500</td>

</tr><tr><td>Bob</td><td>pc600</td>

</tr>

</table></body>

</html>

Page 44: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 44

XSLT stylesheet and HTML outputfirst Template<html><body><table width="100%" border="1"><tr><td> customer : </td><td> PC : </td>

</tr>start here for every customer node

repeat<tr><td> Name of customer </td><td> PC of customer </td>

</tr>until here for every customer node

</table></body>

</html>separatetemplate

Page 45: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 45

XSLT stylesheet and HTML output<xsl:template match="/"><html><body><table width="100%" border="1"><tr><td> customer : </td><td> PC : </td>

</tr><xsl:apply-templates/> <!– work on inner nodes too -->

</table></body>

</html></xsl:template>

<xsl:template match="order"> <!-- for every order node --><tr> <td> <xsl:value-of select="customer"/> </td>

<td> <xsl:value-of select="PC"/> </td></tr>

</xsl:template> <!-- example program xmlbsp2.xsl -->

Page 46: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 46

Generating an SQL script file with XSLT<xsl:template match="/">

create table order( customer char(10) , PC char(10) ) ; <xsl:apply-templates/>

</xsl:template>

<xsl:template match="order">insert into order values( <xsl:value-of select="customer"/> ,

<xsl:value-of select="PC"/> ) ; </xsl:template>

example program: xmlbsp2sql.xsl

Page 47: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 47

Advantages of XML and XSL

Generate HTML on a web server: transform: data.xml + layout.xsl -> x.html

XML is• transformable by XSL files• transformable by application programs (e.g. written in Java) • compressable and storable as compact data (zip, …) • exchangable accross applications, devices, enterprises, …• can be combined with very many other languages (C#, ...)• ...

Page 48: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 48

More advantages of XSLdatabase XML WML or pdf or ps or HTML

database XML database

XML document 1 XML document 2

XML1

XML2

pdf

HTML

WML

other company

DB database query XSL

XSL

XSLXSL

DB

XSL

originaldatabase

other database

Page 49: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 49

Flat XML database mappingsflat database XML mapping :

database <database> … </database> table <table> … </table> row <row> … </row> attribute value <value> … </value>

advantage(s): + easy to understand+ easy to implement

database

table

row row

value value valuevalue

…… …

Page 50: 1. XML standards: XML , DTD , XML Schema 2. DOM , SAX ... · © Prof. Dr. Stefan Böttcher - – invited talk at UTS - 29/03/06 - XML Data Management / 4 Unique Standard for Content

© Prof. Dr. Stefan Böttcher - http://wwwcs.upb.de/cs/boettcher – invited talk at UTS - 29/03/06 - XML Data Management / 50

XSL stylesheet generates database script

create table order( … ) start here for every customer

insert into order values ( name of customer , order of customer,

address of customer ) ; until here for every customer

XML1DB database query

DB-Skript

XSL File:

„layout" and XPath expressionsselecting content fromthe XML file

generates

database scriptDB