1
XML, DTD, XML Schema, and XSLT
Jianguo LuUniversity of Windsor
2
Where we are
• XML • DTD• XML Schema• XML Namespace• XPath• DOM Tree• XSLT
3
Name Conflict
• Solution: add prefix to the tag names
<table><tr><td>Apples</td><td>Bananas</td></tr>
</table>
<table><name>African Coffee
Table</name><width>80</width><length>120</length>
</table>
<h:table><h:tr><h:td>Apples</
h:td><h:td>Bananas</
h:td></h:tr>
</h:table>
<f:table><f:name>African Coffee
Table </f:name>
<f:width>80</f:width>
<f:length>120</f:length></f:table>
4
Name spaces
table
tr
html
body th
td
table
price
name
length
width
HTML name space Furniture name space
height
5
XML namespace
• An XML document may use more than one schema;• Since each structuring document was developed
independently, name clashes may appear;• The solution is to use a different prefix for each schema
– prefix:name
<prod:product xmlns:prod=http://example.org/prod> <prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size></prod:product>
6
Namespace names
• Namespace names are URIs– Many namespace names are in the form of HTTP URI.
• The purpose of a name space is not to point to a location where a resource resides. – It is intended to provide a unique name that can be associated
with a particular organization. – The URI MAY point to a schema.
<prod:product xmlns:prod=http://example.org/prod> <prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size></prod:product>
7
Namespace declaration
• A namespace is declared using an attribute starts with “xmlns”.
• You can declare multiple namespaces in one instance.
<ord:order xmlns:ord=“http://example.org/ord” xmlns:prod=“http://example.org/prod” ><ord:number> 123ABC123</ord:number>
<prod:product> <prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size> </prod:product></ord:order>
8
Default namespace declaration
• Default namespace maps unprefixed element type name to a namespace.
<order xmlns=“http://example.org/ord” xmlns:prod=“http://example.org/prod” ><number> 123ABC123 </number><prod:product>
<prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size> </prod:product></order>
9
Scope of namespace declaration
• Namespace declaration can appear in any start tag.• The scope is in the element where it is declared.
<order xmlns=“http://example.org/ord”><number> 123ABC123 </number><prod:product xmlns:prod=“http://example.org/prod”>
<prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size> </prod:product></order>
10
<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/></xsd:schema>
The elements anddatatypes thatare used to constructschemas - schema - element - complexType - sequence - stringcome from the http://…/XMLSchemanamespace
From Costello
Indicates that theelements definedby this schema - BookStore - Book - Title - Author - Date - ISBN - Publisherare to go in thehttp://www.books.orgnamespace
11
<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/></xsd:schema>
This is referencing a Book element declaration.The Book in whatnamespace? Since thereis no namespace qualifierit is referencing the Bookelement in the defaultnamespace, which is thetargetNamespace! Thus,this is a reference to theBook element declarationin this schema.
The default namespace ishttp://www.books.orgwhich is the targetNamespace!
From Costello
12
Import in XML Schema• Now with the understanding of namespace, we can introduce
some more advanced features in XML Schema.• The import element allows you to access elements and types in
a different namespace.
<xsd:schema …> <xsd:import namespace="A" schemaLocation="A.xsd"/> <xsd:import namespace="B" schemaLocation="B.xsd"/> …</xsd:schema>
NamespaceA
A.xsd
NamespaceB
B.xsd
C.xsd
13
Example
Camera.xsd
Nikon.xsd Olympus.xsd Pentax.xsd
From Costello
14
<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.nikon.com" xmlns="http://www.nikon.com" elementFormDefault="qualified"> <xsd:complexType name="body_type"> <xsd:sequence> <xsd:element name="description" type="xsd:string"/> </xsd:sequence> </xsd:complexType></xsd:schema>
Nikon.xsd
<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.olympus.com" xmlns="http://www.olympus.com" elementFormDefault="qualified"> <xsd:complexType name="lens_type"> <xsd:sequence> <xsd:element name="zoom" type="xsd:string"/> <xsd:element name="f-stop" type="xsd:string"/> </xsd:sequence> </xsd:complexType></xsd:schema>
Olympus.xsd
<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.pentax.com" xmlns="http://www.pentax.com" elementFormDefault="qualified"> <xsd:complexType name="manual_adapter_type"> <xsd:sequence> <xsd:element name="speed" type="xsd:string"/> </xsd:sequence> </xsd:complexType></xsd:schema>
Pentax.xsd
From Costello
15
<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.camera.org" xmlns:nikon="http://www.nikon.com" xmlns:olympus="http://www.olympus.com" xmlns:pentax="http://www.pentax.com" elementFormDefault="qualified"> <xsd:import namespace="http://www.nikon.com" schemaLocation="Nikon.xsd"/> <xsd:import namespace="http://www.olympus.com" schemaLocation="Olympus.xsd"/> <xsd:import namespace="http://www.pentax.com" schemaLocation="Pentax.xsd"/> <xsd:element name="camera"> <xsd:complexType> <xsd:sequence> <xsd:element name="body" type="nikon:body_type"/> <xsd:element name="lens" type="olympus:lens_type"/> <xsd:element name="manual_adapter“ type="pentax:manual_adapter_type"/> </xsd:sequence> </xsd:complexType> </xsd:element><xsd:schema>
Camera.xsd
Here I amusing thebody_typethat isdefinedin the Nikonnamespace
From Costello
16
<?xml version="1.0"?><c:camera xmlns:c="http://www.camera.org" xmlns:nikon="http://www.nikon.com" xmlns:olympus="http://www.olympus.com" xmlns:pentax=http://www.pentax.com… … <c:body> <nikon:description>Ergonomically designed casing for easy handling </nikon:description> </c:body> <c:lens> <olympus:zoom>300mm</olympus:zoom> <olympus:f-stop>1.2</olympus:f-stop> </c:lens> <c:manual_adapter> <pentax:speed>1/10,000 sec to 100 sec</pentax:speed> </c:manual_adapter></c:camera>
Camera.xml
From Costello
<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.nikon.com" xmlns="http://www.nikon.com" elementFormDefault="qualified"> <xsd:complexType name="body_type"> <xsd:sequence> <xsd:element name="description" type="xsd:string"/> </xsd:sequence> </xsd:complexType></xsd:schema>
<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.olympus.com" xmlns="http://www.olympus.com" elementFormDefault="qualified"> <xsd:complexType name="lens_type"> <xsd:sequence> <xsd:element name="zoom" type="xsd:string"/> <xsd:element name="f-stop" type="xsd:string"/> </xsd:sequence> </xsd:complexType></xsd:schema>
17
Include• The include element allows you to access components in other schemas
– All the schemas you include must have the same namespace as your schema (i.e., the schema that is doing the include)
– The net effect of include is as though you had typed all the definitions directly into the containing schema
<xsd:schema …> <xsd:include schemaLocation="LibraryBook.xsd"/> <xsd:include schemaLocation="LibraryEmployee.xsd"/> …</xsd:schema>
LibraryBook.xsd LibraryEmployee.xsd
Library.xsdFrom Costello
18
<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.library.org" xmlns="http://www.library.org" elementFormDefault="qualified"> <xsd:include schemaLocation="LibraryBook.xsd"/> <xsd:include schemaLocation="LibraryEmployee.xsd"/> <xsd:element name="Library"> <xsd:complexType> <xsd:sequence> <xsd:element name="Books"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Employees"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Employee" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element></xsd:schema>
Library.xsd
These arereferencingelementdeclarationsin otherschemas.
From Costello
19
XML Path
•XML •DTD•XML Schema•XML Namespace•XPath•DOM Tree•XSLT
20
XPath
• Language for addressing parts of an XML document. – It operates on the tree data model of XML
• XPath is a syntax for defining parts of an XML document • XPath uses paths to define XML elements
– It has a non-XML syntax
• XPath defines a library of standard functions – Such as arithmetic expressions.
• XPath is a major element in XSLT and XML query languages
• XPath is a W3C Standard
21
What is XPath
• Like traditional file paths• XPath uses path expressions to identify nodes in an XML
document. These path expressions look very much like the expressions you see when you work with a computer file system:– public_html/569/xml.ppt – Books/book/author/name/FirstName
• Absolute path– /library/author/book
• Relative path– author/book
22
XML path example<library location="Bremen">
<author name="Henry Wise"><book title="Artificial Intelligence"/><book title="Modern Web Services"/><book title="Theory of Computation"/></author><author name="William Smart"><book title="Artificial Intelligence"/></author><author name="Cynthia Singleton"><book title="The Semantic Web"/><book title="Browser Technology Revised"/></author>
</library>
/library
/library/author
//author
/library/@location
//book[@title=“Artificial Intelligence”]
23
XML Path Example
• Address all author elements– /library/author– Addresses all author elements that are children of the library
element node, which resides immediately below the root– /t1/.../tn, where each ti+1 is a child node of ti, is a path through
the tree representation
• Address all author elements– //author– Here // says that we should consider all elements in the
document and check whether they are of type author– This path expression addresses all author elements anywhere in
the document
24
XPath example
• Select the location attribute nodes within library element nodes – /library/@location– The symbol @ is used to denote attribute nodes
• Select all title attribute nodes within book elements anywhere in the document, which have the value “Artificial Intelligence” – //book/@title="Artificial Intelligence“
• Select all books with title “Artificial Intelligence”– /library/author/book[@title="Artificial Intelligence"] – Test within square brackets: a filter expression
• It restricts the set of addressed nodes.– Difference with previous query.
• This query addresses book elements, the title of which satisfies a certain condition.
• Previous query collects title attribute nodes of book elements
25
root
library
author author author author
name book book book name book name
title
Artificial Intelligence
title title
Artificial intelligence
title
…
Henry
…
26
XPath syntax• A path expression consists of a series of steps, separated by slashes • A step consists of
– An axis specifier, – A node test, and – An optional predicate
• An axis specifier determines the tree relationship between the nodes to be addressed and the context node
– E.g. parent, ancestor, child (the default), sibling, attribute node– // is such an axis specifier: descendant or self– child::book select all book elements that are children of current node
• A node test specifies which nodes to address – The most common node tests are element names
• /library/author– E.g., * addresses all element nodes
• /library/*– comment() selects all comment nodes
• /library/commnets()
27
XPath syntax
• Predicates (or filter expressions) are optional and are used to refine the set of addressed nodes– E.g., the expression [1] selects the first node– [position()=last()] selects the last node– [position() mod 2 =0] selects the even nodes
• XPath has a more complicated full syntax. – We have only presented the abbreviated syntax
28
More examples
• Address the first author element node in the XML document– //author[1]
• Address the last book element within the first author element node in the document– //author[1]/book[last()]
• Address all book element nodes without a title attribute– //book[not @title]
29
Where we are
• XML • DTD• XML Schema• XML Namespace• XPath• DOM Tree• XSLT
30
How to process XML
• XML does not DO anything• Process XML using general purpose languages
– Java, Perl, C++ …– DOM is the basis
• Process XML using special purpose languages– “translate the stock XML file to an HTML table.”
• Transform the XML: XSLT– “tell me the stocks that are higher that 100.”
• Query XML: XQuery
31
DOM (Document Object Model)
• What: DOM is application programming interface (API) for processing XML documents– http://www.w3c.org/DOM/
• Why: – unique interface. – Platform and language independence.
• How: It defines the logical structure of documents and the way to access and manipulate it– With the Document Object Model, one can
• Create an object tree• Navigate its structure• Access, add, modify, or delete elements etc
32
XML tree hierarchy
• XML can be described by a tree hierarchy
DocumentUnit
Sub-unit Document
Unit
Sub-unit
Parent
Child
Sibling
33
DOM tree model
• Generic tree model– Node
• Type, name, value• Attributes• Parent node• Previous, next sibling nodes• First, last child nodes
– Many other entities extends node
• Document• Element• Attribute• ... ...
Node
Parent
Prev. Sibling Next Sibling
First Child Last Child
34
DOM class hierarchyDocumentFragment
Document
CharacterDataText
Comment
CDATASection
Attr
Element
DocumentType
Notation
Entity
EntityReference
ProcessingInstruction
Node
NodeList
NamedNodeMap
DocumentType
35
JavaDoc of DOM API http://xml.apache.org/xerces-j/apiDocs/index.html
36
Remarks on javadoc• javadoc is a command included in JDK;• It is a useful tool generate HTML description for your programs, so
that you can use a browser to look at the description of the classes;• JavaDoc describes classes, their relationships, methods, attributes,
and comments.• When you write java programs, the JavaDoc is the first place that you
should look at:– For core java, there is JavaDoc to describe every class in the language; – To know how to use DOM, look at the javaDoc of org.w3c.dom package.
• If you are a serious java programmer: – you should have the core jdk javaDoc ready on your hard disk;– You should generate the javaDoc for other people to look at.
• To run javadoc, type D>javadoc *.javaThis is to generate JavaDoc for all the classes under current directory.
37
Methods in Node interface
• Three categories of methods– Node characteristics
• name, type, value– Contextual location and access to relatives
• parents, siblings, children, ancestors, descendants– Node modification
• Edit, delete, re-arrange child nodes
38
XML parser and DOM
• When you parse an XML document with a DOM parser, you get back a tree structure that contains all of the elements of your document;
• DOM also provides a variety of functions you can use to examine the contents and structure of the document.
DOM APIDOM
XML parser
DOM Tree
Your XML application
39
DOM tree and DOM classes<stocks>
<stock Exchange=“nyse” >
<name> <price>
IBM 105
<stock exchange=“nasdaq”>
<name> <symbol> <price>
Amazon inc amzn 15.45
ElementgetAttribute(String)getTagName()
Node
getFistChild()getParentNode()getNextSibling()
Document
getElementsByTagName()
Attr
getName()getValue()
Element
child
Node
TextNode
40
Use Java to process XML• Tasks:
– How to construct the DOM tree from an XML text file?– How to get the list of stock elements?– How to get the attribute value of the second stock element?
• Construct the Document object:– Need to use an XML parser (XML4J);– Remember to import the necessary packages;– The benefits of DOM: the following lines are the only
difference if you use another DOM XML parser.
41
Get the first stock element<?xml version="1.0" ?> <stocks> <stock exchange="nasdaq"> <name>amazon corp</name> <symbol>amzn</symbol> <price>16</price> </stock> <stock exchange="nyse"> <name>IBM inc</name> <price>102</price> </stock> </stocks>
42
Navigate to the next sibling of the first stock element
<?xml version="1.0" ?> <stocks> <stock exchange="nasdaq"> <name>amazon corp</name> <symbol>amzn</symbol> <price>16</price> </stock> <stock exchange="nyse"> <name>IBM inc</name> <price>102</price> </stock> </stocks>
43
Be aware the Text object in two elements
<stocks>
<stock Exchange=“nyse” >
<name> <price>
IBM inc 102
<stock exchange=“nasdaq”>
<name> <symbol> <price>
Amazon inc amzn 16
<?xml version="1.0" ?> <stocks> <stock exchange="nasdaq"> <name>amazon corp</name> <symbol>amzn</symbol> <price>16</price> </stock> <stock exchange="nyse"> <name>IBM inc</name> <price>102</price> </stock> </stocks>
text
texttext text text text text text
texttext
Question: How many children does the stocks node have?
44
Remarks on XML parsers
• There are several different ways to categorise parsers:– Validating versus non-validating parsers;
• It takes a significant amount of effort for an XML parser to process a DTD and make sure that every element in an XML document follows the rules of the DTD;
• If only want to find tags and extract information - use non-validating;• Validating or non-validating can be turned on or off in parsers.
– Parsers that support the Document Object Model (DOM); – Parsers that support the Simple API for XML (SAX) ;– Parsers written in a particular language (Java, C++, Perl, etc.).
45
Where we are
• XML • DTD• XML Schema• XML Namespace• XPath• DOM Tree• XSLT
46
History
XSL
XSLXSLT
XPath
XLink/XPointer
XQuery
XMLSchemas
(high-precision graphics, e.g., PDF)
(low-precision graphics, e.g.,HTML,text, XML)
47
XSLT(XML Stylesheet Language Transformation)• XSLT Version 1.0 is a W3C Recommendation,
1999
• XSLT is used to transform XML to other formats
XSLT 1
XSLT 2
XSLT 3
XML
TEXT
HTML
XML
48
XSLT basics• XSLT is an XML document itself • It is a tree transformation language
XML
XSLT
XSLT processor
• It is a rule-based declarative language– XSLT program consists of a sequence of rules. – It is a functional programming language.
49
XSLT Example: transform to another XML<?xml version="1.0" ?> <stocks> <stock exchange="nasdaq"> <name>amazon corp
</name> <symbol>amzn</symbol> <price>16</price> </stock> <stock exchange="nyse"> <name>IBM inc</name> <price>102</price> </stock> </stocks>
<?xml version="1.0“><companies> <company>
<value>24 CAD </value> <name>amazon corp</name> </company> <company>
<value>153 CAD </value> <name>IBM inc</name>
</company></companies>
?
stock.xml output
• Rename the element names• Remove the attribute and the symbol element• Change the order between name and price. • Change the US dollar to CAD.
50
A most simple XSLT
51
Template definition and call
52
If statement
53
XSLT rule: <xsl:template>
<xsl:template match="stock"><company><value> <xsl:value-of select="price*1.5"/> CAD</value><name> <xsl:value-of select="name"/></name><company></xsl:template>
<?xml version="1.0" ?> <stocks> <stock
exchange="nasdaq"> <name>amazon corp
</name> <symbol>amzn</symbol> <price>16</price> </stock> <stock exchange="nyse"> <name>IBM inc</name> <price>102</price> </stock> </stocks>
<company><value> get the value of <price>* 1.5, i.e. 24 CAD</value><name> get the value of <name>, i.e amazon</name></company>
xslt template for <stock>
stock.xml
Part of the output
54
apply template 1 to <stocks>
XSLT process model
<companies> apply template 2 to <stock> 1
apply template 2 to <stock> 2
</companies>
<company> value> get the value of <price>*1.5,i.e., 153 CAD </value><name> get the value of <name>, i.e., IBM</name></company>
<company> <value> get the value of <price>*1.5,i.e. 24 CAD </value><name> get the value of <name>, i.e amazon</name></company>
<xsl:template match="/"> <companies><xsl:apply-templates
select="stocks/stock”/></companies></xsl:template>
<xsl:template match="stock">
<company> <value> <xsl:value-of
select="price*1.5"/> CAD</value> <name> <xsl:value-of
select="name"/></name></company></xsl:template>
toXML.xsl
xslt output
55
Transforming XML to HTML
toHTML.xsl
56
Running XSLT from the client side • Browser gets the XML+XSLT, and interprets them inside the
browser.
• How to specify the XSL associated with the XML file?– <?xml-stylesheet type="text/xsl" href="stock.xsl"?>
• Advantages:– Easy to develop and deploy.
• Disadvantages: – Not every browser supports XML+XSL;– Browsers do not support all XSLT features;– Not secure: you only want to show part of the XML data;– Not efficient.
Web server
57
Run XSLT from the server side
• XSL processor transforms the XML and XSLT to HTML, and the web server send the HTML to the browser.
• Popular tool: xalan
java -classpath xalan/bin/xalan.jar org.apache.xalan.xslt.Process -in stock.xml -xsl stock.xsl -out stock.html
Web server
HTML
XSL Processor
HTML
58
Why XML is useful
• Data exchange
• Data integration
59
Why XML is useful(cont.)
• Present to different devices
60
XML references
• For XML and related specifications: www.w3c.org• For Java support for XML, like XML parser, XSLT processor:
www.apache.org• For xml technologies: www.xml.com • XML integrated development environment: www.xmlspy.com
Top Related