2
A Complete XML Document
<?XML version ="1.0" encoding="UTF-8"> <!DOCTYPE addresses SYSTEM "http://www.addbook.com/addresses.dtd"><addresses>
<person ssno= “123 4589”> <name>Lisa Simpson</name><tel> 0131-828 1234 </tel><tel> 078-4701 7775 </tel><email> [email protected] </email>
</person></addresses>
Required
Optional
Link to document defining the XML elements
3
Defining the structure of an XML file
We can check if an XML file is well-formed by looking at it, maybe By loading it into a browser
If well-formed, it will be displayed
However, how can we check that the well-formed file contains the correct elements in the correct quantities? We need to write a specification for the XML
file
4
Defining the structure of an XML file
There are 2 main alternatives Document Type Definitions
Original and simple XML Schema
More versatile and complex
We will look at both Concentrating on XML Schema
5
Example: An Address Book
<person ssn = “4444”> <name> Homer Simpson </name><tel> 2543 </tel><tel> 2544 </tel><email> [email protected]
</email></person>
Up to 4 tel nos
Optionally one email
Exactly one nameAn attribute
One or more persons
6
DTD - Specifying the Structure
In a DTD, we can specify the permitted content for each element, using regular expressions Describes the pattern
For a person element, the regular expression is name, title?, tel*,email+
7
What’s in a person Element?
This means name = there must be a name element title? = there is an optional title element
(i.e., 0 or 1 title elements) name, title? = the name element is followed
by an optional title element
tel* = there are 0 or more tel elements
email+ = there are 1 or more email elements
8
DTD For the Address Book
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE addressbook [ <!ELEMENT addressbook (person*)> <!ELEMENT person (name, title?, tel*, email+)> <!ELEMENT name (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT tel (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ATTLIST person
ssn CDATA REQUIRED>
]>PCDATA means parsed character data
Regular expressions
9
Attributes in a DTD
XML elements can have attributes. General Syntax for DTD:
<!ATTLIST element-name attribute-name1 type1 default-value1
….attribute-namen typen default-valuen>
Example: <!ATTLIST person ssn CDATA REQUIRED>
CDATA means Character data Default value could be REQUIRED or IMPLIED
(meaning optional)
10
Connecting a Document with its DTD
A DTD can be internal (part of the document file)
<?xml version="1.0"?>
<!DOCTYPE db [<!ELEMENT ...> … ]><db> ... </db>
Or external (the DTD and the document are in different files) A DTD from the local file system:
<!DOCTYPE db SYSTEM "schema.dtd">
A DTD from a remote file system: <!DOCTYPE db SYSTEM "http://www.schemaauthority.com/schema.dtd">
11
Valid Documents
A document with a DTD is valid if it conforms to the DTD, i.e., the document conforms to the regular-
expression grammar, types of attributes are correct, and constraints on references are satisfied
12
DTDs Problems
DTDs are rather weak specifications by DB & programming-language standards
Some limitations: Only one base type – PCDATA Also no constraints, e.g range of values,
frequency of occurrence Not easily parsed (since they are not XML) Not easy to express that element a has
exactly the children c, d, e in any order
13
XML Schema
DTDs are now being superceded by XML schemas. They provide the following features
XML Syntax So can be parsed, validated with standard XML tools
Data types other than #PCDATA There are built in types such as integer, float, boolean,
string and many others Greater control over permitted constructs
Can specify maximum and minimum occurrences Can use regular expressions to set patterns to be
matched Support for modularity and inheritance
14
XML Schema continued
XML Schema are more precise and therefore more complicated than DTDs
They were designed to replace DTDs but DTDs are very well established, and simpler http://www.w3schools.com/schema
15
Schema types
There are some basic built-in types such as xs:string, xs:decimal, xs:integer, xs:ID
Each element is composed of either simple types or complex types. A complex type is often a sequence of elements
The content of the type can be declared as shown in the following example. A type can also be declared, named and referred to.
Notice the use of minOccurs and maxOccurs. Their default is 1.
16
Simple Schema Example
<?xml version="1.0" ?> <xs:schema xmlns:xs= "http://www.w3.org/2001/XMLSchema"><xs:element name="people"> <xs:complexType> <xs:sequence> <xs:element name="person" maxOccurs = "unbounded">
details of the person element -pto </xs:element> </xs:sequence> </xs:complexType>
</xs:element> </xs:schema>
standard stuff
Top-level element
Namespace
17
Namespaces
You’ll see namespaces when using XML schemas and stylesheets.
There is a namespace associated with the tags used in each that lets them be used unambiguously. e.g. a schema element, a chemical element
A namespace is identified by a short prefix e.g. xs A unique URL
18
Namespace declaration
So at the start of a document we must specify what namespaces we are using.
In the schema example, we are using the XML schema namespace with the xs prefix
We declare this namespace in an attribute in the top-level element<xs:schema xmlns:xs=
"http://www.w3.org/2001/XMLSchema">
We then use the xs prefix in all the XML Schema elements e.g. complexType, sequence, element etc
19
Schema Example Continued
Details of the person element<xs:element name="person"
maxOccurs="unbounded"> <xs:complexType>
<xs:sequence> <xs:element name ="name" type="xs:string"/> <xs:element name = "tel" type="xs:string" /> <xs:element name = "email" type="xs:string"
minOccurs="0" maxOccurs="1"/> </xs:sequence>
<xs:attribute name= "sssNo" type="xs:integer" use="required"/>
</xs:complexType></xs:element> A person is a complex
type which is a sequence of elements and an attribute
Empty element
20
Exercise 1
Create a schema for the holiday house example. Each home has an id, a name and a location Additionally, each home has between one
and three sets of contact details. Contact details consist of a name and a phone number, and optionally an email address and website.
21
Restrictions on elements
You can also restrict the values of the data in a range
<xs:minInclusive value="0"/> <xs:maxInclusive value="120"/>
an enumerated list <xs:enumeration value="Audi"/>
<xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/>
a pattern <xs:pattern value="([a-z])*"/>
Means 0 or more lowercase alphabetic chars
22
Declaring your own types
Named types can be used for elements or attributes. Here’s an example which specifies restrictions on the attribute A named type is declared
<xs:simpleType name = "ssstype"> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> </xs:restriction></xs:simpleType>
And used as the attribute type <xs:attribute name= "sssNo"
type="ssstype" use="required"/>
23
More complex Schemas
The previous example shows a simple schema. It is also possible to make the schema easier to
maintain by declaring all the simple elements first and
then referring to them in the body of the document
By naming the declaration of simple and complex types, which could then be used later in the document, and more than once if necessary
See http://www.w3schools.com/Schema/schema_example.asp if you are interested
24
Referring to a schema
Save your schema in a file with the extension xsd.
Linking schema definition with a document is done using a special attribute of the root node of the document:<people
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation=“people.xsd">
25
Validating
Validators http://www.w3.org/2001/03/webdata/xsv
I don’t seem to be able to revalidate with the same filenames
http://tools.decisionsoft.com/schemaValidate/
No problems, nicer layout Others also on the web
26
XML: Summary
XML lets you choose application specific element names and define special purpose document types.
Need document type definition or schema to define allowed markup.
What can we do with our valid document? – next 2 lectures
Top Related