1 XML DTD & XML Schema Monica Farrow G30 email : [email protected].

27
1 XML DTD & XML Schema Monica Farrow G30 email : [email protected]

Transcript of 1 XML DTD & XML Schema Monica Farrow G30 email : [email protected].

1

XML

DTD & XML Schema

Monica Farrow G30 email : [email protected]

2

A Complete XML Document

<?XML version ="1.0" encoding="UTF-8"> <!DOCTYPE addresses SYSTEM "http://www.addbook.com/addresses.dtd"><addresses>

<person ssno= “123 4589”> <name>Lisa Simpson</name><tel> 0131-828 1234 </tel><tel> 078-4701 7775 </tel><email> [email protected] </email>

</person></addresses>

Required

Optional

Link to document defining the XML elements

3

Defining the structure of an XML file

We can check if an XML file is well-formed by looking at it, maybe By loading it into a browser

If well-formed, it will be displayed

However, how can we check that the well-formed file contains the correct elements in the correct quantities? We need to write a specification for the XML

file

4

Defining the structure of an XML file

There are 2 main alternatives Document Type Definitions

Original and simple XML Schema

More versatile and complex

We will look at both Concentrating on XML Schema

5

Example: An Address Book

<person ssn = “4444”> <name> Homer Simpson </name><tel> 2543 </tel><tel> 2544 </tel><email> [email protected]

</email></person>

Up to 4 tel nos

Optionally one email

Exactly one nameAn attribute

One or more persons

6

DTD - Specifying the Structure

In a DTD, we can specify the permitted content for each element, using regular expressions Describes the pattern

For a person element, the regular expression is name, title?, tel*,email+

7

What’s in a person Element?

This means name = there must be a name element title? = there is an optional title element

(i.e., 0 or 1 title elements) name, title? = the name element is followed

by an optional title element

tel* = there are 0 or more tel elements

email+ = there are 1 or more email elements

8

DTD For the Address Book

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE addressbook [ <!ELEMENT addressbook (person*)> <!ELEMENT person (name, title?, tel*, email+)> <!ELEMENT name (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT tel (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ATTLIST person

ssn CDATA REQUIRED>

]>PCDATA means parsed character data

Regular expressions

9

Attributes in a DTD

XML elements can have attributes. General Syntax for DTD:

<!ATTLIST element-name attribute-name1 type1 default-value1

….attribute-namen typen default-valuen>

Example: <!ATTLIST person ssn CDATA REQUIRED>

CDATA means Character data Default value could be REQUIRED or IMPLIED

(meaning optional)

10

Connecting a Document with its DTD

A DTD can be internal (part of the document file)

<?xml version="1.0"?>

<!DOCTYPE db [<!ELEMENT ...> … ]><db> ... </db>

Or external (the DTD and the document are in different files) A DTD from the local file system:

<!DOCTYPE db SYSTEM "schema.dtd">

A DTD from a remote file system: <!DOCTYPE db SYSTEM "http://www.schemaauthority.com/schema.dtd">

11

Valid Documents

A document with a DTD is valid if it conforms to the DTD, i.e., the document conforms to the regular-

expression grammar, types of attributes are correct, and constraints on references are satisfied

12

DTDs Problems

DTDs are rather weak specifications by DB & programming-language standards

Some limitations: Only one base type – PCDATA Also no constraints, e.g range of values,

frequency of occurrence Not easily parsed (since they are not XML) Not easy to express that element a has

exactly the children c, d, e in any order

13

XML Schema

DTDs are now being superceded by XML schemas. They provide the following features

XML Syntax So can be parsed, validated with standard XML tools

Data types other than #PCDATA There are built in types such as integer, float, boolean,

string and many others Greater control over permitted constructs

Can specify maximum and minimum occurrences Can use regular expressions to set patterns to be

matched Support for modularity and inheritance

14

XML Schema continued

XML Schema are more precise and therefore more complicated than DTDs

They were designed to replace DTDs but DTDs are very well established, and simpler http://www.w3schools.com/schema

15

Schema types

There are some basic built-in types such as xs:string, xs:decimal, xs:integer, xs:ID

Each element is composed of either simple types or complex types. A complex type is often a sequence of elements

The content of the type can be declared as shown in the following example. A type can also be declared, named and referred to.

Notice the use of minOccurs and maxOccurs. Their default is 1.

16

Simple Schema Example

<?xml version="1.0" ?> <xs:schema xmlns:xs= "http://www.w3.org/2001/XMLSchema"><xs:element name="people"> <xs:complexType> <xs:sequence> <xs:element name="person" maxOccurs = "unbounded">

details of the person element -pto </xs:element> </xs:sequence> </xs:complexType>

</xs:element> </xs:schema>

standard stuff

Top-level element

Namespace

17

Namespaces

You’ll see namespaces when using XML schemas and stylesheets.

There is a namespace associated with the tags used in each that lets them be used unambiguously. e.g. a schema element, a chemical element

A namespace is identified by a short prefix e.g. xs A unique URL

18

Namespace declaration

So at the start of a document we must specify what namespaces we are using.

In the schema example, we are using the XML schema namespace with the xs prefix

We declare this namespace in an attribute in the top-level element<xs:schema xmlns:xs=

"http://www.w3.org/2001/XMLSchema">

We then use the xs prefix in all the XML Schema elements e.g. complexType, sequence, element etc

19

Schema Example Continued

Details of the person element<xs:element name="person"

maxOccurs="unbounded"> <xs:complexType>

<xs:sequence> <xs:element name ="name" type="xs:string"/> <xs:element name = "tel" type="xs:string" /> <xs:element name = "email" type="xs:string"

minOccurs="0" maxOccurs="1"/> </xs:sequence>

<xs:attribute name= "sssNo" type="xs:integer" use="required"/>

</xs:complexType></xs:element> A person is a complex

type which is a sequence of elements and an attribute

Empty element

20

Exercise 1

Create a schema for the holiday house example. Each home has an id, a name and a location Additionally, each home has between one

and three sets of contact details. Contact details consist of a name and a phone number, and optionally an email address and website.

21

Restrictions on elements

You can also restrict the values of the data in a range

<xs:minInclusive value="0"/> <xs:maxInclusive value="120"/>

an enumerated list <xs:enumeration value="Audi"/>

<xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/>

a pattern <xs:pattern value="([a-z])*"/>

Means 0 or more lowercase alphabetic chars

22

Declaring your own types

Named types can be used for elements or attributes. Here’s an example which specifies restrictions on the attribute A named type is declared

<xs:simpleType name = "ssstype"> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> </xs:restriction></xs:simpleType>

And used as the attribute type <xs:attribute name= "sssNo"

type="ssstype" use="required"/>

23

More complex Schemas

The previous example shows a simple schema. It is also possible to make the schema easier to

maintain by declaring all the simple elements first and

then referring to them in the body of the document

By naming the declaration of simple and complex types, which could then be used later in the document, and more than once if necessary

See http://www.w3schools.com/Schema/schema_example.asp if you are interested

24

Referring to a schema

Save your schema in a file with the extension xsd.

Linking schema definition with a document is done using a special attribute of the root node of the document:<people

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:noNamespaceSchemaLocation=“people.xsd">

25

Validating

Validators http://www.w3.org/2001/03/webdata/xsv

I don’t seem to be able to revalidate with the same filenames

http://tools.decisionsoft.com/schemaValidate/

No problems, nicer layout Others also on the web

26

XML: Summary

XML lets you choose application specific element names and define special purpose document types.

Need document type definition or schema to define allowed markup.

What can we do with our valid document? – next 2 lectures

27

Exercises 2

Alter the schema given in the lecture notes so that there must be between 1 and 4 tel numbers which must be in the range 1000 – 9999 Create a simple type for tel