1 e-Science e-Business e-Government and their Technologies XML Schema Bryan Carpenter, Geoffrey Fox,...

11

e-Science e-Business e-Government and their

TechnologiesXML Schema

Bryan Carpenter, Geoffrey Fox, Marlon PiercePervasive Technology Laboratories

Indiana University Bloomington IN 47404January 12 2004

[email protected]@indiana.edu

[email protected] http://www.grid2004.org/spring2004

22

Introduction We saw that DTDs provide an approach to validating

XML documents: ensuring they have the structure expected for a particular application.

With the increasing use of XML for data-centric applications—e.g. XML formats for messages exchanged by Web Services—limitations of DTDs (which were inherited from SGML) soon became apparent.

XML Schema is a more recent validation framework for XML, which attempts to address the shortcomings of DTDs for data-centric applications, for example by providing a much richer set of data types.

33

Problems with DTDs DTDs have some clear limitations:

• Restricted set of data types: attribute data is either general character data, name tokens, ID or IDREF (or arcane cases); element content is either general character data or nested elements or some mixture.

For data-centric applications, we might want a value to be a well-formed number, date, etc, etc.

• DTDs are not convenient for dealing with XML Namespaces—essential for modularity on the Web.

• The uniqueness and consistency requirements associated with ID, IDREF are powerful, but could be much more refined.

• There are various obscure constraints on element content specifications, needed purely for historical SGML compatibility.

44

XML Schema XML Schema address all the issues mentioned on the

previous slide.• Also have the interesting property that an XML Schema is

itself a well-formed XML document—some people consider this a significant advantage.

This is the good news. The less good news is that the XML Schema 1.0 specification is longer by almost an order of magnitude than the basic XML specification—DTDs and all.

55

General ComparisonDTDs XML Schema

A DTD defines all elements, etc, in one type of document.

A schema defines all elements, etc, in a single namespace.

For documents with multiple namespaces, somehow patch together one large DTD.

For documents with multiple namespaces, use multiple schemas.

Directly define structures of named elements.

Define structures of complex types of element; then declare named element of that type.

Limited built-in data types for attributes.

Extensive built-in simple types for attributes and element content

Complex entity substitution mechanism.

No entity substitution mechanism.

66

Reading Material The XML Schema Specification itself comes in parts 0,

1, and 2. Parts 1 and 2 are long and tough to read, but part 0 is a reasonable (“non-normative”) introduction: XML Schema Part 0: Primer, May 2001. http://www.w3.org/TR/xmlschema-0/

There are some good and bad books. A good one is: Definitive XML Schema, Priscilla Walmsley, Prentice Hall, 2002.

There is a comprehensive (but again rather long) tutorial introduction to XML Schema by Roger Costello at: http://www.xfront.com/

77

“Report” Format Revisited When discussing DTDs we described a simple “report” format.

Here is a slightly expanded version of the DTD given there:

<!DOCTYPE report [

<!ELEMENT report

(title, (paragraph | figure)*, bibliography?) >

<!ELEMENT title (#PCDATA)>

<!ELEMENT paragraph (#PCDATA)>

<!ELEMENT figure EMPTY>

<!ATTLIST figure source CDATA #REQUIRED >

<!ELEMENT bibliography (reference)* > … ] >

We begin our detailed discussion of schema by considering how to give an equivalent XML Schema for this document.

88

Declaring a paragraph Element The report schema is surprisingly long: we will build

up to it in several incremental steps. First consider the paragraph element.

Using DTD, we declared this element by: <!ELEMENT paragraph (#PCDATA)>

An equivalent declaration in XML schema might be:

<xsd:element name="paragraph" type="xsd:string"/>

• xsd:element is itself an element in the XML Schema namespace; this example assumes we use xsd as the prefix for that namespace.

• xsd:type is a predefined type in that namespace.

99

xsd:string Primitive Type XML Schema has a complex system of types.

Different types may describe:1. the allowed values of attributes,

2. the allowed content of elements, or

3. the allowed content and the allowed attributes of elements. There is a subset of types, called the simple types, that

can be used in either of the first two roles. One of the simplest of all is string. Used as an

attribute type, this is equivalent to the DTD type CDATA; used as an element type, this is equivalent to the DTD content specification (PCDATA).

1010

Declaring a report Element We initially simplify to a schema in which a report

consists only of a series of paragraphs. In DTD a possible declaration of the root element would be: <!ELEMENT report (paragraph)*>

An equivalent declaration in XML schema might be: <xsd:element name=“report"> <xsd:complexType> <xsd:sequence> <xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/>

</xsd:sequence> </xsd:complexType> </xsd:element>

1111

Elements with Complex Type This rather verbose declaration says:

• The element named report has complex type.

• The content associated with this complex type is a sequence of elements.

• This sequence consists of at least 0 and at most an unbounded number of occurrences of paragraph elements.

Here the xsd:element element has different roles:• Outermost xsd:element declares the element named report.• Innermost xsd:element uses the element named paragraph,

declared elsewhere.

The role is determined by the presence or absence of the ref attribute.

1212

Local Declarations In fact xsd:element can have in a third role, which is

considered to be a combined declaration and use, e.g.:

<xsd:element name= "report"> <xsd:complexType> <xsd:sequence> <xsd:element name="paragraph" type="xsd:string“ minOccurs="0" maxOccurs="unbounded"/>

</xsd:sequence> </xsd:complexType> </xsd:element>

• Here the report element has its own local declaration of paragraph; no separate global declaration is necessary.

1313

Global vs Local Element Declarations Declarations that occur as children of the top-level

schema element are global declarations.• These are the only declarations that can actually be “used”

from elsewhere.

“Local declarations”—like the one illustrated on the previous slide—are “used” exactly once at their point of declaration.• This is different from the concept of local declarations in

most programming languages.

• Local element declarations interact with namespaces in a non-obvious way: perhaps best avoided until you are sure you know what you are doing.

1414

Global vs Local Type Definitions The type of the report element was specified by an

xsd:complexType element nested within the element declaration.

The type of the paragraph element was specified by a type attribute on the declaration, referencing a named type.

In fact types, like elements, can always be defined locally where they are used, or defined globally, then referenced from a point of use.

The following slide illustrates yet another way to declare report.

1515

Named Type Definitions In this version we introduce a named complex type

called reportType, then declare the report element with this type:

<xsd:complexType name="reportType"> <xsd:sequence> <xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType>

<xsd:element name="report" type="reportType"/>

This abstraction facility—introducing new named types—is a central theme of XML Schema.

1616

A Complete XML Schema<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.grid2004.org/ns/report1" xmlns="http://www.grid2004.org/ns/report1">

<xsd:element name="report"> <xsd:complexType> <xsd:sequence>

<xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element>

<xsd:element name="paragraph" type="xsd:string"/>

</xsd:schema>

1717

Remarks Recall this schema is essentially equivalent to the DTD:

<!DOCTYPE report [

<!ELEMENT report (paragraph)* >

<!ELEMENT paragraph (#PCDATA)> ] >

Clearly the schema has more baggage (or more added value, according to your point of view!)

Our schema declares two element names, report and paragraph, and puts them in a namespace called http://www.grid2004.org/ns/report1.

1818

Namespace Considerations The root element of any schema is a schema element

from the http://www.w3.org/2001/XMLSchema namespace.

The targetNamespace attribute on this element specifies which namespace the elements declared here “go into”.

We have seen the other namespace attributes before:• The xmlns:xsd attribute associates the prefix xsd with the

XML Schema namespace.• The xmlns attribute makes the default namespace

http://www.grid2004.org/ns/report1 for this document.• Often one uses xsd as the prefix for schema elements, and

makes the target namespace the default namespace of the schema document, but neither is essential.

1919

An XML Instance Document

<?xml version="1.0"?>

<report xmlns="http://www.grid2004.org/ns/report1"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.grid2004.org/ns/report1 report1.xsd">

<paragraph>Recently uncovered documents prove... </paragraph>

<paragraph>The author is grateful to W3C for making this research possible.</paragraph>

</report>

2020

Namespace Considerations Assuming the document vocabulary belongs to a

namespace, we must declare this namespace.• In this example http://www.grid2004.org/ns/report1 is

declared as the default namespace. If the instance document is to be validated against a

schema, we must normally define where the schema for the namespace is located.

This is done here by putting an attribute schemaLocation on the root element of the document.

This attribute is itself defined in a standard namespace, called http://www.w3.org/2001/XMLSchema-instance. So we must introduce a prefix for this (xsi is traditional).

2121

schemaLocation The value of the schemaLocation attribute should be a

pair of IRIs: a namespace name and the corresponding Schema URI.• If the document uses more than one namespace, the value can

be several consecutive pairs.

• All tokens are separated by white space. In this example the schema should be in the file

report1.xsd in the same directory as the instance document.

2222

Schema Validation Using dom.Writer If I save the instance document in a file called

“xsdreport1.xml”, and the schema in a file called “report1.xsd”, I can validate the file with the Xerces parser by using the dom.Writer sample application as follows:

> java dom.Writer –v –s –f xsdreport1.xml

• If validation is successful, this simply prints a formatted version of the input file. If schema validation fails, you will see error messages early in the output.

• The –v –s flags are needed here. Without –s the parser will try to do just DTD validation. -f means “full” schema validation—presumably a good thing.

2323

Schema Validation from Java Unfortunately it doesn’t seem to be possible to enable

XML Schema validation in Xerces using the “vendor-neutral” JAXP API.• The DOM Level 3 API will enable this, but it is not finalized

or fully deployed at the time of this writing. For now you must directly use the “proprietary”

org.apache.xerces.parsers.DOMParser Xerces implementation class.

Use is sketched on the next slide.

2424

The Xerces DOMParser APIimport org.apache.xerces.parsers.DOMParser;import org.w3c.dom.*;…

static final String VALIDATION_FEATURE_ID = "http://xml.org/sax/features/validation" ;

static final String SCHEMA_VALIDATION_FEATURE_ID = "http://apache.org/xml/features/validation/schema" ;

static final String SCHEMA_FULL_CHECKING_FEATURE_ID = "http://apache.org/xml/features/validation/schema-full-checking" ; …

DOMParser parser = new DOMParser();

// Turn Schema Validation on parser.setFeature(VALIDATION_FEATURE_ID, true); parser.setFeature(SCHEMA_VALIDATION_FEATURE_ID, true); parser.setFeature(SCHEMA_FULL_CHECKING_FEATURE_ID, true);

parser.setErrorHandler(new MyErrorHandler()) ;

parser.parse(uri) ; // uri is XML instance file

Document document = parser.getDocument() ; …

2525

More on Complex Types If an element may have nested elements, or if it may

have attributes, it must be described by a complex type.• If neither of these conditions holds—the element has only

character data content and no attributes—it is usually more convenient to use a simple type.

Attributes on complex types are specified by an attribute element, e.g.:

<xsd:element name="figure">

<xsd:complexType>

<xsd:attribute name="source" type="xsd:string"/>

</xsd:complexType>

</xsd:element>

2626

Attribute Declarations Like element declarations, attributes may be declared

globally, then used inside a complex type declaration, through an xsd:attribute element with a ref attribute.• In contrast to the situation with elements, local declaration of

attributes is often a natural choice. The figure example above has a complex type with no

content. In general attribute specifications go after the content specification, in the body of the xsd:complexType element.

2727

Element Sequences and Choices To finish this introductory foray into XML Schema, we

restore our report element back to its original specification. The XML Schema declaration is given on the next slide.

Recall this is supposed to be equivalent to the DTD declaration:

<!ELEMENT report (title, (paragraph | figure)*, bibliography?) >

• The use of the xsd:sequence and xsd:choice elements should be reasonably self explanatory.

• Note how the minOccurs, maxOccurs attributes replace use of the *, ? operators: both have default values of 1.

2828

Original report Element Structure <xsd:element name="report">

<xsd:complexType>

<xsd:sequence>

<xsd:element ref="title"/>

<xsd:choice minOccurs="0" maxOccurs="unbounded">

<xsd:element ref="paragraph"/> <xsd:element ref="figure"/> </xsd:choice>

<xsd:element ref="bibliography" minOccurs="0"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

2929

Simple Types:Schema Datatypes

3030

XML Schema Simple Types Recall simple types can be used to describe the values of

attributes, or the content of elements that have no nested elements (“character data” content).

So far we only illustrated one simple type built in to XML Schema: namely string.• As an attribute type this is similar to the DTD attribute type

CDATA; as an element type, it is similar to the DTD content specification (PCDATA).

Most of the details of simple types are defined in the W3 recommendation XML Schema Part 2: Datatypes.

3131

Built In and User-Defined Types XML Schema provides over 40 built in simple types. It also provides flexible mechanisms for creating your

own simple types,• which may in fact impose rather complex patterns on text

content.

3232

Schema Built In Types

3333

Built In Simple TypesSimple Type Examples (comma separated)

string Confirm this is electric

normalizedString Confirm this is electric

token Confirm this is electric

base64Binary GpM7

hexBinary 0FB7

byte -1, 126

unsignedByte 0, 126

3434

Built In Simple Types (continued)

Simple Type Examples (comma separated)integer -126789, -1, 0, 1, 126789

positiveInteger 1, 126789

negativeInteger -126789, -1

nonNegativeInteger 0, 1, 126789

nonPositiveInteger -126789, -1, 0

int -1, 126789675

unsignedInt 0, 1267896754

long -1, 12678967543233

unsignedLong 0, 12678967543233

short -1, 12678

unsignedShort 0, 12678

3535


Simple Type Examples (comma separated)

decimal -1.23, 0, 123.4, 1000.00

float -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN

double -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN

boolean true, false, 1, 0

3636

Built In Simple Types (continued)Simple Type Examples (comma separated)

time 13:20:00.000, 13:20:00.000-05:00

dateTime 1999-05-31T13:20:00.000-05:00

duration P1Y2M3DT10H30M12.3S

date 1999-05-31

gMonth --05--

gYear 1999

gYearMonth 1999-02

gDay ---31

gMonthDay --05-31

3737



Name shipTo

QName po:USAddress

NCName USAddress

anyURI http://www.example.com/, http://www.example.com/doc.html#ID5

language en-GB, en-US, fr

3838



ID

IDREF

IDREFS

ENTITY

ENTITIES

NOTATION

NMTOKEN US, Brésil

NMTOKENS US UK, Brésil Canada Mexique

3939

Creating New Simple Types There are three basic approaches to building new

simple types (deriving simple types):• Restricting facets of an existing simple type.

• Creating a list type from an existing simple type.

• Creating a union type from some existing simple types. The most sophisticated mechanism is the first—

restriction using facets.

4040

Facets The 19 primitive types (the built in types derived directly

from anySimpleType) have a set of constraining facets restricting allowed values.

The constraining facets of a simple type are a subset of:• length, minLength, maxLength, pattern, enumeration,

whiteSpace, maxInclusive, maxExclusive, minExclusive, minInclusive, totalDigits, fractionDigits

• Restricted types have all the facets of their base types—though values of the facets may be different.

• There is no way for schema writers to introduce new facets—users cannot directly restrict anySimpleType.

• Technically simple types have additional fundamental facets, but values of these flags cannot be set directly. They are: equal, ordered, bounded, cardinality, numeric

4141

Restriction Here is a characteristic example of restriction:

<xsd:simpleType name="singleDigit"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="-9"/> <xsd:maxInclusive value="9"/> </xsd:restriction> </xsd:simpleType>

This starts from the built in xsd:integer, and defines a derived type singleDigit by setting the facet minInclusive to -9 and the facet maxInclusive to 9.

Thus the type singleDigit represents a whole number between -9 and +9.

4242

Length The facets length, minLength, maxLength allow to constrain the

length of an item like a string (also allow to constrain the number of items in a list type, see later).

Values of length, minLength, minLength should be non-negative integers. Example:

<xsd:simpleType name="state">

<xsd:restriction base="xsd:string">

<xsd:length value="2"/>

</xsd:restriction>

</xsd:simpleType>

defines a type state representing strings containing exactly two characters.• These facets supported by all primitive types other than numeric and

date- and time-related types. Also supported by list types.

4343

Pattern Perhaps the most powerful facet is pattern, which

allows to specify a regular expression: any allowed value must satisfy the pattern of this expression.

Example

<xsd:simpleType name="weekday"> <xsd:restriction base="xsd:string"> <xsd:pattern value="(Mon|Tues|Wednes|Thurs|Fri)day"/> </xsd:restriction></xsd:simpleType>

defines a type weekday representing the names of the week days.

4444

Regular Expressions XML Schema has its own notation for regular

expressions, but very much based on the corresponding Perl notation.

For the most part Schema use a subset of the Perl 5 grammar for regular expressions.• Includes most of the purely “declarative” features from Perl

regular expressions, but omits many “procedural” features related to search, matching algorithm, substitution, etc.

XML Schema adds a few features of its own, e.g.:• Matching characters legal in XML names.

• Character class subtraction.

• Inherits general XML escape mechanisms for Unicode characters, replacing analogous Perl mechanisms.

4545

Metacharacters The following characters, called metacharacters, have

special roles in Schema regular expressions: . \ ? * + | { } ( ) [ ]• Like Perl, but treats }, ] uniformly as metacharacters, and

omits search-related metacharacters ^ and $. To match these characters literally in patterns, must

escape them with \, e.g.:• The pattern “2\+2” matches the string “2+2”.

• The pattern “f$x$” matches the string “f(x)”.

4646

Escape Sequences In general one should use XML character references to

include hard-to-type characters. But for convenience Schema regular expressions allow:• \n matches a newline character (same as
)• \r matches a carriage return character (same as )• \t matches a tab character (same as )

All other escape sequences (except \- and \^, used only in character class expressions) match any single character out of some set of possible values.• For example \d matches any decimal digit, so the pattern

“Boeing \d\d\d” matches the strings “Boeing 747”, “Boeing 777”, etc.

4747

Multicharacter Escapes The simplest patterns matching classes of characters

are:• . matches any character except carriage return or newline.

• \d matches any decimal digit.

• \s matches any white space character.

• \i matches any character that can start an XML name.

• \c matches any character that can appear in an XML name.

• \w matches any “word” character (excludes punctuation, etc.)

The escapes \D, \S, \I, \C and \W are negative forms, e.g. \D matches any character except a decimal digit.• Similar to Perl, except: Perl doesn’t have \i, \I; Perl uses \c, \C

for other things; detailed definitions of \w, \W are different.

4848

Category Escapes A large and interesting family of escapes is based on the

Unicode standard. General form in Perl or Schema is \p{Name}

where Name is a Unicode-defined class name.• The negative form \P{Name} matches any character not in the

class. Simple examples include: \p{L} (any letter), \p{Lu}

(upper case letters), \p{Ll} (lower case letters), etc. More interesting cases are based on the Unicode block

names for alphabets, e.g.:• \p{IsBasicLatin}, \p{IsLatin-1Supplement}, \p{IsGreek}, \

p{IsArabic}, \p{IsDevanagari}, \p{IsHangulJamo}, \p{IsCJKUnifiedIdeographs}, etc, etc, etc.

4949

Character Class Expressions Allow you to define terms that match any character

from a custom set of characters. Basic syntax is familiar from Perl and UNIX: [List-of-characters]

or the negative form: [^List-of-characters]

Here List-of-characters can include individual characters, and also ranges of the form First-Last where First and Last are characters.

Examples:• [RGB] matches one of R, G, or B.• [0-9A-F] or [\dA-F] match one of 0, 1, …, 9, A, B,…, F.• [^\r\n] matches anything except CR, NL (same as . ).

5050

Class Subtractions A feature of XML Schema, not present in Perl 5. A

class character expression can take the form:

[List-of-characters-Class-char-expr]

or:

[^List-of-characters-Class-char-expr]

where Class-char-expr is another class character expression.

Example:• [a-zA-Z-[aeiouAEIOU]] matches any consonant in the Latin

alphabet.

5151

Sequences and Alternatives Finally, the universal core of regular expressions. If

Pattern1 and Pattern2 are regular expressions, then:• Pattern1Pattern2 matches any string made by putting a string

accepted by Pattern1 in front of a string accepted by Pattern2.• Pattern1|Pattern2 matches any string that would be accepted by

Pattern1, or any string accepted by Pattern2.

Parentheses just group things together:• (Pattern1) matches any string accepted by Pattern1.

An example given earlier:• (Mon|Tues|Wednes|Thurs|Fri)day matches any of the strings

Monday, Tuesday, Wednesday, Thursday, or Friday.• Equivalent to Monday|Tuesday|Wednesday|Thursday|Friday.

5252

Quantifiers … and if Pattern1 is a regular expression:

• Pattern1? matches the empty string or any string accepted by Pattern1.

• Pattern1+ matches any string accepted by Pattern1, or by Pattern1Pattern1, or by Pattern1Pattern1Pattern1, or …

• Pattern1* matches the empty string or any string accepted by Pattern1+.

If n, m are numbers, Perl and XML Schema also allow the shorthand forms:• Pattern1{n} is equivalent to Pattern1 repeated n times.• Pattern1{m,n} matches any string accepted by Pattern1

repeated m times or m + 1 times or … or n times.• Pattern1{m,} matches any string accepted by Pattern1 repeated

m or more times.

5353

Using Patterns in Restriction All simple types (including lists and enumerations) support the

pattern facet, e.g.:

<simpleType name=“multiplesOfFive">

<restriction base="xs:integer">

<pattern value=“[+-]?\d*[05]"/>

</restriction>

</simpleType>

defines a subtype of integer including all numbers ending with digits 0 or 5.

The pattern facet can appear more than once in a single restriction: interpretation is as if patterns were combined with |.• Conversely if the pattern facet is specified in restriction of a base type that

was itself defined using a pattern, allowed values must satisfy both patterns.

5454

Enumeration The enumeration facet allows one to select a finite

subset of allowed values from a base type, e.g.: <xsd:simpleType name="weekday"> <xsd:restriction base="xs:string"> <xsd:enumeration value="Monday"/> <xsd:enumeration value="Tuesday"/> <xsd:enumeration value="Wednesday"/> <xsd:enumeration value="Thursday"/> <xsd:enumeration value="Friday"/> </xsd:restriction> </xsd:simpleType>

• Behaves like a very restricted version of pattern?• All primitive types except boolean support the enumeration

facet. List and union types also support this facet.

5555

White Space This facet controls how white space in a value received from the

parser is processed, prior to Schema validation. It can take three values:• preserve: no white space processing, beyond what base XML does.

• replace: Convert every white space character (Line Feed, etc) to a space character (#x20).

• collapse: Like replace. All leading or trailing spaces are then removed. Also sequences of spaces are replaced by single spaces.

Note analogies to “Normalization of Attribute Values” in base XML.

All simple types except union types have this attribute, but usually you don’t explicitly set it in restriction: just inherit values from built in types.

All built in types have collapse, except string which has preserve and normalizedString which has replace.

5656

Other Facets The facets maxInclusive, maxExclusive, minExclusive,

minInclusive are supported only by numeric and date- and time-related types, and define bounds of value ranges.

The facets totalDigits, fractionDigits are defined for the primitive type decimal, and thus all numeric types derived from decimal, and for no other types.

5757

List Types We define a type representing a white-space-separated list of

items using the list element, e.g.:

<xsd:simpleType name="listOfDays"> <xsd:list itemType="weekday"> </xsd:simpleType>

this introduces a type that takes values like “”, “Monday”, “Monday Monday”, “Tuesday Wednesday Thursday”, etc.

List types can be restricted using the length-related facets, and the pattern, enumeration and whitespace facets.

The <list> element may contain an anonymous <simpleType> element instead of having an itemType attribute.

A list value is split according to its white space content prior to validation of the items in the list.

5858

Union Types A union type takes values from any one of a set of base types.

<xsd:simpleType name="maxOccursType"> <xsd:union memberTypes="xsd:integer">

<xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="unbounded"/> </xsd:restriction> </xsd:simpleType>

</xsd:union> </xsd:simpleType>

The types in the union are specified in the list-valued attribute memberTypes, or by nested anonymous <simpleType> elements, or by a combination of the two, as above.

Union types can be restricted using the pattern or enumeration facets.

5959

Prohibiting Derivation You may, for some reason, have a simple type that you

don’t want anybody to derive further types from. Do this by specifying the final attribute on the

<simpleType> element. Its value is a list containing a subset of values from list, union, restriction, extension.• These specify which sorts of derivation are disallowed. Note

extension is a way of deriving a complex type from a simple type. It will be discussed in the next section.

Give the final attribute the value “#all” for blanket prohibition of any derivation from this simple type.• Can also prevent the value of individual facets from being

changed in subsequent restrictions by specifying fixed="true" on the facet elements.

6060

Complex Types

6161

Element Content, and Attributes Simple types allow us to declare elements that have only

parsed character content (no nested elements). E.g. the declaration: <xsd:element name="dayItHappened" type="weekday"/>

might validate instance elements like: <dayItHappened> Monday </dayItHappened>

<dayItHappened>Tuesday</dayItHappened>

But if we need elements with element content, or elements with attributes, we must declare those elements to have complex type.

6262

Complex Type Hierarchy We saw that a set of built in simple types were derived

from xsd:anySimpleType, and that new simple types could be derived from a base type by restriction, list, or union.

There are no built in complex types, other than the so-called ur-type, represented as xsd:anyType.

All other complex types are derived by one or more steps of restriction and extension from xsd:anyType.• Complex types can also be created by extension of a simple

type, but simple types are also notionally restrictions of xsd:anyType.

6363

Restriction A restriction of a base type is a new type. All allowed instances of the new type are also instances

of the base type. But the restricted type doesn’t allow all possibilities allowed by the base type.• Think of the example of restricting xsd:string to 4 characters

using the length facet. Strings of length 4 are also allowed by the xsd:string, but the new type is more restrictive.

• In the complex case, we might have a complex base type that allows attribute att optionally. A restricted type might not allow att at all.

Another restriction of the same base might require att.• Or we might have a base type that allows 0 or more nested

elm elements. The restricted type might require exactly 1 nested elm element.

6464

Extension An extension of a base type is a new type. An extension allows extra attributes or extra content

that are not allowed in instances of the base type.• At first brush this sounds like the opposite of restriction, but

this isn’t strictly true.

• If, for example, type E extends a type B by adding a required attribute att, then instances of B are not allowed instances of E (because they don’t have the required attribute). So we have that E is an extension of B, but there is no sense in which B could be a restriction of E.

Some such inverse relation exists if all extra attributes and content are optional in the extended type, but this isn’t a required feature of extension.

6565

Complex Content and Simple Content We have seen that XML Schema complex types define

both some allowed nested elements, and some allowed attribute specifications. Complex types that allow nested elements are said to have complex content.

But Schema distinguish as a special case complex types having simple content—elements with such types may have attributes, but they cannot have nested elements.

This is presumably a useful distinction, but it does introduce one more layer of complexity into the syntax for complex type derivation.

6666

Basic Forms of Complex Type Definition

Restriction Extension

<complexType> <complexContent> <restriction base="type"> allowed element content allowed attributes </restriction> </complexContent> </complexType>

<complexType> <complexContent> <extension base="type"> extra element content extra attributes </extension> </complexContent> </complexType>

<complexType> <simpleContent> <restriction base="type"> facet restrictions allowed attributes </restriction> </simpleContent> </complexType>

<complexType> <simpleContent> <extension base="type"> extra attributes </extension> </simpleContent> </complexType>

6767

Remarks When one restricts a type one generally must specify all

allowed element content and attributes. When one extends a type one generally must specify

just the extra element content and attributes.

6868

Requirements on Base Type The base type must be a complex type in all cases except

simpleContent/extension (lower right in table), in which case the base can be a simple type.

If the derived type has complexContent, the base type must have complex content.• True for extension or restriction.

• Under some conditions, using a special form described later, a base type with complex content can be restricted to a type with simple content.

6969

Schematic Inheritance Diagramxsd:anyType

Simple TypesSimpleContent

ComplexContentextension

restrictionrestriction†

restriction

Complex Types

restrictionlist

union

restrictionextension


restriction†

† see later for syntax

7070

Defining a Complex Type with no Base? In the introductory lecture we seemed to avoid this complexity:

didn’t we just define complex types out of “thin air”? Actually the XML Schema specification says that:

<complexType> <complexContent> <restriction base="xsd:anyType"> allowed element content allowed attributes </restriction> </complexContent></complexType>

<complexType> allowed element content allowed attributes</complexType>

So in reality we were directly restricting the ur-type, which allows any attributes and any content!

Is “shorthand” for

7171

Defining Element Content Where we wrote allowed element content or extra

element content in the syntax for complex type definitions, what should appear is a model group.

A model group is exactly one of:• an <xsd:sequence/> element, or

• an <xsd:choice/> element, or

• an <xsd:all/> element.

(The element content appearing in the type definition may also be a globally defined model group, referenced through an <xsd:group/> element. The global definition—a named <xsd:group/> element—just contains one of the three elements above.)

7272

Sequence A <xsd:sequence/> model group contains a series of

particles. A particle is an <xsd:element/> element, another model group, or a wildcard.

As expected, this model just says the element content represented by those items should appear in sequence.

E.g.<xsd:sequence> <xsd:element ref="title"/> <xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/></xsd:sequence>

says that exactly one occurrence of a title element is followed by any number of occurrences of paragraph elements.

7373

Choice A <xsd:choice/> model group also contains a series of

particles, with the same options as for sequence. The element information validated by this model

should match exactly one of the particles in the choice. E.g. <xsd:choice minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="paragraph"> <xsd:sequence> <xsd:element ref="figure"/> <xsd:element ref="caption"/> <xsd:sequence> </xsd:choice>

matches a sequence of paragraph elements interleaved with consecutive pairs of figure and caption elements.

7474

All The <xsd:all/> model group is peculiar to XML

Schema. All particles it contains must be <xsd:element/>s.

The element information validated should match a sequence of the particles in any order.

There are several constraints:• The maxOccurs attribute of each particle must be 1.• The minOccurs attribute of each particle must be 0 or 1.• The <xsd:all/> model group can only occur at the top level of

a complex type’s content model, and must itself have minOccurs = maxOccurs = 1.

In view of the fact minOccurs of a particle can be 0, subset might be a better name than all??

7575

Element Wildcard The element wildcard particle <xsd:any/> matches and

validates any element in the instance document.• Though one can restrict the namespace of the matched

element, as described below. E.g.

<xsd:sequence minOccurs="0" maxOccurs="unbounded"> <xsd:element ref=“header"/> <xsd:any/> </xsd:sequence>

matches a sequence of consecutive pairs of elements, where the first element in each pair is a header, and the second can be any kind of element.

7676

Options on <xsd:any/> The <xsd:any/> element takes the usual optional

maxOccurs, minOccurs attributes. Allows a namespace attribute taking one of the values:

• ##any (the default),• ##other (any namespace except the target namespace),• List of namespace names, optionally including either

##targetNamespace or ##local.

Controls what elements the wildcard matches, according to namespace.

It also allows a processContents attribute taking one of the values strict, skip, lax (default strict), controlling the extent to which the contents of the matched element are validated.

7777

Parsing and Determinism Recall the rule about determinism of content models in

DTDs. We claimed XML retained this purely for compatibility with SGML.

Perhaps surprisingly, XML Schema retains exactly the same rule, calling it the Unique Particle Attribution constraint.

It has to be imposed slightly more carefully here because of the possibility of wild card particles and substitution groups (discussed later).• Unclear why it was retained. Perhaps to improve the

efficiency of parsing, especially in the presence of substitution groups? Or to simplify the Particle Derivation OK constraints for restriction of complex types (see later)?

7878

Mixed Content XML Schema score a big win over DTDs in the way

mixed content is handled. One simply specifies the attribute mixed on the

complexContent element, giving it the value true.• In the abbreviated form for restriction of the ur-type, the

mixed attribute appears on the complexType element. This specifies that the element content defined by the

model particles can be interleaved with character data (without limiting how the elements themselves are arranged).

7979

Mixed Content Example This element declaration

<xsd:element name="body">

<xsd:complexType mixed="true">

<xsd:choice minOccurs="0" maxOccurs="unbounded"> <element ref="p"/> <element ref="a"/> </xsd:choice>

</xsd:complexType>

</xsd:element>

allows the body element to contain <p/> and <a/> elements, with text interleaved anyhow between them.

8080

mixed and Inheritance So an <xsd:complexContent/> with mixed="true"

indicates a mixed complex type. And an <xsd:complexContent/> with mixed="false" (the default) indicates an element-only complex type.

A mixed complex content type may be restricted to an element-only type (if the element content allows it).

Perhaps surprisingly, an element-only complex content type may not be extended to a mixed type.

8181

Restricting Mixed Content to Simple Content

If the model group of a mixed complex type can match the empty sequence of elements, then the type may have content that is text-only.

Then it is logically possible to restrict the type to one with simple content. There is a special syntax for this: <complexType> <simpleContent> <restriction base="mixed-complex-content-type"> <simpleType> usual content of simpleType element </simpleType> allowed attributes </restriction> </simpleContent> </complexType>

8282

Complex Types

Expanded Complex Type Inheritancexsd:anyType

SimpleContent

Complex Content

restriction

restriction


restriction

extension

restriction

Mixed Element-onlyrestrictio

n

restriction

restriction

extension

8383

Empty Elements XML Schema doesn’t have any unique way of

representing elements that must be empty. The simplest thing to do this is simply omit the allowed

element content in a complex content restriction. Can such an element also be mixed (i.e. have pure text

content)?• Logically it seems this should be possible (I believe it is

allowed by Xerces).

• But it seems to be forbidden by the XML Schema specification, which singles out this case and says such an element is strictly empty.

8484

Attributes and Local Declarations

8585

Defining Allowed Attributes Where we wrote allowed attributes or extra attributes in

the syntax for complex type definitions, what should appear is sequence of attribute declarations in the form of <xsd:attribute/> elements.• These may be followed an optional attribute wildcard.

(The attribute declaration list may also include globally defined attribute groups, referenced through <xsd:attributeGroup/> elements. These will be discussed later.)

8686

Simple Attribute Declarations A straightforward example of an attribute declaration

was given in the introductory lecture:

<xsd:element name="figure">

<xsd:complexType>

<xsd:attribute name="source" type="xsd:string"/>

</xsd:complexType>

</xsd:element> In general the value of the type attribute can be any

simple type.• Though unusual, it is also allowed to include an anonymous

<xsd:simpleType/> definition in the body of the <xsd:attribute/>, instead of specifying the type attribute.

8787

Default Rules As with DTDs, one can specify whether the use of an

attribute is optional (the default) or required. One can also specify a default value (if the attribute is

optional). Alternatively one can specify a fixed value for the

attribute (whether the attribute is optional or required).• default and fixed are mutually exclusive.

8888

DTD Attribute Defaults Revisited Attribute list declaration:

<!ATTLIST a val CDATA "nothing" fix CDATA #FIXED "constant" req CDATA #REQUIRED opt CDATA #IMPLIED>

Instances of element a:<a val="something" fix="constant“ req="reading" opt="extra"/>

<a req="no experience"/> 

<a fix="variable"/>

8989

Schema Attribute Occurrence Equivalent Schema declaration:

<xsd:attribute name="val" type="xsd:string" use="optional" default="nothing"/>

<xsd:attribute name="fix" type="xsd:string" fixed="constant"/>

<xsd:attribute name="req" type="xsd:string" use="required"/>

<xsd:attribute name="val" type="xsd:string“/>

• Note fix and val implicitly have use="optional" (we could have omitted this specification for val too).

• Unlike DTDs, it possible to have an attribute that is both fixed and required.

9090

Complex Content Plus Attributes Putting things together, here is a declaration of a body

element that allows mixed content plus a style attribute.

<xsd:element name="body">

<xsd:complexType mixed="true">

<xsd:choice minOccurs="0" maxOccurs="unbounded"> <element ref="p"/> <element ref="a"/> </xsd:choice>

<xsd:attribute name="style" type="xsd:string"/>

</xsd:complexType>

</xsd:element>

9191

Simple Content plus Attributes Here is a declaration of an anchor element that allows

simple content plus an href attribute.

<xsd:element name="anchor">

<xsd:complexType>

<xsd:simpleContent>

<xsd:extension base="xsd:string”>

<xsd:attribute name="href" type="xsd:anyURI"/>

<xsd:extension>

</xsd:simpleContent>

</xsd:complexType>

</xsd:element>

9292

Attribute Wildcards An attribute wildcard is represented by an

<xsd:anyAttribute/> element. There can be at most one such element in a complex

type definition, and it must appear after any normal attribute declarations.

Such an declaration allows any attribute, optionally limited by namespace.

The namespace and processContents attributes on <xsd:anyAttribute/> work as for <xsd:any/>.

9393

Attributes and Namespaces By default, attributes declared as we have illustrated

(inside an <xsd:complexType/>) do not become part of the target namespace.• Instead these attributes are local properties of any element

they are attached to. The element itself may or may not belong to a namespace.

In instance documents, names of these attributes must not be prefixed with a namespace prefix.

9494

Creating Attributes in a Namespace There are three ways to put attributes into the target

namespace:• Declare them “globally”, directly inside the top level

<xsd:schema/> element. Reference the attribute declaration inside the complex type definition (like element references), or

• specify the attribute form="qualified" on a local <xsd:attribute/> declaration, or

• specify the attribute attributeFormDefault="qualified" on the <xsd:schema/> element.

After this, these attributes must be prefixed in instance documents with a namespace prefix.• Recall default namespace declarations (xmlns="namespace")

don’t work for attributes: you must introduce a non-empty prefix.

9595

Locally Declared Elements XML Schema goes to some lengths to maintain symmetry

between elements and attributes. Because the most natural way of declaring attributes is locally—

private to a complex type—it must therefore be possible to declare elements local to the complex type.• Even if this is less obviously natural for elements—it leads to some clumsy

constraints, e.g.: two local element declaration particles with the same name in the model group of the same complex type must have the same type.

The same rules apply: if an element is declared locally (inside an <xsd:complexType/>), by default it does not belong to a namespace.

In this case its name must not be prefixed with a namespace prefix in instance documents.

9696

Creating Elements in a Namespace There are three ways to put elements into the target

namespace:• Declare them “globally”, directly inside the top level

<xsd:schema/> element. Reference the element declaration inside the complex type definition, or

• specify the attribute form="qualified" on a local <xsd:element/> declaration, or

• specify the attribute elementFormDefault="qualified" on the <xsd:schema/> element.

After this, these elements must be prefixed in instance documents with a namespace prefix (or there must be a default namespace declaration in effect).

9797

elementFormDefault and attributeFormDefault Summary:

• These attributes on the <xsd:schema/> element take the values “qualified” or “unqualified”

• The defaults for both are “unqualified”.• They control whether or not elements and attributes declared

locally in <xsd:complexType/> definitions belong to the target namespace.

• This property can also be controlled by form attributes on the individual declarations.

None of these attributes has any effect on elements or attributes declared globally (at the top level in the <xsd:schema/> element)! Effectively such declarations are all qualified.

9898

Inheritance and Substitution

9999

Polymorphism? We have presented the mechanisms by which new types

can be derived from old types (albeit we have omitted some details for complex types).

Through these mechanisms, inheritance provides useful ways to recycle existing definitions.

But it doesn’t in itself provide all the benefits of OOP—in particular we have not presented any analogue of polymorphism.

Schema tries to provide some of the OO flexibility in use of instances through type substitution and substitution groups.

100100

Type Substitution The most basic mechanism for “polymorphism” is type

substitution. In essence this says that if a particle (in a content

model, say) is declared to be an element with a particular type, then the corresponding element item in the instance document may have type derived from the particle type.

Actually this only introduces new possibilities if the derivation involves extension.

101101

A Basis for Extension Suppose we have the complex type declaration:

<xsd:complexType name="figureType"> <xsd:attribute name="source" type="xsd:anyURI"/> </xsd:complexType>

and suppose this is used as follows: <xsd:element name="figure" type="figureType“/>

<xsd:element name="report"> <xsd:complexType> <xsd:choice minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="paragraph"> <xsd:element ref="figure"/> </xsd:choice> </xsd:complexType> </xsd:element>

i.e. a report is a sequence of interleaved paragraph and figure elements, and a figure just has an attribute referencing a source image file.

102102

Extension Example Now suppose that, without modifying any existing

definitions and declarations, we want to allow figures in reports to have captions. We can do this if we introduce the extended type:

<xsd:complexType name="captionFigureType"> <xsd:complexContent> <xsd:extension base="figureType"> <xsd:element name="caption" type="xsd:string"/> </xsd:extension> </xsd:complexContent> </xsd:complexType>

• This complex type inherits the attribute source from its base type, and adds a nested caption element.

103103

Example Instance Document

<report xmlns="http://www.grid2004.org/ns/report4"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.grid2004.org/ns/report4 report4.xsd">

<paragraph>Recently uncovered documents prove... </paragraph>

<figure xsi:type="captionFigureType" source="notafake.jpg">

<caption>Irrefutable proof of ancient XML.</caption>

</figure>

</report>

104104

xsi:type As illustrated above, the element information item may

have any type derived by extension from the type the element was declared with.• In general it may be derived by a mixture of extension and

restriction. This isn’t quite a free lunch, though. There is no way

for an XML processor to automatically infer the type of an element instance; instead this approach requires the XML author explicitly specify the intended type using the xsi:type attribute.• This limits the attractiveness of this approach to

“polymorphism”.

105105

Substitution Groups A more author-friendly approach to document

polymorphism is based on element declarations. This approach uses so-called substitution groups.

• Each substitution group is a set of element declarations.

• One of these is singled out as the head declaration. Where a content model includes a reference to the head

as a particle, the instance document can have any member of the associated substitution group.

106106

Substitution Group Example Suppose the earlier definitions of figureType, <figure/>,

<report/>, and captionFigureType are in effect. Now suppose we declare a new element

<captionFigure/>, having type captionFigureType, and belonging to a substitution group headed by <figure/>.

Then a possible instance document would be:

<report … > <paragraph>Recently uncovered documents prove... </paragraph>

<captionFigure source="notafake.jpg"> <caption>Irrefutable proof of ancient XML.</caption> </captionFigure> </report>

107107

Remarks Important things to note:

• Again we haven’t modified the original declaration of the report element, which still says it contains figure elements.

• Because captionFigure is in the substitution group of figure, automatically it is allowed to appear in place of figure in the instance.

• We no longer need the clumsy xsi:type attribute; the actual type of the information element can now be easily inferred from the element name (through its declaration, described shortly).

108108

Creating Substitution Groups Groups are implicit: the implementation is more like a

new kind of inheritance hierarchy—one relating element declarations rather than type definitions.

A new element declaration specifies at most one direct substitution group affiliation. This is another element declaration. The “affiliation” now heads a group containing the new declaration. • In practice an affiliation works almost exactly like a base

type, except it involves element declarations, not types.

• If the affiliation itself belongs to a different group, the new declaration automatically joins that group—generally an element can be in several (perfectly nested) groups.

109109

Group Creation Example In our example we could declare captionFigure, as

follows:

<xsd:element name="captionFigure" type="captionFigureType" substitutionGroup="figure" />

• This says <figure/> is the substitution group affiliation of <captionFigure/>.

• Or in other words <captionFigure/> is in the substitution group headed by <figure/>.

• The type attribute here may be omitted: the type defaults to that of the substitution group affiliation (again emphasizing the analogy with inheritance).

110110

Notional Substitution Group Hierarchy This way of looking at things isn’t part of the XML

Schema specification, but it may be mnemonic:

<xsd:any/>

<captionFigure/>

<figure/>

<report/>

111111

Substitution and Type Inheritance It is required that all elements in a substitution group

headed by element <Name/> have either the same type as <Name/>, or a type derived from it by steps of extension and restriction.

Note that substitution may be used without type inheritance.• In other words, all elements in the substitution group may

have the same type as their head.

• Consider the example of internationalization: you might want many interchangeable elements with identical structure but different names (for different languages).

112112

Blocking Substitutions We have described two kinds of substitution involving

an element: the structure of an element can be substituted using xsi:type, or the whole element can be substituted by a member of its substitution group.

It is quite likely that a schema writer will want to block some such substitutions.• Many applications will require elements to have exactly the

originally specified form.

• We need a way to prevent this form being corrupted by (say) unexpected addition of an element to a substitution group.

113113

block Attribute of <xsd:element/> The value of the block attribute on <xsd:element/>

should be a list containing a possibly empty subset of the values extension, restriction, and substitution (or simply #all).

It defines the disallowed substitutions for this element.• If a particle in a content model has substitution in its

disallowed substitutions, the document instance may not replace the element by members of its substitution group.

• If an element has extension in its disallowed substitutions, then neither xsi:type or a substitution group substitution allows the instance to validate against a type whose derivation from the particle type involves steps of extension.

• Appearance of restriction in the disallowed substitutions has an analogous effect.

114114

block Attribute of <xsd:complexType/> A block attribute may also be specified on the

<xsd:complexType/> element. Its value is a list containing a subset of the values extension and restriction (or simply #all).

It defines the prohibited substitutions for this type.• If the type of an element has extension in its prohibited substitutions, then

neither xsi:type or a substitution group substitution are allowed to validate the instance against a type whose derivation from the particle type involves any extension steps.

Such validation is also prevented if the prohibited substitutions of any intervening types in the chain of derivation include extension.

• Appearance of restriction in the prohibited substitutions has an analogous effect.

Note the block attributes of <complexType/> and <element/> are independent, and constraints from both must be satisfied.• But it is “as if” an element acquires all blocked substitutions of its type.

115115

blockDefault Attribute of <xsd:schema/> Unless otherwise specified, all substitutions are allowed. You may want to change this globally to something

more conservative. Do this by specifying the blockDefault attribute on the

<xsd:schema/> element.• Allowed values for this attribute are the same as for the block

attribute on <xsd:element/>.

116116

Prohibiting Derivation The final attribute on <xsd:complexType/> works in

the same way as the corresponding attribute for <xsd:simpleType/>.

Its value may be either a list containing a subset of the values extension and restriction, or simply #all. It prohibits either or both kinds of derivation using this type as base.

Although final and block can be used to similar ends, their modus operandi are quite different:• final controls how you define new types derived from this

type.• block controls how you substitute elements of this type in the

document instance.

117117

Substitution Group Exclusions An <xsd:element/> declaration likewise allows a final

attribute, with the same allowed values as final on <xsd:complexType/>.

Its value defines the substitution group exclusions for this element, which control its use as the head of a substitution group.• If an element has extension in its substitution group

exclusions, it may not be the substitution group affiliation of another element whose type is derived from the type of this element by steps including extension.

• Appearance of restriction in the substitution group exclusions has an analogous effect.

By all rights, it should be possible to put substitution in this set. But it isn’t!

118118

finalDefault Attribute of <xsd:schema/> For completeness we mention that the <xsd:schema/>

element allows a finalDefault attribute, which works in a way very much analogous to the blockDefault attribute.

119119

Still to Come on Inheritance By no means have we yet covered every aspect of

inheritance. Notably we haven’t discussed what exactly is a legal

restriction or extension of a complex type (particularly with respect to the content model).

This is quite complicated in general, and it will be covered in the final section.

120120

XML Schema Identity Constraints

121121

Identifiers and References Revisited Slightly extended version of an example from the

lectures on DTDs:

<agency> <agent name="Alice" boss="Alice"/> <agent name="Bob" boss="Alice"/> <agent name="Carole" boss="Alice"/> <agent name="Dave" boss="Bob"/> </agency>

Carole

Alice

Bob

Dave

Using DTDs, we assumed name was declared with type ID, and attribute boss was declared with type IDREF.

122122

Identity Constraints Recall that the attribute types ID and IDREF imply

interesting constraints on values of those attributes:• Within any individual XML document, every attribute of

type ID must be specified with a different value from every other attribute of type ID.

• The value of any attribute of type IDREF must be the same as the value of an attribute of type ID specified somewhere in the same document.

These properties are obviously very useful and natural if we need to identify individual elements in a document.

XML Schema supports the ID and IDREF simple types. But it also introduces additional, much more general mechanisms for achieving similar ends.

123123

Use of XPath In an earlier lecture-set we gave a brief introduction to

XPath.• Recall that XPath is a notation for representing a subset of

nodes in a single XML document. The basic idea of XML Schema identity constraints is

to use XPath expressions to identify groups of “fields” within an XML document that act as either identifiers or references.• Uniqueness/existence constraints hold within/across these

groups. More flexible than the DTD mechanism, because:

• XPath allows one to single out more refined sets of fields.• May have multiple categories of identifier in the same

document.

124124

Example<xsd:element name="agency"> <xsd:complexType> <xsd:element ref="agent" minOccurs="0" maxOccurs="unbounded"/> </xsd:complexType> <xsd:key name="agentName"> <xsd:selector xpath="agent"/> <xsd:field xpath="@name"/> </xsd:key> <xsd:keyref refer="agentName" name="agentBoss"> <xsd:selector xpath="agent"/> <xsd:field xpath="@boss"/> </xsd:key></xsd:element>

125125

General Remarks The element <xsd:key/> defines a key field called

agentName. The element <xsd:keyref/> defines a key reference field

called agentBoss. These definitions are inside the declaration of the

element <agency/>.• This implies that the scope of the uniqueness and related

constraints is an individual <agency/> element.

• This may or may not be the top-level element of a document. The fields themselves are specified by XPath

expressions (details follow).

126126

Defining a Key We have the example:

<xsd:key name="agentName"> <xsd:selector xpath="agent"/> <xsd:field xpath="@name"/> </xsd:key>• The name of the key is agentName.• The <xsd:selector/> element defines the set of nodes labeled

by this key. In our case, it is the set of all agent elements nested

directly in the agency element.• The <xsd:field/> element defines the field within each labeled

node that acts as the key. In our case, the name attribute of the node.

127127

Validity Constraints on Keys Every node identified by the XPath expression in the

<xsd:selector/> element must have exactly one descendant node identified by the XPath expression in the <xsd:field/> element.• This descendant, whose value is the key field, must be an

attribute or an element with simple type. No two nodes identified by <xsd:selector/> may have

the same value for their key fields.• This constraint holds within the body of the scope element

(the <agency/> element in our example).

• But the same value of the key field is allowed on different <agent/> nodes inside different <agency/> elements.

128128

Defining a Key Reference We have the example:

<xsd:keyref refer="agentName" name="agentBoss"> <xsd:selector xpath="agent"/> <xsd:field xpath="@boss"/> </xsd:key>• The refer attribute is the name of the key to which we refer.• The <xsd:selector/> and <xsd:field/> elements identify the

nodes whose values are the actual references. They work in essentially the same way as in <xsd:key/>. The two-stage approach to identifying the relevant fields is

less obviously natural in this case. But it supports the generalization to multiple key fields, described below.

• The name of the key reference is agentBoss—this attribute is required (though unclear what this name is used for??)

129129

Multiple Key Fields A <xsd:key/> element can have multiple <xsd:field/>

elements, e.g.: <xsd:key name="fullName">

<xsd:selector xpath=".//person"/>

<xsd:field xpath="@firstName"/>

<xsd:field xpath="@lastName"/>

</xsd:key>• For validity, this implies every <person/> element in scope

has firstName and lastName attributes with unique pair-wise-combined values.

A <xsd:keyref/> element that refers to this key must have exactly the same number of <xsd:field/> elements.

130130

Relating Key References to Keys The fact that keys and key references are scoped to

element declarations introduces some “interesting” complications.

Things might be straightforward if a <keyref/> always referred to a <key/> defined in the same element declaration.

You might be forgiven for thinking this should “obviously” be the case. But actually the Schema specification allows a <keyref/> to refer to a <key/> defined in a different element declaration.

131131

Referencing Keys in Nested Elements Suppose a key, Key, is defined in the declaration of

element B. Also suppose a key reference, Ref, refers to this key and

is defined in the declaration of element A. Now a field of Ref—scoped to an instance of A—is

allowed to point to fields of Key scoped to an instance of B that is a descendent of the A instance.• This can lead to ambiguous references, because the key

uniqueness constraints apply only within a single B instance, and there could be several Bs nested in the A instance.

• The specification gives a slightly clumsy recipe for resolving such ambiguities (illustrated below).

132132

Features The rule on the previous slide can introduce interesting

behavior even when the <xsd:keyref/> and the <xsd:key/> are defined in the same element declaration.• This can happen if instances of the element can nest inside

one another. In the example on the next slide, the key is the value of

<key/> elements directly nested inside a <scope/> element, and the reference is the value of a <ref/> element directly nested in a <scope/> element. The <scope/> elements are also allowed to self-nest.

133133

An Interesting Case <xsd:element name="scope">

<xsd:complexType> <xsd:choice minOccurs="0" maxOccurs="unbounded"> <xsd:element ref="key"/> <xsd:element ref="ref"/> <xsd:element ref="scope"/> </xsd:choice> </xsd:complexType>

<xsd:key name="key"> <xsd:selector xpath="key"/> <xsd:field xpath="."/> </xsd:key>

<xsd:keyref refer="key" name="ref"> <xsd:selector xpath="ref"/> <xsd:field xpath="."/> </xsd:keyref> </xsd:element>

134134

Examples<scope> <scope> <key>keyval</key> </scope> <ref>keyval</ref></scope>

<scope> <scope> <key>keyval</key> </scope> <key>keyval</key> <ref>keyval</ref></scope>

<scope> <scope> <key>keyval</key> </scope> <scope> <key>keyval</key> </scope> <ref>keyval</ref></scope>

<scope> <scope> <key>keyval</key> </scope> <scope> <key>keyval</key> </scope> <key>keyval</key> <ref>keyval</ref></scope>

Illegal!

135135

Remarks Examples here follow the rules in the section of the

XML Schema specification called: Schema Information Set Contribution: Identity-constraint table.

The rule is basically that a key reference can refer to a key field scoped to a descendant element. But if there are conflicts, you ignore any potential reference targets arising from children (this rule applies recursively).

In the 3rd example (bottom left), all potential targets arise from children, and are conflicting, so they should be ignored. Thus the reference is illegal.• The 2nd and 4th examples OK: conflicts are resolved by

ignoring targets from children, leaving just the local target.• Xerces 2.6.2, however, also accepts the 3rd example!

136136

Uniqueness Constraints The <xsd:unique/> element works almost exactly like

the <xsd:key/> element, except that it is not required that the identifying fields exist for every node identified by the selector.• If fields exist in the node instance, they must be unique across

all selected nodes. A unique constraint cannot be the target of a keyref.

137137

Namespaces The examples given in this section were simplified in

that the XPath expressions did not allow for a target namespace.

Recall that XPath expressions always require use of qualified names. If you are using identity constraints in a schema with a target namespace, you must declare a prefix for that namespace, and use that prefix on (say) element names appearing in the xpath attributes.

138138

Imports and Includes

[To Be Added]

139139

“Particle Derivation OK”

140140

Inheritance in OOP and XML We saw that XML Schema makes heavy use of a concept of type

inheritance. This concept is clearly inspired by the corresponding concept in

Object Oriented Programming. But the analogy between XML and OOP is by no means exact.

In OOP, a class has a set of disjoint, essentially independent, named members (fields and methods).• In derivation, this set can be extended, or named members can be

individually overridden. In XML, a complex type has a set of attributes and a content model.

• The attributes behave much like the independent members of a class, and the set of attributes can naturally be extended during derivation.

• The analogy works much less well for content models. The complex ordering and nesting relations within element content limit the options for extension.

• And, while perhaps more “mathematically natural” than extension, we will see restriction of content models has its own implementation problems.

141141

Extension and Restriction Unlike typical OOP programming languages, XML Schema

distinguishes two different forms of type derivation, called extension and restriction.• The analogy between Schema type extension and OOP inheritance should

be fairly clear.

• The analogy between Schema type restriction and OOP inheritance may be less obvious.

• It is based on the insight that when a new class is derived, the new constructors and methods generally introduce new sets of constraints or restrictions (“invariants”) on members already in the base class.

Consider a class Square, which may be derived from a base class Rectangle. The derived class imposes the new invariant width=height.

So OOP inheritance includes aspects of both extension and restriction.

142142

Attributes and Complex Type Extension Recall typical syntax for extension is like:

<complexType> <complexContent> <extension base="base-type"> extra element content extra attributes </extension> </complexContent> </complexType>

The extra attributes are generally just added to the set of attributes of base-type.

Some attributes in extra attributes may have the same name (and namespace) as attributes in base-type; any such attribute must also have identical type to its namesake in base-type.• But the new version could have a different default value, say.

If extra attributes includes an attribute wildcard, it must represent a superset of any attribute wildcard in base-type.

143143

Attributes and Complex Type Restriction If an attribute appearing in a restriction of a complex type is also

an explicitly declared attribute of the base-type, then:• The simple type of the attribute in the new type must be identical to the

attribute’s type in the base-type, or derived from it by steps of restriction.• If the attribute is fixed in the base-type, it must be fixed with the same

value in the new type.• If the attribute is required in the base-type, so must it be in the new type.

Otherwise, there must be a wild-card in the base-type that matches the attribute declared in the new type. Note:• If an attribute is required in the base-type, it must be an explicitly declared

attribute of the new type.• If an attribute was optional in the base-type, it may be specified in the new

type with use="prohibited". This is the same as omitting the attribute in the new type (and the attribute might still be allowed by a wildcard!)

If there is an attribute wild-card in the restricted type, it must be a subset of a wild-card in base-type.

144144

Content Models and Extension Consider an extension of a complex type with complex

content that adds non-empty extra element content. The extra element content must be a particle, and the

element content of the new type is <xsd:sequence> base-type element content extra element content </xsd:sequence>

(unless the base-type content model was empty, when it is just the extra element content). Notes:• This would be illegal if the base-type element content was an

<xsd:all/> particle. You can’t extend such content.• If the base-type element content is an <xsd:choice/>, there is

no way to extend the set of choices: can only add extra particles in sequence.

145145

Content Models and Restriction The idea of restricting a content model is fairly intuitive, e.g.:

• Where there is an <xsd:choice/> of several particles, the restricted model may offer a reduced choice—perhaps it replaces the <xsd:choice/> with just one of the particles it contained.

• Where there is an optional particle (say minoccurs="0" and maxoccurs="1") the restricted model might make the particle mandatory (minoccurs="1") or, conversely, simply omit it.

More generally the restricted model may subset the minoccurs..maxoccurs range as it sees fit.

• Where there is an <xsd:any/> wildcard (or an element particle that heads a substitution group) the restricted model might replace it by a more specific element particle.

Although these ideas seem intuitive, it isn’t particularly easy to prove automatically that one content model is a valid restriction of another.

146146

Particle Derivation OK Defining the conditions under which one particle is a

legal restriction of another particle is one of the more complex parts of the (generally quite complex) XML Schema specification.

You will find the rules in the section of the specification called Constraints on Particle Schema Components.

The relevant subsections start with the rule called Particle Valid (Restriction). This gives some rules for reducing particles to a “canonical” form, then delegates to more specialized rules with names like Particle Derivation OK (X:Y – R), where X, Y, R depend on the case.

147147

Canonical Form Before comparing two particles to see if one is a valid

restriction of the other, both should be reduced to a certain canonical form:• Any occurrence of an element particle that is the head of a

substitution group is replaced by an explicit <xsd:choice/> between element particles for all members of the substitution group.

• Empty groups are discarded.• Redundant singleton <xsd:sequence/>, <xsd:choice/>,

<xsd:any/> particles are replaced by the single particle they contain.

• An “associative rule” is applied to eliminate <xsd:sequence/> particles nested inside other <xsd:sequence/> particles (subject to some conditions on minoccurs, maxoccurs). Likewise for <xsd:choice/>.

148148

Comparing <sequence/> with <sequence/>

There are many specific versions of the Particle Derivation OK rule—basically one for every kind of particle you might try to restrict to any other kind of particle.

We don’t attempt to mention all of them here—just a couple of interesting cases.

For example, consider the case where you are trying to restrict an <xsd:sequence/> particle in an existing content model to an <xsd:sequence/> particle in a new content model.

The exact rule that takes care of this case is called Particle Derivation OK (All:All, Sequence:Sequence—Recurse).

149149

All:All, Sequence:Sequence—Recurse The occurrence ranges (minoccurs, maxoccurs) of the

original and restricted <sequence/> must be consistent with restriction.

Less trivially, there exists an order-preserving mapping from the particles in the restricted <sequence/> to particles of the original <sequence/>, such that:• Each particle in the restricted <sequence/> is a valid

restriction of its image particle (under the map). Here we recursively apply the definition of the Particle

Derivation OK, hence the Recurse in the title.• Any particle of the original <sequence/> that is not in the

range of the map is emptiable—i.e. can match empty content. It happens that the same rule is used for <all/> groups,

hence the All:All in the title.

150150

Schematic ExampleOriginal:

<xsd:sequence> <xsd:element ref="title"/> <xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="figure" minOccurs="0" maxOccurs="unbounded"/></xsd:sequence>

Restricted:

<xsd:sequence> <xsd:element ref="title"/> <xsd:element ref="captionFigure" minOccurs="0" maxOccurs="unbounded"/></xsd:sequence>

Arrows illustrate an order-preserving map with required properties:• title particle is (trivially) a valid restriction of title particle, and

captionFigure is a valid restriction of figure.• The original paragraph particle is not in the range of the map, but is

emptiable (because minOccurs is 0).

151151

Determinism? The requirement in the Sequence:Sequence—Recurse

rule that “there exists” a suitable map looks rather cavalier: how are we to actually discover whether this map exists?• In other words, the rule doesn’t seem to give a deterministic

prescription for checking whether one model is a restriction of the other.

152152

A Prescription A “greedy” prescription that will sometimes find a

suitable, order-preserving map is this:• Visit the particles of the restricted model in turn, trying to

find a match for each. At any time we have a “next candidate” particle from the original model, for possible matching (initially the first particle of the original model).

• If the current particle in the restricted model is a valid restriction of the “next candidate”, take the candidate as the mapping of the current particle and carry on to the next particles in both models.

• Otherwise, if the current particle is not a valid restriction of the candidate, but the candidate is emptiable, try again with the immediately following particle in the original model as “next candidate”.

• Otherwise, this prescription fails to find a map.

153153

A Case Where that Prescription FailsOriginal:

<xsd:sequence> <xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="figure"/> minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="paragraph"/></xsd:sequence>

Restricted:

<xsd:sequence> <xsd:element ref="paragraph"/></xsd:sequence>

The “greedy” prescription will try to match the paragraph particle in the restricted model to the first paragraph particle of the original model. But the resulting map is unsatisfactory, because then the final paragraph particle of the original model is not in the range of the map, nor is it emptiable.

Meanwhile, in fact, “there exists” a suitable map: just map the paragraph particle of the restricted model to the final particle of the original.

154154

Unique Particle Attribution to the Rescue!?

But, the “Original” model on the previous slide is an illegal content model according to the Unique Particle Attribution rule!• Recall this is the XML Schema analogue of a rule about

DTDs, which says content models must be “deterministic”. While the XML Schema specification doesn’t spell this

out, it seems semi-plausible that, if content models satisfy the Unique Particle Attribution rule, then a simple greedy prescription will find the order-preserving mapping required by Particle Derivation OK, if such a mapping exists.• This makes checking Particle Valid (Restriction) tractable.

155155

Clause 1.5 Finally, we note that there is a slightly mysterious

clause in the section of the Schema specification called Schema Component Constraint: Derivation Valid (Extension), which is supposed to ensure that, in a chain of derivation, nothing removed by a restriction may be added back by a subsequent extension.• We omit the details here! The rule isn’t very clearly stated in

the specification (IMHO).

156156

Conclusion In this section we have just briefly touched on the issues

of what constitutes a valid extension or restriction of a content model.

The general rules are complicated. If you intend to use these capabilities of XML Schema in non-trivial ways, expect surprises!

1 e-Science e-Business e-Government and their Technologies XML Schema Bryan Carpenter, Geoffrey Fox,...

Documents

Transcript of 1 e-Science e-Business e-Government and their Technologies XML Schema Bryan Carpenter, Geoffrey Fox,...