1 e-Science e-Business e-Government and their Technologies XML Schema Bryan Carpenter, Geoffrey Fox,...
-
Upload
august-bradley -
Category
Documents
-
view
217 -
download
0
Transcript of 1 e-Science e-Business e-Government and their Technologies XML Schema Bryan Carpenter, Geoffrey Fox,...
11
e-Science e-Business e-Government and their
TechnologiesXML Schema
Bryan Carpenter, Geoffrey Fox, Marlon PiercePervasive Technology Laboratories
Indiana University Bloomington IN 47404January 12 2004
[email protected]@indiana.edu
[email protected] http://www.grid2004.org/spring2004
22
Introduction We saw that DTDs provide an approach to validating
XML documents: ensuring they have the structure expected for a particular application.
With the increasing use of XML for data-centric applications—e.g. XML formats for messages exchanged by Web Services—limitations of DTDs (which were inherited from SGML) soon became apparent.
XML Schema is a more recent validation framework for XML, which attempts to address the shortcomings of DTDs for data-centric applications, for example by providing a much richer set of data types.
33
Problems with DTDs DTDs have some clear limitations:
• Restricted set of data types: attribute data is either general character data, name tokens, ID or IDREF (or arcane cases); element content is either general character data or nested elements or some mixture.
For data-centric applications, we might want a value to be a well-formed number, date, etc, etc.
• DTDs are not convenient for dealing with XML Namespaces—essential for modularity on the Web.
• The uniqueness and consistency requirements associated with ID, IDREF are powerful, but could be much more refined.
• There are various obscure constraints on element content specifications, needed purely for historical SGML compatibility.
44
XML Schema XML Schema address all the issues mentioned on the
previous slide.• Also have the interesting property that an XML Schema is
itself a well-formed XML document—some people consider this a significant advantage.
This is the good news. The less good news is that the XML Schema 1.0 specification is longer by almost an order of magnitude than the basic XML specification—DTDs and all.
55
General ComparisonDTDs XML Schema
A DTD defines all elements, etc, in one type of document.
A schema defines all elements, etc, in a single namespace.
For documents with multiple namespaces, somehow patch together one large DTD.
For documents with multiple namespaces, use multiple schemas.
Directly define structures of named elements.
Define structures of complex types of element; then declare named element of that type.
Limited built-in data types for attributes.
Extensive built-in simple types for attributes and element content
Complex entity substitution mechanism.
No entity substitution mechanism.
66
Reading Material The XML Schema Specification itself comes in parts 0,
1, and 2. Parts 1 and 2 are long and tough to read, but part 0 is a reasonable (“non-normative”) introduction: XML Schema Part 0: Primer, May 2001. http://www.w3.org/TR/xmlschema-0/
There are some good and bad books. A good one is: Definitive XML Schema, Priscilla Walmsley, Prentice Hall, 2002.
There is a comprehensive (but again rather long) tutorial introduction to XML Schema by Roger Costello at: http://www.xfront.com/
77
“Report” Format Revisited When discussing DTDs we described a simple “report” format.
Here is a slightly expanded version of the DTD given there:
<!DOCTYPE report [
<!ELEMENT report
(title, (paragraph | figure)*, bibliography?) >
<!ELEMENT title (#PCDATA)>
<!ELEMENT paragraph (#PCDATA)>
<!ELEMENT figure EMPTY>
<!ATTLIST figure source CDATA #REQUIRED >
<!ELEMENT bibliography (reference)* > … ] >
We begin our detailed discussion of schema by considering how to give an equivalent XML Schema for this document.
88
Declaring a paragraph Element The report schema is surprisingly long: we will build
up to it in several incremental steps. First consider the paragraph element.
Using DTD, we declared this element by: <!ELEMENT paragraph (#PCDATA)>
An equivalent declaration in XML schema might be:
<xsd:element name="paragraph" type="xsd:string"/>
• xsd:element is itself an element in the XML Schema namespace; this example assumes we use xsd as the prefix for that namespace.
• xsd:type is a predefined type in that namespace.
99
xsd:string Primitive Type XML Schema has a complex system of types.
Different types may describe:1. the allowed values of attributes,
2. the allowed content of elements, or
3. the allowed content and the allowed attributes of elements. There is a subset of types, called the simple types, that
can be used in either of the first two roles. One of the simplest of all is string. Used as an
attribute type, this is equivalent to the DTD type CDATA; used as an element type, this is equivalent to the DTD content specification (PCDATA).
1010
Declaring a report Element We initially simplify to a schema in which a report
consists only of a series of paragraphs. In DTD a possible declaration of the root element would be: <!ELEMENT report (paragraph)*>
An equivalent declaration in XML schema might be: <xsd:element name=“report"> <xsd:complexType> <xsd:sequence> <xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence> </xsd:complexType> </xsd:element>
1111
Elements with Complex Type This rather verbose declaration says:
• The element named report has complex type.
• The content associated with this complex type is a sequence of elements.
• This sequence consists of at least 0 and at most an unbounded number of occurrences of paragraph elements.
Here the xsd:element element has different roles:• Outermost xsd:element declares the element named report.• Innermost xsd:element uses the element named paragraph,
declared elsewhere.
The role is determined by the presence or absence of the ref attribute.
1212
Local Declarations In fact xsd:element can have in a third role, which is
considered to be a combined declaration and use, e.g.:
<xsd:element name= "report"> <xsd:complexType> <xsd:sequence> <xsd:element name="paragraph" type="xsd:string“ minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence> </xsd:complexType> </xsd:element>
• Here the report element has its own local declaration of paragraph; no separate global declaration is necessary.
1313
Global vs Local Element Declarations Declarations that occur as children of the top-level
schema element are global declarations.• These are the only declarations that can actually be “used”
from elsewhere.
“Local declarations”—like the one illustrated on the previous slide—are “used” exactly once at their point of declaration.• This is different from the concept of local declarations in
most programming languages.
• Local element declarations interact with namespaces in a non-obvious way: perhaps best avoided until you are sure you know what you are doing.
1414
Global vs Local Type Definitions The type of the report element was specified by an
xsd:complexType element nested within the element declaration.
The type of the paragraph element was specified by a type attribute on the declaration, referencing a named type.
In fact types, like elements, can always be defined locally where they are used, or defined globally, then referenced from a point of use.
The following slide illustrates yet another way to declare report.
1515
Named Type Definitions In this version we introduce a named complex type
called reportType, then declare the report element with this type:
<xsd:complexType name="reportType"> <xsd:sequence> <xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType>
<xsd:element name="report" type="reportType"/>
This abstraction facility—introducing new named types—is a central theme of XML Schema.
1616
A Complete XML Schema<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.grid2004.org/ns/report1" xmlns="http://www.grid2004.org/ns/report1">
<xsd:element name="report"> <xsd:complexType> <xsd:sequence>
<xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element>
<xsd:element name="paragraph" type="xsd:string"/>
</xsd:schema>
1717
Remarks Recall this schema is essentially equivalent to the DTD:
<!DOCTYPE report [
<!ELEMENT report (paragraph)* >
<!ELEMENT paragraph (#PCDATA)> ] >
Clearly the schema has more baggage (or more added value, according to your point of view!)
Our schema declares two element names, report and paragraph, and puts them in a namespace called http://www.grid2004.org/ns/report1.
1818
Namespace Considerations The root element of any schema is a schema element
from the http://www.w3.org/2001/XMLSchema namespace.
The targetNamespace attribute on this element specifies which namespace the elements declared here “go into”.
We have seen the other namespace attributes before:• The xmlns:xsd attribute associates the prefix xsd with the
XML Schema namespace.• The xmlns attribute makes the default namespace
http://www.grid2004.org/ns/report1 for this document.• Often one uses xsd as the prefix for schema elements, and
makes the target namespace the default namespace of the schema document, but neither is essential.
1919
An XML Instance Document
<?xml version="1.0"?>
<report xmlns="http://www.grid2004.org/ns/report1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.grid2004.org/ns/report1 report1.xsd">
<paragraph>Recently uncovered documents prove... </paragraph>
<paragraph>The author is grateful to W3C for making this research possible.</paragraph>
</report>
2020
Namespace Considerations Assuming the document vocabulary belongs to a
namespace, we must declare this namespace.• In this example http://www.grid2004.org/ns/report1 is
declared as the default namespace. If the instance document is to be validated against a
schema, we must normally define where the schema for the namespace is located.
This is done here by putting an attribute schemaLocation on the root element of the document.
This attribute is itself defined in a standard namespace, called http://www.w3.org/2001/XMLSchema-instance. So we must introduce a prefix for this (xsi is traditional).
2121
schemaLocation The value of the schemaLocation attribute should be a
pair of IRIs: a namespace name and the corresponding Schema URI.• If the document uses more than one namespace, the value can
be several consecutive pairs.
• All tokens are separated by white space. In this example the schema should be in the file
report1.xsd in the same directory as the instance document.
2222
Schema Validation Using dom.Writer If I save the instance document in a file called
“xsdreport1.xml”, and the schema in a file called “report1.xsd”, I can validate the file with the Xerces parser by using the dom.Writer sample application as follows:
> java dom.Writer –v –s –f xsdreport1.xml
• If validation is successful, this simply prints a formatted version of the input file. If schema validation fails, you will see error messages early in the output.
• The –v –s flags are needed here. Without –s the parser will try to do just DTD validation. -f means “full” schema validation—presumably a good thing.
2323
Schema Validation from Java Unfortunately it doesn’t seem to be possible to enable
XML Schema validation in Xerces using the “vendor-neutral” JAXP API.• The DOM Level 3 API will enable this, but it is not finalized
or fully deployed at the time of this writing. For now you must directly use the “proprietary”
org.apache.xerces.parsers.DOMParser Xerces implementation class.
Use is sketched on the next slide.
2424
The Xerces DOMParser APIimport org.apache.xerces.parsers.DOMParser;import org.w3c.dom.*;…
static final String VALIDATION_FEATURE_ID = "http://xml.org/sax/features/validation" ;
static final String SCHEMA_VALIDATION_FEATURE_ID = "http://apache.org/xml/features/validation/schema" ;
static final String SCHEMA_FULL_CHECKING_FEATURE_ID = "http://apache.org/xml/features/validation/schema-full-checking" ; …
DOMParser parser = new DOMParser();
// Turn Schema Validation on parser.setFeature(VALIDATION_FEATURE_ID, true); parser.setFeature(SCHEMA_VALIDATION_FEATURE_ID, true); parser.setFeature(SCHEMA_FULL_CHECKING_FEATURE_ID, true);
parser.setErrorHandler(new MyErrorHandler()) ;
parser.parse(uri) ; // uri is XML instance file
Document document = parser.getDocument() ; …
2525
More on Complex Types If an element may have nested elements, or if it may
have attributes, it must be described by a complex type.• If neither of these conditions holds—the element has only
character data content and no attributes—it is usually more convenient to use a simple type.
Attributes on complex types are specified by an attribute element, e.g.:
<xsd:element name="figure">
<xsd:complexType>
<xsd:attribute name="source" type="xsd:string"/>
</xsd:complexType>
</xsd:element>
2626
Attribute Declarations Like element declarations, attributes may be declared
globally, then used inside a complex type declaration, through an xsd:attribute element with a ref attribute.• In contrast to the situation with elements, local declaration of
attributes is often a natural choice. The figure example above has a complex type with no
content. In general attribute specifications go after the content specification, in the body of the xsd:complexType element.
2727
Element Sequences and Choices To finish this introductory foray into XML Schema, we
restore our report element back to its original specification. The XML Schema declaration is given on the next slide.
Recall this is supposed to be equivalent to the DTD declaration:
<!ELEMENT report (title, (paragraph | figure)*, bibliography?) >
• The use of the xsd:sequence and xsd:choice elements should be reasonably self explanatory.
• Note how the minOccurs, maxOccurs attributes replace use of the *, ? operators: both have default values of 1.
2828
Original report Element Structure <xsd:element name="report">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="title"/>
<xsd:choice minOccurs="0" maxOccurs="unbounded">
<xsd:element ref="paragraph"/> <xsd:element ref="figure"/> </xsd:choice>
<xsd:element ref="bibliography" minOccurs="0"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
2929
Simple Types:Schema Datatypes
3030
XML Schema Simple Types Recall simple types can be used to describe the values of
attributes, or the content of elements that have no nested elements (“character data” content).
So far we only illustrated one simple type built in to XML Schema: namely string.• As an attribute type this is similar to the DTD attribute type
CDATA; as an element type, it is similar to the DTD content specification (PCDATA).
Most of the details of simple types are defined in the W3 recommendation XML Schema Part 2: Datatypes.
3131
Built In and User-Defined Types XML Schema provides over 40 built in simple types. It also provides flexible mechanisms for creating your
own simple types,• which may in fact impose rather complex patterns on text
content.
3232
Schema Built In Types
3333
Built In Simple TypesSimple Type Examples (comma separated)
string Confirm this is electric
normalizedString Confirm this is electric
token Confirm this is electric
base64Binary GpM7
hexBinary 0FB7
byte -1, 126
unsignedByte 0, 126
3434
Built In Simple Types (continued)
Simple Type Examples (comma separated)integer -126789, -1, 0, 1, 126789
positiveInteger 1, 126789
negativeInteger -126789, -1
nonNegativeInteger 0, 1, 126789
nonPositiveInteger -126789, -1, 0
int -1, 126789675
unsignedInt 0, 1267896754
long -1, 12678967543233
unsignedLong 0, 12678967543233
short -1, 12678
unsignedShort 0, 12678
3535
Built In Simple Types (continued)
Simple Type Examples (comma separated)
decimal -1.23, 0, 123.4, 1000.00
float -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN
double -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN
boolean true, false, 1, 0
3636
Built In Simple Types (continued)Simple Type Examples (comma separated)
time 13:20:00.000, 13:20:00.000-05:00
dateTime 1999-05-31T13:20:00.000-05:00
duration P1Y2M3DT10H30M12.3S
date 1999-05-31
gMonth --05--
gYear 1999
gYearMonth 1999-02
gDay ---31
gMonthDay --05-31
3737
Built In Simple Types (continued)
Simple Type Examples (comma separated)
Name shipTo
QName po:USAddress
NCName USAddress
anyURI http://www.example.com/, http://www.example.com/doc.html#ID5
language en-GB, en-US, fr
3838
Built In Simple Types (continued)
Simple Type Examples (comma separated)
ID
IDREF
IDREFS
ENTITY
ENTITIES
NOTATION
NMTOKEN US, Brésil
NMTOKENS US UK, Brésil Canada Mexique
3939
Creating New Simple Types There are three basic approaches to building new
simple types (deriving simple types):• Restricting facets of an existing simple type.
• Creating a list type from an existing simple type.
• Creating a union type from some existing simple types. The most sophisticated mechanism is the first—
restriction using facets.
4040
Facets The 19 primitive types (the built in types derived directly
from anySimpleType) have a set of constraining facets restricting allowed values.
The constraining facets of a simple type are a subset of:• length, minLength, maxLength, pattern, enumeration,
whiteSpace, maxInclusive, maxExclusive, minExclusive, minInclusive, totalDigits, fractionDigits
• Restricted types have all the facets of their base types—though values of the facets may be different.
• There is no way for schema writers to introduce new facets—users cannot directly restrict anySimpleType.
• Technically simple types have additional fundamental facets, but values of these flags cannot be set directly. They are: equal, ordered, bounded, cardinality, numeric
4141
Restriction Here is a characteristic example of restriction:
<xsd:simpleType name="singleDigit"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="-9"/> <xsd:maxInclusive value="9"/> </xsd:restriction> </xsd:simpleType>
This starts from the built in xsd:integer, and defines a derived type singleDigit by setting the facet minInclusive to -9 and the facet maxInclusive to 9.
Thus the type singleDigit represents a whole number between -9 and +9.
4242
Length The facets length, minLength, maxLength allow to constrain the
length of an item like a string (also allow to constrain the number of items in a list type, see later).
Values of length, minLength, minLength should be non-negative integers. Example:
<xsd:simpleType name="state">
<xsd:restriction base="xsd:string">
<xsd:length value="2"/>
</xsd:restriction>
</xsd:simpleType>
defines a type state representing strings containing exactly two characters.• These facets supported by all primitive types other than numeric and
date- and time-related types. Also supported by list types.
4343
Pattern Perhaps the most powerful facet is pattern, which
allows to specify a regular expression: any allowed value must satisfy the pattern of this expression.
Example
<xsd:simpleType name="weekday"> <xsd:restriction base="xsd:string"> <xsd:pattern value="(Mon|Tues|Wednes|Thurs|Fri)day"/> </xsd:restriction></xsd:simpleType>
defines a type weekday representing the names of the week days.
4444
Regular Expressions XML Schema has its own notation for regular
expressions, but very much based on the corresponding Perl notation.
For the most part Schema use a subset of the Perl 5 grammar for regular expressions.• Includes most of the purely “declarative” features from Perl
regular expressions, but omits many “procedural” features related to search, matching algorithm, substitution, etc.
XML Schema adds a few features of its own, e.g.:• Matching characters legal in XML names.
• Character class subtraction.
• Inherits general XML escape mechanisms for Unicode characters, replacing analogous Perl mechanisms.
4545
Metacharacters The following characters, called metacharacters, have
special roles in Schema regular expressions: . \ ? * + | { } ( ) [ ]• Like Perl, but treats }, ] uniformly as metacharacters, and
omits search-related metacharacters ^ and $. To match these characters literally in patterns, must
escape them with \, e.g.:• The pattern “2\+2” matches the string “2+2”.
• The pattern “f\(x\)” matches the string “f(x)”.
4646
Escape Sequences In general one should use XML character references to
include hard-to-type characters. But for convenience Schema regular expressions allow:• \n matches a newline character (same as 
)• \r matches a carriage return character (same as 
)• \t matches a tab character (same as 	)
All other escape sequences (except \- and \^, used only in character class expressions) match any single character out of some set of possible values.• For example \d matches any decimal digit, so the pattern
“Boeing \d\d\d” matches the strings “Boeing 747”, “Boeing 777”, etc.
4747
Multicharacter Escapes The simplest patterns matching classes of characters
are:• . matches any character except carriage return or newline.
• \d matches any decimal digit.
• \s matches any white space character.
• \i matches any character that can start an XML name.
• \c matches any character that can appear in an XML name.
• \w matches any “word” character (excludes punctuation, etc.)
The escapes \D, \S, \I, \C and \W are negative forms, e.g. \D matches any character except a decimal digit.• Similar to Perl, except: Perl doesn’t have \i, \I; Perl uses \c, \C
for other things; detailed definitions of \w, \W are different.
4848
Category Escapes A large and interesting family of escapes is based on the
Unicode standard. General form in Perl or Schema is \p{Name}
where Name is a Unicode-defined class name.• The negative form \P{Name} matches any character not in the
class. Simple examples include: \p{L} (any letter), \p{Lu}
(upper case letters), \p{Ll} (lower case letters), etc. More interesting cases are based on the Unicode block
names for alphabets, e.g.:• \p{IsBasicLatin}, \p{IsLatin-1Supplement}, \p{IsGreek}, \
p{IsArabic}, \p{IsDevanagari}, \p{IsHangulJamo}, \p{IsCJKUnifiedIdeographs}, etc, etc, etc.
4949
Character Class Expressions Allow you to define terms that match any character
from a custom set of characters. Basic syntax is familiar from Perl and UNIX: [List-of-characters]
or the negative form: [^List-of-characters]
Here List-of-characters can include individual characters, and also ranges of the form First-Last where First and Last are characters.
Examples:• [RGB] matches one of R, G, or B.• [0-9A-F] or [\dA-F] match one of 0, 1, …, 9, A, B,…, F.• [^\r\n] matches anything except CR, NL (same as . ).
5050
Class Subtractions A feature of XML Schema, not present in Perl 5. A
class character expression can take the form:
[List-of-characters-Class-char-expr]
or:
[^List-of-characters-Class-char-expr]
where Class-char-expr is another class character expression.
Example:• [a-zA-Z-[aeiouAEIOU]] matches any consonant in the Latin
alphabet.
5151
Sequences and Alternatives Finally, the universal core of regular expressions. If
Pattern1 and Pattern2 are regular expressions, then:• Pattern1Pattern2 matches any string made by putting a string
accepted by Pattern1 in front of a string accepted by Pattern2.• Pattern1|Pattern2 matches any string that would be accepted by
Pattern1, or any string accepted by Pattern2.
Parentheses just group things together:• (Pattern1) matches any string accepted by Pattern1.
An example given earlier:• (Mon|Tues|Wednes|Thurs|Fri)day matches any of the strings
Monday, Tuesday, Wednesday, Thursday, or Friday.• Equivalent to Monday|Tuesday|Wednesday|Thursday|Friday.
5252
Quantifiers … and if Pattern1 is a regular expression:
• Pattern1? matches the empty string or any string accepted by Pattern1.
• Pattern1+ matches any string accepted by Pattern1, or by Pattern1Pattern1, or by Pattern1Pattern1Pattern1, or …
• Pattern1* matches the empty string or any string accepted by Pattern1+.
If n, m are numbers, Perl and XML Schema also allow the shorthand forms:• Pattern1{n} is equivalent to Pattern1 repeated n times.• Pattern1{m,n} matches any string accepted by Pattern1
repeated m times or m + 1 times or … or n times.• Pattern1{m,} matches any string accepted by Pattern1 repeated
m or more times.
5353
Using Patterns in Restriction All simple types (including lists and enumerations) support the
pattern facet, e.g.:
<simpleType name=“multiplesOfFive">
<restriction base="xs:integer">
<pattern value=“[+-]?\d*[05]"/>
</restriction>
</simpleType>
defines a subtype of integer including all numbers ending with digits 0 or 5.
The pattern facet can appear more than once in a single restriction: interpretation is as if patterns were combined with |.• Conversely if the pattern facet is specified in restriction of a base type that
was itself defined using a pattern, allowed values must satisfy both patterns.
5454
Enumeration The enumeration facet allows one to select a finite
subset of allowed values from a base type, e.g.: <xsd:simpleType name="weekday"> <xsd:restriction base="xs:string"> <xsd:enumeration value="Monday"/> <xsd:enumeration value="Tuesday"/> <xsd:enumeration value="Wednesday"/> <xsd:enumeration value="Thursday"/> <xsd:enumeration value="Friday"/> </xsd:restriction> </xsd:simpleType>
• Behaves like a very restricted version of pattern?• All primitive types except boolean support the enumeration
facet. List and union types also support this facet.
5555
White Space This facet controls how white space in a value received from the
parser is processed, prior to Schema validation. It can take three values:• preserve: no white space processing, beyond what base XML does.
• replace: Convert every white space character (Line Feed, etc) to a space character (#x20).
• collapse: Like replace. All leading or trailing spaces are then removed. Also sequences of spaces are replaced by single spaces.
Note analogies to “Normalization of Attribute Values” in base XML.
All simple types except union types have this attribute, but usually you don’t explicitly set it in restriction: just inherit values from built in types.
All built in types have collapse, except string which has preserve and normalizedString which has replace.
5656
Other Facets The facets maxInclusive, maxExclusive, minExclusive,
minInclusive are supported only by numeric and date- and time-related types, and define bounds of value ranges.
The facets totalDigits, fractionDigits are defined for the primitive type decimal, and thus all numeric types derived from decimal, and for no other types.
5757
List Types We define a type representing a white-space-separated list of
items using the list element, e.g.:
<xsd:simpleType name="listOfDays"> <xsd:list itemType="weekday"> </xsd:simpleType>
this introduces a type that takes values like “”, “Monday”, “Monday Monday”, “Tuesday Wednesday Thursday”, etc.
List types can be restricted using the length-related facets, and the pattern, enumeration and whitespace facets.
The <list> element may contain an anonymous <simpleType> element instead of having an itemType attribute.
A list value is split according to its white space content prior to validation of the items in the list.
5858
Union Types A union type takes values from any one of a set of base types.
<xsd:simpleType name="maxOccursType"> <xsd:union memberTypes="xsd:integer">
<xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="unbounded"/> </xsd:restriction> </xsd:simpleType>
</xsd:union> </xsd:simpleType>
The types in the union are specified in the list-valued attribute memberTypes, or by nested anonymous <simpleType> elements, or by a combination of the two, as above.
Union types can be restricted using the pattern or enumeration facets.
5959
Prohibiting Derivation You may, for some reason, have a simple type that you
don’t want anybody to derive further types from. Do this by specifying the final attribute on the
<simpleType> element. Its value is a list containing a subset of values from list, union, restriction, extension.• These specify which sorts of derivation are disallowed. Note
extension is a way of deriving a complex type from a simple type. It will be discussed in the next section.
Give the final attribute the value “#all” for blanket prohibition of any derivation from this simple type.• Can also prevent the value of individual facets from being
changed in subsequent restrictions by specifying fixed="true" on the facet elements.
6060
Complex Types
6161
Element Content, and Attributes Simple types allow us to declare elements that have only
parsed character content (no nested elements). E.g. the declaration: <xsd:element name="dayItHappened" type="weekday"/>
might validate instance elements like: <dayItHappened> Monday </dayItHappened>
<dayItHappened>Tuesday</dayItHappened>
But if we need elements with element content, or elements with attributes, we must declare those elements to have complex type.
6262
Complex Type Hierarchy We saw that a set of built in simple types were derived
from xsd:anySimpleType, and that new simple types could be derived from a base type by restriction, list, or union.
There are no built in complex types, other than the so-called ur-type, represented as xsd:anyType.
All other complex types are derived by one or more steps of restriction and extension from xsd:anyType.• Complex types can also be created by extension of a simple
type, but simple types are also notionally restrictions of xsd:anyType.
6363
Restriction A restriction of a base type is a new type. All allowed instances of the new type are also instances
of the base type. But the restricted type doesn’t allow all possibilities allowed by the base type.• Think of the example of restricting xsd:string to 4 characters
using the length facet. Strings of length 4 are also allowed by the xsd:string, but the new type is more restrictive.
• In the complex case, we might have a complex base type that allows attribute att optionally. A restricted type might not allow att at all.
Another restriction of the same base might require att.• Or we might have a base type that allows 0 or more nested
elm elements. The restricted type might require exactly 1 nested elm element.
6464
Extension An extension of a base type is a new type. An extension allows extra attributes or extra content
that are not allowed in instances of the base type.• At first brush this sounds like the opposite of restriction, but
this isn’t strictly true.
• If, for example, type E extends a type B by adding a required attribute att, then instances of B are not allowed instances of E (because they don’t have the required attribute). So we have that E is an extension of B, but there is no sense in which B could be a restriction of E.
Some such inverse relation exists if all extra attributes and content are optional in the extended type, but this isn’t a required feature of extension.
6565
Complex Content and Simple Content We have seen that XML Schema complex types define
both some allowed nested elements, and some allowed attribute specifications. Complex types that allow nested elements are said to have complex content.
But Schema distinguish as a special case complex types having simple content—elements with such types may have attributes, but they cannot have nested elements.
This is presumably a useful distinction, but it does introduce one more layer of complexity into the syntax for complex type derivation.
6666
Basic Forms of Complex Type Definition
Restriction Extension
<complexType> <complexContent> <restriction base="type"> allowed element content allowed attributes </restriction> </complexContent> </complexType>
<complexType> <complexContent> <extension base="type"> extra element content extra attributes </extension> </complexContent> </complexType>
<complexType> <simpleContent> <restriction base="type"> facet restrictions allowed attributes </restriction> </simpleContent> </complexType>
<complexType> <simpleContent> <extension base="type"> extra attributes </extension> </simpleContent> </complexType>
6767
Remarks When one restricts a type one generally must specify all
allowed element content and attributes. When one extends a type one generally must specify
just the extra element content and attributes.
6868
Requirements on Base Type The base type must be a complex type in all cases except
simpleContent/extension (lower right in table), in which case the base can be a simple type.
If the derived type has complexContent, the base type must have complex content.• True for extension or restriction.
• Under some conditions, using a special form described later, a base type with complex content can be restricted to a type with simple content.
6969
Schematic Inheritance Diagramxsd:anyType
Simple TypesSimpleContent
ComplexContentextension
restrictionrestriction†
restriction
Complex Types
restrictionlist
union
restrictionextension
restrictionextension
restriction†
† see later for syntax
7070
Defining a Complex Type with no Base? In the introductory lecture we seemed to avoid this complexity:
didn’t we just define complex types out of “thin air”? Actually the XML Schema specification says that:
<complexType> <complexContent> <restriction base="xsd:anyType"> allowed element content allowed attributes </restriction> </complexContent></complexType>
<complexType> allowed element content allowed attributes</complexType>
So in reality we were directly restricting the ur-type, which allows any attributes and any content!
Is “shorthand” for
7171
Defining Element Content Where we wrote allowed element content or extra
element content in the syntax for complex type definitions, what should appear is a model group.
A model group is exactly one of:• an <xsd:sequence/> element, or
• an <xsd:choice/> element, or
• an <xsd:all/> element.
(The element content appearing in the type definition may also be a globally defined model group, referenced through an <xsd:group/> element. The global definition—a named <xsd:group/> element—just contains one of the three elements above.)
7272
Sequence A <xsd:sequence/> model group contains a series of
particles. A particle is an <xsd:element/> element, another model group, or a wildcard.
As expected, this model just says the element content represented by those items should appear in sequence.
E.g.<xsd:sequence> <xsd:element ref="title"/> <xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/></xsd:sequence>
says that exactly one occurrence of a title element is followed by any number of occurrences of paragraph elements.
7373
Choice A <xsd:choice/> model group also contains a series of
particles, with the same options as for sequence. The element information validated by this model
should match exactly one of the particles in the choice. E.g. <xsd:choice minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="paragraph"> <xsd:sequence> <xsd:element ref="figure"/> <xsd:element ref="caption"/> <xsd:sequence> </xsd:choice>
matches a sequence of paragraph elements interleaved with consecutive pairs of figure and caption elements.
7474
All The <xsd:all/> model group is peculiar to XML
Schema. All particles it contains must be <xsd:element/>s.
The element information validated should match a sequence of the particles in any order.
There are several constraints:• The maxOccurs attribute of each particle must be 1.• The minOccurs attribute of each particle must be 0 or 1.• The <xsd:all/> model group can only occur at the top level of
a complex type’s content model, and must itself have minOccurs = maxOccurs = 1.
In view of the fact minOccurs of a particle can be 0, subset might be a better name than all??
7575
Element Wildcard The element wildcard particle <xsd:any/> matches and
validates any element in the instance document.• Though one can restrict the namespace of the matched
element, as described below. E.g.
<xsd:sequence minOccurs="0" maxOccurs="unbounded"> <xsd:element ref=“header"/> <xsd:any/> </xsd:sequence>
matches a sequence of consecutive pairs of elements, where the first element in each pair is a header, and the second can be any kind of element.
7676
Options on <xsd:any/> The <xsd:any/> element takes the usual optional
maxOccurs, minOccurs attributes. Allows a namespace attribute taking one of the values:
• ##any (the default),• ##other (any namespace except the target namespace),• List of namespace names, optionally including either
##targetNamespace or ##local.
Controls what elements the wildcard matches, according to namespace.
It also allows a processContents attribute taking one of the values strict, skip, lax (default strict), controlling the extent to which the contents of the matched element are validated.
7777
Parsing and Determinism Recall the rule about determinism of content models in
DTDs. We claimed XML retained this purely for compatibility with SGML.
Perhaps surprisingly, XML Schema retains exactly the same rule, calling it the Unique Particle Attribution constraint.
It has to be imposed slightly more carefully here because of the possibility of wild card particles and substitution groups (discussed later).• Unclear why it was retained. Perhaps to improve the
efficiency of parsing, especially in the presence of substitution groups? Or to simplify the Particle Derivation OK constraints for restriction of complex types (see later)?
7878
Mixed Content XML Schema score a big win over DTDs in the way
mixed content is handled. One simply specifies the attribute mixed on the
complexContent element, giving it the value true.• In the abbreviated form for restriction of the ur-type, the
mixed attribute appears on the complexType element. This specifies that the element content defined by the
model particles can be interleaved with character data (without limiting how the elements themselves are arranged).
7979
Mixed Content Example This element declaration
<xsd:element name="body">
<xsd:complexType mixed="true">
<xsd:choice minOccurs="0" maxOccurs="unbounded"> <element ref="p"/> <element ref="a"/> </xsd:choice>
</xsd:complexType>
</xsd:element>
allows the body element to contain <p/> and <a/> elements, with text interleaved anyhow between them.
8080
mixed and Inheritance So an <xsd:complexContent/> with mixed="true"
indicates a mixed complex type. And an <xsd:complexContent/> with mixed="false" (the default) indicates an element-only complex type.
A mixed complex content type may be restricted to an element-only type (if the element content allows it).
Perhaps surprisingly, an element-only complex content type may not be extended to a mixed type.
8181
Restricting Mixed Content to Simple Content
If the model group of a mixed complex type can match the empty sequence of elements, then the type may have content that is text-only.
Then it is logically possible to restrict the type to one with simple content. There is a special syntax for this: <complexType> <simpleContent> <restriction base="mixed-complex-content-type"> <simpleType> usual content of simpleType element </simpleType> allowed attributes </restriction> </simpleContent> </complexType>
8282
Complex Types
Expanded Complex Type Inheritancexsd:anyType
SimpleContent
Complex Content
restriction
restriction
restrictionextension
restriction
extension
restriction
Mixed Element-onlyrestrictio
n
restriction
restriction
extension
8383
Empty Elements XML Schema doesn’t have any unique way of
representing elements that must be empty. The simplest thing to do this is simply omit the allowed
element content in a complex content restriction. Can such an element also be mixed (i.e. have pure text
content)?• Logically it seems this should be possible (I believe it is
allowed by Xerces).
• But it seems to be forbidden by the XML Schema specification, which singles out this case and says such an element is strictly empty.
8484
Attributes and Local Declarations
8585
Defining Allowed Attributes Where we wrote allowed attributes or extra attributes in
the syntax for complex type definitions, what should appear is sequence of attribute declarations in the form of <xsd:attribute/> elements.• These may be followed an optional attribute wildcard.
(The attribute declaration list may also include globally defined attribute groups, referenced through <xsd:attributeGroup/> elements. These will be discussed later.)
8686
Simple Attribute Declarations A straightforward example of an attribute declaration
was given in the introductory lecture:
<xsd:element name="figure">
<xsd:complexType>
<xsd:attribute name="source" type="xsd:string"/>
</xsd:complexType>
</xsd:element> In general the value of the type attribute can be any
simple type.• Though unusual, it is also allowed to include an anonymous
<xsd:simpleType/> definition in the body of the <xsd:attribute/>, instead of specifying the type attribute.
8787
Default Rules As with DTDs, one can specify whether the use of an
attribute is optional (the default) or required. One can also specify a default value (if the attribute is
optional). Alternatively one can specify a fixed value for the
attribute (whether the attribute is optional or required).• default and fixed are mutually exclusive.
8888
DTD Attribute Defaults Revisited Attribute list declaration:
<!ATTLIST a val CDATA "nothing" fix CDATA #FIXED "constant" req CDATA #REQUIRED opt CDATA #IMPLIED>
Instances of element a:<a val="something" fix="constant“ req="reading" opt="extra"/>
<a req="no experience"/> <!-- OK: val = “nothing”, fix = “constant”, opt absent. -->
<a fix="variable"/> <!-- Invalid! fix not “constant” and req unspecified. -->
8989
Schema Attribute Occurrence Equivalent Schema declaration:
<xsd:attribute name="val" type="xsd:string" use="optional" default="nothing"/>
<xsd:attribute name="fix" type="xsd:string" fixed="constant"/>
<xsd:attribute name="req" type="xsd:string" use="required"/>
<xsd:attribute name="val" type="xsd:string“/>
• Note fix and val implicitly have use="optional" (we could have omitted this specification for val too).
• Unlike DTDs, it possible to have an attribute that is both fixed and required.
9090
Complex Content Plus Attributes Putting things together, here is a declaration of a body
element that allows mixed content plus a style attribute.
<xsd:element name="body">
<xsd:complexType mixed="true">
<xsd:choice minOccurs="0" maxOccurs="unbounded"> <element ref="p"/> <element ref="a"/> </xsd:choice>
<xsd:attribute name="style" type="xsd:string"/>
</xsd:complexType>
</xsd:element>
9191
Simple Content plus Attributes Here is a declaration of an anchor element that allows
simple content plus an href attribute.
<xsd:element name="anchor">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:string”>
<xsd:attribute name="href" type="xsd:anyURI"/>
<xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
9292
Attribute Wildcards An attribute wildcard is represented by an
<xsd:anyAttribute/> element. There can be at most one such element in a complex
type definition, and it must appear after any normal attribute declarations.
Such an declaration allows any attribute, optionally limited by namespace.
The namespace and processContents attributes on <xsd:anyAttribute/> work as for <xsd:any/>.
9393
Attributes and Namespaces By default, attributes declared as we have illustrated
(inside an <xsd:complexType/>) do not become part of the target namespace.• Instead these attributes are local properties of any element
they are attached to. The element itself may or may not belong to a namespace.
In instance documents, names of these attributes must not be prefixed with a namespace prefix.
9494
Creating Attributes in a Namespace There are three ways to put attributes into the target
namespace:• Declare them “globally”, directly inside the top level
<xsd:schema/> element. Reference the attribute declaration inside the complex type definition (like element references), or
• specify the attribute form="qualified" on a local <xsd:attribute/> declaration, or
• specify the attribute attributeFormDefault="qualified" on the <xsd:schema/> element.
After this, these attributes must be prefixed in instance documents with a namespace prefix.• Recall default namespace declarations (xmlns="namespace")
don’t work for attributes: you must introduce a non-empty prefix.
9595
Locally Declared Elements XML Schema goes to some lengths to maintain symmetry
between elements and attributes. Because the most natural way of declaring attributes is locally—
private to a complex type—it must therefore be possible to declare elements local to the complex type.• Even if this is less obviously natural for elements—it leads to some clumsy
constraints, e.g.: two local element declaration particles with the same name in the model group of the same complex type must have the same type.
The same rules apply: if an element is declared locally (inside an <xsd:complexType/>), by default it does not belong to a namespace.
In this case its name must not be prefixed with a namespace prefix in instance documents.
9696
Creating Elements in a Namespace There are three ways to put elements into the target
namespace:• Declare them “globally”, directly inside the top level
<xsd:schema/> element. Reference the element declaration inside the complex type definition, or
• specify the attribute form="qualified" on a local <xsd:element/> declaration, or
• specify the attribute elementFormDefault="qualified" on the <xsd:schema/> element.
After this, these elements must be prefixed in instance documents with a namespace prefix (or there must be a default namespace declaration in effect).
9797
elementFormDefault and attributeFormDefault Summary:
• These attributes on the <xsd:schema/> element take the values “qualified” or “unqualified”
• The defaults for both are “unqualified”.• They control whether or not elements and attributes declared
locally in <xsd:complexType/> definitions belong to the target namespace.
• This property can also be controlled by form attributes on the individual declarations.
None of these attributes has any effect on elements or attributes declared globally (at the top level in the <xsd:schema/> element)! Effectively such declarations are all qualified.
9898
Inheritance and Substitution
9999
Polymorphism? We have presented the mechanisms by which new types
can be derived from old types (albeit we have omitted some details for complex types).
Through these mechanisms, inheritance provides useful ways to recycle existing definitions.
But it doesn’t in itself provide all the benefits of OOP—in particular we have not presented any analogue of polymorphism.
Schema tries to provide some of the OO flexibility in use of instances through type substitution and substitution groups.
100100
Type Substitution The most basic mechanism for “polymorphism” is type
substitution. In essence this says that if a particle (in a content
model, say) is declared to be an element with a particular type, then the corresponding element item in the instance document may have type derived from the particle type.
Actually this only introduces new possibilities if the derivation involves extension.
101101
A Basis for Extension Suppose we have the complex type declaration:
<xsd:complexType name="figureType"> <xsd:attribute name="source" type="xsd:anyURI"/> </xsd:complexType>
and suppose this is used as follows: <xsd:element name="figure" type="figureType“/>
<xsd:element name="report"> <xsd:complexType> <xsd:choice minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="paragraph"> <xsd:element ref="figure"/> </xsd:choice> </xsd:complexType> </xsd:element>
i.e. a report is a sequence of interleaved paragraph and figure elements, and a figure just has an attribute referencing a source image file.
102102
Extension Example Now suppose that, without modifying any existing
definitions and declarations, we want to allow figures in reports to have captions. We can do this if we introduce the extended type:
<xsd:complexType name="captionFigureType"> <xsd:complexContent> <xsd:extension base="figureType"> <xsd:element name="caption" type="xsd:string"/> </xsd:extension> </xsd:complexContent> </xsd:complexType>
• This complex type inherits the attribute source from its base type, and adds a nested caption element.
103103
Example Instance Document
<report xmlns="http://www.grid2004.org/ns/report4"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.grid2004.org/ns/report4 report4.xsd">
<paragraph>Recently uncovered documents prove... </paragraph>
<figure xsi:type="captionFigureType" source="notafake.jpg">
<caption>Irrefutable proof of ancient XML.</caption>
</figure>
</report>
104104
xsi:type As illustrated above, the element information item may
have any type derived by extension from the type the element was declared with.• In general it may be derived by a mixture of extension and
restriction. This isn’t quite a free lunch, though. There is no way
for an XML processor to automatically infer the type of an element instance; instead this approach requires the XML author explicitly specify the intended type using the xsi:type attribute.• This limits the attractiveness of this approach to
“polymorphism”.
105105
Substitution Groups A more author-friendly approach to document
polymorphism is based on element declarations. This approach uses so-called substitution groups.
• Each substitution group is a set of element declarations.
• One of these is singled out as the head declaration. Where a content model includes a reference to the head
as a particle, the instance document can have any member of the associated substitution group.
106106
Substitution Group Example Suppose the earlier definitions of figureType, <figure/>,
<report/>, and captionFigureType are in effect. Now suppose we declare a new element
<captionFigure/>, having type captionFigureType, and belonging to a substitution group headed by <figure/>.
Then a possible instance document would be:
<report … > <paragraph>Recently uncovered documents prove... </paragraph>
<captionFigure source="notafake.jpg"> <caption>Irrefutable proof of ancient XML.</caption> </captionFigure> </report>
107107
Remarks Important things to note:
• Again we haven’t modified the original declaration of the report element, which still says it contains figure elements.
• Because captionFigure is in the substitution group of figure, automatically it is allowed to appear in place of figure in the instance.
• We no longer need the clumsy xsi:type attribute; the actual type of the information element can now be easily inferred from the element name (through its declaration, described shortly).
108108
Creating Substitution Groups Groups are implicit: the implementation is more like a
new kind of inheritance hierarchy—one relating element declarations rather than type definitions.
A new element declaration specifies at most one direct substitution group affiliation. This is another element declaration. The “affiliation” now heads a group containing the new declaration. • In practice an affiliation works almost exactly like a base
type, except it involves element declarations, not types.
• If the affiliation itself belongs to a different group, the new declaration automatically joins that group—generally an element can be in several (perfectly nested) groups.
109109
Group Creation Example In our example we could declare captionFigure, as
follows:
<xsd:element name="captionFigure" type="captionFigureType" substitutionGroup="figure" />
• This says <figure/> is the substitution group affiliation of <captionFigure/>.
• Or in other words <captionFigure/> is in the substitution group headed by <figure/>.
• The type attribute here may be omitted: the type defaults to that of the substitution group affiliation (again emphasizing the analogy with inheritance).
110110
Notional Substitution Group Hierarchy This way of looking at things isn’t part of the XML
Schema specification, but it may be mnemonic:
<xsd:any/>
<captionFigure/>
<figure/>
<report/>
111111
Substitution and Type Inheritance It is required that all elements in a substitution group
headed by element <Name/> have either the same type as <Name/>, or a type derived from it by steps of extension and restriction.
Note that substitution may be used without type inheritance.• In other words, all elements in the substitution group may
have the same type as their head.
• Consider the example of internationalization: you might want many interchangeable elements with identical structure but different names (for different languages).
112112
Blocking Substitutions We have described two kinds of substitution involving
an element: the structure of an element can be substituted using xsi:type, or the whole element can be substituted by a member of its substitution group.
It is quite likely that a schema writer will want to block some such substitutions.• Many applications will require elements to have exactly the
originally specified form.
• We need a way to prevent this form being corrupted by (say) unexpected addition of an element to a substitution group.
113113
block Attribute of <xsd:element/> The value of the block attribute on <xsd:element/>
should be a list containing a possibly empty subset of the values extension, restriction, and substitution (or simply #all).
It defines the disallowed substitutions for this element.• If a particle in a content model has substitution in its
disallowed substitutions, the document instance may not replace the element by members of its substitution group.
• If an element has extension in its disallowed substitutions, then neither xsi:type or a substitution group substitution allows the instance to validate against a type whose derivation from the particle type involves steps of extension.
• Appearance of restriction in the disallowed substitutions has an analogous effect.
114114
block Attribute of <xsd:complexType/> A block attribute may also be specified on the
<xsd:complexType/> element. Its value is a list containing a subset of the values extension and restriction (or simply #all).
It defines the prohibited substitutions for this type.• If the type of an element has extension in its prohibited substitutions, then
neither xsi:type or a substitution group substitution are allowed to validate the instance against a type whose derivation from the particle type involves any extension steps.
Such validation is also prevented if the prohibited substitutions of any intervening types in the chain of derivation include extension.
• Appearance of restriction in the prohibited substitutions has an analogous effect.
Note the block attributes of <complexType/> and <element/> are independent, and constraints from both must be satisfied.• But it is “as if” an element acquires all blocked substitutions of its type.
115115
blockDefault Attribute of <xsd:schema/> Unless otherwise specified, all substitutions are allowed. You may want to change this globally to something
more conservative. Do this by specifying the blockDefault attribute on the
<xsd:schema/> element.• Allowed values for this attribute are the same as for the block
attribute on <xsd:element/>.
116116
Prohibiting Derivation The final attribute on <xsd:complexType/> works in
the same way as the corresponding attribute for <xsd:simpleType/>.
Its value may be either a list containing a subset of the values extension and restriction, or simply #all. It prohibits either or both kinds of derivation using this type as base.
Although final and block can be used to similar ends, their modus operandi are quite different:• final controls how you define new types derived from this
type.• block controls how you substitute elements of this type in the
document instance.
117117
Substitution Group Exclusions An <xsd:element/> declaration likewise allows a final
attribute, with the same allowed values as final on <xsd:complexType/>.
Its value defines the substitution group exclusions for this element, which control its use as the head of a substitution group.• If an element has extension in its substitution group
exclusions, it may not be the substitution group affiliation of another element whose type is derived from the type of this element by steps including extension.
• Appearance of restriction in the substitution group exclusions has an analogous effect.
By all rights, it should be possible to put substitution in this set. But it isn’t!
118118
finalDefault Attribute of <xsd:schema/> For completeness we mention that the <xsd:schema/>
element allows a finalDefault attribute, which works in a way very much analogous to the blockDefault attribute.
119119
Still to Come on Inheritance By no means have we yet covered every aspect of
inheritance. Notably we haven’t discussed what exactly is a legal
restriction or extension of a complex type (particularly with respect to the content model).
This is quite complicated in general, and it will be covered in the final section.
120120
XML Schema Identity Constraints
121121
Identifiers and References Revisited Slightly extended version of an example from the
lectures on DTDs:
<agency> <agent name="Alice" boss="Alice"/> <agent name="Bob" boss="Alice"/> <agent name="Carole" boss="Alice"/> <agent name="Dave" boss="Bob"/> </agency>
Carole
Alice
Bob
Dave
Using DTDs, we assumed name was declared with type ID, and attribute boss was declared with type IDREF.
122122
Identity Constraints Recall that the attribute types ID and IDREF imply
interesting constraints on values of those attributes:• Within any individual XML document, every attribute of
type ID must be specified with a different value from every other attribute of type ID.
• The value of any attribute of type IDREF must be the same as the value of an attribute of type ID specified somewhere in the same document.
These properties are obviously very useful and natural if we need to identify individual elements in a document.
XML Schema supports the ID and IDREF simple types. But it also introduces additional, much more general mechanisms for achieving similar ends.
123123
Use of XPath In an earlier lecture-set we gave a brief introduction to
XPath.• Recall that XPath is a notation for representing a subset of
nodes in a single XML document. The basic idea of XML Schema identity constraints is
to use XPath expressions to identify groups of “fields” within an XML document that act as either identifiers or references.• Uniqueness/existence constraints hold within/across these
groups. More flexible than the DTD mechanism, because:
• XPath allows one to single out more refined sets of fields.• May have multiple categories of identifier in the same
document.
124124
Example<xsd:element name="agency"> <xsd:complexType> <xsd:element ref="agent" minOccurs="0" maxOccurs="unbounded"/> </xsd:complexType> <xsd:key name="agentName"> <xsd:selector xpath="agent"/> <xsd:field xpath="@name"/> </xsd:key> <xsd:keyref refer="agentName" name="agentBoss"> <xsd:selector xpath="agent"/> <xsd:field xpath="@boss"/> </xsd:key></xsd:element>
125125
General Remarks The element <xsd:key/> defines a key field called
agentName. The element <xsd:keyref/> defines a key reference field
called agentBoss. These definitions are inside the declaration of the
element <agency/>.• This implies that the scope of the uniqueness and related
constraints is an individual <agency/> element.
• This may or may not be the top-level element of a document. The fields themselves are specified by XPath
expressions (details follow).
126126
Defining a Key We have the example:
<xsd:key name="agentName"> <xsd:selector xpath="agent"/> <xsd:field xpath="@name"/> </xsd:key>• The name of the key is agentName.• The <xsd:selector/> element defines the set of nodes labeled
by this key. In our case, it is the set of all agent elements nested
directly in the agency element.• The <xsd:field/> element defines the field within each labeled
node that acts as the key. In our case, the name attribute of the node.
127127
Validity Constraints on Keys Every node identified by the XPath expression in the
<xsd:selector/> element must have exactly one descendant node identified by the XPath expression in the <xsd:field/> element.• This descendant, whose value is the key field, must be an
attribute or an element with simple type. No two nodes identified by <xsd:selector/> may have
the same value for their key fields.• This constraint holds within the body of the scope element
(the <agency/> element in our example).
• But the same value of the key field is allowed on different <agent/> nodes inside different <agency/> elements.
128128
Defining a Key Reference We have the example:
<xsd:keyref refer="agentName" name="agentBoss"> <xsd:selector xpath="agent"/> <xsd:field xpath="@boss"/> </xsd:key>• The refer attribute is the name of the key to which we refer.• The <xsd:selector/> and <xsd:field/> elements identify the
nodes whose values are the actual references. They work in essentially the same way as in <xsd:key/>. The two-stage approach to identifying the relevant fields is
less obviously natural in this case. But it supports the generalization to multiple key fields, described below.
• The name of the key reference is agentBoss—this attribute is required (though unclear what this name is used for??)
129129
Multiple Key Fields A <xsd:key/> element can have multiple <xsd:field/>
elements, e.g.: <xsd:key name="fullName">
<xsd:selector xpath=".//person"/>
<xsd:field xpath="@firstName"/>
<xsd:field xpath="@lastName"/>
</xsd:key>• For validity, this implies every <person/> element in scope
has firstName and lastName attributes with unique pair-wise-combined values.
A <xsd:keyref/> element that refers to this key must have exactly the same number of <xsd:field/> elements.
130130
Relating Key References to Keys The fact that keys and key references are scoped to
element declarations introduces some “interesting” complications.
Things might be straightforward if a <keyref/> always referred to a <key/> defined in the same element declaration.
You might be forgiven for thinking this should “obviously” be the case. But actually the Schema specification allows a <keyref/> to refer to a <key/> defined in a different element declaration.
131131
Referencing Keys in Nested Elements Suppose a key, Key, is defined in the declaration of
element B. Also suppose a key reference, Ref, refers to this key and
is defined in the declaration of element A. Now a field of Ref—scoped to an instance of A—is
allowed to point to fields of Key scoped to an instance of B that is a descendent of the A instance.• This can lead to ambiguous references, because the key
uniqueness constraints apply only within a single B instance, and there could be several Bs nested in the A instance.
• The specification gives a slightly clumsy recipe for resolving such ambiguities (illustrated below).
132132
Features The rule on the previous slide can introduce interesting
behavior even when the <xsd:keyref/> and the <xsd:key/> are defined in the same element declaration.• This can happen if instances of the element can nest inside
one another. In the example on the next slide, the key is the value of
<key/> elements directly nested inside a <scope/> element, and the reference is the value of a <ref/> element directly nested in a <scope/> element. The <scope/> elements are also allowed to self-nest.
133133
An Interesting Case <xsd:element name="scope">
<xsd:complexType> <xsd:choice minOccurs="0" maxOccurs="unbounded"> <xsd:element ref="key"/> <xsd:element ref="ref"/> <xsd:element ref="scope"/> </xsd:choice> </xsd:complexType>
<xsd:key name="key"> <xsd:selector xpath="key"/> <xsd:field xpath="."/> </xsd:key>
<xsd:keyref refer="key" name="ref"> <xsd:selector xpath="ref"/> <xsd:field xpath="."/> </xsd:keyref> </xsd:element>
134134
Examples<scope> <scope> <key>keyval</key> </scope> <ref>keyval</ref></scope>
<scope> <scope> <key>keyval</key> </scope> <key>keyval</key> <ref>keyval</ref></scope>
<scope> <scope> <key>keyval</key> </scope> <scope> <key>keyval</key> </scope> <ref>keyval</ref></scope>
<scope> <scope> <key>keyval</key> </scope> <scope> <key>keyval</key> </scope> <key>keyval</key> <ref>keyval</ref></scope>
Illegal!
135135
Remarks Examples here follow the rules in the section of the
XML Schema specification called: Schema Information Set Contribution: Identity-constraint table.
The rule is basically that a key reference can refer to a key field scoped to a descendant element. But if there are conflicts, you ignore any potential reference targets arising from children (this rule applies recursively).
In the 3rd example (bottom left), all potential targets arise from children, and are conflicting, so they should be ignored. Thus the reference is illegal.• The 2nd and 4th examples OK: conflicts are resolved by
ignoring targets from children, leaving just the local target.• Xerces 2.6.2, however, also accepts the 3rd example!
136136
Uniqueness Constraints The <xsd:unique/> element works almost exactly like
the <xsd:key/> element, except that it is not required that the identifying fields exist for every node identified by the selector.• If fields exist in the node instance, they must be unique across
all selected nodes. A unique constraint cannot be the target of a keyref.
137137
Namespaces The examples given in this section were simplified in
that the XPath expressions did not allow for a target namespace.
Recall that XPath expressions always require use of qualified names. If you are using identity constraints in a schema with a target namespace, you must declare a prefix for that namespace, and use that prefix on (say) element names appearing in the xpath attributes.
138138
Imports and Includes
[To Be Added]
139139
“Particle Derivation OK”
140140
Inheritance in OOP and XML We saw that XML Schema makes heavy use of a concept of type
inheritance. This concept is clearly inspired by the corresponding concept in
Object Oriented Programming. But the analogy between XML and OOP is by no means exact.
In OOP, a class has a set of disjoint, essentially independent, named members (fields and methods).• In derivation, this set can be extended, or named members can be
individually overridden. In XML, a complex type has a set of attributes and a content model.
• The attributes behave much like the independent members of a class, and the set of attributes can naturally be extended during derivation.
• The analogy works much less well for content models. The complex ordering and nesting relations within element content limit the options for extension.
• And, while perhaps more “mathematically natural” than extension, we will see restriction of content models has its own implementation problems.
141141
Extension and Restriction Unlike typical OOP programming languages, XML Schema
distinguishes two different forms of type derivation, called extension and restriction.• The analogy between Schema type extension and OOP inheritance should
be fairly clear.
• The analogy between Schema type restriction and OOP inheritance may be less obvious.
• It is based on the insight that when a new class is derived, the new constructors and methods generally introduce new sets of constraints or restrictions (“invariants”) on members already in the base class.
Consider a class Square, which may be derived from a base class Rectangle. The derived class imposes the new invariant width=height.
So OOP inheritance includes aspects of both extension and restriction.
142142
Attributes and Complex Type Extension Recall typical syntax for extension is like:
<complexType> <complexContent> <extension base="base-type"> extra element content extra attributes </extension> </complexContent> </complexType>
The extra attributes are generally just added to the set of attributes of base-type.
Some attributes in extra attributes may have the same name (and namespace) as attributes in base-type; any such attribute must also have identical type to its namesake in base-type.• But the new version could have a different default value, say.
If extra attributes includes an attribute wildcard, it must represent a superset of any attribute wildcard in base-type.
143143
Attributes and Complex Type Restriction If an attribute appearing in a restriction of a complex type is also
an explicitly declared attribute of the base-type, then:• The simple type of the attribute in the new type must be identical to the
attribute’s type in the base-type, or derived from it by steps of restriction.• If the attribute is fixed in the base-type, it must be fixed with the same
value in the new type.• If the attribute is required in the base-type, so must it be in the new type.
Otherwise, there must be a wild-card in the base-type that matches the attribute declared in the new type. Note:• If an attribute is required in the base-type, it must be an explicitly declared
attribute of the new type.• If an attribute was optional in the base-type, it may be specified in the new
type with use="prohibited". This is the same as omitting the attribute in the new type (and the attribute might still be allowed by a wildcard!)
If there is an attribute wild-card in the restricted type, it must be a subset of a wild-card in base-type.
144144
Content Models and Extension Consider an extension of a complex type with complex
content that adds non-empty extra element content. The extra element content must be a particle, and the
element content of the new type is <xsd:sequence> base-type element content extra element content </xsd:sequence>
(unless the base-type content model was empty, when it is just the extra element content). Notes:• This would be illegal if the base-type element content was an
<xsd:all/> particle. You can’t extend such content.• If the base-type element content is an <xsd:choice/>, there is
no way to extend the set of choices: can only add extra particles in sequence.
145145
Content Models and Restriction The idea of restricting a content model is fairly intuitive, e.g.:
• Where there is an <xsd:choice/> of several particles, the restricted model may offer a reduced choice—perhaps it replaces the <xsd:choice/> with just one of the particles it contained.
• Where there is an optional particle (say minoccurs="0" and maxoccurs="1") the restricted model might make the particle mandatory (minoccurs="1") or, conversely, simply omit it.
More generally the restricted model may subset the minoccurs..maxoccurs range as it sees fit.
• Where there is an <xsd:any/> wildcard (or an element particle that heads a substitution group) the restricted model might replace it by a more specific element particle.
Although these ideas seem intuitive, it isn’t particularly easy to prove automatically that one content model is a valid restriction of another.
146146
Particle Derivation OK Defining the conditions under which one particle is a
legal restriction of another particle is one of the more complex parts of the (generally quite complex) XML Schema specification.
You will find the rules in the section of the specification called Constraints on Particle Schema Components.
The relevant subsections start with the rule called Particle Valid (Restriction). This gives some rules for reducing particles to a “canonical” form, then delegates to more specialized rules with names like Particle Derivation OK (X:Y – R), where X, Y, R depend on the case.
147147
Canonical Form Before comparing two particles to see if one is a valid
restriction of the other, both should be reduced to a certain canonical form:• Any occurrence of an element particle that is the head of a
substitution group is replaced by an explicit <xsd:choice/> between element particles for all members of the substitution group.
• Empty groups are discarded.• Redundant singleton <xsd:sequence/>, <xsd:choice/>,
<xsd:any/> particles are replaced by the single particle they contain.
• An “associative rule” is applied to eliminate <xsd:sequence/> particles nested inside other <xsd:sequence/> particles (subject to some conditions on minoccurs, maxoccurs). Likewise for <xsd:choice/>.
148148
Comparing <sequence/> with <sequence/>
There are many specific versions of the Particle Derivation OK rule—basically one for every kind of particle you might try to restrict to any other kind of particle.
We don’t attempt to mention all of them here—just a couple of interesting cases.
For example, consider the case where you are trying to restrict an <xsd:sequence/> particle in an existing content model to an <xsd:sequence/> particle in a new content model.
The exact rule that takes care of this case is called Particle Derivation OK (All:All, Sequence:Sequence—Recurse).
149149
All:All, Sequence:Sequence—Recurse The occurrence ranges (minoccurs, maxoccurs) of the
original and restricted <sequence/> must be consistent with restriction.
Less trivially, there exists an order-preserving mapping from the particles in the restricted <sequence/> to particles of the original <sequence/>, such that:• Each particle in the restricted <sequence/> is a valid
restriction of its image particle (under the map). Here we recursively apply the definition of the Particle
Derivation OK, hence the Recurse in the title.• Any particle of the original <sequence/> that is not in the
range of the map is emptiable—i.e. can match empty content. It happens that the same rule is used for <all/> groups,
hence the All:All in the title.
150150
Schematic ExampleOriginal:
<xsd:sequence> <xsd:element ref="title"/> <xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="figure" minOccurs="0" maxOccurs="unbounded"/></xsd:sequence>
Restricted:
<xsd:sequence> <xsd:element ref="title"/> <xsd:element ref="captionFigure" minOccurs="0" maxOccurs="unbounded"/></xsd:sequence>
Arrows illustrate an order-preserving map with required properties:• title particle is (trivially) a valid restriction of title particle, and
captionFigure is a valid restriction of figure.• The original paragraph particle is not in the range of the map, but is
emptiable (because minOccurs is 0).
151151
Determinism? The requirement in the Sequence:Sequence—Recurse
rule that “there exists” a suitable map looks rather cavalier: how are we to actually discover whether this map exists?• In other words, the rule doesn’t seem to give a deterministic
prescription for checking whether one model is a restriction of the other.
152152
A Prescription A “greedy” prescription that will sometimes find a
suitable, order-preserving map is this:• Visit the particles of the restricted model in turn, trying to
find a match for each. At any time we have a “next candidate” particle from the original model, for possible matching (initially the first particle of the original model).
• If the current particle in the restricted model is a valid restriction of the “next candidate”, take the candidate as the mapping of the current particle and carry on to the next particles in both models.
• Otherwise, if the current particle is not a valid restriction of the candidate, but the candidate is emptiable, try again with the immediately following particle in the original model as “next candidate”.
• Otherwise, this prescription fails to find a map.
153153
A Case Where that Prescription FailsOriginal:
<xsd:sequence> <xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="figure"/> minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="paragraph"/></xsd:sequence>
Restricted:
<xsd:sequence> <xsd:element ref="paragraph"/></xsd:sequence>
The “greedy” prescription will try to match the paragraph particle in the restricted model to the first paragraph particle of the original model. But the resulting map is unsatisfactory, because then the final paragraph particle of the original model is not in the range of the map, nor is it emptiable.
Meanwhile, in fact, “there exists” a suitable map: just map the paragraph particle of the restricted model to the final particle of the original.
154154
Unique Particle Attribution to the Rescue!?
But, the “Original” model on the previous slide is an illegal content model according to the Unique Particle Attribution rule!• Recall this is the XML Schema analogue of a rule about
DTDs, which says content models must be “deterministic”. While the XML Schema specification doesn’t spell this
out, it seems semi-plausible that, if content models satisfy the Unique Particle Attribution rule, then a simple greedy prescription will find the order-preserving mapping required by Particle Derivation OK, if such a mapping exists.• This makes checking Particle Valid (Restriction) tractable.
155155
Clause 1.5 Finally, we note that there is a slightly mysterious
clause in the section of the Schema specification called Schema Component Constraint: Derivation Valid (Extension), which is supposed to ensure that, in a chain of derivation, nothing removed by a restriction may be added back by a subsequent extension.• We omit the details here! The rule isn’t very clearly stated in
the specification (IMHO).
156156
Conclusion In this section we have just briefly touched on the issues
of what constitutes a valid extension or restriction of a content model.
The general rules are complicated. If you intend to use these capabilities of XML Schema in non-trivial ways, expect surprises!