1 Implementing Variable Content Containers XML Schemas: Best Practices A set of guidelines for...

53
1 Implementing Variable Content Containers XML Schemas: Best Practices A set of guidelines for designing XML Schemas Created by discussions on xml-dev

Transcript of 1 Implementing Variable Content Containers XML Schemas: Best Practices A set of guidelines for...

1

Implementing Variable Content Containers

XML Schemas: Best Practices

A set of guidelines for designing XML Schemas

Created by discussions on xml-dev

2

Issue

• What are the methods for designing an element which is to be comprised of variable content?

<Catalogue> - variable content section -</Catalogue>

<Book> or <Magazine> or ...

Catalogue is called a variable content container

3

4 Methods• There are 4 methods for implementing variable

content containers:

– Method 1: use an abstract element and element substitution

– Method 2: use a <choice> element

– Method 3: use an abstract type and type substitution

– -Method 4: use a dangling type

• In the following slides each method will be described, along with its advantages and disadvantages

4

Method 1 - use an Abstract Element and Element Substitution

<Catalogue> - variable content section </Catalogue>

Publication (abstract)

<Book>

<Magazine>

substitutionGroup

"substitutable for"

"substitutable for"

5

Method 1 - use an Abstract Element and Element Substitution

• In the Schema:– Declare Catalogue to be comprised of

Publication, where Publication is declared abstract

– Declare Book and Magazine to be in a substitutionGroup with Publication

• In an Instance Document:– The contents of <Catalogue> can be any element

that is in the substitutionGroup

6

<xsd:element name="Publication" abstract="true" type="PublicationType"/><xsd:element name="Book" substitutionGroup="Publication" type="BookType"/><xsd:element name="Magazine" substitutionGroup="Publication" type="MagazineType"/><xsd:element name="Catalogue"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Publication" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType></xsd:element>

Method 1 - use an Abstract Element and Element Substitution

7

Method 1 - use an Abstract Element and Element Substitution

PublicationType

BookType MagazineType

In order for Book and Magazine to be in a substitutionGroup withPublication, their type (BookType and MagazineType, respectively) must be the same as, or derived from Publication's type (PublicationType)

8

<xsd:complexType name="PublicationType"> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="Date" type="xsd:year"/> </xsd:sequence> </xsd:complexType><xsd:complexType name="BookType"> <xsd:complexContent> <xsd:extension base="PublicationType" > <xsd:sequence> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:extension> </xsd:complexContent></xsd:complexType><xsd:complexType name="MagazineType"> <xsd:complexContent> <xsd:restriction base="PublicationType"> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string" minOccurs="0" maxOccurs="0"/> <xsd:element name="Date" type="xsd:year"/> </xsd:sequence> </xsd:restriction> </xsd:complexContent></xsd:complexType>

9

<?xml version="1.0"?><Catalogue xmlns="http://www.catalogue.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.catalogue.org Catalogue.xsd"> <Book> <Title>Illusions The Adventures of a Reluctant Messiah</Title> <Author>Richard Bach</Author> <Date>1977</Date> <ISBN>0-440-34319-4</ISBN> <Publisher>Dell Publishing Co.</Publisher> </Book> <Magazine> <Title>Natural Health</Title> <Date>1999</Date> </Magazine> <Book> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-064831-7</ISBN> <Publisher>Harper &amp; Row</Publisher> </Book></Catalogue>

Do Lab1

10

Method 1 Advantages - Extensibility

• Extensible: other schemas can extend the set of elements that may occur in the variable content container

<xsd:include schemaLocation="Catalogue.xsd"/><xsd:complexType name="CDType"> <xsd:complexContent> <xsd:extension base="PublicationType" > <xsd:sequence> <xsd:element name="RecordingCompany" type="xsd:string"/> </xsd:sequence> </xsd:extension> </xsd:complexContent></xsd:complexType><xsd:element name="CD" substitutionGroup="Publication" type="CDType"/>

CD.xsd

Now the variablecontent containermay contain<Book> or<Magazine> or<CD>

11

Method 1 Advantages - Extensibility

• Note how the CD schema extended the Catalogue schema, without modifying it!

• Oftentimes you would like to extend a schema, but it is read-only (under someone else's control). With this technique you are able to extend a read-only schema!

12

<?xml version="1.0"?><Catalogue xmlns="http://www.catalogue.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.catalogue.org CD.xsd"> <Book> <Title>Illusions The Adventures of a Reluctant Messiah</Title> <Author>Richard Bach</Author> <Date>1977</Date> <ISBN>0-440-34319-4</ISBN> <Publisher>Dell Publishing Co.</Publisher> </Book> <CD> <Title>Timeless Serenity</Title> <Author>Dyveke Spino</Author> <Date>1984</Date> <RecordingCompany>Dyveke Spino Productions</RecordingCompany> </CD> <Magazine> <Title>Natural Health</Title> <Date>1999</Date> </Magazine> <Book> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-064831-7</ISBN> <Publisher>Harper &amp; Row</Publisher> </Book></Catalogue>

13

Method 1 Advantages - Semantic Cohesion

• All the elements that may occur in the variable content container must have a type that derives from the abstract element's type. Thus, the elements have a type hierarchy to bind them together, to provide structural (and, by implication, semantic) coherence among the elements.

14

Method 1 Disadvantages - No Independent Elements

• The variable content container cannot contain elements whose type does not derive from the abstract element's type, or is not in a substitutionGroup with the abstract element - as would typically be the case with independently developed elements.

15

Method 1 Disadvantages - Limited Structural Variability

• Over time a schema will evolve, and the kinds of elements which may occur in the variable content container will typically grow. There is no way to know in what direction it will grow. The new elements may be conceptually related but structurally vastly different from the original set of elements. The abstract element's type (e.g., PublicationType) may have been a good base type for the original set of elements which were all structurally related, but may not be a good base type for the new elements which have vastly different structures.

• So you are faced with a tradeoff -

– create a simple base type to support lots of different structures (but then you can make less assumptions about the structure of the members), or

– create a rich base type to support strong data type checking (but then you reduce the ability to add elements with radically different types)

16

Method 1 Disadvantages - Nonscalable Processing

• Suppose that Catalogue is declared to contain two abstract elements - Publication and Retailer, and there can be multiple occurrences of each. Here's a sample instance document:

<Catalogue> <Book> … </Book> <Magazine> … </Magazine> <Book> … </Book> <MarketBasket> … </MarketBasket> <Macys> … </Macys></Catalogue>

How would you write, say, a stylesheet to process all the Publication elements?

17

Method 1 Disadvantages - Nonscalable Processing

Answer: you would need to write special-case code:

<xsl:if test="Book"> - process Book</xsl:if><xsl:if test="Magazine"> - process Magazine</xsl:if>

This is not scalable. Each time a new element is added to Publication's substitutionGroup (e.g., CD), the stylesheet breaks (i.e., needs to be updated)

18

Method 1 Disadvantages - No Control over Namespace Exposure

• A requirement of using substitutionGroup is that all elements in a substitutionGroup must be declared globally. As a consequence, there is no way to hide (localize) the namespaces of the elements in the variable content container. This fails the Best Practice guideline which states that you should design your schema to be able to turn namespace exposure on/off using elementFormDefault as a namespace switch.

19

Method 2 - use a <choice> Element

<Catalogue> - variable content section </Catalogue>

<choice> <element name="Book" …/> <element name="Magazine" …/>

</choice>

20

Method 2 - use a <choice> Element

<xsd:element name="Catalogue"> <xsd:complexType> <xsd:choice maxOccurs="unbounded"> <xsd:element ref="Book"/> <xsd:element ref="Magazine"/> </xsd:choice> </xsd:complexType> </xsd:element>

Do Lab2

21

Method 2 Advantages - Independent Elements

• The elements in the variable content container do not need a common type ancestry. Thus, the variable content container can contain dissimilar, independent, loosely coupled elements.

22

Method 2 Disadvantages - Nonextensible

• This method requires that the contents of the <choice> element be modified to include additional elements. However, the schema may be outside your control (read-only), so you may not be able to get it extended to include your additional elements. Thus, this method has serious extensibility restrictions!

23

Method 2 Disadvantages - No Semantic Coherence

• The <choice> element allows you to group together dissimilar elements. The elements in the variable choice container have no type hierarchy to bind them together, to provide structural (and, by implication, semantic) coherence among the elements. Thus, you can make no assumptions about the structure of the elements.

24

Method 3 - use an Abstract Type and Type Substitution

<Catalogue> <Publication xsi:type="…"> - variable content section </Publication> </Catalogue>

PublicationType (abstract)

BookType MagazineType

25

Method 3 - use an Abstract Type and Type Substitution

• In the Schema:– Declare Catalogue to be comprised of Publication,

where Publication is declared to be of type PublicationType, which is abstract

– Declare BookType and MagazineType to derive from PublicationType

• In an Instance Document:– The contents of <Catalogue> will be <Publication>, and

its content can be any type that derives from PublicationType

26

<xsd:complexType name="PublicationType" abstract="true"> ...</xsd:complexType><xsd:complexType name="BookType"> <xsd:complexContent> <xsd:extension base="PublicationType" > ... </xsd:extension> </xsd:complexContent></xsd:complexType><xsd:complexType name="MagazineType"> <xsd:complexContent> <xsd:restriction base="PublicationType"> ... </xsd:restriction> </xsd:complexContent> </xsd:complexType> <xsd:element name="Catalogue"> <xsd:complexType> <xsd:sequence> <xsd:element name="Publication" type="PublicationType" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element>

27

<?xml version="1.0"?><Catalogue xmlns="http://www.catalogue.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.catalogue.org Catalogue.xsd"> <Publication xsi:type="BookType"> <Title>Illusions The Adventures of a Reluctant Messiah</Title> <Author>Richard Bach</Author> <Date>1977</Date> <ISBN>0-440-34319-4</ISBN> <Publisher>Dell Publishing Co.</Publisher> </Publication> <Publication xsi:type="MagazineType"> <Title>Natural Health</Title> <Date>1999</Date> </Publication> <Publication xsi:type="BookType"> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-064831-7</ISBN> <Publisher>Harper &amp; Row</Publisher> </Publication></Catalogue>

Do Lab3

28

Method 3 Advantages - Extensibility

• Extensible: other schemas can extend the set of elements that may occur in the variable content container

<include schemaLocation="Catalogue.xsd"/><complexType name="CDType"> <complexContent> <extension base="PublicationType" > <sequence> <element name="RecordingCompany" type="string"/> </sequence> </extension> </complexContent></complexType>

CD.xsd

Now the content of<Publication> may beBookType, or MagazineType, or CDType

29

<?xml version="1.0"?><Catalogue xmlns="http://www.catalogue.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.catalogue.org CD.xsd"> <Publication xsi:type="BookType"> <Title>Illusions The Adventures of a Reluctant Messiah</Title> <Author>Richard Bach</Author> <Date>1977</Date> <ISBN>0-440-34319-4</ISBN> <Publisher>Dell Publishing Co.</Publisher> </Publication> <Publication xsi:type="CDType"> <Title>Timeless Serenity</Title> <Author>Dyveke Spino</Author> <Date>1984</Date> <RecordingCompany>Dyveke Spino Productions</RecordingCompany> </Publication> <Publication xsi:type="MagazineType"> <Title>Natural Health</Title> <Date>1999</Date> </Publication> <Publication xsi:type="BookType"> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-064831-7</ISBN> <Publisher>Harper &amp; Row</Publisher> </Publication></Catalogue>

30

Method 3 Advantages - Minimal Dependencies

• In Method 1 to extend the variable content container you need access to both the abstract element (Publication) and its type (PublicationType). With method 3 you only need access to the abstract type.

• Thus, to extend a schema there are fewer dependencies

31

Method 3 Advantages - Uniform, Scalable Processing

With this method instance documents will look different than we saw with the other

two methods. Namely, <Catalogue> will not contain variable content. Instead, it will

always contain the same element (Publication). However, that element will contain

variable content:

<Catalogue>

<Publication xsi:type="BookType"> ... </Publication> <Publication xsi:type="MagazineType"> ... </Publication> <Publication xsi:type="BookType"> ... </Publication>

</Catalogue>

The contents of Catalogue can be uniformly processed, even as new types (e.g, CDType) areadded. For example, a stylesheet could process each Publication element as follows:

<xsl:for-each select="Publication"> -- do something --

</xsl:for-each>

32

Method 3 Advantages - Semantic Cohesion

• All the elements that may occur in the variable content container must have types that derive from the abstract type. Thus, the elements have a type hierarchy to bind them together, to provide structural (and, by implication, semantic) coherence among the elements.

33

Method 3 Advantages - Control over Namespace Exposure

• Recall that the key to controlling namespace exposure is to embed the elements within type definitions. That's exactly what this method does. Consequently, we can control exposure of the namespace of the elements using elementFormDefault. This is consistent with the Best Practice design recommendation we issued for hide (localize) versus expose namespaces.

34

Method 3 Disadvantages - No Independent Elements

• The variable content container cannot contain elements whose type does not derive from the abstract type - as would typically be the case with independently developed elements.

35

Method 3 Disadvantages - Limited Structural Variability

• Over time the kinds of elements which may occur in the variable content container will typically grow. There is no way to know in what direction it will grow. The new elements may be conceptually related but structurally vastly different from the original set of elements. The abstract type (e.g., PublicationType) may have been a good base type for the original set of elements which were all structurally related, but may not be a good base type for the new elements which have vastly different structures.

• So you are faced with a tradeoff:

– create a simple base type to support lots of different structures (but then you can make less assumptions about the structure of the members), or

– create a rich base type to support strong data type checking (but then you reduce the ability to add elements with radically different types)

36

Method 4 - Motivation

• Thus far our variable content container has always contained complex content (i.e., child elements).

• Suppose that we want to create a variable content container to hold simple content? None of the previous methods can be used.

• We need a method that allows us to create simpleType variable content containers.

37

Method 4 - Motivation

Problem: Suppose that we have an element, sensor, which contains the name of a weather station sensor. For example:

<sensor>Barometric Pressure</sensor>

Several things to note: 1. This element holds a simpleType. 2. Each weather station may have sensors that are unique to it. Consequently, we must design our schema so that the sensor element can be customized by each weather station.

38

Method 4 - use a Dangling Type

Here's the method - when you create sensor, declare it to be of a type from another namespace. Then, for the <import> element don't provide a schemaLocation (schemaLocation is optional withthe import element). Thus, the element is declared to be of a type, for which no particular schema is identified, i.e., we have a dangling type definition!

39

Method 4 - use a Dangling TypeIn your schema, declare the sensor element:

<xsd:element name="sensor" type="s:sensor_type"/>

Note that I declared the sensor element to have a type "sensor_type", which is in a different namespace - the sensor namespace:

xmlns:s="http://www.sensor.org"

Now here's the key - when you <import> this namespace, don't provide a value for schemaLocation! For example:

<xsd:import namespace="http://www.sensor.org"/>

40

Method 4 - use a Dangling Type

The instance document must then identify a schema that implements sensor_type. Thus, at run time (validation time) we are matching up the reference to sensor_type with the implementation of sensor_type. For example, an instance document may have this:

xsi:schemaLocation= "http://www.weather-station.org weather-station.xsd http://www.sensor.org boston-sensors.xsd"

In this instance document schemaLocation is identifying a schema, boston-sensors.xsd, which provides an implementation of sensor_type.

41

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.weather-station.org" xmlns="http://www.weather-station.org" xmlns:s="http://www.sensor.org" elementFormDefault="qualified"> <xsd:import namespace="http://www.sensor.org"/> <xsd:element name="weather-station"> <xsd:complexType> <xsd:sequence> <xsd:element name="sensor" type="s:sensor_type" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element></xsd:schema>

weather-station.xsd

An import with noschemaLocation!

sensor is the variablecontent container. Itcontains a sensor_typefor which no schemahas been indicated!It is a dangling type!

42

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.sensor.org" xmlns="http://www.sensor.org" elementFormDefault="qualified"> <xsd:simpleType name="sensor_type"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="barometer"/> <xsd:enumeration value="thermometer"/> <xsd:enumeration value="anenometer"/> </xsd:restriction> </xsd:simpleType></xsd:schema>

boston-sensors.xsd

This schema provides an implementation for the dangling type, sensor_type.

43

<?xml version="1.0"?><weather-station xmlns="http://www.weather-station.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.weather-station.org weather-station.xsd http://www.sensor.org boston-sensors.xsd"> <sensor>thermometer</sensor> <sensor>barometer</sensor> <sensor>anenometer</sensor></weather-station>

boston-weather-station.xml

In the instance document we provide a schemawhich implementsthe dangling type.

44

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.sensor.org" xmlns="http://www.sensor.org" elementFormDefault="qualified"> <xsd:simpleType name="sensor_type"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="barometer"/> <xsd:enumeration value="thermometer"/> <xsd:enumeration value="anenometer"/> <xsd:enumeration value="hygrometer"/> </xsd:restriction> </xsd:simpleType></xsd:schema>

london-sensors.xsd

This schema provides a different implementation for the dangling type, sensor_type.

45

<?xml version="1.0"?><weather-station xmlns="http://www.weather-station.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.weather-station.org weather-station.xsd http://www.sensor.org london-sensors.xsd"> <sensor>thermometer</sensor> <sensor>barometer</sensor> <sensor>hygrometer</sensor> <sensor>anenometer</sensor></weather-station>

london-weather-station.xml

The London weather station is able to customize thecontent of <sensor> by using london-sensors.xsd,which defines sensor_type appropriately for theLondon weather station. Wow!

46

Method 4 - Summary

Summary: The key to this design pattern is: 1. When you declare the variable content container element give it a type that is in another namespace, e.g., s:sensor_type

2. When you <import> that namespace don't provide a value for schemaLocation, e.g.,

<xsd:import namespace="http://www.sensors.org"/>

3. Create any number of implementations of the type:

- boston-sensors.xsd - london-sensors.xsd

4. In instance documents identify the schema that shall be used to implement the type, e.g.,

xsi:schemaLocation= "http://www.weather-station.org weather-station.xsd ----> http://www.sensor.org london-sensors.xsd"

47

Method 4 - Note

• In the previous example the implementation of the dangling type (sensor_type) was a simpleType. It need not be a simpleType. It could be a complexType.

• Thus, this method can be used to create variable content containers which are either complex or simple.

48

Method 4 Advantages - Dynamic

• A schema which contains a dangling type is very dynamic. It does not statically hard-code the identity of a schema to implement the type. Rather, it empowers the instance document author to identify a schema that implements the dangling type.

• Thus, at instance-document-creation the dangling type implementation is provided (rather than at schema-document-creation)

49

Method 4 Advantages - Both for Simple and Complex Variable

Content Containers• A dangling type can be implemented using

either a simpleType or a complexType. The other methods are only applicable to creating variable content containers with a complex type.

50

Method 4 Disadvantages - Must be in Another Namespace

• The implementation of the dangling type must be in another namespace. It cannot be in the same namespace as the variable content container element.

• None of the schema validators implement dangling types (yet). The best that you can do today is "fake it" with anyType (see next slides)

51

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.weather-station.org" xmlns="http://www.weather-station.org" xmlns:s="http://www.sensor.org" elementFormDefault="qualified"> <xsd:import namespace="http://www.sensor.org"/> <xsd:element name="weather-station"> <xsd:complexType> <xsd:sequence> <xsd:element name="sensor" type="s:sensor_type" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element></xsd:schema>

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.weather-station.org" xmlns="http://www.weather-station.org" xmlns:s="http://www.sensor.org" elementFormDefault="qualified"> <xsd:element name="weather-station"> <xsd:complexType> <xsd:sequence> <xsd:element name="sensor" type="xsd:anyType" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element></xsd:schema>

Instead ofthese...

Use this.

Cont. -->

52

Do Lab4

<?xml version="1.0"?><weather-station xmlns="http://www.weather-station.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.weather-station.org weather-station.xsd http://www.sensor.org boston-sensors.xsd"> <sensor>thermometer</sensor> <sensor>barometer</sensor> <sensor>anenometer</sensor></weather-station>

<?xml version="1.0"?><weather-station xmlns="http://www.weather-station.org" xmlns:s="http://www.sensor.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.weather-station.org weather-station-using-anyType.xsd http://www.sensor.org boston-sensors.xsd"> <sensor xsi:type="s:sensor_type">thermometer</sensor> <sensor xsi:type="s:sensor_type">barometer</sensor> <sensor xsi:type="s:sensor_type">anenometer</sensor></weather-station>

Instead of this ...

Use this (specify thetype explicitly)

53

Best Practice

• If you have a hard requirement for independent, dissimilar elements then you have no choice but to use Method 2 (<choice> element)

• If you have a hard requirement for a variable content container which contains simple content then you have no choice but to use Method 4 (dangling types)

• In all other cases, the combination of method 4 and method 3 (that is, use a dangling type, but provide an implementation of the dangling type which uses method 3) provides the most flexibility.