XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy.
-
Upload
joy-newman -
Category
Documents
-
view
215 -
download
0
Transcript of XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy.
XSDL & Relax : 2 new schema languages for XML
Rajasekar Krishnamurthy
Outline
• DTDs and their drawbacks
• XML Schema Requirements
• XSDL
• RELAX
• Other Schema specifications
Sample XML document
<?xml version="1.0"?><book >
<title>Intro to XML</title> <price>72.50</price> <author>
<name> Albert Einstein </name> <email>[email protected]</email> <phone>608-236-4112</phone>
</author></book>
Equivalent DTD
(!element book (title,price,author*))(!element title #PCDATA)(!element price #PCDATA)(!element author (name,email,phone))(!element name #PCDATA)(!element email #PCDATA)(!element phone #PCDATA)
Drawbacks of DTD<book >
<title>Intro to XML</title> <price>72.50</price> <author>
<title> Dr. </title> <firstname> Albert </firstname>
< lastname> Einstein </lastname><email>[email protected]</email>
<phone>608-236-4112</phone></author>
</book>
Outline
• DTDs and their drawbacks
• XML Schema Requirements
• XSDL
• RELAX
• Other Schema specifications
What is a schema ?
Model for describing a class of documentsCommon vocabulary for applications
exchanging documentsFormally express syntactic, structural and
value constraints applicable to instance documents
XML Schema requirements
Mechanisms for constraining document structure
inheritanceembedded documentationapplication specific constraintsprimitive data typingallow creation of user-defined datatypesaddressing the evolution of schema
Application Scenarios
Electronic Commerce transaction processingTraditional document authoring/editingQuery formulation and optimizationOpen and uniform transfer of data between
applications, including databasesMetadata interchange
Outline
• DTDs and their drawbacks
• XML Schema Requirements
• XSDL
• RELAX
• Other Schema specifications
XML Schema Definition Language
• Enhanced datatypes
• written in XML
• separates element tags from types– local namespaces
• Inheritance : derive new type definitions
• Identity constraints
• support for namespaces
Sample XML schema<schema> <element name=“book” type=“booktype”/> <complextype name=“booktype”>
<sequence><element name=“title” type = “string”/><element name=“price” type = “float” /><element name=“author” type=“authortype” minOccurs=“0” maxOccurs=“unbounded”/>
</sequence> </complextype></schema>
Sample schema (contd.)<schema>
<complextype name=“authortype”>
<sequence>
<element name=“name” type=“name”/>
<element name=“email” type=“email”/>
<element name=“phone” type=“phonenumber”/>
<element name=“address” type=“address” minOccurs=“0”/>
</sequence>
</complextype>
</schema>
Schema in graphical form
book
title price author*
name email phone address?
Schema Components
• Building blocks that comprise the abstract data model of the schema
• Primary Components– simple type definitions– complex type definitions– attribute declarations– element declarations
Schema Components
• Secondary components– attribute group
definitions
– identity constraint definitions
– model group definitions
– notation declarations
• Helper components– annotations
– model groups
– particles
– wildcards
Type Definitions
• Separates tag name from type of elements
• types can be– simpletypes
• represent leaf nodes in the graph
• replace PCDATA in DTDs
– complextypes • can have elements and attributes in its content
Sample complexType declaration
<complexType name=“address" > <sequence> <element name="name” type="string”
minOccurs=“0”/> <element name="street" type="string"/> <element name="city" type="string" />
</sequence> <attribute name="country” type = “string”
use=“default” value=“US”/></complexType>
Simpletype : Pattern
<simpletype name=“phonenumber”>
<restriction base=“string”>
<pattern value=“\d{3}-\d{3}-\d{4}”\>
</restriction>
</simpletype>
• Other facets: Enumerate, Range• Other simpletypes: Lists, Union
Elements• Global elements
– can occur as the root of the document– can be included/imported/referenced
• Local elements– can occur only in the specific context– sibling elements need to have same content
model• (!element book (author*, title, author*))
Sample schema<schema>
<element name=“book” type=“booktype”/>
<complextype name=“booktype”>
<sequence>
<element name=“title” type = “string”/>
<element name=“price” type = “float” />
<element name=“author” type=“authortype” maxOccurs=“unbounded”/>
</sequence>
</complextype>
</schema>
Element Content• Complextypes from simple types• <price currency=“USDollar”>23</>
• Mixed content• <price>amount in US-dollars is
<amount>23</amount> only• </price>
• Empty content• <price currency=“USDollars” amount=“23”/>
Building content models(!element author ((name | (title,firstname,lastname)),email,phone))
<author>
< lastname> Einstein </lastname>
<title> Dr. </title>
<firstname> Albert </firstname>
<email>[email protected]</email>
<phone>608-236-4112</phone>
</author>
<author>
<name> Albert Einstein </name>
<email>[email protected]</email>
<phone>608-23-4112</phone>
</author>
Building content models<complextype name=“authortype”>
<sequence>
<choice>
<element name=“name” type=“name”/>
<all>
<element name=“title” type=“titletype”/>
<element name=“firstname” type=“string”/>
<element name=“lastname” type=“string”/>
</all>
</choice>
<element name=“email” type=“email”/> ...
</sequence>
</complextype>
Content models• Can represent any content model expressible with
XML 1.0 DTD and more !!• Does not allow non-determinism
– ( (email,name) | (email,expandedname)) is illegal
– should be (email, (name | expandedname))• Does not allow ambiguity
– ( author*, contactauthor*, author* ) not allowed• author* can be derived in multiple ways
Deriving new types
• Two ways of deriving new types from existing types
• By extension– similar to inheritance in programming
languages
• By restriction– declarations more limited than base type
Deriving by Extension
<complexType name="USAddress" > <sequence> <element name="name” type="string”/> <element name="street" type="string"/> <element name="city" type="string" /> <element name="state" type=”USState"/> <element name="zip" type=”positiveInteger"/> <sequence></complexType>
Declare Base Type
<complexType name=“address" > <sequence> <element name="name” type="string” /> <element name="street" type="string"/> <element name="city" type="string" />
<sequence></complexType>
Derive By Extension
<complexType name=“USAddress”> <complexContent>
<extension base=“address”> <sequence> <element name="state" type=”USState"/>
<element name="zip”type=”positiveInteger"/>
<sequence></extension>
</complexContent></complexType>
Using Derived Types
<address type=“USAddress”>
<street>1210, W.Dayton Street</>
<city>Madison</>
<state>WI</>
<zip>53706</>
</>
<address>
<street>1210, W.Dayton Street</>
<city>Madison</>
</>
Deriving By Restriction
<complexType name=“modifiedAddress”> <complexContent> <restriction base=“address”> <sequence>
<element name="name” type="string” minOccurs=“0” maxOccurs=“0”/>
<element name="street" type="string"/> <element name="city" type="string" />
<sequence> </restriction> </complexContent><complexType>
Identity Constraints
• Can specify integrity constraints– uniqueness, key, keyref
• constraints can be locally scoped
• can be applied on attributes, elements or their contents– XML ID is an attribute
• can create keys/keyrefs from a combination of element and attribute content
Sample constraint
<element name=“book” type=“booktype”>
<unique name=“uniqueauthor”>
<selector xpath=“author”/>
<field xpath=“title”/>
<field xpath=“firstname”/>
<field xpath=“lastname”/>
</unique>
</element>
Other features
• Importing schema components– Type libraries
• Redefining Types & Groups
• Namespaces– Targetnamespaces
• allow undeclared value : support for namespace unaware documents
Other features
• Any element– allows well-formed XML to appear– can be restricted to a set of namespaces
• Any attribute
• anyType– base type for all complexTypes– does not constrain content in any way– default type when none is specified
Main drawback of XSDLAn element declaration (call it D) together with a blocking constraint (a
subset of {substitution, extension,restriction}, the value of a {disallowed substitutions}) is validly substitutable for another element declaration (call it C) if
1.1 the blocking constraint does not contain substitution;
1.2 There is a chain of {substitution group affiliation}s from D to C, that is, either D's {substitution group affiliation} is C, or D's {substitution group affiliation}'s {substitution group affiliation} is C, or . . .;
1.3 The set of all {derivation method}s involved in the derivation of D's {type definition} from C's {type definition} does not intersect with the union of the blocking constraint, C's {prohibited substitutions} and the {prohibited substitutions} of any intermediate {type definition}s in the derivation of D's {type definition} from C's {type definition}.
Main drawback of XSDL
• for a sequence, maximum is
unbounded if the {max occurs} of any wildcard or element declaration particle in the group's {particles} or the maximum part of the effective total range of any of the group particles in the group's {particles} is unbounded, or if any of those is non-zero and the {max occurs} of the particle itself is unbounded, otherwise the product of the particle's {max occurs} and the sum of the {max occurs} of every wildcard or element declaration particle in the group's {particles} and the maximum part of the effective total range of each of the group particles in the group's {particles} (or 0 if there are no {particles})
Outline
• DTDs and their drawbacks
• XML Schema Requirements
• XSDL
• RELAX
• Other Schema specifications
RELAX
• Developed by Makoto Murata & others in Japan
• based on the hedge automaton theory
• borrows rich datatypes from XML Schema Part2
• Submitted to ISO fast-track
• ease of translation from/to DTDs
Main features of RELAX
• Separates element tagname and type– context sensitive content models
• allows content models similar to XML schema
• allows definition of element and attribute groups
• annotations
• include mechanism for large schemas
Features absent in RELAX
• Support for namespaces – coming shortly??
• Identity constraints
• Inheritance
• New datatypes
XSDL vs. RELAX
• Allows sibling elements to have different types– allow the content model (author, title, author)
where the two author elements can have different content models
– introduces ambiguity• For content model (title, author*, author*)
• <title>”XYZ”</title><author/> is ambiguous
XSDL vs. RELAX• A single type can have multiple definitions
– actual definition which matches instance element found by exhaustive search
– atleast one match needs to be found
• nametype can be defined as name or expandedname– it is a choice of the two definitions
Extending existing types
• XSDL uses inheritance – can change (title, author*) to (title, author*,
contactauthor)
• In RELAX, add the new type definition completely– can change (title, author*) to (title,
contactauthor, author*) also
Using attribute values
• <price type=“int”>10</>
• <price type=“string”>ten</>
• content model of price element switched based on attribute value of type attribute
XSDL vs. RELAX
• RELAX– membership checking in linear time in SAX
model
• XSDL– type assignment in linear time in SAX/DOM
models• ignoring integrity constraints
Other Schema proposals
• XDR (XML-Data Reduced)– Microsoft’s Biztalk framework
• SOX (Schema for Object-oriented XML)– Commerce One
• DSD– AT&T and BRICS
• Schematron
References
• www.oasis-open.org/cover/schemas.html • www.w3.org/xml/schema.html• www.xml.gr.jp/relax/• Comparative Analysis of SIX XML Schema
Languages, Sigmod Record, Sept. 2000• Reasoning about XML Schema Languages using
Formal Language Theory, WWW submission