XML Chap8 Sebesta Web2

52
Introduction to XML Updated by Dr Suthikshn Kumar [email protected]

Transcript of XML Chap8 Sebesta Web2

Page 1: XML Chap8 Sebesta Web2

Introduction to XML

Updated byDr Suthikshn [email protected]

Page 2: XML Chap8 Sebesta Web2

Contents

Introduction Syntax of XML XML Document Structure NameSpaces XML Schemas Displaying Raw XML documents Displaying XML Documents with CSS XSLT Style Sheets XML Processors Web Services Summary

Page 3: XML Chap8 Sebesta Web2

Intro to XML

The Extensible Markup Language (XML) is a general-purpose markup language.

It is classified as an extensible language because it allows its users to define their own tags.

Its primary purpose is to facilitate the sharing of structured data across different information systems, particularly via the Internet.

It is used both to encode documents and serialize data. In the latter context, it is comparable with other text-based serialization

languages such as JSON and YAML. It started as a simplified subset of the Standard Generalized Markup

Language (SGML), and is designed to be relatively human-legible. By adding semantic constraints, application languages can be

implemented in XML. These include XHTML, RSS, MathML, GraphML, Scalable Vector Graphics, MusicXML, and thousands of others.

Moreover, XML is sometimes used as the specification language for such application languages.

XML is recommended by the World Wide Web Consortium. It is a fee-free open standard. The W3C recommendation specifies both the lexical grammar, and the requirements for parsing.

Page 4: XML Chap8 Sebesta Web2

Introduction

What is XML?• XML stands for EXtensible Markup Language • XML is a markup language much like HTML • XML was designed to describe data • XML tags are not predefined. You must define your

own tags • XML uses a Document Type Definition (DTD) or an

XML Schema to describe the data • XML with a DTD or XML Schema is designed to be self-

descriptive • XML is a W3C Recommendation

XML is a W3C Recommendation• The Extensible Markup Language (XML) became a W3C

Recommendation 10. February 1998.

Page 5: XML Chap8 Sebesta Web2

The Main Difference Between XML and HTML

XML was designed to carry data.• XML is not a replacement for HTML.

XML and HTML were designed with different goals:• XML was designed to describe data and to focus on what data is.

HTML was designed to display data and to focus on how data looks.• HTML is about displaying information, while XML is about describing information.

XML Does not DO Anything XML was not designed to DO anything.

• Maybe it is a little hard to understand, but XML does not DO anything. XML was created to structure, store and to send information.

• The following example is a note to Tove from Jani, stored as XML:• <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading>

<body>Don't forget me this weekend!</body> </note>• The note has a header and a message body. It also has sender and receiver

information. But still, this XML document does not DO anything. It is just pure information wrapped in XML tags. Someone must write a piece of software to send, receive or display it.

XML is Free and Extensible XML tags are not predefined. You must "invent" your own tags.

• The tags used to mark up HTML documents and the structure of HTML documents are predefined. The author of HTML documents can only use tags that are defined in the HTML standard (like <p>, <h1>, etc.).

• XML allows the author to define his own tags and his own document structure.• The tags in the example above (like <to> and <from>) are not defined in any

XML standard. These tags are "invented" by the author of the XML document.• and data transmission

Page 6: XML Chap8 Sebesta Web2

XML is a Complement to HTML

XML is not a replacement for HTML.• It is important to understand that XML is not a replacement for HTML.

In future Web development it is most likely that XML will be used to describe the data, while HTML will be used to format and display the same data.

• My best description of XML is this: XML is a cross-platform, software and hardware independent tool for transmitting information.

XML in Future Web Development XML is going to be everywhere.

• We have been participating in XML development since its creation. It has been amazing to see how quickly the XML standard has been developed and how quickly a large number of software vendors have adopted the standard.

• We strongly believe that XML will be as important to the future of the Web as HTML has been to the foundation of the Web and that XML will be the most common tool for all data manipulation

Page 7: XML Chap8 Sebesta Web2

XML can Separate Data from HTML

With XML, your data is stored outside your HTML.• When HTML is used to display data, the data is stored inside your HTML. With

XML, data can be stored in separate XML files. This way you can concentrate on using HTML for data layout and display, and be sure that changes in the underlying data will not require any changes to your HTML.

• XML data can also be stored inside HTML pages as "Data Islands". You can still concentrate on using HTML only for formatting and displaying the data.

XML is Used to Exchange Data With XML, data can be exchanged between incompatible systems.

• In the real world, computer systems and databases contain data in incompatible formats. One of the most time-consuming challenges for developers has been to exchange data between such systems over the Internet.

• Converting the data to XML can greatly reduce this complexity and create data that can be read by many different types of applications.

XML and B2B With XML, financial information can be exchanged over the Internet.

• Expect to see a lot about XML and B2B (Business To Business) in the near future.

• XML is going to be the main language for exchanging financial information between businesses over the Internet. A lot of interesting B2B applications are under development.

Page 8: XML Chap8 Sebesta Web2

XML Can be Used to Share Data

With XML, plain text files can be used to share data.• Since XML data is stored in plain text format, XML provides a software- and hardware-

independent way of sharing data.• This makes it much easier to create data that different applications can work with. It also

makes it easier to expand or upgrade a system to new operating systems, servers, applications, and new browsers. 

XML Can be Used to Store Data With XML, plain text files can be used to store data.

• XML can also be used to store data in files or in databases. Applications can be written to store and retrieve information from the store, and generic applications can be used to display the data.

XML Can Make your Data More Useful With XML, your data is available to more users.

• Since XML is independent of hardware, software and application, you can make your data available to other than only standard HTML browsers.

• Other clients and applications can access your XML files as data sources, like they are accessing databases. Your data can be made available to all kinds of "reading machines" (agents), and it is easier to make your data available for blind people, or people with other disabilities.

XML Can be Used to Create New Languages XML is the mother of WAP and WML.

• The Wireless Markup Language (WML), used to markup Internet applications for handheld devices like mobile phones, is written in XML.

If Developers Have Sense If they DO have sense, all future applications will exchange their data in XML

Page 9: XML Chap8 Sebesta Web2

XML Syntax As long as only well-formedness is required, XML is a generic framework for

storing any amount of text or any data whose structure can be represented as a tree.

The only indispensable syntactical requirement is that the document has exactly one root element (alternatively called the document element).

This means that the text must be enclosed between a root opening tag and a corresponding closing tag. The following is a well-formed XML document:

<book>This is a book.... </book> The root element can be preceded by an optional XML declaration. This

element states what version of XML is in use (normally 1.0); it may also contain information about character encoding and external dependencies.

<?xml version="1.0" encoding="UTF-8"?> The specification requires that processors of XML support the pan-Unicode

character encodings UTF-8 and UTF-16 (UTF-32 is not mandatory). The use of more limited encodings, such as those based on ISO/IEC 8859, is acknowledged and is widely used and supported.

Comments can be placed anywhere in the tree, including in the text if the content of the element is text or #PCDATA:

<!-- This is a comment. --> In any meaningful application, additional markup is used to structure the

contents of the XML document. The text enclosed by the root tags may contain an arbitrary number of XML elements. The basic syntax for one element is:

<name attribute="value">content</name>

Page 10: XML Chap8 Sebesta Web2

Example: Recipe for making bread

<?xml version="1.0" encoding="UTF-8"?>

<recipe name="bread" prep_time="5 mins" cook_time="3 hours"> <title>Basic bread</title> <ingredient amount="3" unit="cups">Flour</ingredient> <ingredient

amount="0.25" unit="ounce">Yeast</ingredient> <ingredient amount="1.5" unit="cups" state="warm">Water</ingredient>

<ingredient amount="1" unit="teaspoon">Salt</ingredient> <instructions>

<step>Mix all ingredients together.</step> <step>Knead thoroughly.</step> <step>Cover with a cloth, and leave for one hour in warm room.</step> <step>Knead again.</step> <step>Place in a bread baking tin.</step> <step>Cover with a cloth, and leave for one hour in warm room.</step> <step>Bake in the oven at 350° for 30 minutes.</step> </instructions>

</recipe>

Page 11: XML Chap8 Sebesta Web2

An Example XML Document

XML documents use a self-describing and simple syntax. <?xml version="1.0" encoding="ISO-8859-1"?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> The first line in the document - the XML declaration - defines the XML version and the

character encoding used in the document. In this case the document conforms to the 1.0 specification of XML and uses the ISO-8859-1 (Latin-1/West European) character set.

The next line describes the root element of the document (like it was saying: "this document is a note"):

<note>The next 4 lines describe 4 child elements of the root (to, from, heading, and body): <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me

this weekend!</body>And finally the last line defines the end of the root element: </note>Can you detect from this example that the XML document contains a Note to Tove

from Jani? Don't you agree that XML is pretty self-descriptive? All XML Elements Must Have a Closing Tag With XML, it is illegal to omit the closing tag. In HTML some elements do not have to have a closing tag. The following code is legal in

HTML: <p>This is a paragraph <p>This is another paragraphIn XML all elements must have a

closing tag, like this: <p>This is a paragraph</p> <p>This is another paragraph</p> Note: You might have noticed

from the previous example that the XML declaration did not have a closing tag. This is not an error. The declaration is not a part of the XML document itself. It is not an XML element, and it should not have a closing tag.

Page 12: XML Chap8 Sebesta Web2

Problems1. Create an XML document for a library to store their book

list. The details of the book such as Title, author, Publisher, Year of publication, Category, Reference Number should be captured by the XML document.

2. Create an XML documents for a travel agency. The xml files i.e, flights.xml, hotels.xml, capture the data of available flights with details about flight no, airlines, no. of seats, destination, etc. The hotels.xml captures the data regarding the hotels, categories, rooms, tariff, address etc.

3. An XML file is being designed to capture the data in white pages telephone directory. Design the xml to capture data such as name, phone no., street, address, city, etc.

4. An XML file is being used for storing the student information in an University. The information about each student i.e,, name, usn, semester, branch, college name, date of birth, contact phone, contact address are to stored in the xml. Design the xml.

Page 13: XML Chap8 Sebesta Web2

XML Tags are Case Sensitive

Unlike HTML, XML tags are case sensitive.• With XML, the tag <Letter> is different from the tag <letter>.• Opening and closing tags must therefore be written with the same case:• <Message>This is incorrect</message> <message>This is correct</message>

XML Elements Must be Properly Nested Improper nesting of tags makes no sense to XML.

• In HTML some elements can be improperly nested within each other like this:• <b><i>This text is bold and italic</b></i>In XML all elements must be properly

nested within each other like this:• <b><i>This text is bold and italic</i></b>

XML Documents Must Have a Root Element All XML documents must contain a single tag pair to define a root element.

• All other elements must be within this root element.• All elements can have sub elements (child elements). Sub elements must be

correctly nested within their parent element:• <root> <child> <subchild>.....</subchild> </child> </root>

Page 14: XML Chap8 Sebesta Web2

XML Attribute Values Must be Quoted

With XML, it is illegal to omit quotation marks around attribute values.  XML elements can have attributes in name/value pairs just like in HTML. In

XML the attribute value must always be quoted. Study the two XML documents below. The first one is incorrect, the second is correct:

<?xml version="1.0" encoding="ISO-8859-1"?> <note date=12/11/2002> <to>Tove</to> <from>Jani</from> </note><?xml version="1.0" encoding="ISO-8859-1"?> <note date="12/11/2002"> <to>Tove</to> <from>Jani</from> </note>The error in the first document is that the date attribute in the note element is not quoted. This is correct: date="12/11/2002". This is incorrect: date=12/11/2002.

With XML, White Space is Preserved With XML, the white space in your document is not truncated. This is unlike HTML. With HTML, a sentence like this: Hello              my name is Tove, will be displayed like this: Hello my name is Tove, because HTML reduces multiple, consecutive white space characters to a

single white space.

Page 15: XML Chap8 Sebesta Web2

Comments in XML

The syntax for writing comments in XML is similar to that of HTML.

<!-- This is a comment --> There is Nothing Special About XML There is nothing special about XML. It is just plain text with

the addition of some XML tags enclosed in angle brackets. Software that can handle plain text can also handle XML. In

a simple text editor, the XML tags will be visible and will not be handled specially.

In an XML-aware application however, the XML tags can be handled specially. The tags may or may not be visible, or have a functional meaning, depending on the nature of the application.

Page 16: XML Chap8 Sebesta Web2

XML Elements are Extensible XML documents can be extended to carry more

information.• Look at the following XML NOTE example:• <note> <to>Tove</to> <from>Jani</from> <body>Don't forget

me this weekend!</body> </note>Let's imagine that we created an application that extracted the <to>, <from>, and <body> elements from the XML document to produce this output:

• MESSAGE To: ToveFrom: Jani

• Don't forget me this weekend!Imagine that the author of the XML document added some extra information to it:

• <note> <date>2002-08-01</date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>Should the application break or crash?

• No. The application should still be able to find the <to>, <from>, and <body> elements in the XML document and produce the same output.

XML documents are Extensible.

Page 17: XML Chap8 Sebesta Web2

XML Elements have Relationships

Elements are related as parents and children. To understand XML terminology, you have to know how relationships between XML elements

are named, and how element content is described. Imagine that this is a description of a book:

• My First XMLIntroduction to XML• What is HTML • What is XML • XML Syntax• Elements must have a closing tag • Elements must be properly nested

Imagine that this XML document describes the book: <book> <title>My First XML</title> <prod id="33-657" media="paper"></prod> <chapter>Introduction to XML <para>What is HTML</para> <para>What is XML</para> </chapter> <chapter>XML Syntax <para>Elements must have a closing tag</para> <para>Elements must be properly nested</para> </chapter> </book> Book is the root element. Title, prod, and chapter are child elements of book. Book is the

parent element of title, prod, and chapter. Title, prod, and chapter are siblings (or sister elements) because they have the same parent.

Page 18: XML Chap8 Sebesta Web2

Elements have Content Elements can have different content types.

• An XML element is everything from (including) the element's start tag to (including) the element's end tag.

• An element can have element content, mixed content, simple content, or empty content. An element can also have attributes.

• In the example above, book has element content, because it contains other elements. Chapter has mixed content because it contains both text and other elements. Para has simple content (or text content) because it contains only text. Prod has empty content, because it carries no information.

• In the example above only the prod element has attributes. The attribute named id has the value "33-657". The attribute named media has the value "paper". 

Element Naming XML elements must follow these naming rules:

• Names can contain letters, numbers, and other characters • Names must not start with a number or punctuation character • Names must not start with the letters xml (or XML, or Xml, etc) • Names cannot contain spaces

Page 19: XML Chap8 Sebesta Web2

XML Attributes

XML elements can have attributes.• From HTML you will remember this: <IMG

SRC="computer.gif">. The SRC attribute provides additional information about the IMG element.

• In HTML (and in XML) attributes provide additional information about elements:

• <img src="computer.gif"> <a href="demo.asp">Attributes often provide information that is not a part of the data. In the example below, the file type is irrelevant to the data, but important to the software that wants to manipulate the element:

• <file type="gif">computer.gif</file>

• f the attribute value itself contains double quotes it is necessary to use single quotes, like in this example: 

• <gangster name='George "Shotgun" Ziegler'>Note: If the attribute value itself contains single quotes it is necessary to use double quotes, like in this example: 

• <gangster name="George 'Shotgun' Ziegler">Use of Elements vs. Attributes

Page 20: XML Chap8 Sebesta Web2

Well Formed XML Documents

A "Well Formed" XML document has correct XML syntax.• A "Well Formed" XML document is a document that conforms to

the XML syntax rules that were described in the previous chapters:

• XML documents must have a root element • XML elements must have a closing tag • XML tags are case sensitive • XML elements must be properly nested • XML attribute values must always be quoted

<?xml version="1.0" encoding="ISO-8859-1"?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

Page 21: XML Chap8 Sebesta Web2

XML Document Structure

Two auxilary files:• DTD: file that specifies its tag set and structural and

syntactic rules• Style sheet to describe how the content of the document

is to be printed or displayed. An XML document consists of one or more entities that

are logically related collections of information, ranging from a single char to a book chapter.

Reasons for breaking a document into entities:• It is good to define a large document as a number of

smaller parts so to make it easy to manage.• If the same data appears in more than once place in the

document, defining it as an entity allows any number of references to a single copy of data.

• Many documents include images. It must be separate entity called binary entity.

Page 22: XML Chap8 Sebesta Web2

XML Document Structure When an XML processor encounters the name of a nonbinary

entity in a document, it replaces the name with the value it references.

Binary entities can be handled only by applications that deal with the document, such as browsers.

Entity names can have any length. They must begin with a letter, a dash or a colon.

A reference to entity is its name with a prepended ampersand and an appended semicolon. If apple_image is the name of an entity, &apple_image; is to reference it.

When several predefined entities must appear near each other in an XML doc, their references clutter the content and make it difficult to read.

In such cases character data section can be used. The content of character data section is not parsed by the XML

so it cannot include any tags. <![CDATA[ content ]]>

Page 23: XML Chap8 Sebesta Web2

Document Type Definitions. A DTD is set of structural rules called:

• Declarations which specify a set of elements that can appear in the document as well as how and where these elements may appear.

Use of a DTD is related to the use of an external style sheet for XHTML.

DTDs are used when the same tag set definition is used by collection of documents, perhaps by collection of users, and the collection must have a consistent and uniform structure.

The purpose of DTD is to define a standard form for a collection of XML documents.

This form is specified as the tag and attributes sets, as well as the rules that define how they can appear in a document.

All documents in the collection can be tested against the DTD to determine whether they conform to the rules it describes.

A DTD :• Internal DTD if it can be embedded in the XML doc• External DTD if it is stored as a separate file, preferable

Page 24: XML Chap8 Sebesta Web2

DTD syntax

DTD is a sequence of declarations enclosed in the block of a DOCTYPE markup declaration.

Each declaration within this block has the form of a markup declaration:

<!keyword …> Four possible key words can be used:

• ELEMENT used to define tags• ATTLIST used to define tag attributes• ENTITY used to define entities• NOTATION used to define data type notation

Page 25: XML Chap8 Sebesta Web2

Elements The declarations of a DTD have a form that is related to that of the

rules of context free grammars, also known as Backus-Naur Form (BNF)

Each element declaration in a DTD specifies the structure of one category of elements.

The declaration provides the name of the element whose structure is being defined along with the specification of the structure of that element.

<!ELEMENT element_name ( list of names of child elements )> Example: <!ELEMENT memo ( from, to, date, re, body)> This element describes a tree structure Child element specification modifiers are barrowed from regular

expressions. ( + : one or more occurences, *: zero or more occurences, ?: zero or one occurrence ).

Ex: <!ELEMENT person (parent+, age, spouse?, sibling*)> The leaf nodes of a DTD specify the data types of the contecnt of their

parent nodes .• PCDATA for parsable character data• EMPTY when element has no content• ANY when element may contain any content

The form of the leaf element is as follows:• <!ELEMENT element-name (#PCDATA)>

Page 26: XML Chap8 Sebesta Web2

Attributes The attributes of an element are declared separately from the element

declarations in a DTD An attribute declaration must include the name of the element to which

the attribute belongs, the attribute’s name and it’s type. The general form of an attribute declaration is as follows: <!ATTLIST element_name attribute_name attribute_type

[default_value]> If more than one attribute is declared for a given element, the

declarations can be combined. The common type for attributes is CDATA for any string characters Default values for attributes are: any value, #FIXED ( cannot be

changed ), #REQUIRED( every instance of the element specify a value), #IMPLIED ( The value may or may not be specified in an element)

• <!ATTLIST airplane places CDATA “4”>• <!ATTLIST airplane engine_type CDATA #REQUIRED>• <!ATTLIST airplane price CDATA #IMPLIED>• <!ATTLIST airplane manufacturer CDATA #FIXED “cessna”>

The following element is valid of this DTD• <airplane places = “10” engine_type = “jet”></airplane>

Page 27: XML Chap8 Sebesta Web2

Entities

Entities can be defined so that they can be referenced anywhere in the content of an XML document, in which case they are called general entities.

The form of an entity declaration : <!ENTITY [%] entity_name “entity_value”> % specifies the entity is a parameter entity ( entities

defined to be referenced only in markup declarations) Ex: <!ENTITY jfk “John Fitzegerald Kennedy”> Any XML document that uses the DTD includes this

declaration can specify the complete name with just the reference &jfk;

Page 28: XML Chap8 Sebesta Web2

Sample planes.dtd Consider a booklet of ads for used aircraft.<?xml version=“1.0” encoding = “utf-8”?><!– planes.dtd <!ELEMENT planes_for_sale (ad+)><!ELEMENT ad ( year, make, model, color, description, price?, seller,

location)><!ELEMENT year (#PCDATA)><!ELEMENT make (#PCDTATA)><!ELEMENT model (#PCDATA)><!ELEMENT color (#PCDATA)><!ELEMENT description (#PCDATA)><!ELEMENT price (#PCDATA)><!ELEMENT seller (#PCDATA)><!ELEMENT location (city, state)><!ELEMENT city (#PCDATA)><!ELEMENT state (#PCDATA)>

<!ATTLIST seller phone CDATA #REQUIRED><!ATTLIST seller email CDATA #IMPLIED>

<!ENTITY c “Cessna”><!ENTITY p “Piper”><!ENTITY b “Beechcraft”>

Page 29: XML Chap8 Sebesta Web2

Planes.xml<?xml version = “1.0” encoding = “utf-8”?><!– planes.xml <!DOCTYPE planes_for_sale SYSTEM “planes.dtd”><planes_for_sale><ad><year> 1977</year><make> &c; </make><model> skyhawk </model> <color> Light Blue and White </color><description> 685 hours, full IFR… </description><price> 23,495</price><seller phone = “555-222-3333”> skyway Aircraft </seller><location><city> Rapid city </city><state> South Dakota </state></location></ad></planes_for_sale>

Page 30: XML Chap8 Sebesta Web2

DTD Document Type Definition (DTD), defined slightly differently within

the XML and SGML (the language XML was derived from) specifications, is one of several SGML and XML schema languages, and is also the term used to describe a document or portion thereof that is authored in the DTD language.

A DTD is primarily used for the expression of a schema via a set of declarations that conform to a particular markup syntax and that describe a class, or type, of SGML or XML documents, in terms of constraints on the structure of those documents.

As an expression of a schema, a DTD specifies, in effect, the syntax of an "application" of SGML or XML, such as the derivative language HTML or XHTML. This syntax is usually a less general form of the syntax of SGML or XML.

In a DTD, the structure of a class of documents is described via element and attribute-list declarations.

Element declarations name the allowable set of elements within the document, and specify whether and how declared elements and runs of character data may be contained within each element.

Attribute-list declarations name the allowable set of attributes for each declared element, including the type of each attribute value, if not an explicit set of valid value(s).

Page 31: XML Chap8 Sebesta Web2

Associating DTDs with documents

A DTD is associated with an XML document via a Document Type Declaration, which is a tag that appears near the start of the XML document. The declaration establishes that the document is an instance of the type defined by the referenced DTD.

The declarations in a DTD are divided into an internal subset and an external subset. The declarations in the internal subset are embedded in the Document Type Declaration in the document itself. The declarations in the external subset are located in a separate text file. The external subset may be referenced via a public identifier and/or a system identifier. Programs for reading documents may not be required to read the external subset.

Examples Here is an example of a Document Type Declaration containing both public and

system identifiers: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> Here is an example of a Document Type Declaration that encapsulates an

internal subset consisting of a single entity declaration: <!DOCTYPE foo [ <!ENTITY greeting "hello"> ]> <!DOCTYPE bar [ <!ENTITY

greeting "hello"> ]>

Page 32: XML Chap8 Sebesta Web2

An XML DTD example

An example of an XML file which makes use of and conforms to this DTD follows. It assumes the DTD is identifiable by the relative URI reference "example.dtd":

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE people_list SYSTEM "example.dtd">

<people_list> <person> <name>Fred Bloggs</name>

<birthdate>27/11/2008</birthdate> <gender>Male</gender>

</person> </people_list>

Page 33: XML Chap8 Sebesta Web2

DTD problems:1. Create a DTD for a catalog of cars where each car has the child

elements make, model, year, color, engine, number_of_doors, transmission_type and accessories. The engine element has the child elements number_of_cylinders and fuel_system( carburatted or fuel injected ). The accessories element has the attributes radio, air_conditioning, power_windows, power_steering and power_brakes, each of which is required and has the possible values yes and no. Entities must be declared for the names of popular car makes.

2. Create an XML document with atleast three instances of the car element defined in the DTD of above. Process this document using the DTD and produce a display of the raw XML document.

3. Design an XML document to store information about patients in a hospital. Information about patients must include name ( in three parts), social security number, age, room number, primary insurance company- including member id number, group number, phone number, and address--- secondary insurance company ( in the same sub parts as for the primary insurance company), known medical problems, and know drug allergies. Both attributes and nested tags must be included. Make up sample data for at least four patients.

4. Write a DTD for the document described above. With the following restrictions: the name, social security number, age, room number, and primary insurance company are required. All the other elements are optional, as are middle names.

Page 34: XML Chap8 Sebesta Web2

Valid XML Documents

A "Valid" XML document also conforms to a DTD. A "Valid" XML document is a "Well Formed" XML document, which

also conforms to the rules of a Document Type Definition (DTD): <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE note

SYSTEM "InternalNote.dtd"> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> XML DTD

A DTD defines the legal elements of an XML document. The purpose of a DTD is to define the legal building blocks of an XML

document. It defines the document structure with a list of legal elements.

XML Schema  XML Schema is an XML based alternative to DTD. W3C supports an alternative to DTD called XML Schema. A General

XML Validator

Page 35: XML Chap8 Sebesta Web2

XML Errors will Stop you

Errors in XML documents will stop your XML program. • The W3C XML specification states that a program should not continue to

process an XML document if it finds an error. The reason is that XML software should be easy to write, and that all XML documents should be compatible.

• With HTML it was possible to create documents with lots of errors (like when you forget an end tag). One of the main reasons that HTML browsers are so big and incompatible, is that they have their own ways to figure out what a document should look like when they encounter an HTML error.

• With XML this should not be possible. Syntax-check your XML - IE Only

• To help you syntax-check your xml, we have used Microsoft's XML parser to create an XML validator.

• Paste your XML in the text area below, and syntax-check it by pressing the "Validate" button.

Syntax-check your XML File - IE Only• You can also syntax-check your XML file by typing the URL of your file into the

input field below, and then press the "Validate" button• Filename:

If you want to syntax-check an error-free XML file, you can paste the following address into the filename field: http://www.w3schools.com/xml/cd_catalog.xml

• Note: If you get the error "Access denied" when accessing this file, it is because your Internet Explorer security settings do not allow access across domains!

Page 36: XML Chap8 Sebesta Web2

Namespaces It is often convenient to construct XML documents that

include tag sets that are defined for and used by other documents.

When a tag set is available and appropriate, it is better to use it than to reinvent a new collection of element types.

An XML Namespace is a collection of element names used in XML documents.

The name os a namespace usually has the form of a Uniform Resource Identifier (URI)

A namespace for the elements of the hierarchy rooted at a particular element is declared as the value of the attribute xmlns.

< element_name xmlns[:prefix] = URI > Ex: <birds xmlns:bd =

http://www.abundon.org/names/species> Within the birds element, including all of its children

elements, the names from the namespace must be prefixed with bd, as :

<bd:lark>

Page 37: XML Chap8 Sebesta Web2

Example for NameSpace usage

<states>Xmlns = “http://www.states-info.org/states”Xmlns:cap = “http://www.states-info.org/state-capitals”<state><name> south Dakota </name><population> 754844 </population><capital><cap:name> Pierre </cap:name><cap:population> 12429 </cap:population></capital></state><!– more states -- -></states>

Page 38: XML Chap8 Sebesta Web2

XML Schemas DTDs have several disadvantages :

• DTDs are written in a syntax unrelated to XML, so they cannot be analyzed with an XML processor.

• It can be confusing to deal with two different syntactic forms, one to define a document and one to define its structure.

• DTDs do not allow restrictions on the form of data that can be content of a particular tag.

XML Schema by W3C is one of the alternatives for DTD. XML Schema is an XML document, so it can be parsed with an

XML parser. It also provides far more control over data types than do DTDs. Data in a specific element can be required to be of any one of

44 different data types. The user can even define new types with constraints on

existing data types. To promote the transition from DTDs to XML schemas, XML

schema was designed to allow any DTD to be automatically converted to an equivalent XML schema.

Page 39: XML Chap8 Sebesta Web2

Schema Fundas

Schemas can conveniently be related to the idea of a class and an object in an object-oriented programming language.

A schema is similar to a class definition; an XML document that conforms to the structure defined in the schema is similar to an object of the schema’s class.

XML documents that conform to a specific schema are considered instances of that schema.

Schemas have primary purposes:• Schema specifies the structure of its instance

documents, including which elements and attributes may appear as well as where and how often they may appear.

• Schema specifies data type of every element and attribute of its instance XML documents. ( this is where schemas outshine DTDs).

Page 40: XML Chap8 Sebesta Web2

Example Schema

<?xml version = “1.0” encoding = “utf-8”?><!– planes.xsd – a simple schema for planes.xml -- ><xsd:schema

xmlns:xsd = http://www.w3.org/2001/XMLSchematargetNamespace = “http:/cs.uccs.edu/planeSchema”xmlns = http://cs.uccs.edu/planeSchemaelementFormDefault = “qualified”>

<xsd:element name = “planes”><xsd:complexType>

<xsd:all><xsd:element name = “make”

type = “xsd:string”minOccurs = “1”maxOccurs = “unbounded” />

</xsd:all></xsd:complexType>

</xsd:element></xsd:schema>

Page 41: XML Chap8 Sebesta Web2

Planes.xml

<?xml version = “1.0” encoding = “utf-8”?><!– planes.xml -- ><planes

xmlns = “http://cs.uccs.edu/planeSchema”xmlns:xsi = “http://www.w3.org/2001/XMLSchema-instance”xsi:schemaLocation = “http://cs.uccs.edu/planeSchema/planes.xsd”>

<make> Cessna </make><make> Piper </make><make> Beechcraft </make></planes>

Page 42: XML Chap8 Sebesta Web2

Validating Instances of Schemas

Determining whether a given XML instance document conforms to the schema.

XSV: XML Schema Validator from University of Edinburgh.

The xsv can be used online. If the schema is not in the correct

format, the validator will report that it could not find the specified schema.

Page 43: XML Chap8 Sebesta Web2

Browser Support

Mozilla Firefox• As of version 1.0.2, Firefox has support for XML and XSLT (and CSS).

Mozilla• Mozilla includes Expat for XML parsing and has support to display XML

+ CSS. Mozilla also has some support for Namespaces.• Mozilla is available with an XSLT implementation.

Netscape• As of version 8, Netscape uses the Mozilla engine, and therefore it has

the same XML / XSLT support as Mozilla. Opera

• As of version 9, Opera has support for XML and XSLT (and CSS). Version 8 supports only XML + CSS.

Internet Explorer• As of version 6, Internet Explorer supports XML, Namespaces, CSS,

XSLT, and XPath. Note: Internet Explorer 5 also has XML support, but the XSL part is

NOT compatible with the official W3C XSL Recommendation!

Page 44: XML Chap8 Sebesta Web2

Viewing XML Files In Firefox and Internet Explorer: Open the XML file (typically by clicking on a link) - The XML document will be displayed with

color-coded root and child elements. A plus (+) or minus sign (-) to the left of the elements can be clicked to expand or collapse the element structure. To view the raw XML source (without the + and - signs), select "View Page Source" or "View Source" from the browser menu.

In Netscape 6: Open the XML file, then right-click in XML file and select "View Page Source". The XML

document will then be displayed with color-coded root and child elements. In Opera 7 and 8: In Opera 7: Open the XML file, then right-click in XML file and select "Frame" / "View Source".

The XML document will be displayed as plain text. In Opera 8: Open the XML file, then right-click in XML file and select "Source". The XML document will be displayed as plain text.

Look at this XML file: note.xml Note: Do not expect XML files to be formatted like HTML documents! Viewing an Invalid XML File If an erroneous XML file is opened, the browser will report the error.

XML documents do not carry information about how to display the data. Since XML tags are "invented" by the author of the XML document, browsers do not know if a

tag like <table> describes an HTML table or a dining table. Without any information about how to display the data, most browsers will just display the

XML document as it is.

Page 45: XML Chap8 Sebesta Web2

Displaying XML documents with CSS

CSS file that has style info for the elements in XML doc can be developed

The other way is to use th XSLT style sheet technology XSLT provides far more power over the appearance of the

documents display. XSLT is not supported by all the browsers. The form of a css style sheet for an XML document is simple. It is just the list element names, each followed by a brace-

delimited set of element’s CSS attributes. Planes.css<!– planes.css Ad { display : block; margin-top: 15px; color: blue;}Year, make, model { color: red; font-size: 16pt}

Using in an XML <?xml-stylesheet type = “text/css” href = “planes.css” >

Page 46: XML Chap8 Sebesta Web2

Displaying your XML Files with CSS?

It is possible to use CSS to format an XML document. Below is an example of how to use a CSS style sheet to format an XML document: Take a look at this XML file: The CD catalog Then look at this style sheet: The CSS file Finally, view: The CD catalog formatted with the CSS file Below is a fraction of the XML file. The second line, <?xml-stylesheet type="text/css"

href="cd_catalog.css"?>, links the XML file to the CSS file:<?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/css" href="cd_catalog.css"?> <CATALOG> <CD> <TITLE>Empire Burlesque</TITLE> <ARTIST>Bob Dylan</ARTIST> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia</COMPANY> <PRICE>10.90</PRICE> <YEAR>1985</YEAR> </CD> <CD> <TITLE>Hide your heart</TITLE> <ARTIST>Bonnie Tyler</ARTIST> <COUNTRY>UK</COUNTRY> <COMPANY>CBS Records</COMPANY> <PRICE>9.90</PRICE> <YEAR>1988</YEAR> </CD> . . . . </CATALOG>

Note: Formatting XML with CSS is NOT the future of how to style XML documents. XML document should be styled by using the W3C's XSL standard!

Page 47: XML Chap8 Sebesta Web2

Displaying XML with XSL

XSL is the preferred style sheet language of XML. XSL (the eXtensible Stylesheet Language) is far more sophisticated

than CSS. One way to use XSL is to transform XML into HTML before it is displayed by the browser as demonstrated in these examples:

Below is a fraction of the XML file. The second line, <?xml-stylesheet type="text/xsl" href="simple.xsl"?>, links the XML file to the XSL file:

<?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/xsl" href="simple.xsl"?> <breakfast_menu> <food> <name>Belgian Waffles</name> <price>$5.95</price> <description> two of our famous Belgian Waffles </description>

<calories>650</calories> </food> </breakfast_menu>

Page 48: XML Chap8 Sebesta Web2

XML Data Embedded in HTML

An XML data island is XML data embedded into an HTML page. <?xml version="1.0" encoding="ISO-8859-1"?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't

forget me this weekend!</body> </note> Then, in an HTML document, you can embed the XML file above with the <xml> tag. The id

attribute of the <xml> tag defines an ID for the data island, and the src attribute points to the XML file to embed:

<html> <body><xml id="note" src="note.xml"></xml></body> </html> However, the embedded XML data is, up to this point, not visible for the user. The next step is to format and display the data in the data island by binding it to HTML

elements. Bind Data Island to HTML Elements In the next example, we will embed an XML file called "cd_catalog.xml" into an HTML file. View "cd_catalog.xml". The HTML file looks like this: <html> <body> <xml id="cdcat" src="cd_catalog.xml"></xml> <table border="1"

datasrc="#cdcat"> <tr> <td><span datafld="ARTIST"></span></td> <td><span datafld="TITLE"></span></td> </tr> </table> </body> </html>

Example explained: The datasrc attribute of the <table> tag binds the HTML table element to the XML data island.

The datasrc attribute refers to the id attribute of the data island. <td> tags cannot be bound to data, so we are using <span> tags. The <span> tag allows the

datafld attribute to refer to the XML element to be displayed. In this case, it is datafld="ARTIST" for the <ARTIST> element and datafld="TITLE" for the <TITLE> element in the XML file. As the XML is read, additional rows are created for each <CD> element

Page 49: XML Chap8 Sebesta Web2

Example: XML News

XMLNews is a specification for exchanging news and other information.

Using such a standard makes it easier for both news producers and news consumers to produce, receive, and archive any kind of news information across different hardware, software, and programming languages.

An example XMLNews document:<?xml version="1.0" encoding="ISO-8859-1"?> <nitf> <head> <title>Colombia Earthquake</title> </head> <body>

<headline> <hl1>143 Dead in Colombia Earthquake</hl1> </headline> <byline>

<bytag>By Jared Kotler, Associated Press Writer</bytag> </byline> <dateline> <location>Bogota, Colombia</location> <date>Monday January 25 1999 7:28 ET</date> </dateline> </body>

</nitf>

Page 50: XML Chap8 Sebesta Web2

Parsing XML Documents To manipulate an XML document, you need an XML parser. The parser loads the document into your computer's memory. Once the document is loaded, its data can be manipulated using the DOM. The DOM treats the XML document as a tree.

. Microsoft's XML Parser Microsoft's XML parser is a COM component that comes with Internet Explorer 5 and higher. Once you have installed Internet Explorer, the parser is available to scripts. Microsoft's XML parser supports all the necessary functions to traverse the node tree, access

the nodes and their attribute values, insert and delete nodes, and convert the node tree back to XML.

To create an instance of Microsoft's XML parser, use the following code: JavaScript: var xmlDoc=new ActiveXObject("Microsoft.XMLDOM");VBScript: set xmlDoc=CreateObject("Microsoft.XMLDOM")ASP: set xmlDoc=Server.CreateObject("Microsoft.XMLDOM")The following code fragment loads an

existing XML document ("note.xml") into Microsoft's XML parser: var xmlDoc=new ActiveXObject("Microsoft.XMLDOM"); xmlDoc.async="false";

xmlDoc.load("note.xml");The first line of the script above creates an instance of the XML parser. The second line turns off asynchronized loading, to make sure that the parser will not continue execution of the script before the document is fully loaded. The third line tells the parser to load an XML document called "note.xml".

Page 51: XML Chap8 Sebesta Web2
Page 52: XML Chap8 Sebesta Web2

XML DTD Example

a very simple XML DTD to describe a list of persons is given below: <!ELEMENT people_list (person*)> <!ELEMENT person (name, birthdate?,

gender?, socialsecuritynumber?)> <!ELEMENT name (#PCDATA)> <!ELEMENT birthdate (#PCDATA)> <!ELEMENT gender (#PCDATA)> <!ELEMENT socialsecuritynumber (#PCDATA)>

Taking this line by line, it says: people_list is a valid element name, and an instance of such an element

contains any number of person elements. The * denotes there can be 0 or more person elements within the people_list element.

person is a valid element name, and an instance of such an element contains one element named name, followed by one named birthdate (optional), then gender (also optional) and socialsecuritynumber (also optional). The ? indicates that an element is optional. The reference to the name element name has no ?, so a person element must contain a name element.

name is a valid element name, and an instance of such an element contains parseable character data (#PCDATA).

birthdate is a valid element name, and an instance of such an element contains character data.

gender is a valid element name, and an instance of such an element contains character data.

socialsecuritynumber is a valid element name, and an instance of such an element contains character data.