XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup...

57
XML

Transcript of XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup...

Page 1: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

XML

Page 2: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

2Microsoft

• The Extensible Markup Language (XML) is a general-purpose markup language.

• It is classified as an extensible language because it allows its users to define their own tags.

• XML is a markup language for documents containing structured information.

• Its primary purpose is to facilitate the sharing of structured data across different information systems, particularly via the Internet.

• XML is recommended by the World Wide Web Consortium. It is a fee-free open standard.

What is XML?

Page 3: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

3Microsoft

So XML is Just Like HTML?

• No. In HTML, both the tag semantics and the tag set are fixed.

• An <h1> is always a first level heading and the tag <mscit> is meaningless.

• XML specifies neither semantics nor a tag set.

Page 4: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

4Microsoft

Why Is XML Important?

• Plain Text– Since XML is not a binary format, you can create and edit files

with anything from a standard text editor to a visual development environment.

– That makes it easy to debug your programs, and makes it useful for storing small amounts of data.

– XML provides scalability for anything from small configuration files to a company-wide data repository.

• Data Identification– XML tells you what kind of data you have, not how to display it.

Page 5: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

5Microsoft

Why Is XML Important?

• XML can Separate Data from HTML– With XML, your data is stored outside your HTML.

• XML is Used to Exchange Data– With XML, data can be exchanged between incompatible

systems.

• XML and B2B– With XML, financial information can be exchanged over the

Internet.

Page 6: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

6Microsoft

Why Is XML Important?

• XML Can be Used to Share Data– With XML, plain text files can be used to share data.– Since XML data is stored in plain text format, XML provides a

software- and hardware-independent way of sharing data.– In the real world, computer systems and databases contain

data in incompatible formats. One of the most time-consuming challenges for developers has been to exchange data between such systems over the Internet.

– Converting the data to XML can greatly reduce this complexity and create data that can be read by many different types of applications.

Page 7: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

7Microsoft

Why Is XML Important?

• XML Can Be Used to Create Specialized Vocabularies– XML is an extensible standard. – By using XML as a base, you can create your own

vocabularies. Wireless Application Protocol (WAP), Wireless Markup Language (WML), and Simple Object Access Protocol (SOAP) are some examples of specialized XML vocabularies.

Page 8: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

8Microsoft

Other Applications of XML

• Molecular sciences with CML

– Peter Murray-Rust’s Chemical Markup Language (CML)

• Science and math with MathML

– The Mathematical Markup Language (MathML) is an XML application for mathematical equations.

• Webcasting with CDF

– Microsoft’s Channel Definition Format (CDF) is an XML application for defining channels. Web sites use channels to upload information to readers who subscribe to the site rather than waiting for them to come and get it. This is alternately called Webcasting or push.

Page 9: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

9Microsoft

Other Applications of XML

• Software updates through OSD– The Open Software Description (OSD) format is an XML

application co-developed by Marimba and Microsoft for updating software automatically. OSD defines XML tags that describe software components. The description of a component includes the version of the component, its underlying structure, and its relationships to and dependencies on other components

• Vector graphics with both PGML and VML– Vector graphics better than GIF, JPEG images– The Precision Graphics Markup Language (PGML) from

IBM, Adobe, Netscape, and Sun.– The Vector Markup Language (VML) from Microsoft,

Macromedia, Autodesk, Hewlett-Packard, and Visio

Page 10: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

10Microsoft

Other Applications of XML

• Financial data with OFX– the Open Financial Exchange Format (OFX)

is an XML application used to describe financial data of the type you’re likely to store in a personal finance product like Money or Quicken. Any program that understands OFX can read OFX data. And since OFX is fully documented and non-roprietary (unlike the binary formats of Money, Quicken, and other programs) it’s easy for programmers to write the code to understand OFX.

Page 11: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

11Microsoft

Other Applications of XML

• Financial data with OFX

Page 12: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

12Microsoft

Other Applications of XML

• Financial data with OFX

Page 13: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

13Microsoft

Other Applications of XML

• Automated voice responses with VoxML• Financial data with OFX• Legally binding forms with XFDL• Human resources job information with

HRML• Meta-data through RDF• Internal use of XML by various companies,

including Microsoft, Federal Express, and Netscape

Page 14: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

14Microsoft

XML Syntax

• The syntax rules of XML are very simple and very strict.

An Example XML Document

XML documents use a self-describing and simple syntax.

<?xml version="1.0" encoding="ISO-8859-1"?>

<note>

<to>Students</to>

<from>Faculty</from>

<heading>Reminder</heading>

<body>Learn XML</body>

</note>

Page 15: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

15Microsoft

XML Syntax

<?xml version="1.0" encoding="ISO-8859-1"?> • defines the XML version and the character encoding used in

the document. • In this case the document conforms to the 1.0 specification of

XML and uses the ISO-8859-1 (Latin-1/West European) character set.

<note>

The next line describes the root element of the document

Page 16: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

16Microsoft

XML Syntax

The next 4 lines describe 4 child elements of the root (to, from, heading, and body)

<to>Students</to>

<from>Faculty</from>

<heading>Reminder</heading>

<body>Learn XML</body>

</note>

last line defines the end of the root element

Page 17: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

17Microsoft

XML Syntax

• All XML Elements Must Have a Closing Tag• With XML, it is illegal to omit the closing tag.• In HTML some elements do not have to have a closing tag.

The following code is legal in HTML:

<p>This is a paragraph

<p>This is another paragraph

XML all elements must have a closing tag, like this:

<p>This is a paragraph</p>

<p>This is another paragraph</p>

Page 18: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

18Microsoft

XML Syntax

• XML Tags are Case Sensitive

<Message>This is incorrect</message>

<message>This is correct</message>

Page 19: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

19Microsoft

XML Syntax

• XML Elements Must be Properly Nested• Improper nesting of tags makes no sense to XML.

<b><i>This text is bold and italic</b></i>

In XML all elements must be properly nested within each other like this:

<b><i>This text is bold and italic</i></b>

Page 20: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

20Microsoft

XML Syntax

• XML Documents Must Have a Root Element• All XML documents must contain a single tag pair to define a

root element.• All other elements must be within this root element.• All elements can have sub elements (child elements). Sub

elements must be correctly nested within their parent element:

<root>

<child>

<subchild>.....</subchild>

</child>

</root>

Page 21: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

21Microsoft

XML Syntax

• XML Attribute Values Must be Quoted• With XML, it is illegal to omit quotation marks around attribute

values. • XML elements can have attributes in name/value pairs just like in

HTML. • In XML the attribute value must always be quoted. <?xml version="1.0" encoding="ISO-8859-1"?> <note date=12/11/2002> <to>Students</to> </note>

<?xml version="1.0" encoding="ISO-8859-1"?> <note date="12/11/2002"> <to>Students</to> </note>

Page 22: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

22Microsoft

XML Syntax

• With XML, White Space is Preserved• With XML, the white space in your document is not truncated.

Comments in XML• The syntax for writing comments in XML is similar to that of

HTML.

<!-- This is a comment -->

Page 23: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

23Microsoft

XML Sibling Elements

<book> <title>My First XML</title> <prod id="33-657" edia="paper"></prod><chapter>Introduction to XML

<para>What is HTML</para> <para>What is XML</para>

</chapter> <chapter>XML Syntax

<para>Elements must have a closing tag</para> <para>Elements must be properly nested</para>

</chapter> </book> Sibling Elements: Title, prod, and chapter are siblings (or sister elements) because they have the same parent.

Page 24: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

24Microsoft

Use of Elements vs. Attributes

• Data can be stored in child elements or in attributes.

<person gender="female">

<firstname>ABC</firstname>

<lastname>XYZ</lastname>

</person>

<person>

<gender>female</gender>

<firstname>ABC</firstname>

<lastname>XYZ</lastname>

</person>

Page 25: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

25Microsoft

Use of Elements vs. Attributes

Some of the problems with using attributes are: • attributes cannot contain multiple values (child elements can) • attributes are not easily expandable (for future changes) • attributes cannot describe structures (child elements can) • attributes are more difficult to manipulate by program code • attribute values are not easy to test against a Document Type

Definition (DTD) - which is used to define the legal elements of an XML document

Page 26: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

26Microsoft

Well Formed XML Document

Well Formed XML Documents• A "Well Formed" XML document has correct XML syntax.• A "Well Formed" XML document is a document that conforms

to the following:– XML documents must have a root element – XML elements must have a closing tag – XML tags are case sensitive – XML elements must be properly nested – XML attribute values must always be quoted

Page 27: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

27Microsoft

Why usa a DTD?

• With a DTD, each of your XML files can carry a description of its own format.

• With a DTD, independent groups of people can agree to use a standard DTD for interchanging data.

• Your application can use a standard DTD to verify that the data you receive from the outside world is valid.

• You can also use a DTD to verify your own data.

Page 28: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

28Microsoft

Valid XML

• Valid XML Documents– A "Valid" XML document also conforms to a DTD.– A "Valid" XML document is a "Well Formed" XML document,

which also conforms to the rules of a Document Type Definition (DTD)

Page 29: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

29Microsoft

Valid XML

• XML DTD – This is a statement of rules for an XML document– The purpose of a DTD is to define the legal building blocks of

an XML document. – It helps ensure the accuracy of the information you collect.– It helps ensure that the information gathered is in the most

usable format for your business needs.

Page 30: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

30Microsoft

DTD

• A DTD can be declared inline inside an XML document, or as an external reference.

• Internal DTD Declaration• If the DTD is declared inside the XML file, it should be

wrapped in a DOCTYPE definition with the following syntax:

<!DOCTYPE root-element [element-declarations]>

Page 31: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

31Microsoft

DTD Terms

Page 32: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

32Microsoft

Inline to DTD

<?xml version="1.0"?> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <note>

<to>Students</to> <from>Faculty</from> <heading>Reminder</heading> <body>Don't forget to learn XML !!!</body>

</note>

Page 33: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

33Microsoft

External DTD Declaration

• If the DTD is declared in an external file, it should be wrapped in a DOCTYPE definition with the following syntax:– <!DOCTYPE root-element SYSTEM "filename">

<?xml version="1.0"?> <!DOCTYPE note SYSTEM "note.dtd"> <note>

File "note.dtd" which contains the DTD:<!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>

Page 34: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

34Microsoft

DTD Explanation

• !DOCTYPE note defines that the root element of this document is note

• !ELEMENT note defines that the note element contains four elements: "to,from,heading,body“

• !ELEMENT to defines the to element to be of type "#PCDATA"

• !ELEMENT from defines the from element to be of type "#PCDATA"

• !ELEMENT heading defines the heading element to be of type "#PCDATA"

• !ELEMENT body defines the body element to be of type "#PCDATA"

Page 35: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

35Microsoft

DTD Elements

• DTD elements can be defined as follows:• <!ELEMENT Name EMPTY>

– Sometimes, you may want an element type to remain empty with no content to call its own

• <!ELEMENT Name ANY>– If you want your element to serve as a catch-all box that you

can put anything in, you may want to use another type of content specification: ANY.

– If you declare an element to contain ANY content, you allow that element type to hold any element or character data.

Page 36: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

36Microsoft

Element Attribute

• <!ATTLIST element-name attribute-name datatype defaultvalue>– The attribute-list declaration begins with the !ATTLIST string,

followed by white space.– Next is the element name, the associated attribute’s name, its

type, and a default value.– For Example

<!ATTLIST customer custType CDATA #REQUIRED>

<!ATTLIST price priceType (Retail | Wholesale) #REQUIRED>

Page 37: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

37Microsoft

Element Attribute

• #REQUIRED means you must always include the attribute when the element is used.

• No specific default value for the attribute can be included in this case, so you must include a value for it in your– <!ATTLIST element-name attribute-name CDATA

#REQUIRED>

• #IMPLIED means the attribute is optional. The attribute may be used in an element, but no default value is provided if the attribute isn’t used. – <!ATTLIST element-name attribute-name CDATA #IMPLIED>

Page 38: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

38Microsoft

Element Attribute

• #FIXED means the attribute is optional, but if used, the attribute must always take on the default value assigned in the DTD. – <!ATTLIST element-name attribute-name #FIXED “value”>

Page 39: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

39Microsoft

Element and its attribute example

Page 40: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

40Microsoft

DTD PCDATA

• PCDATA means parsed character data.• Think of character data as the text found between the start

tag and the end tag of an XML element.• PCDATA is text that WILL be parsed by a parser.

Page 41: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

41Microsoft

DTD CDATA

• CDATA means character data.

• CDATA is text that will NOT be parsed by a parser.

• Tags inside the text will NOT be treated as markup and entities will not be expanded.

Page 42: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

42Microsoft

XML Schema

• XML Schema is an XML-based alternative to DTDs.• An XML Schema describes the structure of an XML

document.• The XML Schema language is also referred to as XML Schema

Definition (XSD).

Page 43: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

43Microsoft

An XML Schema

The purpose of an XML Schema is to define the legal building

blocks of an XML document, just like a DTD.• defines elements that can appear in a document • defines attributes that can appear in a document • defines which elements are child elements • defines the order of child elements • defines the number of child elements • defines whether an element is empty or can include text • defines data types for elements and attributes • defines default and fixed values for elements and attributes

Page 44: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

44Microsoft

XML Schema

• XML Schemas use XML Syntax• You don't have to learn a new language • You can use your XML editor to edit your Schema files • You can use your XML parser to parse your Schema files • You can manipulate your Schema with the XML DOM

Page 45: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

45Microsoft

XML Schema Example

• The following example is an XML Schema file called "note.xsd" that defines the elements of the XML document ("note.xml")

<xs:element name="note"> <xs:complexType>

<xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/>

<xs:element name="body" type="xs:string"/> </xs:sequence>

</xs:complexType> </xs:element>

Page 46: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

46Microsoft

XML Schema

• The note element is a complex type because it contains other elements.

• The other elements (to, from, heading, body) are simple types because they do not contain other elements.

Page 47: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

47Microsoft

XML Schema

• Simple element is an XML element that contains only text. It cannot contain any other elements or attributes.

• The text can be of many different types. It can be one of the types included in the XML Schema definition (boolean, string, date, etc.), or it can be a custom type that you can define yourself.

• You can also add restrictions (facets) to a data type in order to limit its content, or you can require the data to match a specific pattern.

Page 48: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

48Microsoft

XML Schema

• The syntax for defining a simple element is:

<xs:element name=“abc" type=“xyz"/>

• Where abc is the name of the element and xyz is the data type of the element.

• XML Schema has a lot of built-in data types. The most common types are:– xs:string – xs:decimal – xs:integer – xs:boolean – xs:date – xs:time

Page 49: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

49Microsoft

XML Schema

• Here are some simple XML elements:

<lastname>Raj</lastname>

<age>36</age>

<dateborn>1987-03-27</dateborn>

• Here are the corresponding simple element definitions:

<xs:element name="lastname" type="xs:string"/>

<xs:element name="age" type="xs:integer"/>

<xs:element name="dateborn" type="xs:date"/>

Page 50: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

50Microsoft

XML Schema

• Simple elements may have a default value OR a fixed value specified.

• Default value is automatically assigned to the element when no other value is specified. In the following example the default value is "red":

<xs:element name="color" type="xs:string“ default="red"/>

• Fixed value is also automatically assigned to the element, and you cannot specify another value. In the following example the fixed value is "red":

<xs:element name="color" type="xs:string" fixed="red"/>

Page 51: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

51Microsoft

XML Schema

• The syntax for defining an attribute is:

<xs:attribute name=“abc" type=“xyz"/>

– Where abc is the name of the attribute and xyz specifies the data type of the attribute.

– Simple elements can’t have attributes!

Page 52: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

52Microsoft

XML Schema

• When an XML element or attribute has a data type defined, it puts restrictions on the element's or attribute's content.

• If an XML element is of type "xs:date" and contains a string like "Hello World", the element will not validate.

• With XML Schemas, you can also add your own restrictions to your XML elements and attributes.

Page 53: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

53Microsoft

XML Schema

• The following example defines an element called "age" with a restriction. The value of age cannot be lower than 0 or greater than 120:

<xs:element name="age"> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> <xs:maxInclusive value="120"/> </xs:restriction> </xs:simpleType> </xs:element>

Page 54: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

54Microsoft

XML Schema

• The example below defines an element called "car" with a restriction. The only acceptable values are: Audi, Ford, BMW:

<xs:element name="car" type="carType"/> <xs:simpleType name="carType"> <xs:restriction base="xs:string"> <xs:enumeration value="Audi"/> <xs:enumeration value=“Ford"/> <xs:enumeration value="BMW"/> </xs:restriction> </xs:simpleType>

• Note: In this case the type "carType" can be used by other elements because it is not a part of the "car" element.

Page 55: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

55Microsoft

Page 56: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

56Microsoft

Page 57: XML. 2 Microsoft The Extensible Markup Language (XML) is a general-purpose markup language. markup language It is classified as an extensible language.

57Microsoft

An XML Document