Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML...

19
Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list of legal elements and attributes. A DTD can be declared inline inside an XML document, or as an external reference.

Transcript of Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML...

Page 1: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

Introduction to DTDA Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list of legal elements and attributes.A DTD can be declared inline inside an XML document, or as an external reference.

Page 2: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

XML DTD• A DTD is a set of rules that allow us to

specify our own set of elements and attributes.• DTD is grammar to indicate what tags

are legal in XML documents. c• XML Document is valid if it has an attached DTD and

document is structured according to rules defined in DTD.

Page 3: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

DTD Example

<BOOKLIST>

<BOOK GENRE = “Science” FORMAT = “Hardcover”>

<AUTHOR>

<FIRSTNAME> RICHRD </FIRSTNAME>

<LASTNAME> KARTER </LASTNAME>

</AUTHOR>

</BOOK>

</BOOKS>

<!DOCTYPE BOOKLIST[

<!ELEMENT BOOKLIST(BOOK)*> <!ELEMENT BOOK(AUTHOR)>

<!ELEMENT AUTHOR(FIRSTNAME,LASTNAME)>

<!ELEMENT FIRSTNAME(#PCDATA)>

<!ELEMENT>LASTNAME(#PCDATA)>

<!ATTLIST BOOK GENRE (Science|Fiction)#REQUIRED>

<!ATTLIST BOOK FORMAT (Paperback|Hardcover) “PaperBack”>]>

Page 4: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

Internal DTD DeclarationIf the DTD is declared inside the XML file, it should be wrapped in a

DOCTYPE definition with the following syntax:<!DOCTYPE root-element [element-declarations]> Example XML document with an internal DTD:<?xml version="1.0"?>

<!DOCTYPE note [<!ELEMENT note (to,from,heading,body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>]><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend</body></note>

Page 5: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

• The DTD above is interpreted like this:• !DOCTYPE note defines that the root element of this

document is note • !ELEMENT note defines that the note element contains

four elements: "to,from,heading,body" • !ELEMENT to defines the to element  to be of type

"#PCDATA" • !ELEMENT from defines the from element to be of type

"#PCDATA" • !ELEMENT heading defines the heading element to be

of type "#PCDATA" • !ELEMENT body defines the body element to be of type

"#PCDATA"

Page 6: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

External DTD Declaration

If the DTD is declared in an external file, it should be wrapped in a DOCTYPE definition with the following syntax:

<!DOCTYPE root-element SYSTEM "filename"> This is the same XML document as above, but with an external DTD (Open it, and select

view source):<?xml version="1.0"?>

<!DOCTYPE note SYSTEM "note.dtd"><note>  <to>Tove</to>  <from>Jani</from>  <heading>Reminder</heading>  <body>Don't forget me this weekend!</body></note>

And this is the file "note.dtd" which contains the DTD:<!ELEMENT note (to,from,heading,body)>

<!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>

Page 7: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

• Why Use a DTD?• With a DTD, each of your XML files can carry a

description of its own format.• With a DTD, independent groups of people can

agree to use a standard DTD for interchanging data.

• Your application can use a standard DTD to verify that the data you receive from the outside world is valid.

• You can also use a DTD to verify your own data.

Page 8: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

DTD - XML Building Blocks

• The main building blocks of both XML and HTML documents are elements.

• The Building Blocks of XML Documents• Seen from a DTD point of view, all XML documents (and

HTML documents) are made up by the following building blocks:

• Elements • Attributes • Entities • PCDATA • CDATA

Page 9: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

Elements• Elements are the main building blocks of both

XML and HTML documents.• Elements can contain text, other elements, or be

empty. Attributes• Attributes provide extra information about

elements.• Attributes are always placed inside the opening

tag of an element. Attributes always come in name/value pairs.

Page 10: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

Entities

• Some characters have a special meaning in XML, like the less than sign (<) that defines the start of an XML tag.

• Most of you know the HTML entity: "&nbsp;". This "no-breaking-space" entity is used in HTML to insert an extra space in a document. Entities are expanded when a document is parsed by an XML parser.

• The following entities are predefined in XML:Entity References Character

&lt; <&gt; > &amp; &&quot; “&apos; '

Page 11: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

• PCDATA• PCDATA means parsed character data.• Think of character data as the text found between the start tag and

the end tag of an XML element.• PCDATA is text that WILL be parsed by a parser. The text will

be examined by the parser for entities and markup.• Tags inside the text will be treated as markup and entities will be

expanded.• However, parsed character data should not contain any &, <, or >

characters; these need to be represented by the &amp; &lt; and &gt; entities, respectively.

• CDATA• CDATA means character data.• CDATA is text that will NOT be parsed by a parser. Tags inside

the text will NOT be treated as markup and entities will not be expanded.

Page 12: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

DTD - Elements

• In a DTD, elements are declared with an ELEMENT declaration.• Declaring Elements• In a DTD, XML elements are declared with an element declaration

with the following syntax:• <!ELEMENT element-name category>

or<!ELEMENT element-name (element-content)>

Empty ElementsEmpty elements are declared with the category

keyword EMPTY:<!ELEMENT element-name EMPTY>Example:<!ELEMENT br EMPTY>

Page 13: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

• <!ELEMENT element-name (#PCDATA)>- PARSED CHARACTER data

• <!ELEMENT element-name ANY> - Elements with any contents

Elements with Children (sequences)<!ELEMENT element-name (child1)>or<!ELEMENT element-name (child1,child2,...)>

Declaring Only One Occurrence of an Element<!ELEMENT element-name (child-name)>

Declaring Zero or More Occurrences of an Element<!ELEMENT element-name (child-name*)>

Page 14: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

XML: Motivation• Data interchange is critical in today’s networked world

– Examples:• Banking: funds transfer• Order processing (especially inter-company orders)• Scientific data

– Chemistry: ChemML, …– Genetics: BSML (Bio-Sequence Markup Language),

…– Paper flow of information between organizations is being

replaced by electronic flow of information• Each application area has its own set of standards for

representing information• XML has become the basis for all new generation data

interchange formats

Page 15: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

XML Motivation (Cont.)• Earlier generation formats were based on plain text with line headers

indicating the meaning of fields

– Similar in concept to email headers

– Does not allow for nested structures, no standard “type” language

– Tied too closely to low level document structure (lines, spaces, etc)

• Each XML based standard defines what are valid elements, using

– XML type specification languages to specify the syntax

• DTD (Document Type Descriptors)

• XML Schema

– Plus textual descriptions of the semantics

• XML allows new tags to be defined as required

– However, this may be constrained by DTDs

• A wide variety of tools is available for parsing, browsing and querying XML documents/data

Page 16: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

Structure of XML Data

• Tag: label for a section of data• Element: section of data beginning with

<tagname> and ending with matching </tagname>• Elements must be properly nested

– Proper nesting• <account> … <balance> …. </balance> </account>

– Improper nesting • <account> … <balance> …. </account> </balance>

– Formally: every start tag must have a unique matching end tag, that is in the context of the same parent element.

• Every document must have a single top-level element

Page 17: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

Example of Nested Elements <bank-1> <customer>

<customer_name> Hayes </customer_name> <customer_street> Main </customer_street> <customer_city> Harrison </customer_city> <account>

<account_number> A-102 </account_number> <branch_name> Perryridge </branch_name> <balance> 400 </balance>

</account> <account> … </account>

</customer> . .

</bank-1>

Page 18: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

Structure of XML Data (Cont.)• Mixture of text with sub-elements is legal in XML.

– Example: <account>

This account is seldom used any more. <account_number> A-102</account_number> <branch_name> Perryridge</branch_name> <balance>400 </balance></account>

– Useful for document markup, but discouraged for data representation

Page 19: Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

Attributes• Elements can have attributes

<account acct-type = “checking” > <account_number> A-102 </account_number> <branch_name> Perryridge </branch_name> <balance> 400 </balance>

</account>• Attributes are specified by name=value pairs inside the

starting tag of an element• An element may have several attributes, but each

attribute name can only occur once

<account acct-type = “checking” monthly-fee=“5”>