x Ml Basics Seminar

97
XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold [email protected] http://metalab.unc.edu/xml/slides/

Transcript of x Ml Basics Seminar

Page 1: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 1/97

XML Basics

Wednesday May 12, 1999 SD99

Copyright 1999 Elliotte Rusty Harold

[email protected]

http://metalab.unc.edu/xml/slides/

Page 2: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 2/97

What is XML?

• Extensible Markup Language

•  A syntax for documents

•  A Meta-Markup Language

•  A Structural and Semantic language,

not a formatting language• Not just for Web pages

Page 3: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 3/97

XML is a Meta Markup Language

• Not like HTML, troff, LaTeX 

• Make up the tags you needs as you

need them

• The tags you create can bedocumented in a Document Type

Definition (DTD)

•  A meta syntax for domain-specificmarkup languages like MusicML,

MathML, and CML

Page 4: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 4/97

XML describes structure andsemantics, not formatting

•  XML documents form a tree

• Element and attribute names reflect

the kind of the element

• Formatting can be added with a stylesheet

Page 5: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 5/97

 A Song Description in HTML

 <dt>Hot Cop <dd> by Jacques Morali, Henri

Belolo, and Victor Willis <ul>  <li>Producer: Jacques Morali <li>Publisher: PolyGram Records <li>Length: 6:20

 <li>Written: 1978 <li>Artist: Village People </ul>  

Page 6: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 6/97

 A Song Description in XML

 <SONG>  <TITLE>Hot Cop</TITLE>  <COMPOSER>Jacques Morali</COMPOSER> 

 <COMPOSER>Henri Belolo</COMPOSER>  <COMPOSER>Victor Willis</COMPOSER>  <PRODUCER>Jacques Morali</PRODUCER>  <PUBLISHER>PolyGram Records</PUBLISHER> 

 <LENGTH>6:20</LENGTH>  <YEAR>1978</YEAR>  <ARTIST>Village People</ARTIST> 

 </SONG> 

Page 7: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 7/97

Style Sheets provide formattingSONG {display: block}

TITLE {display: block; font-family: Helvetica, serif;font-size: 20pt; font-weight: bold}

COMPOSER {display: block;font-family: Times, Times New Roman, serif;font-size: 14pt; font-style: italic}

ARTIST {display: block;font-family: Times, Times New Roman, serif;font-size: 14pt; font-weight: bold;font-style: italic}

PUBLISHER {display: block; font-size: 14pt;

font-family: Times, Times New Roman, serif}LENGTH {display: block;font-family: Times, Times New Roman, serif;font-size: 14pt}

YEAR {display: block;font-family: Times, Times New Roman, serif;

font-size: 14pt} 

Page 8: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 8/97

 Attaching style sheets todocuments

• Processing Instruction 

 <?xml-stylesheet type="text/css"href="song.css"?> 

• Converter Program 

Page 9: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 9/97

What is XML used for?

• Domain-Specific Markup Languages

• Self-Describing Data

• Interchange of Data Among Applications

Structured and Integrated Data

Page 10: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 10/97

Domain-Specific MarkupLanguages

• Non proprietary format

• Don’t pay for what you don’t use 

Page 11: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 11/97

Self-Describing Data

• Much data is lost due to formatproblems

•  XML is very simple

•  XML is self-describing

 XML is well documented

Page 12: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 12/97

 <PERSON ID="p1100" SEX="M">  <NAME>  <GIVEN>Judson</GIVEN> 

 <SURNAME>McDaniel</SURNAME>  </NAME>  <BIRTH>  <DATE>21 Feb 1834</DATE> 

 </BIRTH>  <DEATH>  <DATE>9 Dec 1905</DATE> 

 </DEATH> 

 </PERSON> 

Page 13: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 13/97

Interchange of Data Among Applications

• E-commerce

• Syndication

Page 14: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 14/97

Structured and Integrated Data

• Can specify relationships betweenelements

• Can assemble data from multiplesources

Page 15: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 15/97

XML Applications

•  A specific markup language uses the XML meta-syntax is called an XMLapplication

• Different XML applications havetheir own more constricted syntaxesand vocabularies within the broader

 XML syntax

• Further syntax can be layered ontop of this; e.g. data typing through

DCDs or other schemas

Page 16: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 16/97

Example XML Applications

• Web Pages

• Mathematical Equations

• Music Notation

•  Vector Graphics

• Metadata

• and more… 

Page 17: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 17/97

Mathematical Markup Language

Page 18: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 18/97

Channel Definition Format

 <?xml version="1.0"?>  <CHANNEL HREF="http://metalab.unc.edu/xml/index.html">  <TITLE>Cafe con Leche</TITLE>  <ITEM HREF="http://metalab.unc.edu/xml/books.html">  <TITLE>Books about XML</TITLE> 

 </ITEM>  <ITEM HREF="http://metalab.unc.edu/xml/tradeshows.html">  <TITLE>Trade shows and conferences about XML</TITLE> 

 </ITEM>  <ITEM HREF="http://metalab.unc.edu/xml/lists.htm">  <TITLE>Mailing Lists dedicated to XML</TITLE> 

 </ITEM>  </CHANNEL>  

Page 19: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 19/97

Classic Literature

• The Complete Plays of Shakespeare

• The Bible

• The Koran

• The Book of Mormon

Page 20: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 20/97

 Vector Graphics

•  Vector Markup Language (VML)

 –  Internet Explorer 5.0

 –  Microsoft Office 2000

• Scalable Vector Graphics (SVG)

Page 21: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 21/97

The Resource Description

Framework (RDF)

• Meta-data

• Dublin Core

• Better Web searching

Page 22: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 22/97

 An Example of RDF

 <rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:dc="http://purl.org/DC/>  <rdf:Descriptionabout="http://metalab.unc.edu/xml/>  <dc:CREATOR>Elliotte Rusty

Harold</dc:CREATOR> 

 <dc:TITLE>Cafe con Leche</dc:TITLE>  </rdf:Description> 

 </rdf:RDF>  

Page 23: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 23/97

XML for XML

•  XSL: The Extensible StylesheetLanguage

• DCD: The Document ContentDescription Schema Language

•  XLL: The Extensible Linking Language

Page 24: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 24/97

XSL: The Extensible StylesheetLanguage

•  XSL Transformations

•  XSL Formatting Objects

Page 25: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 25/97

DCD: The Document ContentDescription Schema Language

• Data Typing in XML is Weak 

• <MONTH>9</MONTH>

 <DCD>  <ElementDef Type="MONTH"

 Model="Data" Datatype="i1" Min="1" Max="12" /> 

 </DCD> 

Page 26: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 26/97

XLL: The Extensible LinkingLanguage

•  Any element can be a link 

• Links can be bi-directional

• Links can be separated from thedocuments they connect

 <footnote xlink:form="simple"href="footnote7.xml">7</footnote>  

File Formats In house

Page 27: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 27/97

File Formats, In-houseapplications, and other behind

the scenes uses • Microsoft Office 2000

• Federal Express Web API

• Netscape What’s Related 

Page 28: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 28/97

Hello XML

 <?xml version="1.0" standalone="yes"?>  <FOO> Hello XML!

 </FOO>  

• Plain ASCII or UTF-8 text

• .xml is standard file extension

• Any standard text editor will work 

Page 29: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 29/97

The XML Declaration 

• version attribute

 – required

 – always has the value 1.0

• standalone attribute

 – yes

 – no

• encoding attribute

 – UTF-8

 – 8859_1

 – etc.

 <?xml version="1.0" standalone="yes"?>  

Page 30: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 30/97

The FOO element

• Start tag <FOO>

Contents "Hello XML!"• End tag </FOO>

 <FOO> Hello XML!

 </FOO>  

Page 31: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 31/97

greeting.xml

 <?xml version="1.0" standalone="yes"?>  <GREETING> 

Hello XML! </GREETING>  

St l h t

Page 32: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 32/97

Style sheets

• Separate from the XML document

• Different Languages

 – Cascading Style Sheets Level 1 (CSS1)

Internet Explorer 5.0

Mozilla 5.0 – Cascading Style Sheets Level 2 (CSS2)

Internet Explorer 5 (partial)

Mozilla 5.0 (partial)

 – Extensible Style Language (XSL)

Internet Explorer 5.0 (older draft, buggy)

LotusXSL, XT, Other non-browser converters

 –

Document Style and Semantics Language

Page 33: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 33/97

xml-stylesheet

Style sheets are attached via an xml-stylesheet processing instruction in the

prolog

 <?xml version="1.0" standalone="yes"?>  <?xml-stylesheet type="text/css"

href="greeting.css"?>  <GREETING>Hello XML!</GREETING> 

 – type attribute has the value text/css or text/xsl

 – href attribute is a URL to the stylesheet, possiblyrelative

• Can also use non-browser converters like

Page 34: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 34/97

greeting.css

GREETING {display: block;font-size: 24pt;font-weight: bold} 

Page 35: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 35/97

A larger example: Baseballstatistics

• Examine the data

• Design a vocabulary for the data

• Write a style sheet

S l t ti ti

Page 36: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 36/97

Sample statistics

http://cbs.sportsline.com/u/baseball/mlb/

stats.htm 

Page 37: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 37/97

Organizing the Data

• XML documents are trees.

• XML elements contain other elements as

well as text

• Within these limits there's more than oneway to organize the data

 – Hierarchically

 – Relationally

 – Objects

Page 38: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 38/97

What is the Root Element

• The League?

• The Season?

• A custom Document element?

Page 39: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 39/97

The Root Element

 <?xml version="1.0"?>  <SEASON>  </SEASON>  

• Choose SEASON for the root element

• Everything else will be a descendant of SEASON

• This is not the only possible choice

Page 40: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 40/97

What are the ImmediateChildren of The root?

• Leagues?

• Teams?

• Players?

• Games?

Page 41: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 41/97

Child Elements

 <?xml version="1.0"?>  <SEASON>  <YEAR> 

1998

 </YEAR>  </SEASON>  

Page 42: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 42/97

White space in XML is notespecially significant

 <?xml version="1.0"?> 

 <SEASON><YEAR>1998</YEAR></SEASON> 

Page 43: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 43/97

Leagues

• Major league baseball is divided intotwo leagues

• Each league has –  a name

 –  three divisions

Page 44: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 44/97

Divisions

• Each division has

 –  name

 – 

4-6 teams

Page 45: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 45/97

Teams

• Each team has

 –  Name

 – 

City –  Players

Page 46: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 46/97

Player Data

• Each player has

 –  First name

 – 

Last name –  Position

 –  Statistics

Page 47: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 47/97

Player Batting Statistics

• G Games Played

• GS Games Started

•  AB At Bats

• R Runs• H Hits

• 2B Doubles

3B Triples• HR Home Runs

• RBI Runs Batted In

• SB Stolen Bases

• CS Caught Stealing

• SH Sacrifice Hits

• SF Sacrifice Flies• Err Errors

• PB Pitcher Balked

• BB Base on Balls(Walks)

• SO Strike Outs

• HBP Hit By Pitch

Page 48: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 48/97

What does a player look like

• Long names vs. short names 

Th C l t 1998 M j

Page 49: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 49/97

The Complete 1998 MajorLeague

• Long version 

Page 50: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 50/97

 A Style Sheet

• 1998shortstats.xml 

• baseballstats.css

• <?xml-stylesheet type="text/css"href="baseballstats.css"?> 

• styled1998shortstats.xml 

Page 51: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 51/97

Cascading Style Sheets

• Partially supported by Mozilla and IE5.0

•Full W3C Recommendation

Page 52: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 52/97

The Default Rule

• Not every element needs a rule

• The root element should be at least

display: block 

SEASON { font-size: 14pt; background-color: white;

color: black;display: block}

A t l l f th YEAR

Page 53: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 53/97

 A style rule for the YEAR element

• Make it look like a title

YEAR { display: block;font-size: 32pt;font-weight: bold;text-align: center}

St l R l f Di i i d L

Page 54: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 54/97

Style Rules for Division and LeagueNames

LEAGUE_NAME { display: block;text-align: center;font-size: 28pt;font-weight: bold}

DIVISION_NAME { display: block;text-align: center;font-size: 24pt;

font-weight: bold}

Alt t St l R l f Di i i

Page 55: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 55/97

 Alternate Style Rules for Divisionand League Names

LEAGUE_NAME, DIVISION_NAME {display: block;text-align: center;font-weight: bold}

LEAGUE_NAME {font-size: 28pt }DIVISION_NAME {font-size: 24pt }

Page 56: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 56/97

Style Rules for Teams• Team name and Team city must be one

title

• Must be inline elements

Previous and following must be block elements

TEAM_CITY { font-size: 20pt; font-weight: bold; font-style: italic}

TEAM_NAME { font-size: 20pt; font-weight: bold; font-style: italic}

TEAM, PLAYER {display: block}

Style Rules for Players

Page 57: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 57/97

Style Rules for PlayersTEAM {display: table}TEAM_CITY {display: table-caption}TEAM_NAME {display: table-caption}PLAYER {display: table-row}

SURNAME, GIVEN_NAME, POSITION,

GAMES, GAMES_STARTED, AT_BATS, RUNS, HITS,DOUBLES, TRIPLES, HOME_RUNS, RBI, STEALS,CAUGHT_STEALING, SACRIFICE_HITS,SACRIFICE_FLIES, ERRORS, WALKS, STRUCK_OUT,

HIT_BY_PITCH {display: table-cell} 

Finished Style Sheet

Page 58: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 58/97

Finished Style Sheet

SEASON {font-size: 14pt; background-color:

white; color: black; display: block}YEAR {display: block; font-size: 32pt;

font-weight: bold; text-align: center}LEAGUE_NAME {display: block; text-align:

center; font-size: 28pt; font-weight: bold}DIVISION_NAME {display: block; text-align:center; font-size: 24pt; font-weight: bold}TEAM_CITY {font-size: 20pt; font-weight:

 bold; font-style: italic}TEAM_NAME {font-size: 20pt;

font-weight: bold; font-style: italic}TEAM {display: block}PLAYER {display: block} 

Possible Extensions

Page 59: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 59/97

Possible Extensions

• There should be captions like "RBI" or "At

Bats.”  • Derived numbers like batting averages are

not included.

• The titles are short. E.g. "1998" instead of "1998 Major League Baseball".

• The document is so long it's hard to read.

Something similar to IE5's collapsibleoutline view would be nice.

• Pitcher stats should be separated frombatter stats.

Page 60: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 60/97

Possible Solutions

• CSS Level 2

•  XSL

•  XSL + JavaScript

Page 61: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 61/97

Well-formedness Rules• Open and close all tags

• Empty tags end with />  

• There is a unique root element

• Elements may not overlap

•  Attribute values are quoted

• < and & are only used to start tags andentities

• Only the five predefined entity references

are used

Page 62: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 62/97

Open and close all tags

Page 63: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 63/97

Empty tags end with />

• <BR/>, <HR/>, and <IMG/> insteadof <BR>, <HR>, and <IMG>

Web browsers deal inconsistently withthese

• Can use <BR></BR> <HR></HR>

<IMG></IMG> instead

Page 64: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 64/97

There is a unique root element

• One element completely contains allother elements of the document

This is HTML in HTML files

•  XML Declaration is not an element

<?xml version="1.0" standalone="yes"?>

<GREETING>

Hello XML!

</GREETING> 

Page 65: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 65/97

Elements may not overlap

• If an element contains a start tag foran element, it must also contain thecorresponding end tag

• Empty elements may appear anywhere

• Every non root element has a parent

element

Page 66: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 66/97

 Attribute values are quoted

• Good:

 –  <AHREF="http://metalab.unc.edu/xml/">

• Bad:

 –  <A HREF=http://metalab.unc.edu/xml/>

< and & are only used to start tags

Page 67: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 67/97

y gand entities

• Good: <H1>O'Reilly &amp; Associates</H1>  

• Bad:  <H1> O'Reilly &

 Associates</H1> 

• Good:

– <CODE>for (int i = 0; i &lt;=args.length; i++ ) { </CODE>  

• Bad:

 <CODE>for (int i = 0; i <= args.length;++

Only the five predefined entity

Page 68: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 68/97

Only the five predefined entityreferences are used

• Good:

– &amp;

– &lt;

– &gt;

– &quot;

– &apos; 

• Bad:

– &copy;

– &reg;

– &tm;

– &alpha;

– &eacute;

&nbsp;  –  etc.

Page 69: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 69/97

DTDs and Validity

•  A Document T ype Definition describesthe elements and attributes that mayappear in a document

•  Validation compares a particulardocument against a DTD

• Well-formedness is a prerequisite for

validity

Page 70: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 70/97

What is a DTD?

• a list of the elements, tags, attributes,and entities contained in a document,and their relationship to each other

• internal vs. external DTDs

Page 71: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 71/97

The importance of validation

• Ensures that data is correct beforefeeding it into a program

• Ensure that a format is followed

• Establish what must be supported

• Not all documents need to be valid;

sometimes well-formed is enough

Page 72: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 72/97

 A DTD for greeting.xml

• greeting.xml:

 <?xml version="1.0"?>  <GREETING> Hello XML! </GREETING> 

• greeting.dtd:

 <!ELEMENT GREETING (#PCDATA)>  

D t T D l ti

Page 73: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 73/97

Document Type Declarations <?xml version="1.0"?> 

 <!DOCTYPE GREETING SYSTEM "greeting.dtd">  <GREETING> Hello XML!

 </GREETING>  • specifies the root element

• gives a URL for the DTD

Invalid Documents

Page 74: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 74/97

Invalid Documents•  Valid: <GREETING>  various random text but no markup </GREETING> 

• Invalid: anything else including <GREETING>  <sometag>various random text</sometag>  <someEmptyTag/> 

 </GREETING> 

 –  or  <GREETING>  <GREETING>various random 

text</GREETING> 

Page 75: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 75/97

 Validating Tools

• Command line programs like XJParse

• Online validators

 – 

http://www.stg.brown.edu/service/xmlvalid/

 –  http://www.cogsci.ed.ac.uk/%7Erichard/ xml-check.html

• Browsers

l l

Page 76: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 76/97

Element Declarations

• Each tag must be declared in a<!ELEMENT> declaration.

•  A <!ELEMENT> declaration gives thename and content model of theelement

• The content model uses a simple

regular expression-like grammar toprecisely specify what is and isn'tallowed in an element

Content Specifications

Page 77: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 77/97

Content Specifications

 ANY • #PCDATA

• Sequences

• Choices

• Mixed Content

• Modifiers

• Empty

ANY

Page 78: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 78/97

 ANY 

 <!ELEMENT SEASON ANY> 

•  A SEASON can contain any childelement and/or raw text (parsedcharacter data)

#PCDATA

Page 79: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 79/97

#PCDATA 

 <!ELEMENT YEAR (#PCDATA)> 

• Parsed Character Data; i.e. raw text,no markup

#PCDATA

Page 80: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 80/97

#PCDATA 

•  Valid: <YEAR>1999</YEAR>  <YEAR>99</YEAR>  <YEAR>1999 C.E.</YEAR>  <YEAR> 

The year of our Lord one thousand, ninehundred, and ninety-nine

 </YEAR>  

• Invalid: <YEAR>  <MONTH>January</MONTH>  <MONTH>February</MONTH> 

 <MONTH>March</MONTH>  <MONTH>April</MONTH>  <MONTH>May</MONTH>  <MONTH>June</MONTH>  <MONTH>July</MONTH> 

 <MONTH>August</MONTH>  <MONTH>September</MONTH>  <MONTH>October</MONTH>  <MONTH>November</MONTH>  <MONTH>December</MONTH> 

 </YEAR>  

Child El t

Page 81: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 81/97

Child Elements

• To declare that a LEAGUE elementmust have a LEAGUE_NAME child:

 <!ELEMENT LEAGUE (LEAGUE_NAME)> 

 <!ELEMENT LEAGUE_NAME (#PCDATA)> 

S

Page 82: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 82/97

Sequences

• Separate multiple required childelements with commas; e.g.

 <!ELEMENT SEASON (YEAR, LEAGUE,

LEAGUE)> 

 <!ELEMENT LEAGUE (LEAGUE_NAME,DIVISION, DIVISION, DIVISION)> 

O M Child

Page 83: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 83/97

One or More Children +

 <!ELEMENT DIVISION_NAME (#PCDATA)> 

 <!ELEMENT DIVISION (DIVISION_NAME,TEAM+)> 

Z M Child *

Page 84: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 84/97

Zero or More Children *

 <!ELEMENT TEAM (TEAM_CITY, TEAM_NAME,PLAYER*)> 

 <!ELEMENT TEAM_CITY (#PCDATA)> 

 <!ELEMENT TEAM_NAME (#PCDATA)> 

Z O Child ?

Page 85: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 85/97

Zero or One Children ? <!ELEMENT PLAYER (GIVEN_NAME, SURNAME,

POSITION, GAMES, GAMES_STARTED, AT_BATS?, RUNS?, HITS?, DOUBLES?,TRIPLES?, HOME_RUNS?, RBI?, STEALS?,CAUGHT_STEALING?, SACRIFICE_HITS?,

SACRIFICE_FLIES?, ERRORS?, WALKS?,STRUCK_OUT?, HIT_BY_PITCH?, WINS?,LOSSES?, SAVES?, COMPLETE_GAMES?,SHUT_OUTS?, ERA?, INNINGS?,

EARNED_RUNS?, HIT_BATTER?, WILD_PITCHES?, BALK?,WALKED_BATTER?,STRUCK_OUT_BATTER?)

Fi i h d DTD

Page 86: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 86/97

Finished DTD

Ch i

Page 87: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 87/97

Choices

 <!ELEMENT PAYMENT (CASH |CREDIT_CARD)> 

 <!ELEMENT PAYMENT (CASH |

CREDIT_CARD | CHECK)> 

Grouping With Parentheses

Page 88: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 88/97

• Parentheses combine several elements

into a single element.

• Parenthesized element can be nestedinside other parentheses in place of a

single element.

• The parenthesized element can besuffixed with a plus sign, a comma, or a

question mark. <!ELEMENT dl (dt, dd)*>  <!ELEMENT ARTICLE (TITLE, (P | PHOTO |

GRAPH | SIDEBAR | PULLQUOTE |

SUBHEAD)*, BYLINE?)> 

Mixed Content

Page 89: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 89/97

Mixed Content

• Both #PCDATA and child elements in achoice

 <!ELEMENT TEAM (#PCDATA | TEAM_CITY |

TEAM_NAME | PLAYER)*> 

• #PCDATA must come first

•#PCDATA cannot be used in asequence

Empty elements

Page 90: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 90/97

Empty elements

 <!ELEMENT BR EMPTY> 

 <!ELEMENT IMG EMPTY> 

 <!ELEMENT HR EMPTY> 

Internal DTDs

Page 91: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 91/97

Internal DTDs

 <?xml version="1.0"?>  <!DOCTYPE GREETING [ <!ELEMENT GREETING (#PCDATA)> 

]>  <GREETING> Hello XML!

 </GREETING>  

Internal DTD Subsets

Page 92: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 92/97

Internal DTD Subsets

 <?xml version="1.0"?>  <!DOCTYPE GREETING SYSTEM 

"greeting.dtd" [

 <!ELEMENT GREETING (#PCDATA)> ]>  <GREETING> Hello XML!

 </GREETING> • Internal declarations override

external declarations

Programming with XML

Page 93: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 93/97

Programming with XML

• Java works best

• C, Perl, Python etc. can also be used

• Unicode support is the biggest issue

SAX the Simple API for XML

Page 94: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 94/97

SAX, the Simple API for XML

• Event based

• Programs can plug in different parsers

The Document Object Model

Page 95: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 95/97

j(DOM)

To Learn More: Books

Page 96: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 96/97

To Learn More: Books

•  XML: Extensible Markup Language

 –  IDG Books 1998

 –  ISBN 0-76453-199-9

• The XML Bible

 –  IDG Books 1999

 –  ISBN 0-76453-236-7

Questions?

Page 97: x Ml Basics Seminar

7/29/2019 x Ml Basics Seminar

http://slidepdf.com/reader/full/x-ml-basics-seminar 97/97

Questions?