1 XMLXML Slide Courtesy to prof. Elis Horowitz @ USC.

77
1 XML XML Slide Courtesy to prof. Elis Horowitz @ USC
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    1

Transcript of 1 XMLXML Slide Courtesy to prof. Elis Horowitz @ USC.

1

XMLXMLXMLXML

Slide Courtesy to prof. Elis Horowitz @ USC

2

What is XML•XML stands for Extensible Markup Language– the World Wide Web Consortium (W3C) directs the effort

•XML isn't a markup language, like HTML, but rather a system for defining other markup languages.•XML is a common syntax for expressing structure in data, and as a result a way for others to define new tags– whereas the <H1> tag in HTML specifies text to be presented in a certain typeface and weight, an XML tag would explicitly identify the kind of information it surrounds:

<AUTHOR> tag might identify the author of a document,

<PRICE> tag could contain an item's cost in an inventory list

3

SGML, XML and HTML• The parent of HTML and XML is Standard Generalized

Markup Language (SGML) an ISO standard for electronic document exchange

• SGML competes with other standards, mainly de facto standards, like Adobe PDF (Acrobat), Microsoft RTF (Rich Text Format) and popular word processor file formats like Microsoft Word.

• both XML and HTML are document formats derived from SGML. – Thus they all share certain characteristics, such as

a similar syntax and the use of bracketed tags. – But HTML is an application of SGML, whereas XML is a

subset of SGML. • XML documents can be

– read by any SGML authoring or viewing tool. – XML is less complex than SGML, and it is designed to

work across a limited-bandwidth network such as the Internet.

4

Why Are Developers Excited about XML?

• Domain-Specific Markup Languages– A DTD precisely describes the format– DTDs verify that documents adhere to the format– Ensures interoperability of unrelated tools

• Self-Describing Data– DTDs explain the format so reverse engineering isn't as

necessary– Comments in DTDs can go even further<!-- This should be a four digit year like "1999", not a

two-digit year like "99" --> <!ELEMENT YEAR (#PCDATA)> • Interchange of Data Among Applications

– E-commerce and syndication– DTDs make sure that two independent applications speak

the same language– DTDs detect malformed data– DTDs verify correct data

• Structured and Integrated Data– Can specify relationships between elements using element

declarations– Can assemble data from multiple sources using external

entity references declared in the DTD

5

XML Appications• Chemical Markup Language (CML)

– Jumbo: the first general-purpose XML browser– Assign each XML elements to a java class that

knows how to render that element– http://www.xml-cml.org

• Mathematical Markup Language (MathML)– The Amaya browser

• Synchronized Multimedia Integration Language (SMIL)

• Scalable Vector Graphics• MusicML • FoodWebML, GuiML

6

A Song Description in HTML

<dt>Hot Cop

<dd> by Jacques Morali, Henri Belolo, and Victor Willis <ul> <li>Producer: Jacques Morali

<li>Publisher: PolyGram Records

<li>Length: 6:20

<li>Written: 1978

<li>Artist: Village People

</ul>

7

A Song Description in XML

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?> <SONG LENGTH="6:20">

<TITLE>Hot Cop</TITLE>

<COMPOSER>Jacques Morali</COMPOSER>

<COMPOSER>Henri Belolo</COMPOSER>

<COMPOSER>Victor Willis

</COMPOSER>

<PRODUCER>Jacques Morali

</PRODUCER>

<PUBLISHER>PolyGram Records

</PUBLISHER> <YEAR>1978</YEAR>

<ARTIST>Village People</ARTIST>

</SONG>

8

Using XSLT

Attaching style sheets to documents

9

Attaching style sheets to documents

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>

<?xml-stylesheet type="text/css" href="song.css"?>

<SONG LENGTH="6:20">

<TITLE>Hot Cop</TITLE>

<COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER>

<COMPOSER>Victor Willis</COMPOSER>

<PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <YEAR>1978</YEAR>

<ARTIST>Village People</ARTIST>

</SONG>

Using CSS – simpler, but limitted

10

Well-formedness

• All XML documents must be well-formed• Well-formedness rules:

– Open and close all tags– Empty tags end with /> – There is a unique root element– Elements may not overlap– Attribute values are quoted– < and & are only used to start tags and entities

• Parsers are required to reject malformed documents.

• This improves compatibility and interoperability.

11

Well-formedness Rules• Open and close all tags• Empty tags end with /> • There is a unique root element• Elements may not overlap• Attribute values are quoted• < and & are only used to start tags and

entities• Only the five predefined entity references are

used

12

What is a Document Type Definition• A Document Type Definition (DTD) is a set of syntax

rules for tags. It tells you – what tags you can use in a document, – what order they should appear in, – which tags can appear inside other ones, – which tags have attributes, and so on.

• Originally developed for use with SGML, a DTD can be part of an XML document, but it's usually a separate document or series of documents.

• Because XML is not a language itself, but rather a system for defining languages, it doesn't have a universal DTD the way HTML does. Instead, each industry or organization that wants to use XML for data exchange can define its own DTDs.

• If an organization uses XML to tag documents for internal use only, it can create its own private DTD.

13

Validity

• To be valid an XML document must be

1.Well-formed

2.Must have a Document Type Definition (DTD)

3.Must comply with the constraints specified in the DTD

14

Validity is not always sufficient

• DTDs cannot specify anything about the contents of an element. – That an element must contain a number– That an element must contain a date– That a date must be between 1970 and 2001– etc.

• Custom validation layers can sit on top of XML validation

• Schemas will add this

15

XML Schemas

• an XML-based syntax, or schema, for defining how an XML document is marked up.

• recommended by Microsoft an alternative to Document Type Definition (DTD)

• DTDs have many drawbacks, including the use of non-XML syntax, no support for data-typing, and non-extensibility.

• XML Schema improves upon DTDs in several ways, including the use of XML syntax, and support for data-typing and namespaces.

• For example, an XML Schema allows you to specify an element as an integer, a float, a boolean, an URL, etc.

• The XML parser in Internet Explorer 5 can validate an XML document with both a DTD and an XML Schema.

16

How to process XML? Java Parsers

• DOM Parser – tree structure• SAX Parser – event driven approach

• DOM Parser makes use of SAX parser to parse and then create a tree structure

17

DTDs – Content Definitions• Content model definitions describe what may be

contained in an instance of an element– names of allowed or forbidden elements– DTD entities– document text

• syntax for expressing content is a form of regular expressions:– (…) delimits a group– A | B either A or B– A, B A followed by B– A & B A and B in any order– A? A occurs zero or one time– A* A occurs zero or more times– A+ A occurs one or more times

18

Element Declarations

• Each tag must be declared in a <!ELEMENT> declaration.

• A <!ELEMENT> declaration gives the name and content model of the element

• The content model uses a simple regular expression-like grammar to precisely specify what is and isn't allowed in an element

19

Content Specifications• ANY

– <!ELEMENT catalog ANY> – A catalog can contain any child element and/or raw text (parsed character data)

• #PCDATA– Parsed Character Data; i.e. raw text, no markup. For example,

– <year>1984</year> – <!ELEMENT year (#PCDATA)>

• Sequences

• Choices• Mixed Content• Modifiers• EMPTY

20

#PCDATAThere are a number of elements in the example document that only contain PCDATA:

<!ELEMENT category (#PCDATA)>

<!ELEMENT abstract (#PCDATA)>

<!ELEMENT keyword (#PCDATA)>

<!ELEMENT last_updated (#PCDATA)>

<!ELEMENT copyright (#PCDATA)>

<!ELEMENT first_name (#PCDATA)>

<!ELEMENT middle_name (#PCDATA)>

<!ELEMENT last_name (#PCDATA)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT year (#PCDATA)>

<!ELEMENT instruments (#PCDATA)>

<!ELEMENT publisher (#PCDATA)> <!ELEMENT length (#PCDATA)>

21

Comments in DTDs• DTDs seem fundamentally more obfuscated than C. • Comments can improve this by giving example

elements• Comments are the same as in HTML; e.g. <!--

Comment -->

<!-- e.g. "1999 New York Women Composers", not "Copyright 1999 New York Women Composers" -->

<!ELEMENT copyright (#PCDATA)>

22

Child Elements<date><year>1994</year></date> • To declare that a date element must have a year

child:

<!ELEMENT date (year)>

23

Child Elements•You only have to declare the immediate children<maintainer email="[email protected]" url="http://www.macfaq.com/personal.html"> <name> <first_name>Elliotte</first_name> <middle_name>Rusty</middle_name> <last_name>Harold</last_name> </name> </maintainer> <composer id="c1"> <name> <first_name>Julie</first_name> <middle_name></middle_name> <last_name>Mandel</last_name> </name> </composer> •To declare that an element must have exactly one name child:<!ELEMENT maintainer (name)> <!ELEMENT composer (name)>

24

Sequences<name>

<first_name>Elliotte</first_name> <middle_name>Rusty</middle_name>

<last_name>Harold</last_name>

</name> •Separate multiple required child elements with commas; e.g.

<!ELEMENT name (first_name, middle_name, last_name)> •A list of child elements separated by commas is called a sequence

25

More Sequences• To use a sequence in an ELEMENT declaration:

– The element being described must have only child elements, no mixed content

– You must know the order of the child elements

– You must know the type of each child element– You must know the number of child elements – The number can be relaxed with wild cards

26

One or More Children +<cataloging_info>

<abstract>Compositions by the members of New York Women Composers</abstract> <keyword>music publishing</keyword> <keyword>scores</keyword> <keyword>women composers</keyword> <keyword>New York</keyword> </cataloging_info> •The + suffix indicates that one or more of that element is required at that point

<!ELEMENT cataloging_info (abstract, keyword+)>

27

A DTD for Songs<!ELEMENT SONG (TITLE, COMPOSER+, PRODUCER*,

PUBLISHER*, LENGTH?, YEAR?, ARTIST+)>

<!ELEMENT TITLE (#PCDATA)>

<!ELEMENT COMPOSER (#PCDATA)>

<!ELEMENT PRODUCER (#PCDATA)>

<!ELEMENT PUBLISHER (#PCDATA)>

<!ELEMENT LENGTH (#PCDATA)> <!-- This should be a four digit year like "1999", not a two-digit year like "99" -->

<!ELEMENT YEAR (#PCDATA)>

<!ELEMENT ARTIST (#PCDATA)>

28

Internal DTDs

<?xml version="1.0"?>

<!DOCTYPE GREETING [

<!ELEMENT GREETING (#PCDATA)> ]>

<GREETING>

Hello XML!

</GREETING>

29

Complete Example – Mail MessageSuppose we describe an email message as consisting of:

a title; <!--Mail System DTD-->

a header made of: <!ELEMENT mail - - (head,body)>

the sender; <!ELEMENT head - O ((TO & FR) & SH?)>

the recipient; <!ELEMENT body - O (p*)>

a subject; <!ELEMENT TO - O (#PCDATA)>

the body text made of: <!ELEMENT FR - O (#PCDATA)>

four paragraphs; <!ELEMENT SH - O (#PCDATA)>

quoted material; <!ELEMENT p - O ((#PCDATA|cite)*)>

<!ELEMENT cite - - (#PCDATA)>

The tags are <MAIL><HEAD><BODY><TO><FR><SB><P><CITE>

<!-- is a comment, (head,body) implies a group with body following head

TO is followed by FR and both must appear, ? Means SB is optional, P may occur zero or more times

30

Well-formedness

• All XML documents must be well-formed• Well-formedness rules:

– Open and close all tags– Empty tags end with /> – There is a unique root element– Elements may not overlap– Attribute values are quoted– < and & are only used to start tags and entities

• Parsers are required to reject malformed documents.

• This improves compatibility and interoperability.

31

Well-formedness Rules• Open and close all tags• Empty tags end with /> • There is a unique root element• Elements may not overlap• Attribute values are quoted• < and & are only used to start tags and

entities• Only the five predefined entity references are

used

32

Open and close all tags• Good:

– <p>The quick brown fox jumped over the lazy dog</p>

– <li>A very <B>important</B> point</li> – Copyright 1999 Ellis Horowitz<br></br>

• Bad: – The quick brown fox jumped over the lazy dog<p>

– <li>A very <B>important point – Copyright 1999 Ellis Horowitz<br>

33

Empty tags end with />

• <BR/>, <HR/>, and <IMG/> instead of <BR>, <HR>, and <IMG>

• Web browsers deal inconsistently with these• Can use <BR></BR> <HR></HR> <IMG></IMG> instead

34

There is a unique root element• One element completely contains all other

elements of the document• This is HTML in HTML files• The XML declaration and xml-stylesheet

processing instruction are not elements

35

Elements may not overlap• If an element contains a start tag for an

element, it must also contain the corresponding end tag

• Empty elements may appear anywhere• Every non root element has a parent element

36

Attribute values are quoted• Good:

– <A HREF="http://metalab.unc.edu/xml/"> – <DIV ALIGN="CENTER"> – <A HREF="http://metalab.unc.edu/xml/"> – <EMBED SRC="minnesotaswale.aif" hidden="true">

• Bad: – <A HREF=http://metalab.unc.edu/xml/> – <DIV ALIGN=CENTER> – <EMBED SRC=minnesotaswale.aif hidden=true> – <EMBED SRC="minnesotaswale.aif" hidden>

37

< and & are only used to start tags and entities

• Good:

<H1>O'Reilly &amp; Associates</H1> • Bad:

<H1>O'Reilly & Associates</H1> • Good:

<CODE>for (int i = 0; i &lt;= args.length; i++ ) { </CODE>

• Bad:

<CODE>for (int i = 0; i <= args.length; i++ ) { </CODE>

38

Only the five predefined entity references are used

• Good: – &amp; – &lt; – &gt; – &quot; – &apos;

• Bad:– &copy;– &reg;– &tm;– &alpha;– &eacute;– &nbsp;– etc.

• DTDs loosen this restriction by allowing you to define new entities, even in an invalid document.

39

Validity

• To be valid an XML document must be

1.Well-formed

2.Must have a Document Type Definition (DTD)

3.Must comply with the constraints specified in the DTD

40

Validity is not always sufficient

• DTDs cannot specify anything about the contents of an element. – That an element must contain a number– That an element must contain a date– That a date must be between 1970 and 2001– etc.

• Custom validation layers can sit on top of XML validation

• Schemas will add this

41

XML Schemas

• an XML-based syntax, or schema, for defining how an XML document is marked up.

• recommended by Microsoft an alternative to Document Type Definition (DTD)

• DTDs have many drawbacks, including the use of non-XML syntax, no support for data-typing, and non-extensibility.

• XML Schema improves upon DTDs in several ways, including the use of XML syntax, and support for data-typing and namespaces.

• For example, an XML Schema allows you to specify an element as an integer, a float, a boolean, an URL, etc.

• The XML parser in Internet Explorer 5 can validate an XML document with both a DTD and an XML Schema.

42

Compare DTD & Schema

43

http://www.w3schools.com/schema/schema_schema.asp

44

A DTD for Songs<!ELEMENT SONG (TITLE, COMPOSER+, PRODUCER*,

PUBLISHER*, LENGTH?, YEAR?, ARTIST+)>

<!ELEMENT TITLE (#PCDATA)>

<!ELEMENT COMPOSER (#PCDATA)>

<!ELEMENT PRODUCER (#PCDATA)>

<!ELEMENT PUBLISHER (#PCDATA)> <!-- This should be a four digit year like "1999", not a two-digit year like "99" -->

<!ELEMENT YEAR (#PCDATA)>

<!ELEMENT ARTIST (#PCDATA)>

<!ATTLIST SONG LENGTH CDATA #IMPLIED>

45

A Valid Song Document<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?> <!DOCTYPE SONG SYSTEM "song.dtd">

<SONG LENGTH="6:20">

<TITLE>Hot Cop</TITLE>

<COMPOSER>Jacques Morali</COMPOSER>

<COMPOSER>Henri Belolo</COMPOSER>

<COMPOSER>Victor Willis</COMPOSER>

<PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER>

<YEAR>1978</YEAR>

<ARTIST>Village People</ARTIST>

</SONG>

46

XSLT - XSL TransformationsXSL (eXtensible Stylesheet Language) consists of two parts: XSL Transformations and XSL Formatting Objects. •An XSLT stylesheet is an XML document defining a transformation for a class of XML documents. •A stylesheet seperates contents and logical structure from presentation. •Not intended as completely general-purpose XML transformation language - designed for XSL Formatting Objects.Nevertheless: XSLT is generally useful. The basic idea:

                                                                                                                          

                                                    The basic design:XSLT is declarative and based on pattern-matching and templates

47

Song.xml processed with song2HTML.xsl

48

song2HTML.xsl

49

song2HTML.xsl

50

Transformer.java

51

Processing model

template rule = pattern + template

Construction of result tree fragment: •the source tree is processed by processing the root •a single node is processed by

1.finding the template rule with the best matching pattern 2.instantiating its template (creates fragment + continues processing recursively)

•a node list is processed by processing each node in order current node: the node currently being processedcurrent node list: the node list currently being processed(used for evaluation context later)

52

53

54

55

56

57

58

59

60

CSS Examples – self study

61

A Blank Style Sheet

<?xml version="1.0" encoding="ISO-8859-1"?>

<?xml-stylesheet type="text/css" href="compositions1.css"?>

<catalog> ... </catalog>

62

The Default Rule• Not every element needs a rule• The root element should be at least display:

block

catalog { font-family: New York, Times New Roman, serif;

font-size: 14pt; background-color: white; color: black; display: block }

63

A style rule for the category element•Make it look like an H1 headingcategory { display: block; font-family: Helvetica, Arial, sans; font-size: 32pt; font-weight: bold; text-align: center} catalog { font-family: New York, Times New Roman, serif; font-size: 14pt; background-color: white; color: black; display: block }

64

A style rule for the composer element

• Make it look like a level 2 head

• No need to styleize the first, middle, and last names separately

composer { display: block;

font-family: Helvetica, Arial, sans;

font-size: 24pt; font-weight: bold;

text-align: left}

65

A style rule for the title element

• composition title { display: block; font-family: Helvetica, Arial, sans; font-size: 18pt; font-weight: bold; text-align: left}

66

Style Rules for composition children

composition * {display:list-item}

description {display: block}

67

Finished Style Sheetcategory { display: block; font-family: Helvetica, Arial,

sans; font-size: 32pt; font-weight: bold; text-align: center}

catalog { font-family: New York, Times New Roman, serif; font-size: 14pt; background-color: white; color: black; display: block }

composer { display: block; font-family: Helvetica, Arial, sans; font-size: 24pt; font-weight: bold; text-align: left}

composition title { display: block; font-family: Helvetica, Arial, sans; font-size: 18pt; font-weight: bold; text-align: left}

composition * {display:list-item} description {display: block} // cataloging_info is only for

search engines cataloging_info { display: none; color: #FFFFFF}

last_updated, copyright, maintainer {display: block; font-size: small} copyright:before {content: "Copyright " }

last_updated:before {content: "Last Modified " } last_updated {margin-top: 2ex }

68

Java Parsers• DOM Parser – tree structure• SAX Parser – event driven approach

• DOM Parser makes use of SAX parser to parse and then create a tree structure

69

Day Planner – example DTD

70

Planner Application

71

72

73

74

75

76

77