Introduction to XML 1. The XML Language

36
University of Nottingham ool of Computer Science & Information Technology Introduction to XML 1. The XML Language Tim Brailsford

description

Introduction to XML 1. The XML Language. Tim Brailsford. Markup Languages. The word “Markup” is derived from the printing industry Detailed stylistic instructions for typesetting Usually hand-written on the copy (eg underlining some text that is to be set in italics). - PowerPoint PPT Presentation

Transcript of Introduction to XML 1. The XML Language

Page 1: Introduction to XML 1.  The XML Language

University of Nottingham

School of Computer Science & Information Technology

Introduction to XML1. The XML Language

Tim Brailsford

Page 2: Introduction to XML 1.  The XML Language

2

University of Nottingham

Markup Languages The word “Markup” is derived from the printing

industry Detailed stylistic instructions for typesetting Usually hand-written on the copy (eg underlining some text

that is to be set in italics). Markup languages do the same job for computerised

documentation systems. Markup adds logical structure to a document, or

indicates how it is to be laid out (on paper or screen). Markup languages are a set of instructions that are

amenable to automatic processing.

Page 3: Introduction to XML 1.  The XML Language

3

University of Nottingham

Markup Languages (cont.) Usually a sequence of characters in a text file that

indicate structure or behaviour of the content. For example (in HTML)

This is <B>bold</B> and this is <I>italic</I> <TITLE>This is the title.</TITLE>

Markup may be created by directly editing the symbols, but is more usually hidden from end-users.

Examples HTML RTF Hytime

Page 4: Introduction to XML 1.  The XML Language

4

University of Nottingham

Generalised Markup Languages Proprietary markup languages are problematic.

Generalised markup languages are langauges for defining markup languages.

Metalanguages

SGML

Page 5: Introduction to XML 1.  The XML Language

5

University of Nottingham

SGML - History Standard Generalised Markup Language 1969 - GML from IBM

text editing formatting information retrieval

1980 SGML first published 1980’s SGML adopted by US IRS & DOD 1986 - ISO standard

ISO 8879: Information processing--Text and office systems--Standard Generalized Markup Language (SGML), ([Geneva]: ISO, 1986).

Page 6: Introduction to XML 1.  The XML Language

6

University of Nottingham

SGML SGML defines a system of tag markup

<TAG>This is a pair of SGML tags</TAG> SGML is a standard for how to specify a tag set. Document Type Definition (DTD) SGML documents contain structural elements that

can be described without consideration of how they are displayed.

SGML application. HTML is an SGML application.

Page 7: Introduction to XML 1.  The XML Language

7

University of Nottingham

Benefits of SGML Documents are created by thinking in terms of

structure rather than appearance (which may change over time).

Documents are portable because any SGML compliant software can interpret them by reference to the DTD.

Documents originally intended for one medium can easily be re-purposed for other media, such as the computer display screen.

Page 8: Introduction to XML 1.  The XML Language

8

University of Nottingham

What is XML? XML is based upon SGML, but is substantially

simplified for use on the WWW. Like SGML, XML is a metalanguage

arbitrary definition of elements <TITLE> <PARAGRAPH> <ChapterHeading> <PRICE>

<PARTNUMBER> <MANUFACTUER> <ExamGrade>

Syntax may optionally be described by a DTD Valid documents - have a DTD Well formed documents do not have a DTD

Style and content are completely separate XML documents contain content Style is specified by stylesheets

Page 9: Introduction to XML 1.  The XML Language

9

University of Nottingham

Example XML Applications MathML - maths CML - chemistry SVG - vector graphics XHTML - WWW SMIL - synchronised multimedia MusicML - sheet music FpML - financial products RETML - real estate transactions and many, many others

Page 10: Introduction to XML 1.  The XML Language

10

University of Nottingham

XML Elements XML documents consist of one or more elements. Elements consist of a pair of tags and (optionally) enclosed text.

<TITLE>The XML Companion</TITLE>

Elements may have attributes.

<TITLE type=“book”>The XML Companion</TITLE>

Elements may contain other elements.

<REFERENCE> <TITLE type=“book”>The XML Companion</TITLE></REFERENCE>

Empty elements may be self closing.

<PICTURE src=“mypic.jpg”> </PICTURE>

<PICTURE src=“mypic.jpg” />

Page 11: Introduction to XML 1.  The XML Language

11

University of Nottingham

Contents vs Style XML tags contain meaning not appearance. This allows extra information to be extracted Consider the example of the scientific names of

animals. scientific names are in latin by convention they are always printed in italics

The scientific name of the domestic dogis Canis familiaris, and of the domestic cat is Felis catus.

Page 12: Introduction to XML 1.  The XML Language

12

University of Nottingham

Contents vs Style XML tags contain meaning not appearance. This allows extra information to be extracted Consider the example of the scientific names of

animals. scientific names are in latin by convention they are always printed in italics

The scientific name of the domestic dogis Canis familiaris, and of the domestic cat is Felis catus.

In HTML

<P>The <I>scientific</I> name of the domestic dog is <I>Canis familiaris</I>, and of the domestic cat is <I>Felis catus.</I></P>

NB there is no distinction between scientific names and emphasis.

In HTML

<P>The <I>scientific</I> name of the domestic dog is <I>Canis familiaris</I>, and of the domestic cat is <I>Felis catus.</I></P>

NB there is no distinction between scientific names and emphasis.

Page 13: Introduction to XML 1.  The XML Language

13

University of Nottingham

Contents vs Style XML tags contain meaning not appearance. This allows extra information to be extracted Consider the example of the scientific names of

animals. scientific names are in latin by convention they are always printed in italics

The scientific name of the domestic dogis Canis familiaris, and of the domestic cat is Felis catus.

In XML

<P>The <emph>scientific</emph> name of the domestic dog is <sci>Canis familiaris</sci>, and of the domestic cat is <sci>Felis catus.</sci></P>

NB emphasis and scientific names are different tags. They may both be displayed as italic, but they can be treated separately.

In XML

<P>The <emph>scientific</emph> name of the domestic dog is <sci>Canis familiaris</sci>, and of the domestic cat is <sci>Felis catus.</sci></P>

NB emphasis and scientific names are different tags. They may both be displayed as italic, but they can be treated separately.

Page 14: Introduction to XML 1.  The XML Language

14

University of Nottingham

Rendering of XML XML files contain content not appearance Stylesheets contain appearance and behaviour XML data is rendered by being transformed into

some form suitable for display RTF (for simple printing) PDF or PostScript (for printing or display) HTML (for display over the web) HTML 4.0 / DHTML (for complex interfaces)

The transformation is defined by a stylesheet Rendering may be done by standalone software, or

by a web browser, or on a web server.

Page 15: Introduction to XML 1.  The XML Language

15

University of Nottingham

Standalone Rendering

HTMLXML

XSL

Page 16: Introduction to XML 1.  The XML Language

16

University of Nottingham

Client Side Rendering

XML

XSLHTML

Server (any) Browser with XSL engine(eg MS IE > 5.0)

XML

XSL

Page 17: Introduction to XML 1.  The XML Language

17

University of Nottingham

Server Side Rendering

HTML

Browser (any)Server with XSL engineeg Apache/Tomcat/Cocoon

XML

XSLHTML

Page 18: Introduction to XML 1.  The XML Language

18

University of Nottingham

Client vs Server Stylesheets Client side stylesheets are processed in client XML is delivered to the client

XSL/CSS must be supported by client MS IE supports CSS & XSLT

(non-standard in 5.x mostly standard in 6.x) Netscape 7 & Mozilla supports CSS and possibly XSLT via plugins.

Server side stylesheets are processed in server XML is not delivered to the client, it is transformed usually

to HTML or PDF XSL/CSS must be supported by server Cocoon is an Open Source project, implementing XSL as a Java servlet Any browser can then be used

Page 19: Introduction to XML 1.  The XML Language

19

University of Nottingham

Cocoon on Nottingham Servers Any file placed in a directory called public_html

is accessible within the Nottingham network with the url:http://www.cs.nott.ac.uk/~username/filename

Files with the .xml extension are automatically processed by cocoon.

Providing that they have an XSL stylesheet and the correct Cocoon processing instructions they will be “transformed” into (usually) HTML.

Page 20: Introduction to XML 1.  The XML Language

20

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?>

<booklist title="Some XML Books">

</booklist>

Page 21: Introduction to XML 1.  The XML Language

21

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?>

<booklist title="Some XML Books">

</booklist>

Root element (one per document)

XML declaration

Page 22: Introduction to XML 1.  The XML Language

22

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?><!DOCTYPE booklist SYSTEM "books.dtd" >

<booklist title="Some XML Books">

</booklist>

Page 23: Introduction to XML 1.  The XML Language

23

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?><!DOCTYPE booklist SYSTEM "books.dtd" >

<booklist title="Some XML Books">

</booklist>

Define root element and specify DTD.

Page 24: Introduction to XML 1.  The XML Language

24

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?><!DOCTYPE booklist SYSTEM "books.dtd" ><!-- This is a comment -->

<booklist title="Some XML Books">

</booklist>

Page 25: Introduction to XML 1.  The XML Language

25

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?><!DOCTYPE booklist SYSTEM "books.dtd" ><!-- This is a comment -->

<booklist title="Some XML Books">

</booklist> This is a comment (as SGML / HTML)

Page 26: Introduction to XML 1.  The XML Language

26

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?><!DOCTYPE booklist SYSTEM "books.dtd" ><!-- This is a comment --><?xml-stylesheet type="text/xsl" href=”iti-xml2.xsl"?>

<booklist title="Some XML Books">

</booklist>

Page 27: Introduction to XML 1.  The XML Language

27

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?><!DOCTYPE booklist SYSTEM "books.dtd" ><!-- This is a comment --><?xml-stylesheet type="text/xsl" href=”iti-xml2.xsl"?>

<booklist title="Some XML Books">

</booklist> This defines the XSL stylesheet

Page 28: Introduction to XML 1.  The XML Language

28

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?><!DOCTYPE booklist SYSTEM "books.dtd" ><!-- This is a comment --><?xml-stylesheet type="text/xsl" href="books3.xsl"?> <?cocoon-process type="xslt"?>

<booklist title="Some XML Books">

</booklist>

Page 29: Introduction to XML 1.  The XML Language

29

University of Nottingham

A Simple XML Document

<?xml version="1.0" ?><!DOCTYPE booklist SYSTEM "books.dtd" ><!-- This is a comment --><?xml-stylesheet type="text/xsl" href="books3.xsl"?> <?cocoon-process type="xslt"?>

<booklist title="Some XML Books">

</booklist>

This is a Cocoon processing directive (NB not standard XML, but required by Cocoon 1.7.4).

Page 30: Introduction to XML 1.  The XML Language

30

University of Nottingham

Adding Content

<booklist title="Some XML Books">

<book> <author>

<name>St. Laurent</name><initial>S</initial>

</author> <date>1998</date> <title edition="Second">XML: A Primer</title> <publisher>MIS Press</publisher> <website href="http://www.simonstl.com/xmlprim/" /> <rating stars="4"/> </book>

</booklist>

Page 31: Introduction to XML 1.  The XML Language

31

University of Nottingham

Benefits of a DTD DTDs are optional in XML

DTD allows validation of documents

DTD defines the application Vital for collaborative development IPR implications

DTD allows entity definitions (ie symbols, shortcuts, “foreign” characters etc.).

Page 32: Introduction to XML 1.  The XML Language

32

University of Nottingham

XML Namespaces Namespaces are mechanisms to ensure that

elements are unique Namespaces in XML are optional Consider the following:

<title>The Title</title>

<title text=“The Title” />

<title> <text>The Title</text></title>

Page 33: Introduction to XML 1.  The XML Language

33

University of Nottingham

Ensuring uniqueness Unique element names

Unique attribute content

<title-one>The Title</titleone>

<title-two text=“The Title” />

<title-three> <text>The Title</text></title-three>

<title ns=“one”>The Title</title>

<title ns=“two” text=“The Title” />

<title ns=“3”> <text>The Title</text></title>

Page 34: Introduction to XML 1.  The XML Language

34

University of Nottingham

xmlns attribute The xmlns attribute is used to declare

namespaces This must be a URI

<title xmlns=“http://www.cs.nott.ac.uk/~tjb/NSdemo-one”> The Title</title>

<title xmlns=“http://www.cs.nott.ac.uk/~tjb/NSdemo-two” text=“The Title” />

<title xmlns=“http://www.cs.nott.ac.uk/~tjb/NSdemo-three”> <text>The Title</text></title>

Page 35: Introduction to XML 1.  The XML Language

35

University of Nottingham

Namespace Abbreviations If an element doesn’t have a namespace

defined it inherits that of its parent.

Where multiple namespaces are used together aliases may be declared

<demo xmlns=“http://www.cs.nott.ac.uk/~tjb/NSdemo-two” > <title text=“The Title” /></demo>

<demo xmlns:first =“http://www.cs.nott.ac.uk/~tjb/NSdemo-one” xmlns:second =“http://www.cs.nott.ac.uk/~tjb/NSdemo-two” > <first:title>The Title</first:title> <second:title text=“The Title” /></demo>

Page 36: Introduction to XML 1.  The XML Language

36

University of Nottingham

XML Namespaces Namespaces in XML are optional Namespaces ensure that elements are unique In different contexts a given tag might mean

different things - eg consider <BOOK> To me it might mean a book in a bibliography To a bookshop it might contain stock details To a travel agent it might contain information about flight

bookings!

Namespaces attach unique labels to a given tag set. URLs are usually used as namespace labels.