ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture...

33
ITR3 lecture 2: XML Thomas Krichel 2002-10-16

Transcript of ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture...

Page 1: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

ITR3 lecture 2: XML

Thomas Krichel

2002-10-16

Page 2: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Structure

• URIs (we will come back to them in lecture 3)

• XML

• Sofix xml example

Page 3: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Literature

• Castro, Elizabeth (2001) “XML for the World Wide Web” Peachpit Press

• RFC 2396

• http://openlib.org/home/krichel/lis900gp02i

Page 4: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Uniform Resource Identifiers URI

• A Uniform Resource Identifier (URI) is a compact string of characters for identifying an abstract or physical resource.

• They provide a simple and extensible means for identifying a resource.

Page 5: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Universal concept of “resource”• A resource can be anything that has

identity. Not all resources are network ``retrievable''.

• The resource identifier identifies a resource, not necessarily the state in which the resource is in at a particular point in time.

Page 6: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Benefits of uniformity

• it allows different type of resource identifiers to be used in the same context, even when the mechanisms used to access those resources may differ

• it allows uniform semantic interpretation of common syntactic conventions across different types of resource identifiers

Page 7: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Benefits of extensibility

• allows introduction of new types of resource identifiers without interfering with the way that existing identifiers are used

• it allows the identifiers to be reused in many different contexts, thus permitting new applications or protocols to leverage a pre-existing, large, and widely-used set of resource identifiers.

Page 8: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

transcribabilityThe URI syntax was designed with global transcribability as

one of its main concerns. • A URI is a sequence of characters, not a sequence of

bytes• A URI may be transcribed from a non-network source,

and thus should consist of characters that are most likely to be able to be typed into a computer

• A URI often needs to be remembered by people, and it is easier for people to remember a URI when it consists of meaningful components.

Therefore it has a restricted set of characters, only US ASCII.

Page 9: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

XML

• Stands for eXtensible Markup Language

• It is a recommendation by the World Wide Web Consortium (W3C). It is a new (1998) markup language that will transport a lot of contents over the Internet in the future.

• As its level of complexity goes it sits in between HTML and SGML.

Page 10: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Importance of XML

• XML will be, for the information industry, what the container is for international shipping.

• A uniform syntactic convention for the encoding of any piece of information expressed as textual data (i.e. as characters)

• Default character set is the UTF-8 encoding of Unicode.

Page 11: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

HTML and XML

• HTML comes with predefined tags such as HTML, HEAD, TITLE, BODY, H1, H2, P, UL, LI, IMG, A, EM, B etc

• XML allows to use any tags.

• XML has not yet replaced HTML. It lacks native support for images and links.

Page 12: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

XML and SGML

• SGML is the standard general markup language developed by an industry consortium

• Very complicated, to extent that there is no full implementation software ever written

• XML specs written by SGML aficionados who were aware of its problems

Page 13: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Original design goals

• XML shall be straightforwardly usable over the Internet.• XML shall support a wide variety of applications.• XML shall be compatible with SGML.• It shall be easy to write programs which process XML

documents.• The number of optional features in XML is to be kept to

the absolute minimum, ideally zero.• XML documents should be human-legible and

reasonably clear.• The XML design should be prepared quickly.• The design of XML shall be formal and concise.• XML documents shall be easy to create.• Terseness in XML markup is of minimal importance

Page 14: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Well-formed & valid XML

• Every piece of data that wants be be xml has to obey a set of rules. Otherwise it is just not XML

• These rules ensure that the document is “well-formed”.

• In addition, the XML document may obey to other rules, in that case it is called “valid”.

Page 15: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

XML element

• Syntax <name>contents</name>• Where name is the name of the element and contents is

the contents of the element.• <name> is called the opening tag• </name> is called the closing tag• Examples

– <sex>F</sex> – <story>Once upon a time there was…. </story>

• Element names are case-sensitive. They must start with a letter or “_”.

• Element names must not start with “xml” in any capitalization.

Page 16: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Attributes to XML elements

• Are name/value pairs that further qualify element contents

• Syntax <name attribute_name=“attribute_value”> contents</name>

• Example– <temperature unit=“F”>64</temperature>– <swearword language=“fr”>con</swearword>

• Attribute names have to obey the same rules as element names.

• Attribute values must be surrounded by single or double quotes.

Page 17: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Empty elements

• Elements that are empty may be written as <name/>. This is a shorthand for <name></name>.

• Empty names may have attributes.

• Example:– <grade value=‘A’/>

Page 18: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Processing instructions

• They are instructions to the software reading the XML.

• General syntax is

<?name attribute_name1=“attribute_value1” attribute_name2=“attribute_value2” …?>

Page 19: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

comments

• Start with <!--

• End with -->

• May not contain a double hyphen

• Comments may not be nested i.e. no comments inside other comments.

Page 20: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Nesting elements

• Elements are allowed to contain other elements.

• Elements that contain other elements are called parent elements.

• Elements that are contained in another element are children of that element.

• Elements must be properly nested, i.e. child element closing tag must appear before parent element closing tag.

Page 21: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Root and prolog

• There must be one root element that contains all other element is the document.

• The prolog is what appears before the root element.

• The prolog may contain the XML declaration.

Page 22: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

XML declaration

• The XML declaration is a special case of a processing instruction, it is written as

<?xml version=“1.0”?>

• If the XML declaration is there, it must be the first line.

• You can declare your character set in the XML declaration, like

<?xml version=“1.0” encoding=“ucs-2”?>

Page 23: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Quote special symbols

• & is written as &amp;

• < is written as &lt;

• > is written as &gt;

• “ is written as &quot;

• ‘ is written as &apos;

• Example <story content=“she pronounced the &quot;l-word&quot;”/>

Page 24: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Document Type Definition DTD

• DTDs are a legacy SGML tool to further define and refine the contents of an XML document. XML can be defined by an SGML

• Still in use by the technologically retarded.

• Not covered here, because there are more powerful replacements.

Page 25: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Example application: sofix

• Sofix is an XML based cataloging format for classical music CDs.

• It is named after Sophie C. Rigny.

• It is a creation of Thomas Krichel.

• Used for teaching purposes only.

Page 26: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Key concepts in Sofix

• Item: an individual CD or a collection of CDs kept physically together (i.e. sold together)

• Work: a piece of music as recorded on a CD. For simplicity, we do not distinguish between composition and recording of that composition.

• Track: semantics associated with physical separation of tracks on the disk

Page 27: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Sofix in XML

<item>

<work>

<track>

</track>

</work>

<item>

Page 28: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Sofix general rules

• Record all titles in English. If no English title provided, use a translation if it is obvious. If the translation is not obvious, use original language.

• All personal names as Lastname, Firstname

• Translatable names in English.

Page 29: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Contents of <item>

<labelname>name of label</labelname>

<number>number of the CD</number>

(followed by the works on the CD)

Page 30: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Contents of <work>

• <title>title of the work</title>• <compositionyear> year when work was

composed</compositionyear>• <recordingyear> year when the recording

was made </recordingyear>• <contributor role=“contributor role”> name

of contributor </contributor>• Possibly many contributor, followed by a

series of tracks

Page 31: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Contributor roles

alto, alto_sax, bariton, bass, bassoon, chamber orchestra, cello, choir, choir_master, clarinett,composer, conductor, flute, french_horn, horn, oboe, orchestra, organ, piano, piano_trio, prepared_piano, recorder, soprano, speaker, string_orchestra, string_quartett, viola, violin, xylophone

Page 32: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

Attributes of <track>

• <title> full title as given on CD</title>

• <time> minutes:seconds</time>

where minutes and seconds are numbers.

Page 33: ITR3 lecture 2: XML Thomas Krichel 2002-10-16. Structure URIs (we will come back to them in lecture 3) XML Sofix xml example.

http://openlib.org/home/krichel

Thank you for your attention!