Fergus Fahey - DRI/ARA(I) Training: Introduction to EAD - Introduction to XML
-
Upload
driireland -
Category
Education
-
view
351 -
download
0
Transcript of Fergus Fahey - DRI/ARA(I) Training: Introduction to EAD - Introduction to XML
INTRODUCTION TO XMLFergus Fahey – Training officer ARA(I)
Format of Workshop
• Description of xml features
• Practical exercise
What is XML
• XML stands for EXtensible Markup Language.
• XML was designed to store and transport data.
• XML was designed to be both human- and machine-readable
• XML is a software- and hardware-independent tool for storing and transporting
data
• “XML does not DO anything”
• Very widely used to store and share data:
• By libraries to share bibliographic data
• By software applications e.g. podcast metadata,
• By banks e.g. to process Single Euro Payments Area
eXtensible MARK-UP Language XML
XML Does Not DO Anything
<tramTicket><type>return</type>
<from>Central 1</from>
<to>Red 2</to>
<validUntil>Last Tram</validUntil>
<date>31 Jul 06</date>
<for>Adult</for>
<on>Luas only</on>
<timeIssued>21:15</timeissued>
<price>2.90</price>
<number>6004375019</number>
</tramTicket>
Before xml…html…before html…
Marc record processed
000 02617cam 22004931a 450
001 1197435
005 20030227130037.0
008 940923s1840 enkabcf 00 0 eng u
035 __ |a (UPRA)CTYXRL7078-B
035 __ |9 CAF1680YL
040 __ |c UPRA |d CtY-BR
043 __ |a n-us---
090 __ |a \Za W679\ |b +840s
100 1_ |a Willis, Nathaniel Parker, |d 1806-1867.
245 10 |a American scenery, or, Land, lake, and river
illustrations of transatlantic nature : |b 246
246 30 |a Land, lake and river illustrations of transatlantic
nature
260 __ |a London : |b George Virtue, |c 1840
Author: Willis, Nathaniel Parker, 1806-1867.
Title: American scenery, or, Land, lake, and river illustrations
of transatlantic nature : uniform with Dr. Beattie's
Switzerland, Scotland, & Waldenses / from drawings by
W.H. Bartlett, engraved in the first style of the art,
by R. Wallis, J. Cousen, Willmore, Brandard, Adlard,
Richardson, &c ; the literary department by N.P. Willis.
American scenery
Land, lake and river illustrations of transatlantic nature
Published: London : George Virtue, 1840
Description: 30 parts : ills., map, port. ; 29 cm.
Location: BEINECKE (Non-Circulating)
Call Number: 2003 +56
Library has: pt.1-pt.30
Html Hyper Text Mark-up Language
• HTML was designed to display data - with focus on how data looks (Unlike
the MARC example)
• HTML – Has predefined tags:
• <b> for bold
• <p> for paragraph
• HTML tags relate to layout and appearance of text/data and images
• HTML is permissive i.e. HTML will still render if it includes invalid tags.
HTML
<html>
<p>
The <b>cat</b>sat on the
<i>mat</i>
</p>
<img src=“catonmat.jpg”/>
</html>
The cat sat on the mat
xml
<animal type=‘cat’>
<name>Felix</name>
<colour>white</colour>
<state>seated</state>
<surface>mat</surface>
<attire>Dickie bow</attire>
<mood>Happy</mood>
</animal>
The Difference Between XML and HTML
• The XML language has no predefined tags
• The tags in the luas ticket example above (like <to> and
<price>) are not defined in any XML standard. These tags
are "invented" by the author of the XML document.
• HTML works with predefined tags like <p>, <b>, <img>,
etc.
• With XML, the author must define both the tags and the
document structure.
• XML Separates Data from Presentation
XML Tree root element
<eu>
element
<memberState>
element
<name>
element
<area>
element
<population>
element
<headOfstate>element
<capital>
element
<firstName>element
<lastName>
Text:
Brussels
Text:
Belgium
Text:
11,190,845
Text:
30,528
Text:
Philippe
Text:
Saxe-Coburg-
Gotha
element
<name>
attribute
“type”
XML Syntax
• XML documents must contain one root element that is the parent of all other
elements
• <root>
<child>
<subchild>.....</subchild>
</child>
</root>
XML Syntax example<memberstate>
<name>Belgium</name>
<area>30,528</area>
<population>11,190,845</population>
<headOfstate type="Constitutional Monarch">
<lastName>Saxe-Coburg-Gotha</lastName>
<firstName>Philippe</firstName>
</headOfstate>
<capital>
<name>Brussels</name>
<population AdministrativeDivision="Capital Region">1,138,854</population>
</capital>
</memberstate>
XML Elements
• An XML element is everything from (including) the element's start tag to (including) the element's end tag.
<population>11,190,845</population>
• An element can contain:• text
• attributes
• other elements
• or a mix of the above
<capital>
<name>Brussels</name>
<population AdministrativeDivision="Capital Region">1,138,854</population>
</capital>
XML Attributes
• Attributes are designed to contain data related to a specific element.
<headOfstate type="Constitutional Monarch">
<lastName>Saxe-Coburg-Gotha</lastName>
<firstName>Philippe</firstName>
</headOfstate>
--------------------------------------------------------------------------------------
<headOfstate>
<type>Constitutional Monarch</type>
<lastName>Saxe-Coburg-Gotha</lastName>
<firstName>Philippe</firstName>
</headOfstate>
XML Tree root element
<eu>
element
<memberState>
element
<name>
element
<area>
element
<population>
element
<headOfstate>element
<capital>
element
<firstName>element
<lastName>
Text:
Brussels
Text:
Belgium
Text:
11,190,845
Text:
30,528
Text:
Philippe
Text:
Saxe-Coburg-
Gotha
element
<name>
attribute
“type”
XML Namespaces
• In XML, element names are defined by the developer. This often results in a conflict when trying to mix XML documents from different XML applications.
• This XML carries HTML table information:<table><tr>
<td>Apples</td><td>Bananas</td>
</tr></table>
This XML carries information about a table (a piece of furniture):
<table><name>African Coffee Table</name><width>80</width><length>120</length>
</table>
Xml Namespaces
• <h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
Validating XML
• XML documents must have a root element
• XML elements must have a closing tag
• XML tags are case sensitive
• XML elements must be properly nested
• XML attribute values must be quoted
<eu>
….
</eu>
<lastName>Mattarella</Lastname>
<lastName>Mattarella</lastName>
<eu>
<headOfstate type="Non executive President">
<eu>
<country>
<headOfstate type="Non executive President">
<population AdministrativeDivision=Capital Region>
<population AdministrativeDivision="Capital Region">
Validating xml - dtd
• An XML document with correct syntax is called "Well Formed".
• An XML document validated against a DTD is both "Well Formed" and "Valid“
• Xml parser only knows what is valid if you tell it, e.g. doesn’t know that a country has a head of state but a capital does not.
• Rules are created using a dtd file.
• <!DOCTYPE eu
• [<!ELEMENT eu (memberstate*)>
• <!ELEMENT memberstate(name,area,population,headOfstate,capital)>
• <!ELEMENT name (#PCDATA)>
• <!ELEMENT area (#PCDATA)>
• <!ELEMENT headOfstate(firstName,lastName)>
• <!ELEMENT capital (name,population)>
• <!ELEMENT firstName (#PCDATA)>
• <!ELEMENT lastName (#PCDATA)>
• <!ELEMENT population (#PCDATA)>
• <!ATTLIST headOfstate type CDATA "0">
• <!ATTLIST population AdministrativeDivisionCDATA "0">]>
Three types of error
• Badly formatted – missing closing tag, tags not matching, tags not nestled correctly
• Not valid – doesn’t comply with dtd rules
• Information is wrong, xml will not spot this in most circumstances, may spot it if information doesn’t comply with a rule.
• Won’t spot<lastName>O’Higgins</ lastName >
<firstName>Michael D.</firstName>
• Might spot (if expecting alphabetic characters only):<lastName>O’Higgins</ lastName >
<firstName>Michael D.</firstName>
XML and XSLT
• Xslt is one of a number of technologies which is used to process xml
• In our example we will use xslt to pick out individual xml elements and use
html to display them in a web browser.
• In my experience writing xslt is not easy, more difficult than any other
programing language I’ve used.
• Good news you don’t necessarily have to use xslt to use xml or EAD.
Useful links
• W3 schools xml tutorial http://www.w3schools.com/xml/default.asp
• W3 schools xslt tutorial http://www.w3schools.com/xsl/default.asp