Intro to XML in libraries
-
Upload
kyle-banerjee -
Category
Education
-
view
513 -
download
0
description
Transcript of Intro to XML in libraries
Intro to XML in librariesKyle Banerjee
Why do libraries use XML?
• Easy to share information
• Strict syntax and human readability make it easy to work with
• Create any structure you need
• Many tools for all operating systems
• Schema support
• Namespace support
2
Disadvantages
• Requires an external application
• Verbose
• Inefficient
• Picky – everything stops when data is not well formed
• No intrinsic data types
3
Encoded Archival Description (EAD)
4
Open Archives Initiative Protocol for Metadata Harvesting(OAI-PMH)
5
NISO Circulation Interchange Protocol (NCIP)
6
<!DOCTYPE NCIPMessage PUBLIC "-//NISO//NCIP DTD Version 1.0//EN" "http://www.niso.org/ncip/v1_0/imp1/dtd/ncip_v1_0.dtd"><NCIPMessage version="http://www.niso.org/ncip/v1_0/imp1/dtd/ncip_v1_0.dtd"> <LookupUserResponse> <ResponseHeader> <FromAgencyId> <UniqueAgencyId> <Scheme>http://136.181.125.166:6601/IRCIRCD?target=get_scheme_values&scheme=UniqueAgencyId</Scheme> <Value>zv229</Value> </UniqueAgencyId> </FromAgencyId> <ToAgencyId> <UniqueAgencyId> <Scheme>http://136.181.125.166:6601/IRCIRCD?target=get_scheme_values&scheme=UniqueAgencyId</Scheme> <Value>melir</Value> </UniqueAgencyId> </ToAgencyId> </ResponseHeader>
… [rest of entry deleted]
MARCXML
<record xmlns="http://www.loc.gov/MARC21/slim">
<leader>00000cas a2200000 4500</leader>
<controlfield tag="001">1798471</controlfield>
<controlfield tag="008">750909d19722001sw qx p ob 0 a0eng</controlfield>
<datafield ind1=" " ind2=" " tag="010">
<subfield code="a">75640778</subfield>
</datafield>
<datafield ind1=" " ind2=" " tag="022">
<subfield code="a">0105-0397</subfield>
<subfield code="l">0105-0397</subfield>
<subfield code="2">1</subfield>
</datafield>
…[rest of record deleted]7
Dublin Core (DC)
<qdc:qualifieddc xmlns:qdc="http://epubs.cclrc.ac.uk/xmlns/qdc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://epubs.cclrc.ac.uk/xmlns/qdc/ http://epubs.cclrc.ac.uk/xsd/qdc.xsd">
<dc:creator>Huntington, C. L.</dc:creator>
<dc:title>Horseshoe Bend near Wolf Creek, Southern Pacific Railroad, Shasta Route</dc:title>
<dc:date>1908-00-00</dc:date>
<dc:date>1900-1909</dc:date>
<dc:subject>Railroad tracks; Forests; Railroad locomotives</dc:subject>
<dc:coverage>Josephine County (Ore.)</dc:coverage>
<dc:type>Image</dc:type>
<dc:source>Postcards</dc:source>
<dc:source>Gerald W. Williams Collection</dc:source>
<dc:title>Umpqua Album</dc:title>
<dcterms:isPartOf>WilliamsG:Horseshoe Bend</dcterms:isPartOf>
..[rest of record deleted] 8
Search / Retrieve via URL (SRU)
9
And enough other stuff to blow your mind
• RDF
• Darwin Core
• VRA Core
• MODS
10
• MADS
• PBCore
• Webapps and other cool stuff
XML is not a language
• It’s a grammar that specifies a structure for exchanging information
• XML cannot do anything by itself• When most people talk about XML, they are
actually referring to a family of related technologies
• Don’t confuse XML (a data structure standard) with content standards such as AACR2R/RDA, DACS, LCNAF, LCSH, MeSH, and AAT
11
Interpreting XML
• Common methods are Document Object Model (DOM) and Simple API for XML (SAX)
• DOM is more common and far more powerful. Best for smaller files and documents
• SAX is much faster and requires much less memory. Best for large files
12
XML Document
<?xml version = “1.0”?><inventory> <book> <title>My Dog</title> </book> <book> <title>My Cat</title> </book></inventory>
DOM (tree structure) SAX (linear events)
Start document
Start element: inventoryStart element: bookStart element: titleCharacters: My DogEnd element: titleEnd element: book
Start element: bookStart element: titleCharacters: My CatEnd element: titleEnd element: book
End document
DOM vs. SAX
13
inventory
book book
title title
My Dog
My Cat
DOM basics
• Platform independent way to represent and interact with XML documents
• All nodes and relationships are accessible
• Great for generating and displaying documents (e.g. EAD), interpreting messages (e.g. NCIP, OAI-PMH)
• Must load entire document into memory – terrible for transferring millions of records
14
SAX (Simple API for XML)
• Not formally defined
• Relies on events – detects beginnings/ends of elements, attributes, etc.
• Does not require loading file into memory
• Great for extracting info from large files but awkward for interpreting documents
15
XML Document
<?xml version = “1.0”?><inventory> <book> <title>My Dog</title> </book> <book> <title>My Cat</title> </book></inventory>
JSON
{“inventory”: { “book”: { “title”: “My Dog” }, “book”: { “title”: “My Cat” } }}
Delimited
Inventory
Common Alternatives to XML
16
Item type Title
book My Dog
book My Cat
Why Delimited or JSON?
• Delimited– Easiest to parse– Works great with tabular data– Not good for arbitrary and nested structures
• JSON– Much simpler and easier to use– Bad for situations where markup languages are
appropriate (e.g. documents)
17
XML = Data Duct Tape
• Very useful and is here to stay
• Best uses are documents, messaging, and data transport
• Can be used for almost anything but sometimes not a good choice
18
XML and Life after MARC
• Use of XML will expand as the role of the traditional catalog wanes
• Expect growth as libraries need to provide access to a greater variety of resources
• XML will be critical as linked data becomes more common
19
What You Should Do Now
• Be aware of what XML is
• Know what it is good for
• Learn specifics on an as needed basis
20
Thank You!Kyle Banerjee