Excellent XML – systems interoperability at the Wellcome Library
description
Transcript of Excellent XML – systems interoperability at the Wellcome Library
![Page 1: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/1.jpg)
Excellent XML – systems interoperability at the Wellcome
Library
EIUG 11th Conference, Stirling University
1 & 2 September 2005
Margaret Savage-Jones
![Page 2: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/2.jpg)
Wellcome Library Systems
Millennium - Innovative Interfaces Inc.
http://catalogue.wellcome.ac.uk Includes online requesting
from closed stack since mid 2003
Calm - Archive system – DS Ltd http://archives.wellcome.ac.uk
Online access to archive & mss holdings
Miro/MedPhoto image system – System Simulation Ltd
http://medphoto.wellcome.ac.uk
Online access to over 100,000 images, image retrieval & delivery
![Page 3: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/3.jpg)
Underlying protocol: OAI-PMH
Open Archives Initiative Protocol for Metadata Harvesting - protocol for sharing and harvesting metadata between different OAI-compliant systems
Based on XML and HTTP
One system (CALM or MedPhoto) exposes metadata via an OAI repository. This metadata is harvested by the other system (Millennium) and then loaded
![Page 4: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/4.jpg)
Motivation With a MARC21, ISAD(G) & a bespoke image repository it was a strategic objective to make these systems interoperate
Phase II of the Closed Stack project - Western Manuscripts and Archives had to be requestable online by summer 2004
XML Harvester development by Innovative with Michigan State University 2001-02. Wellcome placed an order for XML Harvester in January 2003
With CALM ver 4 it was possible to export EAD XML
![Page 5: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/5.jpg)
Benefits
Online requesting - Western MSS & Archives collections
One circulation system to manage and one set of circ stats
Same interface for all online requests from stack
Archives & manuscripts like other collections
Image sets for library objects displayed in Web OPAC
User can jump from one system to another
No need to rekey user search in other system
Selective harvesting for onward record updating
![Page 6: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/6.jpg)
Example: archive record (from Crick Coll.)
![Page 7: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/7.jpg)
Harvested archive record in Web OPAC
![Page 8: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/8.jpg)
Image of the archive item
![Page 9: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/9.jpg)
Encoded Archival Description (EAD)
Initially XML Harvester dealt only with EAD and needed
encodinganalogs for parsing. Developed with Michigan
State University (MSU) whose EAD finding aids had
MARC encodinganalogs. Harvester parser read these tags.
Encodinganalogs are attributes in XML records indicateing
field, subfield, indicators etc. in another descriptive encoding
system e.g. MARC21 equivalent to EAD tagged element
![Page 10: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/10.jpg)
Archive system metadata
Hierarchical, tree structure with collection and component item
level records catalogued in General International Standard Archival
Description, ISAD(G)
Field export from CALM as default subset EAD DTD had some
empty fields – had to export as “DServe Natural” XML which
includes field tags. Catalog.xml output with catalog.DTD
![Page 11: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/11.jpg)
Pilot – used “Haddad” catalogue XML
Used small set of 87 XML Arabic records – a local variant
of `MASTER’ XML DTD as a pilot to tes XML Harvester
Used stylesheets to filter unwanted fields, add encodinganalogs
and put 87 .xml files in a web server directory ready to be
harvested
![Page 12: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/12.jpg)
![Page 13: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/13.jpg)
Web crawler
Harvester reaches the XML files through port 80.
We added a page to the Millennium screens directory
listing files with redirections to the web server folder.
Harvester opened the page, scanned for `HREF’ strings
which directed it to the XML records (file.xml)
The XML Harvester parser read tags from encodinganalogs
to create MARC21 records, writing to a file for loading
![Page 14: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/14.jpg)
Redirection screen<html>
<head>
<title> Harvester Test</title>
</head>
<body>
<em>Mss Files</em><br>
<strong> Sample Screen # 2</strong>
<PRE>
Test to confirm if harvester can crawl files deposited on wtcalm01
</pre>
<A HREF=http://wtcalm01.wellcome.ac.uk/xml/002.xml>002</A>
<A HREF=http://wtcalm01.wellcome.ac.uk/xml/83.xml>83</A>
<A HREF=http://wtcalm01.wellcome.ac.uk/xml/82.xml>82</A>
</body>
</html>
![Page 15: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/15.jpg)
Example – encodinganalogs for 856
- <hyperlink>
-<url ENCODINGANALOG=”85607$u”>
<xsl:text>http://http://wisdom.welcome.ac.uk/xml/</xsl:text>
<xsl:value-of select+”substring-after(/?idno,`WMS Arabic`)”/>
<xsl:text>.html</xsl:text>
</url>
<text ENCODINGANALOG=”85607$z”>View full manuscript record</text>
</hyperlink>
![Page 16: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/16.jpg)
Harvested MARC21 “Haddad” record
![Page 17: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/17.jpg)
Links: to PDF and Request button
![Page 18: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/18.jpg)
Lessons
Arabic records would be loaded only once but records from
CALM would need regular reharvesting/overlay
Need a more sophisticated approach than crawling a web
directory – XML Harvester can harvest from OAI Repository and
use datestamps in OAI to harvest records created, or modified
in specified date range
XSLT could be used to transform records to MARC21 OAI
without using encodinganalogs.
![Page 19: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/19.jpg)
Archives OAI repository
Built on CALM server using freeware University of Illinois
Provider service tool (Runs under Windows IIS)
Other Requirements:
Microsoft 2000 serverMicrosoft IIS ver 4 or higherMicrosoft ASPMicrosoft XML Parser (MSXML) 4.0Microsoft ActiveX Data objects and ODBC compliant datasource i.e. MS Acces97+ databaseFirewall access on port 80
![Page 20: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/20.jpg)
Key decisions
Metadata export – chose full CALM record XML DTD (not EAD)
Matchpoint – decided to load contents of Calm RefNo field to Millennium 001 indexed in `o’
Also had to consider:
Hierarchical record level to harvest
Navigation between the two systems
Millennium parameters
![Page 21: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/21.jpg)
Decision: Record level to harvest
A “Collection” could consist of more than 40 boxes. Must have
1:1 record relationship to make requesting and retrieval work
Decision to exclude archives Collection records & use Component
level records. Each of these represent 1 item (box, folder, piece)
and links to a single bib records with attached item for circulation
in Millennium
![Page 22: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/22.jpg)
Decision: NavigationArchivists wanted the archives (CALM) interface to offer
the main search route for Western Archives & MSS
User is taken from CALM record into Millennium to place
their request then back to their CALM record to continue
browsing their hit list - – two links were needed
Forward: runs cgi script to search Millennium for
corresponding bib record
Back: 856 with URL link (can be inserted by Harvester)
![Page 23: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/23.jpg)
Example: Links
Forward: cgi script runs search of Millennium `o’ index for
match on CALM RefNo value
http://catalogue.wellcome.ac.uk/search/o?SEARCH=PPCRI%2FA%2F1%2F2%2F8
Back: RefNo PP/CRI/A/1/2/8 built into OAI record URL linking
to CALM web front end - RefNo value built into search string
http://archives.wellcome.ac.uk/DServe/dserve.exe?& dsqIni=
Dserve.ini&dsqApp=Archive&dsqCmd=show.tcl& dsqDb=
Catalog&dsqPos=0&dsqSearch=((text)='PP/CRI/A/1/2/8')
![Page 24: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/24.jpg)
Calm XML export file<?xml version="1.0" encoding="utf-8" ?>
- <record>
- <DScribeRecord> <RecordType>Component</RecordType> <IDENTITY /> <RefNo>MS4385/4404</RefNo> <AltRefNo>MS.4404</AltRefNo> <PreviousNumbers /> <Title>Notes and extracts on Chemistry, Volumetric Analysis, (etc.)</Title> <Date>c. 1865</Date> <Level>Item</Level> <Extent>1 volume</Extent> <UserText5>Bentley House</UserText5> <Location /> <UserText3>Western MSS series 3 - Requestable</UserText3> <UserWrapped9 /> <UserText6 /> <UserText7 />
![Page 25: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/25.jpg)
Mapping Calm XML to Marc21
Fields tags used: 001, 008, 245, 260, 500, 506, 655, 856
And 949 to make the item. Harvester inserts a 99x tag with load
identification code e.g. CALM20040820225128
Found that Component records do not have `author’ which is
only held at Collection level – but not a problem
Mock’ bib and item records keyed to Millennium to:
- demonstrate navigation & agree content with team
- act as a benchmark when harvested records loaded
![Page 26: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/26.jpg)
XSLT – eXtensible Style Language Transformation
Used XSLT to split the XML single output file into 48,000 component
.xml records using the <DescribeRecord> as record delimiter
and then transform them to MARC21 OAI records listed to
XML Harvester by our OAI repository
The OAI repository installed on the CALM staging server
uses the University of Illinois Provider service tool - freeware
![Page 27: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/27.jpg)
Millennium parameters
To cope with `open’ v `closed’ archive collections
– new codes were added to archives records and mapped to
new Millennium branch codes which would trigger Millcirc rules
New branch codes added to Request Rules, Determiner Table,
WWWOPTIONS, Locations served
New MATTYPE to exclude Western Mss and archives from the
Asian Mss scope
![Page 28: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/28.jpg)
Config file for archives record harvest
@LOGLEVEL=CONFIG
@DBNAME=CALM
@URL=http://wtcalm02/oai/oai.asp
@CREATEOVERLAYFROMURI=true
@9XXMARCTAG=991
@USEOAI=true
@DATE=20000606000000
@SHOWMETADATA=true
![Page 29: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/29.jpg)
Management interface for XML Harvester
![Page 30: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/30.jpg)
Archive record: Request link to Web OPAC
![Page 31: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/31.jpg)
Harvested archive record in Millennium
![Page 32: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/32.jpg)
Patron login screen to place request
![Page 33: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/33.jpg)
Confirmation of request
![Page 34: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/34.jpg)
Interoperation sought with image system
To integrate MedPhoto, a bespoke photo library system,
and Millennium for seamless display and ordering of images
MedPhoto holds images and records for more than 60,000 items
catalogued in Millennium – Iconographic collection, archives &
manuscripts, rare books etc.
Specific need for Millennium User to see images associated with
library objects
![Page 35: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/35.jpg)
Media management interface
![Page 36: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/36.jpg)
Config file for image URL harvest@LOGLEVEL=CONFIG
@DBNAME=MEDPHOTO
@URL=http://aquarius.wellcome.ac.uk:6969/ixbin/hixserv
@RECID_MARCTAG=001
@CREATEOVERLAYFROMURI=true
@9XXMARCTAG=991
@USEOAI=true
@REQUIRE_EADID=false
@DATE=20000606000000
@OAIFROMDATE=20050701000000
@OAIUNTILDATE=20050731000000
@OAISET=bib
![Page 37: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/37.jpg)
Selective Harvesting – images
Harvest full “bib” set and load to Millennium populating 962s
then each month request list of all new image URLs created since
the last harvest with a Millennium .b number in their record.
<http://medphoto.wellcome.ac.uk:6969/ixbin/hixserv?verb=ListRecords&meta
dataPrefix=marc21&set=bib&from=2005-05-01&until=2005-05-31>
(for records in May)
<http://medphoto.wellcome.ac.uk:6969/ixbin/hixserv?verb=ListRecords&meta
dataPrefix=marc21&set=bib&from=2005-06-01&until=2005-06-30>
(for records in June and so on)
![Page 38: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/38.jpg)
Harvesting: Image OAI repository
OAI repository built by SSL on MedPhoto server
Metadata matchpoint .b bib record no. is common element
Between Millennium and MedPhoto
XML Harvester selectively requests record set “bib” which all
Have .b nos, parses the returned list of MARC21 OAI records
and creates a file of MARC records for loading
Matches on .b and overlays inserting 962 for each image
962|u holds URL for thumbnail and |e holds `launchpad`URL
![Page 39: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/39.jpg)
MARC21 record ready to load File Name: DONE-MEDPHOTO_20050601192747.marc (411,392 bytes) Offset:
256 Blocks: 1 - 2
LEADER 00403nam a2200085uu 4500
DIRECTORY
001000900000 035001500009 856008000024 962018500104 991002800289
TAGS
1 000 00403nam a2200085uu 4500@
2 001 L0027751@
3 035 |a.b12857890@
4 856 4 1
|uhttp://medphoto.wellcome.ac.uk/ixbin/imageserv?MIDMIRO=L0027751|zView image@
5 962
|a000:000:URL:b0000000:000000:0:0:0:0:0:0|tImage|vn|uhttp://medphoto.wellcome.ac
.uk/ixbin/hixclient.exe?MIROPAC=L0027751|ehttp://medphoto.wellcome.ac.uk/ixbin/i
mageserv?MIRO=L0027751@
6 991 |aMEDPHOTO{228}20050601192747@
![Page 40: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/40.jpg)
![Page 41: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/41.jpg)
Example: with |t default
![Page 42: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/42.jpg)
![Page 43: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/43.jpg)
“Launch pad”
We saw an opportunity for further integration – used
Intermediate screen – URL delivered by MedPhoto repository and
loaded to 962 |e
User can hotlink from this “launch pad” into image system
to register, use a light box, email, download or order the
image online from the image system before returning to
Web OPAC
![Page 44: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/44.jpg)
![Page 45: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/45.jpg)
![Page 46: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/46.jpg)
What we usedXML Harvester product (III)
OAI repository software
VBScript – for file splitting operation
Instant Saxon (command line XSLT processor)
Microsoft MSXML core services (e.g. ver 5)
Media Management for 962 (or load URLs to 856)
Three OAI-PMH compliant library systems
Shared Record IDs as matchpoints
Some experience of working with stylesheets
Some experience of load tables and record loading
![Page 47: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/47.jpg)
Work in progress
Harvesting legacy catalogues/XML for other Asian MSS
e.g.Iskander and Jain project (with Oxford University)
Complete testing and batch loading of 60,000 thumbnail and
“launchpad” URLs to 962’s
Establish routines to manage updates for new, deleted
or amended records – utilise OAI-PMH selective harvesting
Further automation of routines where practicable
![Page 48: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/48.jpg)
Wish List/Enhancements
Global edit for 962 tag
More documentation for XML Harvester
Access to underlying harvester parameters e.g. for XSLT
processor and XML parser
Automation of selective harvesting for maintenance
![Page 49: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/49.jpg)
Useful linksXML http://www.w3.org/XML
EAD http://www.loc.gov/ead/
OAI software http://oai.grainger.uiuc.edu/projectinfo.htm
XSLT http://saxon.sourceforge.net/saxon6.4.3/instant.html
http://www.openarchives.org/OAI/openarchivesprotocol.html
http://www.openarchives.org/OAI/2.0/guidelines-marcxml.htm
OAI tutorial http://www.oaiforum.org/tutorial
OAI repository testing http://re.cs.uct.ac.za/
![Page 50: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/50.jpg)
Some example records
http://catalogue.wellcome.ac.uk/record=b1465521
http://catalogue.wellcome.ac.uk/record=b1580232
http://catalogue.wellcome.ac.uk/record=b1313568
http://catalogue.wellcome.ac.uk/record=b1613633
http://catalogue.wellcome.ac.uk/search/o?SEARCH=PPCRI%2FA%2F1%2F2%2F8
![Page 51: Excellent XML – systems interoperability at the Wellcome Library](https://reader036.fdocuments.in/reader036/viewer/2022062800/568140f0550346895dacbc89/html5/thumbnails/51.jpg)
Excellent XML: systems interoperability at the Wellcome Library
Thanks for your attention
Margaret Savage-Jones
Library Systems Administrator