Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail...
-
Upload
johnathon-hebblethwaite -
Category
Documents
-
view
215 -
download
0
Transcript of Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail...
Using <METS> and <MODS> to Create XML Standards-based
Digital Library Applications
Using <METS> and <MODS> to Create XML Standards-based
Digital Library Applications
Morgan Cundiff & Nate TrailNetwork Development and MARC Standards Office (NDMSO)
Library of Congress
XML is the lingua franca of the WebXML is the lingua franca of the Web
» Web pages increasingly use XHTML
» Business use for data exchange/ messaging
» Family of technologies can be leveraged• XML Schema, XSLT, XPath, and XQuery
» Software tools widely available (open source)• Storage, editing, parsing, validating, transforming and
publishing XML – constantly and actively improved
» Microsoft Office 2003 supports XML as document format (WordML and ExcelML)
» Web 2.0 applications based on XML (AJAX, Semantic Web, Web Services, etc.)
XML (Extensible Markup Language)XML (Extensible Markup Language)
“XML has become the de-facto standard
for representing metadata descriptions
of resources on the Internet.”
Dr. Jane HunterUniversity of Queensland, Australia
Working towards MetaUtopia – A Survey of Current Metadata Research
Interoperability and StandardsInteroperability and Standards
“In moving from dispersed digital collections to
interoperable digital libraries, the most important
activity we need to focus on is standards… most
important is the wide variety of metadata standards
[including] descriptive metadata… administrative
metadata…, structural metadata, and terms and
conditions metadata…”
Dr. Howard Besser, New York UniversityThe Next Stage: Moving from Isolated Digital Collections to Interoperable Digital Libraries
XML and Digital LibrariesXML and Digital Libraries
» Family of XML data standards• METS – Metadata Encoding and Transmission Standard
• MODS – Metadata Object Description Schema
• MIX – Metadata for Images in XML
• PREMIS – PREservation Metadata Implementation Strategies
• TEI – Text Encoding Initiative
• EAD – Encoded Archival Description
XML and Digital LibrariesXML and Digital Libraries
» METS Implementors• Library of Congress, OCLC, RLG, California Digital Library
(CDL), Harvard, Princeton, National Library of Portugal, National Library of Wales, University of Indiana, Stanford, New York University, University of Göttingen, Oxford University, and more …
» METS Software Tools• METS Toolkit & DRS METS Archive Tool (Dmart) for Audio
Deposit (Harvard), 7train METS Generation Tool (CDL), MEX Authoring Tools (Das Bundesarchiv), ContentE (Biblioteca Nacional Digital, Portugal), METS Navigator (Indiana University DL Program) ResCarta Metadata Creation Tool (ResCarta Foundation), and more …
» METS listserv: 550 subscribers
XML at LC: A Historical PerspectiveXML at LC: A Historical Perspective» 1995 – American Memory released (not XML-based)
» 1998 – XML 1.0 becomes W3C Recommendation
» 2002 – METS and MODS released
» 2002 – Digital Audio-Visual Preservation Prototyping Project (first use of METS, MODS, and MIX at LC)
» 2003 – Patriotic Melodies (first use of METS and MODS in production at LC – this is later added toI Hear American Singing)
» 2003 – Veterans History Project database released, MINERVA project (MODS)
continued…
XML at LC: A Historical PerspectiveXML at LC: A Historical Perspective
» 2004 – I Hear America Singing released (since renamed to LC Presents)
» 2004 – Justice Blackmun Papers collection released
» 2006 – National Digital Newspaper Project as repository submission package at LC (LC and partners, 1st use of METS, MODS, MIX, PREMIS)
» 2006 – Ser2Dig (Digital Serials workgroup, METS for multi-volume monographs)
» 2006 – Draft METS profile for “article-level” historical newspapers
What is METS?What is METS?
» Metadata Encoding and Transmission Standard
» An XML Schema for the purpose of creating XML document instances that express…• the hierarchical structure of digital library objects
• the names and locations of the files that comprise the digital object
• the associated metadata (e.g., MODS)
» METS can be used as a tool for modeling real world objects, such as specific document types
What is MODS?What is MODS?
» Metadata Object Description Schema
» An XML Schema designed for expressing bibliographic data• Can be viewed as an alternative to the MARC format
• Especially useful for XML-based digital library projects
• Can be used as an extension schema to METS
What is MODS?What is MODS?
» Metadata Object Description Schema
» An XML Schema designed for expressing bibliographic data• Can be viewed as an alternative to the MARC format
• Especially useful for XML-based digital library projects
• Can be used as an extension schema to METS
» Note to catalogers: MODS does not make you obsolete! The same knowledge and skills needed for traditional cataloging (AACR, controlled vocabularies, etc.) still apply. You will only need to learn a different syntax (i.e., different from MARC) for expressing bibliographic information in machine-readable form.
Structure of METSStructure of METS
» There are 7 sections in a METS document
<mets>
<metsHdr/> - METS header (document talks about itself)
<dmdSec/> - Descriptive metadata (MODS, etc.)
<amdSec/> - Administrative metadata (copyright info., etc.)
<fileSec/> - File section (names and locations of files)
<structMap/> - Structural map (relationships of the parts)
<structLink/> - Linking information
<behaviorSec/> - Binding executables/actions to object
</mets>
Wrap Descriptive Metadata in METSWrap Descriptive Metadata in METS
» Use <mdWrap> to embed descriptive metadata within a METS document
<mets> … <dmdSec> <mdWrap> <xmlData> <!-- insert metadata from different namespace here --> </xmlData> </mdWrap> </dmdSec> …</mets>
Metadata wrap section acts as “socket” to hold metadata from
other XML schemas or “vocabularies”
<dmdSec> with MODS Extension Schema<dmdSec> with MODS Extension Schema
<mets:mets> … <mets:dmdSec> <mets:mdWrap> <mets:xmlData> <mods:mods></mods:mods> </mets:xmlData> </mets:mdWrap> </mets:dmdSec> …</mets:mets>
Descriptive metadata section
MODS data contained inside the metadata
wrap section
Use of prefixes before element names to identify schema
<dmdSec> with <mods:relatedItem><dmdSec> with <mods:relatedItem><mets:mets> … <mets:dmdSec> <mets:mdWrap> <mets:xmlData> <mods:mods> <mods:relatedItem type=“constituent”> <mods:relatedItem type=“constituent”></mods:relatedItem> </mods:relatedItem> </mods:mods> </mets:xmlData> </mets:mdWrap> </mets:dmdSec> …</mets:mets>
The MODS releatedItem element can be nested
and can be used to express a hierarchy.
<mods:mods> <mods:titleInfo> <mods:title>Bernstein conducts Beethoven </mods:title> </mods:titleInfo> <mods:name> <mods:namePart>Bernstein, Leonard</mods:namePart> </mods:name> <mods:relatedItem type="constituent"> <mods:titleInfo>
<mods:title>Symphony No. 5</mods:title> </mods:titleInfo> <mods:name>
<mods:namePart>Beethoven, Ludwig van</mods:namePart> </mods:name> <mods:relatedItem type="constituent">
<mods:titleInfo> <mods:partName>Allegro con moto</mods:partName> </mods:titleInfo> </mods:relatedItem> <mods:relatedItem type="constituent"> <mods:titleInfo> <mods:partName>Adagio</mods:partName> </mods:titleInfo> </mods:relatedItem>
</mods:relatedItem> </mods:mods>
MODS relatedItem type=“constituent”MODS relatedItem type=“constituent”
» Child element to MODS
» relatedItem element uses MODS content model• titleInfo, name, subject, physicalDescription, note, etc.
» Makes it possible to create rich analytics for contained works within a MODS record
» Repeatable and nestable recursively • Making it possible to build a hierarchical tree structure
» Makes it possible to associate descriptive data with any structural element
METS 2 Hierarchies: Logical & Physical METS 2 Hierarchies: Logical & Physical <mets:mets> <mets:dmdSec> <mets:mdWrap> <mets:xmlData> <mods:mods> <mods:relatedItem> <mods:relatedItem></mods:relatedItem> </mods:relatedItem> </mods:mods> </mets:xmlData> </mets:mdWrap> </mets:dmdSec> <mets:fileSec></mets:fileSec> <mets:structMap> <mets:div> <mets:div></mets:div> </mets:div> </mets:structMap></mets:mets>
<mets:mets> <mets:dmdSec> <mets:mdWrap> <mets:xmlData> <mods:mods> <mods:relatedItem> <mods:relatedItem></mods:relatedItem> </mods:relatedItem> </mods:mods> </mets:xmlData> </mets:mdWrap> </mets:dmdSec> <mets:fileSec></mets:fileSec> <mets:structMap> <mets:div> <mets:div></mets:div> </mets:div> </mets:structMap></mets:mets>
Hierarchy to represent “logical” structure (nested
relatedItems)
Hierarchy to represent “physical” structure (nested
div elements)
(XML ID/IDREF links)(XML ID/IDREF links)DescMD
mods
relatedItem
relatedItem
AdminMD
techMD
sourceMD
digiprovMD
rightsMD
fileGrp
file
file
StructMap
div
div
fptr
div
fptr
Linking in METS DocumentsLinking in METS Documents
(XML ID/IDREF links)(XML ID/IDREF links)DescMD
mods
relatedItem
relatedItem
AdminMD
techMD
sourceMD
digiprovMD
rightsMD
fileGrp
file
file
StructMap
div
div
fptr
div
fptr
Linking in METS DocumentsLinking in METS Documents
(XML ID/IDREF links)(XML ID/IDREF links)DescMD
mods
relatedItem
relatedItem
AdminMD
techMD
sourceMD
digiprovMD
rightsMD
fileGrp
file
file
StructMap
div
div
fptr
div
fptr
Linking in METS DocumentsLinking in METS Documents
DescMD
mods
relatedItem
relatedItem
AdminMD
techMD (mix)
sourceMD
digiprovMD
rightsMD
fileGrp
file
file
StructMap
div
div
fptr
div
fptr
(XML ID/IDREF links)(XML ID/IDREF links)
Linking in METS DocumentsLinking in METS Documents
DescMD
mods
relatedItem
relatedItem
AdminMD
techMD (mix)
sourceMD
digiprovMD
rightsMD
fileGrp
file
file
StructMap
div
div
fptr
div
fptr
(XML ID/IDREF links)(XML ID/IDREF links)
Linking in METS DocumentsLinking in METS Documents
DescMD
mods
relatedItem
relatedItem
AdminMD
techMD (mix)
sourceMD
digiprovMD
rightsMD
fileGrp
file
file
StructMap
div
div
fptr
div
fptr
(XML ID/IDREF links)(XML ID/IDREF links)
Linking in METS DocumentsLinking in METS Documents
What is a METS Profile?What is a METS Profile?
» Description of a class of METS documents• provides document authors and programmers guidance to
create and process conformant METS documents
» XML document using a schema• Expresses the requirements that a METS document must
satisfy
» “Data standard” in its own right• A sufficiently explicit METS Profile may be considered a
“data standard”
» METS Profiles are human-readable prose and not intended to be “machine actionable”
METS Profile ExcerptMETS Profile Excerpt
» Recorded Event – structMap requirement
METS Profiles Used in LC PresentsMETS Profiles Used in LC Presents
» Sheet Music
» Musical Score (score, score and parts, or a set of parts only)
» Print Material (books, pamphlets, etc)
» Music Manuscript (score or sketches)
» Recorded Event (audio or video)
» PDF Document
» Bibliographic Record
» Photograph
» Compact Disc
» Collection
Multiple Inputs to Common Data FormatMultiple Inputs to Common Data Format
New DigitalObjects
LegacyDatabase
Profile-basedMETSObject
A common data format for searching
and display
Harvest of American Memory Objects
Example 1: New Digital Object Example 1: New Digital Object
» METS Musical Score Profile
» Library of Congress Marchby John Philip Sousa
» Musical score and parts
Example 2: New Digital Object Example 2: New Digital Object
» METS Recorded Event Profile
» Juilliard String Quartet
» Sound Recording
Example 3: Legacy Database Example 3: Legacy Database
» METS Bibliographic Record Profile
» Duke Ellington & His Orchestra(1962) [Motion Picture]
» Bibliographic Information
Convert database from Filemaker Pro to a single XML file.
XSLT stylesheet creates 14,000 METS/MODS records.
XSL-FO stylesheet creates single PDF document.
Example 4: American Memory HarvestExample 4: American Memory Harvest
» METS Photograph Profile
» William P. Gottlieb CollectionPortrait of Louis Armstrong
» Photographic object
Convert file of 1600 MARC records, using marc4j, to XML
modsCollection (single file).
Used XSLT stylesheet to create 1600 records conforming to the METS photograph profile.
Logical (MODS)
<mods:mods ID="ver01"> <mods:titleInfo> <mods:title>Original Work</mods:title> </mods:titleInfo><mods:relatedItem type="otherVersion" ID="ver02"> <mods:titleInfo> <mods:title>Derivative Work 1</mods:title> </mods:titleInfo></mods:relatedItem><mods:relatedItem type="otherVersion" ID="ver03"> <mods:titleInfo> <mods:title>Derivative Work 2</mods:title> </mods:titleInfo></mods:relatedItem></mods:mods>
Physical (METS structMap)
<mets:structMap> <mets:div TYPE="photo:photoObject“ DMDID="MODS1"> <mets:div TYPE="photo:version" DMDID="ver01"> <mets:div TYPE="photo:image"> <mets:fptr FILEID="FN10081"/> </mets:div> </mets:div> <mets:div TYPE="photo:version" DMDID=“ver02"> <mets:div TYPE="photo:image"> <mets:fptr FILEID="FN10090"/> </mets:div> <mets:div TYPE="photo:version" DMDID="ver03"> <mets:div TYPE="photo:image"> <mets:fptr FILEID="FN1009F"/> </mets:div> </mets:div> </mets:div> </mets:div></mets:structMap>
mods:mods and mods:relatedItem type ="otherVersion" elements create a sequence of 3 nodes
div TYPE=“photo:version” elements correspond to the 3 nodes using a logical sequence of ID to DMDID relationships
Logical & Physical RelationshipsLogical & Physical Relationships
Validation in METS ProfilesValidation in METS Profiles
» 3 levels of validation for METS objects
» Validation of XML (well-formed)
» Validation of METS/MODS (XML Schema)
» Validation of METS Profile
Example 1: Aggregation Example 1: Aggregation
» METS Song Collection Object
» Hierarchy of METS documentsCollection members include sheet music, an audio recording, a
manuscript, and a biography of the
composer.
Example 2: Aggregation Example 2: Aggregation
» MODS relatedItem type=“host”
» memberOf:Baseball sheet music
Objects can be related to a virtual aggregate
– in this case “Baseball sheet
music”
Example 3: Aggregation Example 3: Aggregation
» “See also” reference
» MODS relatedItem (no type)
Example: Administrative Metadata Example: Administrative Metadata
» PREMIS and MIX for digital images
Software/Tools for METS/MODSSoftware/Tools for METS/MODS
» Emacs – text editor (used to edit MODS)
» nxml-mode – plug-in for schema-aware XML editing
» XML Schemas for METS, MODS, MIX, PREMIS
Software/Tools for METS/MODSSoftware/Tools for METS/MODS
» cygwin – bash shell command line and tools
» Saxon – XSLT transformations
» Xerces – XML validation
» mysql-jdbc-connector – connect to mySQL
» SRU – retrieve records from ILS
» Cocoon – facilities to retrieve and load records, retrieve xml version of a file system, etc.
» Ant – used to automate all of the above tasks and create pipelines of multiple tasks (runs from Emacs)
continued…
Advantages of METS/MODS ApproachAdvantages of METS/MODS Approach
» Ability to model complex library objects
» Ease of change and extension • both the data and the application
» Use of modern, non-proprietary software tools
» Use of XSLT for…• Legacy data conversion
• Batch METS creation and editing
• Web displays and behaviors
» Use of a common syntax – XML • For data creation, editing, storage and searching
continued…
Advantages of METS/MODS ApproachAdvantages of METS/MODS Approach
» Creation of multiple outputs from XML• HTML/XHTML for Web display; PDF for printing
» Ease of editing• Single records or selected batches of records
» Ability to validate data
» Ability to aggregate disparate data sources
» Ease of data management and publishing
» Excellent positioning for the future• New web applications (Web 2.0)• Repository submission and OAI harvesting• Cooperative projects (test interoperability)