Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail...

42
Using <METS> and <MODS> to Create XML Standards- based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO) Library of Congress

Transcript of Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail...

Page 1: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Using <METS> and <MODS> to Create XML Standards-based

Digital Library Applications

Using <METS> and <MODS> to Create XML Standards-based

Digital Library Applications

Morgan Cundiff & Nate TrailNetwork Development and MARC Standards Office (NDMSO)

Library of Congress

Page 2: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

XML is the lingua franca of the WebXML is the lingua franca of the Web

» Web pages increasingly use XHTML

» Business use for data exchange/ messaging

» Family of technologies can be leveraged• XML Schema, XSLT, XPath, and XQuery

» Software tools widely available (open source)• Storage, editing, parsing, validating, transforming and

publishing XML – constantly and actively improved

» Microsoft Office 2003 supports XML as document format (WordML and ExcelML)

» Web 2.0 applications based on XML (AJAX, Semantic Web, Web Services, etc.)

Page 3: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

XML (Extensible Markup Language)XML (Extensible Markup Language)

“XML has become the de-facto standard

for representing metadata descriptions

of resources on the Internet.”

Dr. Jane HunterUniversity of Queensland, Australia

Working towards MetaUtopia – A Survey of Current Metadata Research

Page 4: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Interoperability and StandardsInteroperability and Standards

“In moving from dispersed digital collections to

interoperable digital libraries, the most important

activity we need to focus on is standards… most

important is the wide variety of metadata standards

[including] descriptive metadata… administrative

metadata…, structural metadata, and terms and

conditions metadata…”

Dr. Howard Besser, New York UniversityThe Next Stage: Moving from Isolated Digital Collections to Interoperable Digital Libraries

Page 5: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

XML and Digital LibrariesXML and Digital Libraries

» Family of XML data standards• METS – Metadata Encoding and Transmission Standard

• MODS – Metadata Object Description Schema

• MIX – Metadata for Images in XML

• PREMIS – PREservation Metadata Implementation Strategies

• TEI – Text Encoding Initiative

• EAD – Encoded Archival Description

Page 6: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

XML and Digital LibrariesXML and Digital Libraries

» METS Implementors• Library of Congress, OCLC, RLG, California Digital Library

(CDL), Harvard, Princeton, National Library of Portugal, National Library of Wales, University of Indiana, Stanford, New York University, University of Göttingen, Oxford University, and more …

» METS Software Tools• METS Toolkit & DRS METS Archive Tool (Dmart) for Audio

Deposit (Harvard), 7train METS Generation Tool (CDL), MEX Authoring Tools (Das Bundesarchiv), ContentE (Biblioteca Nacional Digital, Portugal), METS Navigator (Indiana University DL Program) ResCarta Metadata Creation Tool (ResCarta Foundation), and more …

» METS listserv: 550 subscribers

Page 7: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

XML at LC: A Historical PerspectiveXML at LC: A Historical Perspective» 1995 – American Memory released (not XML-based)

» 1998 – XML 1.0 becomes W3C Recommendation

» 2002 – METS and MODS released

» 2002 – Digital Audio-Visual Preservation Prototyping Project (first use of METS, MODS, and MIX at LC)

» 2003 – Patriotic Melodies (first use of METS and MODS in production at LC – this is later added toI Hear American Singing)

» 2003 – Veterans History Project database released, MINERVA project (MODS)

continued…

Page 8: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

XML at LC: A Historical PerspectiveXML at LC: A Historical Perspective

» 2004 – I Hear America Singing released (since renamed to LC Presents)

» 2004 – Justice Blackmun Papers collection released

» 2006 – National Digital Newspaper Project as repository submission package at LC (LC and partners, 1st use of METS, MODS, MIX, PREMIS)

» 2006 – Ser2Dig (Digital Serials workgroup, METS for multi-volume monographs)

» 2006 – Draft METS profile for “article-level” historical newspapers

Page 9: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

What is METS?What is METS?

» Metadata Encoding and Transmission Standard

» An XML Schema for the purpose of creating XML document instances that express…• the hierarchical structure of digital library objects

• the names and locations of the files that comprise the digital object

• the associated metadata (e.g., MODS)

» METS can be used as a tool for modeling real world objects, such as specific document types

Page 10: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

What is MODS?What is MODS?

» Metadata Object Description Schema

» An XML Schema designed for expressing bibliographic data• Can be viewed as an alternative to the MARC format

• Especially useful for XML-based digital library projects

• Can be used as an extension schema to METS

Page 11: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

What is MODS?What is MODS?

» Metadata Object Description Schema

» An XML Schema designed for expressing bibliographic data• Can be viewed as an alternative to the MARC format

• Especially useful for XML-based digital library projects

• Can be used as an extension schema to METS

» Note to catalogers: MODS does not make you obsolete! The same knowledge and skills needed for traditional cataloging (AACR, controlled vocabularies, etc.) still apply. You will only need to learn a different syntax (i.e., different from MARC) for expressing bibliographic information in machine-readable form.

Page 12: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Structure of METSStructure of METS

» There are 7 sections in a METS document

<mets>

<metsHdr/> - METS header (document talks about itself)

<dmdSec/> - Descriptive metadata (MODS, etc.)

<amdSec/> - Administrative metadata (copyright info., etc.)

<fileSec/> - File section (names and locations of files)

<structMap/> - Structural map (relationships of the parts)

<structLink/> - Linking information

<behaviorSec/> - Binding executables/actions to object

</mets>

Page 13: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Wrap Descriptive Metadata in METSWrap Descriptive Metadata in METS

» Use <mdWrap> to embed descriptive metadata within a METS document

<mets> … <dmdSec> <mdWrap> <xmlData> <!-- insert metadata from different namespace here --> </xmlData> </mdWrap> </dmdSec> …</mets>

Metadata wrap section acts as “socket” to hold metadata from

other XML schemas or “vocabularies”

Page 14: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

<dmdSec> with MODS Extension Schema<dmdSec> with MODS Extension Schema

<mets:mets> … <mets:dmdSec> <mets:mdWrap> <mets:xmlData> <mods:mods></mods:mods> </mets:xmlData> </mets:mdWrap> </mets:dmdSec> …</mets:mets>

Descriptive metadata section

MODS data contained inside the metadata

wrap section

Use of prefixes before element names to identify schema

Page 15: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

<dmdSec> with <mods:relatedItem><dmdSec> with <mods:relatedItem><mets:mets> … <mets:dmdSec> <mets:mdWrap> <mets:xmlData> <mods:mods> <mods:relatedItem type=“constituent”> <mods:relatedItem type=“constituent”></mods:relatedItem> </mods:relatedItem> </mods:mods> </mets:xmlData> </mets:mdWrap> </mets:dmdSec> …</mets:mets>

The MODS releatedItem element can be nested

and can be used to express a hierarchy.

Page 16: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

<mods:mods> <mods:titleInfo> <mods:title>Bernstein conducts Beethoven </mods:title> </mods:titleInfo> <mods:name> <mods:namePart>Bernstein, Leonard</mods:namePart> </mods:name> <mods:relatedItem type="constituent"> <mods:titleInfo>

<mods:title>Symphony No. 5</mods:title> </mods:titleInfo> <mods:name>

<mods:namePart>Beethoven, Ludwig van</mods:namePart> </mods:name> <mods:relatedItem type="constituent">

<mods:titleInfo> <mods:partName>Allegro con moto</mods:partName> </mods:titleInfo> </mods:relatedItem> <mods:relatedItem type="constituent"> <mods:titleInfo> <mods:partName>Adagio</mods:partName> </mods:titleInfo> </mods:relatedItem>

</mods:relatedItem> </mods:mods>

Page 17: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

MODS relatedItem type=“constituent”MODS relatedItem type=“constituent”

» Child element to MODS

» relatedItem element uses MODS content model• titleInfo, name, subject, physicalDescription, note, etc.

» Makes it possible to create rich analytics for contained works within a MODS record

» Repeatable and nestable recursively • Making it possible to build a hierarchical tree structure

» Makes it possible to associate descriptive data with any structural element

Page 18: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

METS 2 Hierarchies: Logical & Physical METS 2 Hierarchies: Logical & Physical <mets:mets> <mets:dmdSec> <mets:mdWrap> <mets:xmlData> <mods:mods> <mods:relatedItem> <mods:relatedItem></mods:relatedItem> </mods:relatedItem> </mods:mods> </mets:xmlData> </mets:mdWrap> </mets:dmdSec> <mets:fileSec></mets:fileSec> <mets:structMap> <mets:div> <mets:div></mets:div> </mets:div> </mets:structMap></mets:mets>

<mets:mets> <mets:dmdSec> <mets:mdWrap> <mets:xmlData> <mods:mods> <mods:relatedItem> <mods:relatedItem></mods:relatedItem> </mods:relatedItem> </mods:mods> </mets:xmlData> </mets:mdWrap> </mets:dmdSec> <mets:fileSec></mets:fileSec> <mets:structMap> <mets:div> <mets:div></mets:div> </mets:div> </mets:structMap></mets:mets>

Hierarchy to represent “logical” structure (nested

relatedItems)

Hierarchy to represent “physical” structure (nested

div elements)

Page 19: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

(XML ID/IDREF links)(XML ID/IDREF links)DescMD

mods

relatedItem

relatedItem

AdminMD

techMD

sourceMD

digiprovMD

rightsMD

fileGrp

file

file

StructMap

div

div

fptr

div

fptr

Linking in METS DocumentsLinking in METS Documents

Page 20: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

(XML ID/IDREF links)(XML ID/IDREF links)DescMD

mods

relatedItem

relatedItem

AdminMD

techMD

sourceMD

digiprovMD

rightsMD

fileGrp

file

file

StructMap

div

div

fptr

div

fptr

Linking in METS DocumentsLinking in METS Documents

Page 21: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

(XML ID/IDREF links)(XML ID/IDREF links)DescMD

mods

relatedItem

relatedItem

AdminMD

techMD

sourceMD

digiprovMD

rightsMD

fileGrp

file

file

StructMap

div

div

fptr

div

fptr

Linking in METS DocumentsLinking in METS Documents

Page 22: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

DescMD

mods

relatedItem

relatedItem

AdminMD

techMD (mix)

sourceMD

digiprovMD

rightsMD

fileGrp

file

file

StructMap

div

div

fptr

div

fptr

(XML ID/IDREF links)(XML ID/IDREF links)

Linking in METS DocumentsLinking in METS Documents

Page 23: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

DescMD

mods

relatedItem

relatedItem

AdminMD

techMD (mix)

sourceMD

digiprovMD

rightsMD

fileGrp

file

file

StructMap

div

div

fptr

div

fptr

(XML ID/IDREF links)(XML ID/IDREF links)

Linking in METS DocumentsLinking in METS Documents

Page 24: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

DescMD

mods

relatedItem

relatedItem

AdminMD

techMD (mix)

sourceMD

digiprovMD

rightsMD

fileGrp

file

file

StructMap

div

div

fptr

div

fptr

(XML ID/IDREF links)(XML ID/IDREF links)

Linking in METS DocumentsLinking in METS Documents

Page 25: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

What is a METS Profile?What is a METS Profile?

» Description of a class of METS documents• provides document authors and programmers guidance to

create and process conformant METS documents

» XML document using a schema• Expresses the requirements that a METS document must

satisfy

» “Data standard” in its own right• A sufficiently explicit METS Profile may be considered a

“data standard”

» METS Profiles are human-readable prose and not intended to be “machine actionable”

Page 26: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

METS Profile ExcerptMETS Profile Excerpt

» Recorded Event – structMap requirement

Page 27: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

METS Profiles Used in LC PresentsMETS Profiles Used in LC Presents

» Sheet Music

» Musical Score (score, score and parts, or a set of parts only)

» Print Material (books, pamphlets, etc)

» Music Manuscript (score or sketches)

» Recorded Event (audio or video)

» PDF Document

» Bibliographic Record

» Photograph

» Compact Disc

» Collection

Page 28: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Multiple Inputs to Common Data FormatMultiple Inputs to Common Data Format

New DigitalObjects

LegacyDatabase

Profile-basedMETSObject

A common data format for searching

and display

Harvest of American Memory Objects

Page 29: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Example 1: New Digital Object Example 1: New Digital Object

» METS Musical Score Profile

» Library of Congress Marchby John Philip Sousa

» Musical score and parts

Page 30: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Example 2: New Digital Object Example 2: New Digital Object

» METS Recorded Event Profile

» Juilliard String Quartet

» Sound Recording

Page 31: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Example 3: Legacy Database Example 3: Legacy Database

» METS Bibliographic Record Profile

» Duke Ellington & His Orchestra(1962) [Motion Picture]

» Bibliographic Information

Convert database from Filemaker Pro to a single XML file.

XSLT stylesheet creates 14,000 METS/MODS records.

XSL-FO stylesheet creates single PDF document.

Page 32: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Example 4: American Memory HarvestExample 4: American Memory Harvest

» METS Photograph Profile

» William P. Gottlieb CollectionPortrait of Louis Armstrong

» Photographic object

Convert file of 1600 MARC records, using marc4j, to XML

modsCollection (single file).

Used XSLT stylesheet to create 1600 records conforming to the METS photograph profile.

Page 33: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Logical (MODS)

<mods:mods ID="ver01"> <mods:titleInfo> <mods:title>Original Work</mods:title> </mods:titleInfo><mods:relatedItem type="otherVersion" ID="ver02"> <mods:titleInfo> <mods:title>Derivative Work 1</mods:title> </mods:titleInfo></mods:relatedItem><mods:relatedItem type="otherVersion" ID="ver03"> <mods:titleInfo> <mods:title>Derivative Work 2</mods:title> </mods:titleInfo></mods:relatedItem></mods:mods>

Physical (METS structMap)

<mets:structMap> <mets:div TYPE="photo:photoObject“ DMDID="MODS1"> <mets:div TYPE="photo:version" DMDID="ver01"> <mets:div TYPE="photo:image"> <mets:fptr FILEID="FN10081"/> </mets:div> </mets:div> <mets:div TYPE="photo:version" DMDID=“ver02"> <mets:div TYPE="photo:image"> <mets:fptr FILEID="FN10090"/> </mets:div> <mets:div TYPE="photo:version" DMDID="ver03"> <mets:div TYPE="photo:image"> <mets:fptr FILEID="FN1009F"/> </mets:div> </mets:div> </mets:div> </mets:div></mets:structMap>

mods:mods and mods:relatedItem type ="otherVersion" elements create a sequence of 3 nodes

div TYPE=“photo:version” elements correspond to the 3 nodes using a logical sequence of ID to DMDID relationships

Logical & Physical RelationshipsLogical & Physical Relationships

Page 34: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Validation in METS ProfilesValidation in METS Profiles

» 3 levels of validation for METS objects

» Validation of XML (well-formed)

» Validation of METS/MODS (XML Schema)

» Validation of METS Profile

Page 35: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Example 1: Aggregation Example 1: Aggregation

» METS Song Collection Object

» Hierarchy of METS documentsCollection members include sheet music, an audio recording, a

manuscript, and a biography of the

composer.

Page 36: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Example 2: Aggregation Example 2: Aggregation

» MODS relatedItem type=“host”

» memberOf:Baseball sheet music

Objects can be related to a virtual aggregate

– in this case “Baseball sheet

music”

Page 37: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Example 3: Aggregation Example 3: Aggregation

» “See also” reference

» MODS relatedItem (no type)

Page 38: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Example: Administrative Metadata Example: Administrative Metadata

» PREMIS and MIX for digital images

Page 39: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Software/Tools for METS/MODSSoftware/Tools for METS/MODS

» Emacs – text editor (used to edit MODS)

» nxml-mode – plug-in for schema-aware XML editing

» XML Schemas for METS, MODS, MIX, PREMIS

Page 40: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Software/Tools for METS/MODSSoftware/Tools for METS/MODS

» cygwin – bash shell command line and tools

» Saxon – XSLT transformations

» Xerces – XML validation

» mysql-jdbc-connector – connect to mySQL

» SRU – retrieve records from ILS

» Cocoon – facilities to retrieve and load records, retrieve xml version of a file system, etc.

» Ant – used to automate all of the above tasks and create pipelines of multiple tasks (runs from Emacs)

continued…

Page 41: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Advantages of METS/MODS ApproachAdvantages of METS/MODS Approach

» Ability to model complex library objects

» Ease of change and extension • both the data and the application

» Use of modern, non-proprietary software tools

» Use of XSLT for…• Legacy data conversion

• Batch METS creation and editing

• Web displays and behaviors

» Use of a common syntax – XML • For data creation, editing, storage and searching

continued…

Page 42: Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)

Advantages of METS/MODS ApproachAdvantages of METS/MODS Approach

» Creation of multiple outputs from XML• HTML/XHTML for Web display; PDF for printing

» Ease of editing• Single records or selected batches of records

» Ability to validate data

» Ability to aggregate disparate data sources

» Ease of data management and publishing

» Excellent positioning for the future• New web applications (Web 2.0)• Repository submission and OAI harvesting• Cooperative projects (test interoperability)