Millennium and XML: Repurposing and Customizing Metadata

39
Lucas Mak and Dao Rong Gong Michigan State University Millennium and XML: Repurposing and Customizing Metadata May 17 - 20, 2009

description

Millennium and XML: Repurposing and Customizing Metadata. Lucas Mak and Dao Rong Gong Michigan State University. May 17 - 20, 2009. Today’s Outline. Overview of Metadata Millennium system and XML Overview of XSLT Case Studies Sunday School Books Collection New Book List - PowerPoint PPT Presentation

Transcript of Millennium and XML: Repurposing and Customizing Metadata

Page 1: Millennium and XML: Repurposing and Customizing Metadata

Lucas Mak and Dao Rong Gong

Michigan State University

Millennium and XML: Repurposing and Customizing Metadata

May 17 - 20, 2009

Page 2: Millennium and XML: Repurposing and Customizing Metadata

Today’s Outline

Overview of Metadata

Millennium system and XML

Overview of XSLT

Case Studies1. Sunday School Books Collection

2. New Book List

Conclusions and Observations

Page 3: Millennium and XML: Repurposing and Customizing Metadata

Metadata

Structured data or information about an information resource.

Types of metadata:– Descriptive– Administrative/Rights– Preservation– Technical– Structural

Page 4: Millennium and XML: Repurposing and Customizing Metadata

Descriptive Metadata

Popular descriptive metadata standards– Dublin Core (Simple & Qualified)– MODS– MARCXML– VRA Core– IEEE LOM– TEI Header– EAD

Page 5: Millennium and XML: Repurposing and Customizing Metadata

Innovative XML

XML records from Millennium

Retrieved through HTTP query

Data arrangement based on MARC fields– But MARC field and its subfields are siblings

Optimized for WebPAC display– Brief record (for search result index page display)

• Contains data from MARC 245, Publication year, record ID

– Full record (for both public and staff MARC display of individual record)

Page 6: Millennium and XML: Repurposing and Customizing Metadata

Public displayPublic display

Staff MARC display

Staff MARC display

Page 7: Millennium and XML: Repurposing and Customizing Metadata

Millennium System and XML

MillenniumMillenniumMillenniumMillennium

Delimited Delimited TextText

Delimited Delimited TextText

MARCMARCMARCMARC

XMLXMLXMLXML

/xrecord

XMLServer

OAIHarvester

Metadata Builder

Content ProContent ProContent ProContent Pro

Page 8: Millennium and XML: Repurposing and Customizing Metadata

/xrecord

Page 9: Millennium and XML: Repurposing and Customizing Metadata

XML Server

XML server query string (search for title “xslt”):

http://magic.msu.edu/xmlopac/?xml=<WXREQ_ROOT><KEY>txslt</KEY></WXREQ_ROOT>

Page 10: Millennium and XML: Repurposing and Customizing Metadata

OAI Harvester

Page 11: Millennium and XML: Repurposing and Customizing Metadata

MetaData Builder

Page 12: Millennium and XML: Repurposing and Customizing Metadata

MetaData Builder

Page 13: Millennium and XML: Repurposing and Customizing Metadata

Content Pro in Encore

Page 14: Millennium and XML: Repurposing and Customizing Metadata

XSLT

Extensible Stylesheet Language Transformation

Current version: 2.0

“Transformation” means:– Manipulation of XML documents by creating a new

document based on the original document• We recommend against multiple bullet indents

Usages in library context:– Crosswalking

• Data selection and manipulation

– Web display• Example: converting EAD into HTML for web display

Page 15: Millennium and XML: Repurposing and Customizing Metadata

XSLT

Uses XPath expressions to select/filter data node– By name of “Element”

• <xsl:for-each select="marc:leader">– By value of “Element” and/or “Attribute”

• <xsl:for-each select="marc:datafield[@tag=650 and @ind2='0']>

• <xsl:if test="$leader7='c'">

Page 16: Millennium and XML: Repurposing and Customizing Metadata

Case Study One

Sunday School Books Collection – 19th century publications by religious

societies– 170 titles digitized and cataloged

Data conversion needs– Source: Millennium– Target: Content Pro– Conversions in:

• Format: .marc to XML• Schema and Data Structure: MARC to Qualified

Dublin Core

Page 17: Millennium and XML: Repurposing and Customizing Metadata

Options for Data Migration

Create Lists

Create Lists

MARCXML

MARCXML

InnovativeXML

InnovativeXML

MARCFile

MARCFile

Content ProContent Pro(QDC)(QDC)

Content ProContent Pro(QDC)(QDC)

MillenniumMillenniumMillenniumMillennium

HTTPQuery

HTTPQuery

XSLTXSLTMARCEditMARCEdit

MARCEditMARCEdit

Page 18: Millennium and XML: Repurposing and Customizing Metadata

Segment of Innovative XML

SiblingsSiblings

MARC field/subfield as value of elementMARC field/subfield as value of element

Field indicator asvalue of elementField indicator asvalue of element

Page 19: Millennium and XML: Repurposing and Customizing Metadata

Segment of MARC21XML

Parent-ChildParent-Child

MARC field/subfield as value of element attributeMARC field/subfield as

value of element attributeField indicator as

value of element attributeField indicator as

value of element attribute

Page 20: Millennium and XML: Repurposing and Customizing Metadata

Segment of MARC21XML

Issues with Innovative XML data conversion needs– Data structured differently from MARC21XML

• Availability of existing “Innovative XML to DC/QDC” XSLT?

– Not optimized for data manipulation• Complications in data selection

» Selection of data node by matching criteria against values in individual elements

» A series of matching may be needed for selecting just one node

• Efficiency in processing» Multiple upward, downward, and lateral movement

involved in data selection

Page 21: Millennium and XML: Repurposing and Customizing Metadata

Final Path of Data Migration

Create Lists

Create Lists

MARCXML

MARCXML

MARCFile

MARCFile Content ProContent Pro

(QDC)(QDC)

Content ProContent Pro(QDC)(QDC)

MillenniumMillennium(.marc)(.marc)

MillenniumMillennium(.marc)(.marc)

XSLTXSLT

MARCEditMARCEdit

MARCEditMARCEdit

Page 22: Millennium and XML: Repurposing and Customizing Metadata

Design of XSLT

Based on LC’s “MARC To Simple DC” XSLT

– Customized mappings according to LC’s suggestions

– Crosswalking strategies• Conditional processing (i.e. matching)

• boolean ( ), contains ( ), starts-with ( )• <xsl:if>, <xsl:choose>, <xsl:when>

• String manipulation• Used in both conditional processing and data selection for

output• substring ( ), substring-before ( ), substring-after ( ),

translate ( ), concat ( ), normalize-space ( )

Page 23: Millennium and XML: Repurposing and Customizing Metadata

Design of XSLT

Conditional Processing & String Manipulation in De-duplication<xsl:for-each

select="marc:datafield[@tag=246]/marc:subfield[@code='a']"> <xsl:if test="not(contains($dataField245Lower,

translate(substring(normalize-space(.),1,string-length()-1),

$upperCase,$lowerCase)))"> <xsl:element name="dcterms:alternative"> <xsl:value-of select="normalize-space

(substring(.,1,string-length()-1))"/>

</xsl:element>

</xsl:if>

</xsl:for-each>

Converts 245 & 246 into lower case before comparing

Chop trailing period (.)

Compare MARC 246 against MARC 245

Page 24: Millennium and XML: Repurposing and Customizing Metadata

Design of XSLT

No <dcterms:alternative> for MARC 246

Page 25: Millennium and XML: Repurposing and Customizing Metadata

Design of XSLT

Predicate

• Used for data selection and de-duplication

<!-- Output MARC 650y as <dcterms:temporal> -->

<xsl:for-each select="marc:datafield[@tag=650 and @ind2='0']

[not(marc:subfield[@code='y'] = preceding-sibling::marc:

datafield[@tag=650 and @ind2='0']/marc:subfield[@code='y'])]/

marc:subfield[@code='y']"> <xsl:element name="dcterms:temporal"> <xsl:value-of select="normalize-space(self::node())"/> </xsl:element> </xsl:for-each>

Selects LCSH only

Selects unique

650$y only

Page 26: Millennium and XML: Repurposing and Customizing Metadata

Design of XSLT

Hard-coding

Inserted elements that are global to all records

<!-- Output <dc:format>application/pdf</dc:format> --><xsl:element name="dc:format">

<xsl:text>application/pdf</xsl:text></xsl:element>

Page 27: Millennium and XML: Repurposing and Customizing Metadata

Segment of Source MARCXML

Page 28: Millennium and XML: Repurposing and Customizing Metadata

Segment of Output QDC XML

Page 29: Millennium and XML: Repurposing and Customizing Metadata

Case Study Two

Library’s book lists

Issues with featured list

Page 30: Millennium and XML: Repurposing and Customizing Metadata

Existing New Book List – Newly cataloged books for browse shelf– New approach using XML and XSLT

New features design– Sorting– RSS feed– Customization

Case Study Two

Page 31: Millennium and XML: Repurposing and Customizing Metadata

New Book List Based on XML File

Millennium XML server outputs two files– Entire new book list over a rolling period of

time– List of daily added books

New Book List program output– Book List in HTML format– RSS feed for daily added books

Page 32: Millennium and XML: Repurposing and Customizing Metadata

Path of Data Processing

Web ServerWeb Server& php& php

Web ServerWeb Server& php& phpMillenniumMillenniumMillenniumMillennium

EXPECTEXPECT XSLTXSLT Internet

XML output

Page 33: Millennium and XML: Repurposing and Customizing Metadata

Design of XSLT

Page 34: Millennium and XML: Repurposing and Customizing Metadata

Design of XSLT

Page 35: Millennium and XML: Repurposing and Customizing Metadata

Design of XSLT

Page 36: Millennium and XML: Repurposing and Customizing Metadata

Putting It Together

Page 37: Millennium and XML: Repurposing and Customizing Metadata

Putting It Together

Page 38: Millennium and XML: Repurposing and Customizing Metadata

Observations and Challenges

Millennium System and XML– XSLT processor within Millennium and

customizing Innovative XML output

Using XML as data source– Large XML file size

XSLT and data processing– XSLT data manipulation– Lack of built-in functions for conditional data

looping etc.