OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November...

21
OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting

Transcript of OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November...

Page 1: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

OCLC Online Computer Library Center

Metadata Standards

Eric Childress

OCLC

                                                                                                             

Washington, DCNovember 18, 2003

FEDLINK OCLC Users Group Meeting

Page 2: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

OverviewOverview

Fundamentals– Types of metadata– Document mark-up languages &

character encodings

MetaMap

Metadata formats:– MARC, MODS– DC, ONIX– TEI, EAD, METS, MIX– RDF, FGDC, COSATI

Page 3: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

FundamentalsFundamentals

Descriptive– Title, author, summary, topic, etc.

Technical & Structural– File size, software needed, file type(s),

presentation instructions, etc.

Administrative (a.k.a. “meta-metadata”)– Record number, record date, record source, etc.

Rights– Copyright ownership, use privileges, etc.

Management– [Typically by/for owning agency]: price paid,

circulation restrictions, etc.

5 types of metadata

Page 4: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

FundamentalsFundamentals

Markup languages:– Address the structure of a document– Convey instructions to software that will process text

to: • Index the text for searching• To render the text (e.g., for screen display or print) • Transform the text (e.g., for a voice synthesizer) for some

output device(s) – The markup is generally invisible to end-users

Extensible Markup Language (XML):– XML is a metalanguage

• Agencies define their own XML to suit their task – By creating Document Type Definitions (DTDs) or XML schema

– Data is separate from presentation instructions• Presentation instructions go in a style sheet

– Offers just the right mix of flexibility and structure

Markup languages

Page 5: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

FundamentalsFundamentals

Character encoding:– Used for communicating text characters in a

computing environment– Hundreds of character encoding standards exist– Character conversion is complex and expensive

Unicode: – A single, “comprehensive” global encoding

standard– Includes characters from scripts of all major

modern, most minor, and selected ancient languages

Character Encodings

Page 6: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

http://mapageweb.umontreal.ca/turner/meta/english/metamap.html

MetaMapMetaMap

Page 7: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

MARC 21MARC 21

MARC 21 (ISO 2709)– ISO 2709-based metadata communications protocol– Choice of two character encoding options:

• MARC 8 (ASCII, ANSEL, selected ISO, EACC)• Unicode (limited to equivalents of MARC 8 repertoire)

– XML expression is now also an option– Maintenance agency: Library of Congress w/ NLC, BL

Strengths:• Well-maintained, mature standard• Widely adopted by library communities• Large universe of MARC 21 records available• Wide choice of software vendors

Weaknesses (in the present & future): • Virtually unused outside of libraries • Limits on field and record size• Restricted range of scripts supported • Limited ability to convey complex relationships,

hierarchy, attributes at tag/subfield level

Page 8: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

MODSMODS

Metadata Object Description Schema (MODS) – Essentially MARC 21 recast in an XML-native framework

• Text-based tags rather than numeric ones, • Selected clusters of related MARC 21 attributes condensed into

single MODS element– MARC 21 readily converts to MODS, but you can’t do a

lossless reverse conversion of MODS to MARC 21– Maintenance agency: Library of Congress

Value of MODS:– A rich, library-oriented XML metadata schema– Optimized for from-MARC conversion of legacy records– Well-suited as a metadata format for OAI harvesting

Applications of MODS:– LC planning to convert 100K American Memory records– Minerva project, U of Chicago Press, California Digital

Library, others using or planning to use for records for web sites, e-texts.

– OpenOffice Bibliographic Project

Page 9: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

MARC 21 & MODS MARC 21 & MODS Feature MARC

21MARC

21 Unicod

e

MARC XML

MARC Slim

MODS

Structure ISO 2709

ISO 2709

XML XML XML

Encoding MARC 8 Unicode Unicode Unicode Unicode

Repertoire of scripts JACKPHY

JACKPHY JACKPHY JACKPHY Unicode

Conversion from MARC 21 lossless lossless lossless

minimal loss lossless

Conversion to MARC 21 lossless lossless lossless lossless? minor loss

        Bibliographic OCLC OCLC R OCLC R OCLC R OCLC DCPS

        Authority OCLC OCLC R

OCLC R

OCLC R x  

        Classification x OCLC R OCLC R x  

        Community x x x x 

        Holdings OCLC x x x  

Page 10: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

Dublin CoreDublin Core

Dublin Core Metadata Element Set– ISO 15836:2003(E) The Dublin Core metadata element set– A standard for cross-domain resource description

• Designed primarily to support discovery and retrieval – Defines semantics but not syntax (i.e. container)– Choice of simple or qualified DC – Maintenance agency: Dublin Core Metadata Initiative (DCMI)

hosted by OCLC Research

Value of Dublin Core:– Simplicity, extensibility, interoperability– Worldwide adoption (DCMES translated into 20+ languages)– Usable as crosswalk between major metadata standards

Applications of Dublin Core:– Open Archives Initiative (OAI) mandates DC metadata– Wide variety of extended versions in use:

• In digital library, archives, museums projects • By e-government programs (AU, CA, DK, FI, IE, NZ, UK)

– OCLC usage: Connexion, DCPS, ContentDM, Research

Page 11: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

ONIXONIXONIX International (Online Information Exchange):– Standard data exchange format for publishers & jobbers

• Based on EPICS (EDItEUR Product Information Communication Standards)– For representing and communicating book industry product

information in electronic form• Offers two levels of richness (level 1 & level 2)

– XML schema with Unicode encoding– Maintenance agency: EDItEUR working with input from the

Book Industry Communication (BIC) and the Book Industry Study Group (BISG)

Value of ONIX:– Meets needs of publishers, jobbers, retail sellers for:

• Easier access to richer book data (including bibliographic data, cover art, blurbs, TOCs, UPC data, and much more)

• An inexpensive-to-implement common data exchange format

Applications of ONIX:– Primarily oriented towards publishers, jobbers, retailers

• Most major players (Amazon, Baker & Taylor, etc.) now using/supporting ONIX

– Some interest by libraries & ILS vendors in ONIX

Page 12: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

TEITEI

Text Encoding Initiative (TEI):– For complex markup of literary texts– Both SGML & XML DTDs available– TEI “header” (TEIH) can be used as a metadata record– Maintenance agency: TEI Consortium:

• TEI Consortium has executive offices in Bergen, Norway, and is hosted at four university sites worldwide: the Univ. of Bergen, Brown Univ., Oxford Univ., and the Univ. of Virginia

• Maintains “P4” Guidelines for Electronic Text Encoding and Interchange

Value of TEI:– Designed to meet the needs of scholarly research community

(esp. in the humanities) for a variety of activities including:• Adding in-line academic commentary in e-texts• As an aid to research by supporting special indexing points, etc.

Applications of TEI:– Widely used by major humanities electronic text collections

such as CETH, UVa e-text center, many others.

Page 13: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

EADEAD

Encoded Archival Description (EAD)– A format for expressing electronic archival finding aids – EAD DTD (Version 2002) is designed to function as both an

SGML and XML DTD– Maintained jointly by the Library of Congress and the Society

of American Archivists (SAA)

Value of EAD: – Effectively an organized presentation of a collection of

documents (typically in an archive or manuscript collection)• EAD header carries metadata for the finding aid• Provides for simple or complex mark-up to support varying

levels of indexing• Well-suited for interweaving narrative with links to specific

objects in a collection (either directly to the object or via a record for the object that may link to the object).

Applications of EAD:– Conversion of existing paper finding aids to electronic form– Widely used by academic institutions and archives in North

America– RLG Archival Resources database host copies of many EADs

                                                                                                                                                                                                                                                       

          

Page 14: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

METSMETS

Metadata Encoding and Transmission Standard (METS)– A standard “shell” for encoding data essential for retrieving,

preserving, and serving up digital resources • Six modules define descriptive, administrative, structural, rights

and other metadata • Some parts of a METS object may be external (e.g., a MODS

record for the descriptive metadata)– Maintenance agency: Library of Congress

Value of METS:– Need for METS identified at DLF metadata experts meetings

• Varied local approaches to non-descriptive metadata not scaling well & offering little interoperability between agencies

– Offers a standard mode for object “packaging” for preservation, institutional repositories, other activities

Applications of METS:– LC: planning to use with selected moving images, audio

recordings, folk life mixed media collections– OCLC DCPS, RLG, Harvard, Stanford, UC Berkeley, National

Library of Wales exploring or using for variety of projects

Page 15: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

MIXMIX

Metadata for Images in XML (MIX)– XML schema for a set of technical data elements required to

manage digital image collections– Format for interchange and/or storage of the data specified

in the NISO Draft Standard Data Dictionary: Technical Metadata for Digital Still Images (version 1.2)

– Still in early development and testing phases– Collaboration of: Library of Congress and NISO Technical

Metadata for Digital Still Images Standards Committee

Value of MIX:– Provides a common XML schema for expressing technical

data particular to still and moving digital images– Can be used with other schema such as METS and MODS as

part of a comprehensive approach to managing and preserving digital images

Applications of MIX:– OCLC DCPS, LC, others planning or testing – MIX still in nascent stage of development and testing

Page 16: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

Summary Summary DC ONIX TEI EAD METS MIX

Structure   XML XML XML XML XML

Encoding   Unicode Unicode Unicode Unicode Unicode

Repertoire of scripts   Unicode Unicode Unicode Unicode Unicode

Conversion from MARC 21Lossiness

variesMinimal

lossHeader only

- lossyHeader

only - lossy   

Conversion to MARC 21Minimal

loss

Some ONIX-only data may

be lostHeader only – lossless

Header only –

lossless   

Chief purpose Simple description

for discovery &

retrieval

Publisher product

info exchange

Markup of scholarly

Etexts

Markup of electronic

finding aids

Shell with technical

data

Technical data for digital images

Primary user base e-Govt, Libraries, Museums, Archives,

Publishers, Jobbers

Humanities scholars

Archives, Libraries

Archives, Libraries

Archives, Libraries

Maintenance agency

DCMI EditeurTEI

Consortium LC w/ SAA LC LC

Page 17: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

RDFRDF

Resource Description Format (RDF)– Graphing theory (i.e. arcs and nodes)-influenced, XML

syntax-based metalanguage for expressing metadata about web resources

– Designed to convey metadata for machine consumption (raw RDF is not very human-readable)

– Fundamental building block of RDF is the triple (subject + predicate + object)

– Maintained by the W3C; RDF specification under revision

Value of RDF:– A subject of debate (typically RDF vs. XML)!– Pro: Model-based expression of metadata critical to the

Semantic Web (i.e. derived connections); more flexible, scalable and forgiving standard than XML

– Con: RDF carries unneeded processing overhead vs. XML; RDF specification has too many flaws; few use RDF

Applications of RDF:– Open Directory Project, selected software (e.g., Siderean)– OCLC Connexion exports Dublin Core in RDF/XML

Page 18: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

CSDGM (a.k.a. FGDC)CSDGM (a.k.a. FGDC)

Content Standard for Digital Geospatial Metadata (CSDGM) [better known as “FGDC”]– CSDGM Version 2 - FGDC-STD-001-1998– Defines a common set of terminology and definitions for the

documentation of digital geospatial data – Maintained by Federal Geographic Data Committee (FGDC)

[an interagency committee]– Crosswalk of FGDC to ISO 19115:2003(E) Geographic

information - Metadata available; ANSI technical amendment for ISO-FDGC harmonization in progress

Value of FGDC:– Provides common standard for publishing metadata about

geospatial resources– Widely used by government and business – Many systems and applications support the standard

Applications of FGDC:– Adopted or usable by major geospatial agencies in West.– Usefulness extended with profiles (e.g. Biological Data)

Page 19: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

COSATICOSATI

Committee on Scientific and Technical Information (COSATI) – Cataloging rules and record format for the descriptive

cataloging of technical reports and similar documents– Field tags are alpha strings (not numerical like MARC)– Related COSATI subject category list can be used– Owned by CENDI (the Commerce, Energy, NASA, Defense

Information Managers Group) [successor to COSATI]

Value of COSATI:– Supports straightforward capture of useful metadata for

scientific and technical information

Applications of COSATI:– Used by a number of science/technical and defense U.S.

federal agencies– Small number of library systems (e.g., SIRSI) support COSATI

record import/export– COSATI can be converted to MARC if desired

Page 20: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

OCLC Online Computer Library Center

Questions

Page 21: OCLC Online Computer Library Center Metadata Standards Eric Childress OCLC Washington, DC November 18, 2003 FEDLINK OCLC Users Group Meeting.

LinksLinksDublin Core: http://www.dublincore.org

EAD: http://www.loc.gov/ead

FGDC: http://www.fgdc.gov/metadata/meta_stand.html

MARC 21: http://lcweb.loc.gov/marc/marcdocz.html

MARCXML: http://www.loc.gov/marc/marcxml.html

METS: http://www.loc.gov/standards/mets

MIX: http://www.loc.gov/standards/mix

MODS: http://www.loc.gov/standards/mods

ONIX: http://www.editeur.org/onix.html

RDF: http://www.w3.org/RDF

TEI: http://www.tei-c.org

OCLC Research: http://www.oclc.org/research