Edward A. Fox fox@vt CS DLRL Virginia Tech, Blacksburg, VA, USA

59
The OAI PMH (Open Archives Initiative Protocol for Metadata Harvesting) MetaScholar Initiative All-Project Meeting Atlanta, GA 6/18/2002 Edward A. Fox [email protected] CS DLRL Virginia Tech, Blacksburg, VA, USA

description

The OAI PMH (Open Archives Initiative Protocol for Metadata Harvesting) MetaScholar Initiative All-Project Meeting Atlanta, GA 6/18/2002. Edward A. Fox [email protected] CS DLRL Virginia Tech, Blacksburg, VA, USA. Acknowledgements. - PowerPoint PPT Presentation

Transcript of Edward A. Fox fox@vt CS DLRL Virginia Tech, Blacksburg, VA, USA

Page 1: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

The OAI PMH(Open Archives Initiative

Protocol forMetadata Harvesting)

MetaScholar InitiativeAll-Project Meeting

Atlanta, GA 6/18/2002

Edward A. [email protected]

CS DLRLVirginia Tech, Blacksburg, VA, USA

Page 2: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Acknowledgements

• Sponsors: Mellon Foundation, SOLINET, NSF, DLF, CNI, UK’s JISC, Virginia’s CIT, …

• OAI Team: Steering Committee, Technical Committee, Developers, Data Providers, Service Providers

• Emory Team, Partners around Southeast• VT Colleagues: Hussein Suleman, Rohit

Kelapure, Ming Luo, Ryan Richardson, Marcos Goncalves, Priya Shivakumar, Baoping Zhang, students working on term projects, …

Page 3: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Contents

• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion

Page 4: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Open Archives Initiative

OAIwww.openarchives.org

[email protected]

Page 5: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Open Archives Initiative (OAI)• xxx@LANL, high-energy physics (Ginsparg, 1991)• CSTR + WATERS = NCSTRL (Lagoze,1994)• xxx + NCSTRL = CoRR collaboration (1998)• Universal Preprint Service protoproto, Oct. 21-22, 1999,

Santa Fe – led by LANL, CNI, DLF, Mellon --> OAi• Santa Fe Convention (see Feb 2000 D-Lib Magazine article)• Archives -> Open Archives

• Support unique archive identifiers• Implement metadata set(s) (DC, using XML)• Implement OA harvesting protocol• Register the archive

• Build tools, layer other services: linking, searching, …

Page 6: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

OAi Philosophy

• Self-archiving = submission mechanism• Long-term storage system = archive• Open interface = harvesting mechanism• Data provider + service provider• Start with “gray literature”

• e-prints/pre-prints, reports, dissertations, …

Page 7: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Began as “archives of the world unite!”

OAI

Page 8: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Open Archives (protoproto)

• ArXiv & Los Alamos National Lab• CogPrints & U. Southampton• NACA & NASA (reports)• NCSTRL & Cornell U.• NDLTD & Virginia Tech• RePEc & U. Surrey• Total of around 200K records

Page 9: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Original Open Archives Members

• American Physical Society

• California Digital Library

• Caltech

• Coalition for Networked Info.

• Cornell University

• Harvard University

• Library of Congress

• Los Alamos Nat’l Lab

• Mellon Foundation

• NASA Langley Research Cntr

• Old Dominion University

• Stanford University

• U. of Ghent

• U. of Surrey

• U. of Southampton

• Vanderbilt University

• Virginia Tech

• Washington University

Page 10: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Contents

• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion

Page 11: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Now is a Technical Umbrella forPractical Interoperability…

ReferenceLibraries

PublishersE-Print

Archives

…that can be exploited by different communities

Museums

Page 12: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

DiscoveryCurrent

AwarenessPreservation

Service Providers

Data Providers

Meta

data

harv

estin

g

The World According to OAI

Page 13: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Aggregation throughOAI Harvesting –

Black Box Perspective

OA 1

OA 2

OA 4

OA 3

OA 5OA 6

OA 7

Page 14: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Aggregation throughOAI Harvesting –By Organization

Theology

Emory

GA

UGA

U FLUTK

AmSo

Library

Page 15: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Aggregation throughOAI Harvesting –

By Topic

Confederate Constitution

Civil War

History

Oral

SportsCulture

AmSo

Diaries

Page 16: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Approaches to Aggregation

Build ByDiscipline

Build By Institution

Page 17: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Types of Access Possible

Build ByDiscipline

Build By Institution

YearCategoryPersonageAuthorGenreQuery …

Page 18: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

OAI Repository

Required: Protocol

DODO DO DO

MDO

MDO MDOMDOMDO

MDOMDOMDO

Page 19: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Metadata vs. Data

• Data refers to digital objects or digital representations of objects

• Metadata is information about the objects (e.g. title, author, etc.)

• OAI focuses on metadata, with the implicit understanding that metadata usually contains useful links to the source digital objects

Page 20: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Metadata: Complex to Simple

MARC (>$50) Dublin Core (DC)

Page 21: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

repository

repos i tory

OAI protocol

harves ter

supportdata

harvestingdata

items

Page 22: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

identifiers

oai-identifier = oai:archive-identifier:record-identifier

Registered URI

Scheme

Archive Identifier:Registered within

OAI

Unique ID within archive:

(syntax is archive-specific)

example = oai:ncstrl:ncstrl.cornellcs/TR94-1418

locally unique key for extracting a record from a repository

Page 23: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

selective harvesting - datestamps

repos i tory

harvest withindate range

record

record

Page 24: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

selective harvesting - sets

repos i tory

harvest within setS1

recordrecord

record

S2

Page 25: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Summary:Protocol for Metadata Harvesting• Service Requests

• Identify• ListMetadataFormats• ListSets• GetRecord• ListIdentifiers• ListRecords

• Metadata Multiplicity• Date (and Time) Ranges• Resumption Tokens

Page 26: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Harvesting vs. Federation

• Competing approaches to interoperability• Federation is when services are run remotely on remote data

(e.g., federated searching)

• Harvesting is when data/metadata is transferred from the remote source to the destination where the services are located (e.g., union catalogues)

• Federation requires more effort at each remote source but is easier for the local system and vice versa for harvesting

• OAI (currently) focuses on harvesting

Page 27: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Contents

• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion

Page 28: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Example 1: Union Collection of ETDs(Electronic Theses and Dissertations,

for Networked Digital Library ofTheses and Dissertations, NDLTD)

VIRTUA

Merged Metadata Collection

MARIAN

Virginia Tech ETD Archive

Duisburg ETD

Archive

HumboldtETD

Archive

Future: recommender, …

… OAI Data Provider

OAI Service Provider

OAI Harvesting

LEGEND

Page 29: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Example 1: Details

NDLTD Site / Member

Local DB

OAI Server

Local Search / Brow se

Student Entry

NDLTD Central

OAI Harvester

Name Authority Service

(e.g. OCLC)

MARIAN Union

Catalog

VTLS Union Catalog

MARC DB

Virtua

Conversion

Alternate MARC Transport (f tp?) tapes?)

Librarian Verif ication / Validation / Enrichment / Maintenance

Page 30: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Example 2: NSDL Information ArchitectureEssentially as developed by the Technical Infrastructure

Workgroup

referenceditems &

collections

referenceditems &

collections

Special Databases

NSDLServicesNSDL

ServicesOther NSDLServices

CI Services

annotation

CI Services

discussion

CI Services

personalization

CI Services

authentication

CI Services

browsing

Core Services:information retrieval

Core Collection-Building Services

harvesting

Core Collection-Building Services

protocols

Core Services:metadata gathering

Portals &ClientsPortals &

ClientsPortals &Clients

Usage Enhancement

Collection Building

User Interfaces

NSDLCollections

NSDLCollections

NSDLCollections

CoreNSDL“Bus”

Page 31: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Example 2: CITIDEL -> NSDL

• Computing and Information Technology Interactive Digital Education Library

• A collection project in the National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL

• www.nsdl.nsf.gov

• www.nsdl.org

Page 32: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Union Metadata Repository

OAI Data

Provider

Laboratories Repository

Applets Repository

Papers Repository

Syllabi Repository

. . .

Digital Library Services

OAI Data

Harvester

Example 2: CITIDELDistributed repository structure

Page 33: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Example 2: NSDL Collections(themes relevant to our projects)

• Discovery of content

• Classification and cataloguing

• Acquisition and/or linking; referencing

• Disciplinary-based themes define a natural body of content, but other possibilities are also encouraged

• Software tool suites for analysis, modeling, simulation, or visualization

• Reviewed commentary on pedagogy

Page 34: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Contents

• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion

Page 35: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Open Digital LibrariesXOAI-PMH

• Dissertation work of Hussein Suleman (member of OAI technical committee)

• Extending the OAI protocol• Supporting rapid development of DLs using

networks of components• Demonstrated with NDLTD, CSTC• Described in Dec. 2001 D-Lib Magazine

article, and article scheduled for publication

Page 36: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Open Digital LibrariesComponents

• Running now• XML-File (data provider from file system)• Union, search, browse, recent, filter• E-journal support system

• Class projects• High performance multilingual search• Recommender• User rating

• Others discussed• Classification/categorization and browsing

Page 37: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Component System Approach• (Open) DL = Network of Extended OAs

Local Archive

Data Input

Remote Archive

Browse

Metadata Repository

Search Recommend

Resource Discovery

User Interface

OAI/ODL archive

OAI/ODL protocol

leg

end

Page 38: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Example Architecture (NDLTD)

Humboldt

Duisburg

MIT Filter

MIT

Browse

Union Catalog

Search Recent

User Interface

User Interface

OAI/ODL archive

OAI/ODL protocol

leg

end

Virginia Tech

PhysNet

CalTech

Dresden

Page 39: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Contents

• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion

Page 40: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

OAI Tools

• Related resources, e.g., XML, Unicode • Submission / author support

• XML Schema Validator• Servers and utilities, e.g., ARC, Kepler, EPrints • Repository Explorer

• Interactive Browsing• Testing of parameters• Multiple views of data• Multilingual support• Automatic test suite

Page 41: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Author‘s toolswww.physik.uni-oldenburg.de/EPS/mmm

Page 42: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

XSV Schema Validator

Page 43: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

ARC (arc.cs.odu.edu)

Page 44: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA
Page 45: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA
Page 46: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

VT Tool: Repository Explorer

• The Repository Explorer is a tool for browsing and testing Open Archives, by Hussein Suleman

• You issue commands and see the results

• You also can perform a sequence of automatic tests

• http://purl.org/net/oai_explorer

Page 47: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

VTTool:

RE1.3

Page 48: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

VT Tool: Request, Responsehttp://scholar.lib.vt.edu/theses/OAI/cgi-bin/index.pl?verb=GetRecord&metadataPrefix=oai_etdms&identifier=oai:VTETD:etd-520112859651791

Request

Response

Page 49: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Contents

• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion

Page 50: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

What will central service look like? (1 of 2)

• Harvesting from local sites

• Rich content, drawn from all participating sites

• Data management• Logging and reporting• Repository/preservation/mirroring • Adding/updating/deleting• User interface and support for digital librarians and

data providers

Page 51: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

What will central service look like? (2 of 2)

• Adding value• De-duping• Categorization/classification -> browsing• Normalization/standardization -> authority control• Tools for communication/collaboration/annotation

-> security/privacy

• User interface for both general users and scholars

Page 52: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

What are needs at local sites?

• Increasing OAI expertise

• Connecting OAI with local systems

• Supporting standards, normalization

• Supporting continual updating

• Passing enhancements upstream

Page 53: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

How can VT help? (1 of 2)

• Usability studies for central site

• Help develop consensus

• Help plan system architecture & services

• Education/training

• Provide and support tools/systems

• Help sites engage, become OAI compliant

Page 54: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

How can VT help? (2 of 2)

• Standards• MARC-XML

• ODL Suite• Download and configure• Use in packaged forms, or re-architected

• Support• Connecting your system into OAI• Help with OAI Tools

Page 55: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

MARC XML-DTD

• XML Transport format for US-MARC records

• Standardized metadata exchange format for traditional library services joining OAI

Page 56: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Contents

• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion

Page 57: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Conclusion• Rethink your efforts in terms of providers of

• Data, Services• Reduced work for data providers

• Tools available• Don’t need to offer services

• Reduced work for service providers• Others provide the data• Can use tools and systems for OAI, XOAI

• Results• More data becoming available• To more people• Supported by improved services

• MetaScholar can be a win-win-win project!

Page 58: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

Links

• Open Archives Initiative• http://www.openarchives.org

• OAI Metadata Harvesting Protocol• http://www.openarchives.org/OAI/openarchivesprotocol.htm

• Virginia Tech DLRL OAI Projects• http://www.dlib.vt.edu/projects/OAI/• http://oai.dlib.vt.edu/odl

• Repository Explorer• http://purl.org/net/oai_explorer

• NDLTD• http://www.ndltd.org

Page 59: Edward A. Fox fox@vt CS            DLRL Virginia Tech, Blacksburg, VA, USA

More Links

• ARC Cross-Archive Search Service• http://arc.cs.odu.edu/

• XML Schema Validator• http://www.w3.org/2001/03/webdata/xsv

• Dublin Core Metadata Initiative• http://www.dublincore.org

• E-Prints DL-in-a-box• http://www.eprints.org

• XML Tools at W3C• http://www.w3.org/XML/#software