Edward A. Fox fox@vt CS DLRL Virginia Tech, Blacksburg, VA, USA

Post on 20-Jan-2016

37 views 0 download

description

The OAI PMH (Open Archives Initiative Protocol for Metadata Harvesting) MetaScholar Initiative All-Project Meeting Atlanta, GA 6/18/2002. Edward A. Fox fox@vt.edu CS DLRL Virginia Tech, Blacksburg, VA, USA. Acknowledgements. - PowerPoint PPT Presentation

Transcript of Edward A. Fox fox@vt CS DLRL Virginia Tech, Blacksburg, VA, USA

The OAI PMH(Open Archives Initiative

Protocol forMetadata Harvesting)

MetaScholar InitiativeAll-Project Meeting

Atlanta, GA 6/18/2002

Edward A. Foxfox@vt.edu

CS DLRLVirginia Tech, Blacksburg, VA, USA

Acknowledgements

• Sponsors: Mellon Foundation, SOLINET, NSF, DLF, CNI, UK’s JISC, Virginia’s CIT, …

• OAI Team: Steering Committee, Technical Committee, Developers, Data Providers, Service Providers

• Emory Team, Partners around Southeast• VT Colleagues: Hussein Suleman, Rohit

Kelapure, Ming Luo, Ryan Richardson, Marcos Goncalves, Priya Shivakumar, Baoping Zhang, students working on term projects, …

Contents

• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion

Open Archives Initiative

OAIwww.openarchives.org

openarchives@openarchives.org

Open Archives Initiative (OAI)• xxx@LANL, high-energy physics (Ginsparg, 1991)• CSTR + WATERS = NCSTRL (Lagoze,1994)• xxx + NCSTRL = CoRR collaboration (1998)• Universal Preprint Service protoproto, Oct. 21-22, 1999,

Santa Fe – led by LANL, CNI, DLF, Mellon --> OAi• Santa Fe Convention (see Feb 2000 D-Lib Magazine article)• Archives -> Open Archives

• Support unique archive identifiers• Implement metadata set(s) (DC, using XML)• Implement OA harvesting protocol• Register the archive

• Build tools, layer other services: linking, searching, …

OAi Philosophy

• Self-archiving = submission mechanism• Long-term storage system = archive• Open interface = harvesting mechanism• Data provider + service provider• Start with “gray literature”

• e-prints/pre-prints, reports, dissertations, …

Began as “archives of the world unite!”

OAI

Open Archives (protoproto)

• ArXiv & Los Alamos National Lab• CogPrints & U. Southampton• NACA & NASA (reports)• NCSTRL & Cornell U.• NDLTD & Virginia Tech• RePEc & U. Surrey• Total of around 200K records

Original Open Archives Members

• American Physical Society

• California Digital Library

• Caltech

• Coalition for Networked Info.

• Cornell University

• Harvard University

• Library of Congress

• Los Alamos Nat’l Lab

• Mellon Foundation

• NASA Langley Research Cntr

• Old Dominion University

• Stanford University

• U. of Ghent

• U. of Surrey

• U. of Southampton

• Vanderbilt University

• Virginia Tech

• Washington University

Contents

• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion

Now is a Technical Umbrella forPractical Interoperability…

ReferenceLibraries

PublishersE-Print

Archives

…that can be exploited by different communities

Museums

DiscoveryCurrent

AwarenessPreservation

Service Providers

Data Providers

Meta

data

harv

estin

g

The World According to OAI

Aggregation throughOAI Harvesting –

Black Box Perspective

OA 1

OA 2

OA 4

OA 3

OA 5OA 6

OA 7

Aggregation throughOAI Harvesting –By Organization

Theology

Emory

GA

UGA

U FLUTK

AmSo

Library

Aggregation throughOAI Harvesting –

By Topic

Confederate Constitution

Civil War

History

Oral

SportsCulture

AmSo

Diaries

Approaches to Aggregation

Build ByDiscipline

Build By Institution

Types of Access Possible

Build ByDiscipline

Build By Institution

YearCategoryPersonageAuthorGenreQuery …

OAI Repository

Required: Protocol

DODO DO DO

MDO

MDO MDOMDOMDO

MDOMDOMDO

Metadata vs. Data

• Data refers to digital objects or digital representations of objects

• Metadata is information about the objects (e.g. title, author, etc.)

• OAI focuses on metadata, with the implicit understanding that metadata usually contains useful links to the source digital objects

Metadata: Complex to Simple

MARC (>$50) Dublin Core (DC)

repository

repos i tory

OAI protocol

harves ter

supportdata

harvestingdata

items

identifiers

oai-identifier = oai:archive-identifier:record-identifier

Registered URI

Scheme

Archive Identifier:Registered within

OAI

Unique ID within archive:

(syntax is archive-specific)

example = oai:ncstrl:ncstrl.cornellcs/TR94-1418

locally unique key for extracting a record from a repository

selective harvesting - datestamps

repos i tory

harvest withindate range

record

record

selective harvesting - sets

repos i tory

harvest within setS1

recordrecord

record

S2

Summary:Protocol for Metadata Harvesting• Service Requests

• Identify• ListMetadataFormats• ListSets• GetRecord• ListIdentifiers• ListRecords

• Metadata Multiplicity• Date (and Time) Ranges• Resumption Tokens

Harvesting vs. Federation

• Competing approaches to interoperability• Federation is when services are run remotely on remote data

(e.g., federated searching)

• Harvesting is when data/metadata is transferred from the remote source to the destination where the services are located (e.g., union catalogues)

• Federation requires more effort at each remote source but is easier for the local system and vice versa for harvesting

• OAI (currently) focuses on harvesting

Contents

• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion

Example 1: Union Collection of ETDs(Electronic Theses and Dissertations,

for Networked Digital Library ofTheses and Dissertations, NDLTD)

VIRTUA

Merged Metadata Collection

MARIAN

Virginia Tech ETD Archive

Duisburg ETD

Archive

HumboldtETD

Archive

Future: recommender, …

… OAI Data Provider

OAI Service Provider

OAI Harvesting

LEGEND

Example 1: Details

NDLTD Site / Member

Local DB

OAI Server

Local Search / Brow se

Student Entry

NDLTD Central

OAI Harvester

Name Authority Service

(e.g. OCLC)

MARIAN Union

Catalog

VTLS Union Catalog

MARC DB

Virtua

Conversion

Alternate MARC Transport (f tp?) tapes?)

Librarian Verif ication / Validation / Enrichment / Maintenance

Example 2: NSDL Information ArchitectureEssentially as developed by the Technical Infrastructure

Workgroup

referenceditems &

collections

referenceditems &

collections

Special Databases

NSDLServicesNSDL

ServicesOther NSDLServices

CI Services

annotation

CI Services

discussion

CI Services

personalization

CI Services

authentication

CI Services

browsing

Core Services:information retrieval

Core Collection-Building Services

harvesting

Core Collection-Building Services

protocols

Core Services:metadata gathering

Portals &ClientsPortals &

ClientsPortals &Clients

Usage Enhancement

Collection Building

User Interfaces

NSDLCollections

NSDLCollections

NSDLCollections

CoreNSDL“Bus”

Example 2: CITIDEL -> NSDL

• Computing and Information Technology Interactive Digital Education Library

• A collection project in the National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL

• www.nsdl.nsf.gov

• www.nsdl.org

Union Metadata Repository

OAI Data

Provider

Laboratories Repository

Applets Repository

Papers Repository

Syllabi Repository

. . .

Digital Library Services

OAI Data

Harvester

Example 2: CITIDELDistributed repository structure

Example 2: NSDL Collections(themes relevant to our projects)

• Discovery of content

• Classification and cataloguing

• Acquisition and/or linking; referencing

• Disciplinary-based themes define a natural body of content, but other possibilities are also encouraged

• Software tool suites for analysis, modeling, simulation, or visualization

• Reviewed commentary on pedagogy

Contents

• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion

Open Digital LibrariesXOAI-PMH

• Dissertation work of Hussein Suleman (member of OAI technical committee)

• Extending the OAI protocol• Supporting rapid development of DLs using

networks of components• Demonstrated with NDLTD, CSTC• Described in Dec. 2001 D-Lib Magazine

article, and article scheduled for publication

Open Digital LibrariesComponents

• Running now• XML-File (data provider from file system)• Union, search, browse, recent, filter• E-journal support system

• Class projects• High performance multilingual search• Recommender• User rating

• Others discussed• Classification/categorization and browsing

Component System Approach• (Open) DL = Network of Extended OAs

Local Archive

Data Input

Remote Archive

Browse

Metadata Repository

Search Recommend

Resource Discovery

User Interface

OAI/ODL archive

OAI/ODL protocol

leg

end

Example Architecture (NDLTD)

Humboldt

Duisburg

MIT Filter

MIT

Browse

Union Catalog

Search Recent

User Interface

User Interface

OAI/ODL archive

OAI/ODL protocol

leg

end

Virginia Tech

PhysNet

CalTech

Dresden

Contents

• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion

OAI Tools

• Related resources, e.g., XML, Unicode • Submission / author support

• XML Schema Validator• Servers and utilities, e.g., ARC, Kepler, EPrints • Repository Explorer

• Interactive Browsing• Testing of parameters• Multiple views of data• Multilingual support• Automatic test suite

Author‘s toolswww.physik.uni-oldenburg.de/EPS/mmm

XSV Schema Validator

ARC (arc.cs.odu.edu)

VT Tool: Repository Explorer

• The Repository Explorer is a tool for browsing and testing Open Archives, by Hussein Suleman

• You issue commands and see the results

• You also can perform a sequence of automatic tests

• http://purl.org/net/oai_explorer

VTTool:

RE1.3

VT Tool: Request, Responsehttp://scholar.lib.vt.edu/theses/OAI/cgi-bin/index.pl?verb=GetRecord&metadataPrefix=oai_etdms&identifier=oai:VTETD:etd-520112859651791

Request

Response

Contents

• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion

What will central service look like? (1 of 2)

• Harvesting from local sites

• Rich content, drawn from all participating sites

• Data management• Logging and reporting• Repository/preservation/mirroring • Adding/updating/deleting• User interface and support for digital librarians and

data providers

What will central service look like? (2 of 2)

• Adding value• De-duping• Categorization/classification -> browsing• Normalization/standardization -> authority control• Tools for communication/collaboration/annotation

-> security/privacy

• User interface for both general users and scholars

What are needs at local sites?

• Increasing OAI expertise

• Connecting OAI with local systems

• Supporting standards, normalization

• Supporting continual updating

• Passing enhancements upstream

How can VT help? (1 of 2)

• Usability studies for central site

• Help develop consensus

• Help plan system architecture & services

• Education/training

• Provide and support tools/systems

• Help sites engage, become OAI compliant

How can VT help? (2 of 2)

• Standards• MARC-XML

• ODL Suite• Download and configure• Use in packaged forms, or re-architected

• Support• Connecting your system into OAI• Help with OAI Tools

MARC XML-DTD

• XML Transport format for US-MARC records

• Standardized metadata exchange format for traditional library services joining OAI

Contents

• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion

Conclusion• Rethink your efforts in terms of providers of

• Data, Services• Reduced work for data providers

• Tools available• Don’t need to offer services

• Reduced work for service providers• Others provide the data• Can use tools and systems for OAI, XOAI

• Results• More data becoming available• To more people• Supported by improved services

• MetaScholar can be a win-win-win project!

Links

• Open Archives Initiative• http://www.openarchives.org

• OAI Metadata Harvesting Protocol• http://www.openarchives.org/OAI/openarchivesprotocol.htm

• Virginia Tech DLRL OAI Projects• http://www.dlib.vt.edu/projects/OAI/• http://oai.dlib.vt.edu/odl

• Repository Explorer• http://purl.org/net/oai_explorer

• NDLTD• http://www.ndltd.org

More Links

• ARC Cross-Archive Search Service• http://arc.cs.odu.edu/

• XML Schema Validator• http://www.w3.org/2001/03/webdata/xsv

• Dublin Core Metadata Initiative• http://www.dublincore.org

• E-Prints DL-in-a-box• http://www.eprints.org

• XML Tools at W3C• http://www.w3.org/XML/#software