Edward A. Fox fox@vt CS DLRL Virginia Tech, Blacksburg, VA, USA
description
Transcript of Edward A. Fox fox@vt CS DLRL Virginia Tech, Blacksburg, VA, USA
The OAI PMH(Open Archives Initiative
Protocol forMetadata Harvesting)
MetaScholar InitiativeAll-Project Meeting
Atlanta, GA 6/18/2002
Edward A. [email protected]
CS DLRLVirginia Tech, Blacksburg, VA, USA
Acknowledgements
• Sponsors: Mellon Foundation, SOLINET, NSF, DLF, CNI, UK’s JISC, Virginia’s CIT, …
• OAI Team: Steering Committee, Technical Committee, Developers, Data Providers, Service Providers
• Emory Team, Partners around Southeast• VT Colleagues: Hussein Suleman, Rohit
Kelapure, Ming Luo, Ryan Richardson, Marcos Goncalves, Priya Shivakumar, Baoping Zhang, students working on term projects, …
Contents
• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion
Open Archives Initiative (OAI)• xxx@LANL, high-energy physics (Ginsparg, 1991)• CSTR + WATERS = NCSTRL (Lagoze,1994)• xxx + NCSTRL = CoRR collaboration (1998)• Universal Preprint Service protoproto, Oct. 21-22, 1999,
Santa Fe – led by LANL, CNI, DLF, Mellon --> OAi• Santa Fe Convention (see Feb 2000 D-Lib Magazine article)• Archives -> Open Archives
• Support unique archive identifiers• Implement metadata set(s) (DC, using XML)• Implement OA harvesting protocol• Register the archive
• Build tools, layer other services: linking, searching, …
OAi Philosophy
• Self-archiving = submission mechanism• Long-term storage system = archive• Open interface = harvesting mechanism• Data provider + service provider• Start with “gray literature”
• e-prints/pre-prints, reports, dissertations, …
Began as “archives of the world unite!”
OAI
Open Archives (protoproto)
• ArXiv & Los Alamos National Lab• CogPrints & U. Southampton• NACA & NASA (reports)• NCSTRL & Cornell U.• NDLTD & Virginia Tech• RePEc & U. Surrey• Total of around 200K records
Original Open Archives Members
• American Physical Society
• California Digital Library
• Caltech
• Coalition for Networked Info.
• Cornell University
• Harvard University
• Library of Congress
• Los Alamos Nat’l Lab
• Mellon Foundation
• NASA Langley Research Cntr
• Old Dominion University
• Stanford University
• U. of Ghent
• U. of Surrey
• U. of Southampton
• Vanderbilt University
• Virginia Tech
• Washington University
Contents
• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion
Now is a Technical Umbrella forPractical Interoperability…
ReferenceLibraries
PublishersE-Print
Archives
…that can be exploited by different communities
Museums
DiscoveryCurrent
AwarenessPreservation
Service Providers
Data Providers
Meta
data
harv
estin
g
The World According to OAI
Aggregation throughOAI Harvesting –
Black Box Perspective
OA 1
OA 2
OA 4
OA 3
OA 5OA 6
OA 7
Aggregation throughOAI Harvesting –By Organization
Theology
Emory
GA
UGA
U FLUTK
AmSo
Library
Aggregation throughOAI Harvesting –
By Topic
Confederate Constitution
Civil War
History
Oral
SportsCulture
AmSo
Diaries
Approaches to Aggregation
Build ByDiscipline
Build By Institution
Types of Access Possible
Build ByDiscipline
Build By Institution
YearCategoryPersonageAuthorGenreQuery …
OAI Repository
Required: Protocol
DODO DO DO
MDO
MDO MDOMDOMDO
MDOMDOMDO
Metadata vs. Data
• Data refers to digital objects or digital representations of objects
• Metadata is information about the objects (e.g. title, author, etc.)
• OAI focuses on metadata, with the implicit understanding that metadata usually contains useful links to the source digital objects
Metadata: Complex to Simple
MARC (>$50) Dublin Core (DC)
repository
repos i tory
OAI protocol
harves ter
supportdata
harvestingdata
items
identifiers
oai-identifier = oai:archive-identifier:record-identifier
Registered URI
Scheme
Archive Identifier:Registered within
OAI
Unique ID within archive:
(syntax is archive-specific)
example = oai:ncstrl:ncstrl.cornellcs/TR94-1418
locally unique key for extracting a record from a repository
selective harvesting - datestamps
repos i tory
harvest withindate range
record
record
selective harvesting - sets
repos i tory
harvest within setS1
recordrecord
record
S2
Summary:Protocol for Metadata Harvesting• Service Requests
• Identify• ListMetadataFormats• ListSets• GetRecord• ListIdentifiers• ListRecords
• Metadata Multiplicity• Date (and Time) Ranges• Resumption Tokens
Harvesting vs. Federation
• Competing approaches to interoperability• Federation is when services are run remotely on remote data
(e.g., federated searching)
• Harvesting is when data/metadata is transferred from the remote source to the destination where the services are located (e.g., union catalogues)
• Federation requires more effort at each remote source but is easier for the local system and vice versa for harvesting
• OAI (currently) focuses on harvesting
Contents
• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion
Example 1: Union Collection of ETDs(Electronic Theses and Dissertations,
for Networked Digital Library ofTheses and Dissertations, NDLTD)
VIRTUA
Merged Metadata Collection
MARIAN
Virginia Tech ETD Archive
Duisburg ETD
Archive
HumboldtETD
Archive
Future: recommender, …
… OAI Data Provider
OAI Service Provider
OAI Harvesting
LEGEND
Example 1: Details
NDLTD Site / Member
Local DB
OAI Server
Local Search / Brow se
Student Entry
NDLTD Central
OAI Harvester
Name Authority Service
(e.g. OCLC)
MARIAN Union
Catalog
VTLS Union Catalog
MARC DB
Virtua
Conversion
Alternate MARC Transport (f tp?) tapes?)
Librarian Verif ication / Validation / Enrichment / Maintenance
Example 2: NSDL Information ArchitectureEssentially as developed by the Technical Infrastructure
Workgroup
referenceditems &
collections
referenceditems &
collections
Special Databases
NSDLServicesNSDL
ServicesOther NSDLServices
CI Services
annotation
CI Services
discussion
CI Services
personalization
CI Services
authentication
CI Services
browsing
Core Services:information retrieval
Core Collection-Building Services
harvesting
Core Collection-Building Services
protocols
Core Services:metadata gathering
Portals &ClientsPortals &
ClientsPortals &Clients
Usage Enhancement
Collection Building
User Interfaces
NSDLCollections
NSDLCollections
NSDLCollections
CoreNSDL“Bus”
Example 2: CITIDEL -> NSDL
• Computing and Information Technology Interactive Digital Education Library
• A collection project in the National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL
• www.nsdl.nsf.gov
• www.nsdl.org
Union Metadata Repository
OAI Data
Provider
Laboratories Repository
Applets Repository
Papers Repository
Syllabi Repository
. . .
Digital Library Services
OAI Data
Harvester
Example 2: CITIDELDistributed repository structure
Example 2: NSDL Collections(themes relevant to our projects)
• Discovery of content
• Classification and cataloguing
• Acquisition and/or linking; referencing
• Disciplinary-based themes define a natural body of content, but other possibilities are also encouraged
• Software tool suites for analysis, modeling, simulation, or visualization
• Reviewed commentary on pedagogy
Contents
• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion
Open Digital LibrariesXOAI-PMH
• Dissertation work of Hussein Suleman (member of OAI technical committee)
• Extending the OAI protocol• Supporting rapid development of DLs using
networks of components• Demonstrated with NDLTD, CSTC• Described in Dec. 2001 D-Lib Magazine
article, and article scheduled for publication
Open Digital LibrariesComponents
• Running now• XML-File (data provider from file system)• Union, search, browse, recent, filter• E-journal support system
• Class projects• High performance multilingual search• Recommender• User rating
• Others discussed• Classification/categorization and browsing
Component System Approach• (Open) DL = Network of Extended OAs
Local Archive
Data Input
Remote Archive
Browse
Metadata Repository
Search Recommend
Resource Discovery
User Interface
OAI/ODL archive
OAI/ODL protocol
leg
end
Example Architecture (NDLTD)
Humboldt
Duisburg
MIT Filter
MIT
Browse
Union Catalog
Search Recent
User Interface
User Interface
OAI/ODL archive
OAI/ODL protocol
leg
end
Virginia Tech
PhysNet
CalTech
Dresden
Contents
• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion
OAI Tools
• Related resources, e.g., XML, Unicode • Submission / author support
• XML Schema Validator• Servers and utilities, e.g., ARC, Kepler, EPrints • Repository Explorer
• Interactive Browsing• Testing of parameters• Multiple views of data• Multilingual support• Automatic test suite
Author‘s toolswww.physik.uni-oldenburg.de/EPS/mmm
XSV Schema Validator
ARC (arc.cs.odu.edu)
VT Tool: Repository Explorer
• The Repository Explorer is a tool for browsing and testing Open Archives, by Hussein Suleman
• You issue commands and see the results
• You also can perform a sequence of automatic tests
• http://purl.org/net/oai_explorer
VTTool:
RE1.3
VT Tool: Request, Responsehttp://scholar.lib.vt.edu/theses/OAI/cgi-bin/index.pl?verb=GetRecord&metadataPrefix=oai_etdms&identifier=oai:VTETD:etd-520112859651791
Request
Response
Contents
• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion
What will central service look like? (1 of 2)
• Harvesting from local sites
• Rich content, drawn from all participating sites
• Data management• Logging and reporting• Repository/preservation/mirroring • Adding/updating/deleting• User interface and support for digital librarians and
data providers
What will central service look like? (2 of 2)
• Adding value• De-duping• Categorization/classification -> browsing• Normalization/standardization -> authority control• Tools for communication/collaboration/annotation
-> security/privacy
• User interface for both general users and scholars
What are needs at local sites?
• Increasing OAI expertise
• Connecting OAI with local systems
• Supporting standards, normalization
• Supporting continual updating
• Passing enhancements upstream
How can VT help? (1 of 2)
• Usability studies for central site
• Help develop consensus
• Help plan system architecture & services
• Education/training
• Provide and support tools/systems
• Help sites engage, become OAI compliant
How can VT help? (2 of 2)
• Standards• MARC-XML
• ODL Suite• Download and configure• Use in packaged forms, or re-architected
• Support• Connecting your system into OAI• Help with OAI Tools
MARC XML-DTD
• XML Transport format for US-MARC records
• Standardized metadata exchange format for traditional library services joining OAI
Contents
• Early history• Key concepts• Examples• ODL, XOAI• OAI Tools• Technical Plan• Conclusion
Conclusion• Rethink your efforts in terms of providers of
• Data, Services• Reduced work for data providers
• Tools available• Don’t need to offer services
• Reduced work for service providers• Others provide the data• Can use tools and systems for OAI, XOAI
• Results• More data becoming available• To more people• Supported by improved services
• MetaScholar can be a win-win-win project!
Links
• Open Archives Initiative• http://www.openarchives.org
• OAI Metadata Harvesting Protocol• http://www.openarchives.org/OAI/openarchivesprotocol.htm
• Virginia Tech DLRL OAI Projects• http://www.dlib.vt.edu/projects/OAI/• http://oai.dlib.vt.edu/odl
• Repository Explorer• http://purl.org/net/oai_explorer
• NDLTD• http://www.ndltd.org
More Links
• ARC Cross-Archive Search Service• http://arc.cs.odu.edu/
• XML Schema Validator• http://www.w3.org/2001/03/webdata/xsv
• Dublin Core Metadata Initiative• http://www.dublincore.org
• E-Prints DL-in-a-box• http://www.eprints.org
• XML Tools at W3C• http://www.w3.org/XML/#software