H erbert V an de S ompel Los Alamos National Laboratory – Research Library Carl Lagoze
description
Transcript of H erbert V an de S ompel Los Alamos National Laboratory – Research Library Carl Lagoze
herbert van de sompel & carl lagoze
Herbert Van de Sompel Los Alamos National Laboratory – Research Library
Carl LagozeCornell University – Computer Science
the OAI Protocol for Metadata Harvesting
an update
herbert van de sompel & carl lagoze
origins & evolution of OAI-PMH
process leading to OAI-PMH v.2.0
what’s new in OAI-PMH v.2.0?
what’s next?
herbert van de sompel & carl lagoze
evolution towards OAI-PMH v.2.0
OAI-PMH 1.0 [01/2001]
OAI-PMH 2.0 [06/2002]
Santa Fe Convention [02/2000]
herbert van de sompel & carl lagoze
about eprintsdocument
like objectsresources
metadata OAMSunqualifiedDublin Core
unqualifiedDublin Core
transport HTTP HTTP HTTP
responses XML XML XML
requests HTTP GET/POST HTTP GET/POST HTTP GET/POST
verbs Dienst OAI-PMH OAI-PMH
nature experimental experimental stable
modelmetadataharvesting
metadataharvesting
metadataharvesting
Santa Feconvention
OAI-PMHv.1.0/1.1
OAI-PMHv.2.0
herbert van de sompel & carl lagoze
Santa Fe Convention [02/2000]
• goal: optimize discovery of e-prints
• input:
• the UPS prototype
• RePEc data provider / service provider model
• Dienst protocol
• deliberations at Santa Fe meeting [10/99]
herbert van de sompel & carl lagoze
Santa Fe Convention [02/2000]
• low-barrier interoperability specification
• metadata harvesting model: data provider / service provider
• focus on eprints (e.g. OAMS format)
• Dienst subset
• HTTP based
• XML responses
• experimental
herbert van de sompel & carl lagoze
OAI-PMH v.1.0 [01/2001]
• goal: optimize discovery of document-
like objects
• input:• SFC• DLF meetings on metadata harvesting• deliberations at Cornell meeting [09/00]• alpha test group of OAI-PMH v.1.0
herbert van de sompel & carl lagoze
• low-barrier interoperability specification
• metadata harvesting model: data provider / service provider
• focus on document-like objects
• autonomous protocol
• HTTP based
• XML responses
• unqualified Dublin Core
• experimental: 12-18 months
OAI-PMH v.1.0 [01/2001]
herbert van de sompel & carl lagoze
OAI-PMH v.2.0 [06/2002]
• goal: recurrent exchange of metadata
about resources between systems
• input:• OAI-PMH v.1.0• feedback on OAI-implementers• deliberations by OAI-tech [09/01 -]
• alpha test group of OAI-PMH v.2.0 [03/02 -]
herbert van de sompel & carl lagoze
• low-barrier interoperability specification
• metadata harvesting model: data provider / service provider
• metadata about resources
• autonomous protocol
• HTTP based
• XML responses
• unqualified Dublin Core
• stable
OAI-PMH v.2.0 [06/2002]
herbert van de sompel & carl lagoze
process leading to OAI-PMH v.2.0
pre-alpha phase
alpha-phase
creation of OAI-tech
beta-phase
herbert van de sompel & carl lagoze
• created for 1 year period
• charge:• review functionality and nature of OAI-PMH v.1.0
• investigate extensions
• release stable version of OAI-PMH by 05/02
• determine need for infrastructure to support broad adoption of the protocol
• communication: listserv, SourceForge, conference calls
creation of OAI-tech [06/01]
herbert van de sompel & carl lagoze
US representatives
Thomas Krichel (Long Island U) - Jeff Young (OCLC) - Tim Cole - (U of Illinois at Urbana Champaign) - Hussein Suleman (Virginia Tech) - Simeon Warner (Cornell U) - Michael Nelson (NASA) - Caroline Arms (LoC) - Muhammad Zubair (Old Dominion U) - Steven Bird (U Penn.)
European representatives
Andy Powell (Bath U. & UKOLN) - Mogens Sandfaer (DTV) - Thomas Baron (CERN) - Les Carr (U of Southampton)
OAI-tech
herbert van de sompel & carl lagoze
• review process by OAI-tech:
• identification of issues
• conference call to filter/combine issues
• white paper per issue
• on-line discussion per white paper
• proposal for resolution of issue by OAI-exec
• discussion of proposal & closure of issue
• conference call to resolve open issues
pre-alpha phase [09/01 – 02/02]
herbert van de sompel & carl lagoze
• creation of revised protocol document
• in-person meeting Lagoze - Van de Sompel - Nelson – Warner
• autonomous decisions
• internal vetting of protocol document
pre-alpha phase [02/02]
herbert van de sompel & carl lagoze
• alpha-1 release to OAI-tech March 1st
2002
• OAI-tech extended with alpha testers
• discussions/implementations by OAI-tech
• ongoing revision of protocol document
alpha phase [02/02 – 05/02]
herbert van de sompel & carl lagoze
• The British Library • Cornell U. -- NSDL project & e-print arXiv • Ex Libris • FS Consulting Inc -- harvester for my.OAI • Humboldt-Universität zu Berlin • InQuirion Pty Ltd, RMIT University • Library of Congress • NASA • OCLC
OAI-PMH 2.0 alpha testers (1/2)
herbert van de sompel & carl lagoze
OAI-PMH 2.0 alpha testers (2/2)
• Old Dominion U. -- ARC , DP9 • U. of Illinois at Urbana-Champaign • U. Of Southampton -- OAIA, CiteBase, eprints.org
• UCLA, John Hopkins U., Indiana U., NYU -- sheet music collection • UKOLN, U. of Bath -- RDN• Virginia Tech -- repository explorer
herbert van de sompel & carl lagoze
beta phase [05/02]
• beta release on May 1st 2002 to:
• registered data providers and service providers
• interested parties
• fine tuning of protocol document
• preparation for the release of 2.0 conformant tools by alpha testers
herbert van de sompel & carl lagoze
what’s new in OAI-PMH v.2.0?
corrections
new functionality
general changes to improve solidity of protocol
quick recap
herbert van de sompel & carl lagoze
service provider data provider
Requests
Replies
repos i tory
harves ter
6 OAI-PMH
herbert van de sompel & carl lagoze
Supporting protocol requests:• Identify• ListMetadataFormats• ListSets
Harvesting protocol requests:• ListRecords• ListIdentifiers• GetRecord
repos i tory
service provider data provider
harves ter
herbert van de sompel & carl lagoze
service provider data provider
DatestampIdentifierSet
Records
repos i tory
harves ter
herbert van de sompel & carl lagoze
general changes
• clear distinction between protocol and
periphery
• fixed protocol document
• extensible implementation guidelines:
• e.g. sample metadata formats, description containers, about containers
• allows for OAI guidelines and community guidelines
herbert van de sompel & carl lagoze
general changes
• clear separation of OAI-PMH and HTTP
• OAI-PMH error handling
• all OK at HTTP level? => 200 OK
• something wrong at OAI-PMH level? => OAI-PMH error (e.g. badVerb)
herbert van de sompel & carl lagoze
general changes
• notion of item has become prominent
• resource / item / record
• metadata can be disseminated from item
• item == identifier
• record == identifier, datestamp, metadataPrefix
herbert van de sompel & carl lagoze
general changes
• better definitions of harvester,
repository, item, unique identifier, record,
datestamp, set
• oai_dc schema builds on DCMI XML
Schema for unqualified Dublin Core
• usage of must, must not etc. as in
RFC2119
• wording on response compression
herbert van de sompel & carl lagoze
general changes
• all protocol responses can be validated
with a single XML Schema
• easier for data providers
• no redundancy in type definitions
• SOAP-ready
• clean for error handling
herbert van de sompel & carl lagoze
<?xml version="1.0" encoding="UTF-8"?><OAI-PMH><responseDate>2002-0208T08:55:46Z</responseDate> <request verb=“GetRecord”… …>http://arXiv.org/oai2</request> <GetRecord> <record> <header> <identifier>oai:arXiv:cs/0112017</identifier> <datestamp>2001-12-14</datestamp> <setSpec>cs</setSpec> <setSpec>math</setSpec> </header> <metadata> ….. </metadata> </record> </GetRecord></OAI-PMH>
response no errors
herbert van de sompel & carl lagoze
<?xml version="1.0" encoding="UTF-8"?><OAI-PMH><responseDate>2002-0208T08:55:46Z</responseDate> <request>http://arXiv.org/oai2</request><error code=“badVerb”>ShowMe is not a valid OAI-PMH verb</error></OAI-PMH>
response with error
herbert van de sompel & carl lagoze
corrections
• all dates/times are UTC, encoded in
ISO8601, Z-notation
1957-03-20T20:30:00.00Z
herbert van de sompel & carl lagoze
• idempotency of resumptionToken: return
same incomplete list when rT is reissued
• while no changes occur in the repo: strict
• while changes occur in the repo: all items
with unchanged datestamp
• expirationDate attribute for rT
corrections
herbert van de sompel & carl lagoze
• harvesting granularity
• mandatory support of YYYY-MM-DD
• optional support of YYYY-MM-DDThh:mm:ssZ
• granularity of from and until must be the
same
new functionality
herbert van de sompel & carl lagoze
• Identify more expressive
new functionality
<Identify>
<repositoryName>Library of Congress 1</repositoryName>
<baseURL>http://memory.loc.gov/cgi-bin/oai</baseURL>
<protocolVersion>2.0</protocolVersion>
<adminEmail>[email protected]</adminEmail>
<adminEmail>[email protected]</adminEmail>
<deletedRecord>transient</deletedRecord>
<earliestDatestamp>1990-02-01T00:00:00Z</earliestDatestamp>
<granularity>YYYY-MM-DDThh:mm:ssZ</granularity>
<compression>deflate</compression>
herbert van de sompel & carl lagoze
• header contains set membership of item
new functionality
<record> <header> <identifier>oai:arXiv:cs/0112017</identifier> <datestamp>2001-12-14</datestamp> <setSpec>cs</setSpec> <setSpec>math</setSpec> </header> <metadata> ….. </metadata> </record>
herbert van de sompel & carl lagoze
• ListIdentifiers returns headers
new functionality
<?xml version="1.0" encoding="UTF-8"?><OAI-PMH><responseDate>2002-0208T08:55:46Z</responseDate> <request verb=“…” …>http://arXiv.org/oai2</request><ListIdentifiers> <header> <identifier>oai:arXiv:hep-th/9801001</identifier> <datestamp>1999-02-23</datestamp> <setSpec>physic:hep</setSpec> </header> <header> <identifier>oai:arXiv:hep-th/9801002</identifier> <datestamp>1999-03-20</datestamp> <setSpec>physic:hep</setSpec> <setSpec>physic:exp</setSpec> </header> ……
herbert van de sompel & carl lagoze
• ListIdentifiers mandates
metadataPrefix as argument
new functionality
http://www.perseus.tufts.edu/cgi-bin/pdataprov?
verb=ListIdentifiers
&metadataPrefix=olac
&from=2001-01-01
&until=2001-01-01
&set=Perseus:collection:PersInfo
herbert van de sompel & carl lagoze
• character set for metadataPrefix and
setSpec extended to URL-safe characters
new functionality
A-Z a-z 0-9 _ ! ‘ $ ( ) + - . *
herbert van de sompel & carl lagoze
• introduction of provenance container to
facilitate tracing of harvesting history
in the periphery
<about> <provenance> <originDescription> <baseURL>http://an.oa.org</baseURL> <identifier>oai:r1:plog/9801001</identifier> <datestamp>2001-08-13T13:00:02Z</datestamp> <metadataPrefix>oai_dc</metadataPrefix> <harvestDate>2001-08-15T12:01:30Z</harvestDate> </originDescription> <originDescription> … … … </originDescription> </provenance></about>
herbert van de sompel & carl lagoze
• introduction of friends container to
facilitate discovery of repositories
in the periphery
<description>
<Friends>
<baseURL>http://cav2001.library.caltech.edu/perl/oai</baseURL>
<baseURL>http://formations2.ulst.ac.uk/perl/oai</baseURL>
<baseURL>http://cogprints.soton.ac.uk/perl/oai</baseURL>
<baseURL>http://wave.ldc.upenn.edu/OLAC/dp/aps.php4</baseURL>
</Friends>
</description>
herbert van de sompel & carl lagoze
• revision of oai-identifier
• guidelines for collection-level and set-
level metadata
in the periphery
herbert van de sompel & carl lagoze
future
adoption
communities
OAI-PMH
herbert van de sompel & carl lagoze
• release of OAI-PMH v.2.0 [06/2002]
• no backwards compatibility with v.1.0/1.1
• stable
• migration process for registered repos
• ? formal standardization ?
• ? SOAP version ~ web services framework [SOAP, WSDL, UDDI] ?
the OAI-PMH
herbert van de sompel & carl lagoze
• proliferation of community-specific add-ons for:
• collection & set level metadata
• expressive metadata formats (e.g. qualified DC XML Schema)
• shared set-structures
• machine readable rights (about the metadata)
communities
herbert van de sompel & carl lagoze
• evolution
• from talking about OAI-PMH
• to talking about projects that use OAI-PMH
• to talking about projects and failing to mention they use OAI-PMH
=> OAI-PMH becomes part of the infrastructure
adoption
herbert van de sompel & carl lagoze
I just wanted to report what I consider an OAI success. I discovered that RLG had harvested records for two of the American Memory collections I had made available and integrated them into their Cultural Materials Initiative service without the need for a single e-mail or phone call. They reported that it was working very well for them.
[Caroline Arms, Library of Congress]
herbert van de sompel & carl lagoze
indicators of adoption of OAI-PMH
tools
structural support
service providers
data providers
herbert van de sompel & carl lagoze
• 49 registered repositories [11/2001]
• 65 registered repositories [03/2002]
• 5+ million records
• many unregistered repositories
data providers
herbert van de sompel & carl lagoze
•Arc : cross-searching of registered
repositories [Old Dominion U]
[ http://arc.cs.odu.edu ]
• OLAC: cross-searching of Language
Archive Community repositories
http://www.language-archives.org/index.html
service providers
herbert van de sompel & carl lagoze
• Scirus scientific search engine [Elsevier]
[ http://www.scirus.com ]
• my.OAI : user-tailorable cross-searching
of registered repositories [FS Consulting,
Inc.]
[http://www.myoai.com]
• growing interest from web search
engines
service providers
herbert van de sompel & carl lagoze
• Repository Explorer: interactive
exploration of repositories [Virginia Tech][ http://www.purl.org/NET/oai_explorer ]
• eprints.org: generic OAI-PMH compliant
repository software [U of Southampton][ http://www.eprints.org ]
• ALCME repository and harvester
software [OCLC][ http://alcme.oclc.org/index.html ]
OAI-PMH tools
herbert van de sompel & carl lagoze
• Kepler [Old Dominion U]
• your personal OAI data provider: Kepler
archivelet
• the Kepler service provider harvests from
archivelets that register
• archivelet downloadable
•http://www.dlib.org/dlib/april01/maly/04maly.html
exploration
herbert van de sompel & carl lagoze
• DP9 [Old Dominion U]• provides entry page to repositories for web-
crawlers
• provides bookmarkable URL for OAI record
• provides resolution of OAI identifier into
metadata
• software downloadable
exploration
herbert van de sompel & carl lagoze
• CNI & DLF support the day-to-day operation of the OAI Executive
structural support
herbert van de sompel & carl lagoze
• Metadata Harvesting Initiative of the Mellon Foundation
• NSF funded NSDL
• UK FAIR call for proposals to support disclosure of institutional assets (papers, learning materials, etc.)
• several EC projects exploring/supporting usage of OAI-PMH: TEL, Leaf, Cyclades, OA Forum, Figaro
structural support