The Open Archives Initiative Story Thomas Krichel Uni. of Surrey, Hitotsubashi Uni., Long Island...

30

Transcript of The Open Archives Initiative Story Thomas Krichel Uni. of Surrey, Hitotsubashi Uni., Long Island...

Page 1: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.
Page 2: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

The Open Archives Initiative Story

Thomas Krichelhttp://openlib.org/home/krichel

Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

Page 3: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

About this talk

• Follows essentially a historical approach• mixes in a few digital library concepts,

interrupt me if you do not get some of them• does not represent an official statement• botches together various ideas from different

people• benefited from funding by DLF, LANL,

CLIR, JISC, DINI

Page 4: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

UPS call 1999-07

• Ginsparg, Luce and Van de Sompel

“The purpose of this call is the mobilisation of a core group to work towards achieving a universal service for author-archived literature”

• emphasis on a pragmatic level of interoperability

Page 5: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

UPS protoproto

By Krichel, Nelson and Van de Sompel

found that the main problems of interoperability between eprint initiative are– poor metadata– no uniform identifier structure– unclear legal terms and conditions– lack of selective harvesting

Page 6: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

Santa Fe meeting 1999-10

• Representatives of arXiv, cogprints, Highwire, NCSTRL, NDLTD, RePEc, SLAC/SPIRES and others

• chaired by Lynch and Waters

• sponsored by CLIR, LANL and SPARC

Page 7: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

basic concepts

• “Managed” or formal e-print archive; not papers on the web

• Open e-print archive means that there is a machine interface

• “record” can be metadata or metadata & full text

• archive may be partitioned

Page 8: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

business model

• Inspired by RePEc initiative

• Separation between data providers and service providers

Many archives

Many metadata collections

Many services

Page 9: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

requirements & realisations

• Metadata harvesting (not distributed database)

• Namespace• mandatory metadata &

parallel sets• acceptable use• registration

• OA Dienst subset• full id=archive|record• OAMS and XML

transport• gentleperson’s

agreement in a provider statement

• primitive templates

Page 10: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

technical model

• Subset of Dienst protocol used by NCSTRL

• Compatible archive respond to 4 requests– List-Partitions

– List-Meta-Formats

– List-Contents (partitionspec, file-after, meta-format)

– Disseminate (fullID, meta-format, content-type)

Page 11: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

mandatory• Title • Date of Accession • Full ID • Author [R]

optional

• Display ID [R]

• Abstract

• Subject [R].

• Comment [R]

• Date for Discovery [R]

Dublin Core-ish Minimal Metadata for

selective harvesting

Page 12: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

Implementation efforts

• Implementation of Dienst subset– arXiv.org done– Cornell NCSTRL server done– WCR done– RePEc fails

• Harvesting arXiv NCSTRL for a test library

Page 13: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

Critique

• Why OAMS, not Dublin Core

• Dienst subset carries a lot of legacy to the full dienst protocol that.

Page 14: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

development in DL community

• Interest in interoperability for a long time, stated interest of the digital library federation

• trouble: two approaches– union catalogue

• causes friction

– distributed search• high entry requirement• problematic to implement

Page 15: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

Harvard meeting 2000-05

• Vision statement: SFc a new way forward for interoperability

• could the OAi develop in a more general fashion such that it can be used by different communities?

• political agenda of OAi (free access) perceived as problem

Page 16: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

San Antonio meeting 2000-06

• 45 people show broad range of interest leads to problem of not getting lost.

• View that SFc is a technical support infrastructure

• Communities in different business and contents model can adopt the framework for interoperabilty

Page 17: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

San Antonio meeting 2000-05

• Carl’s reverse bubble– First there was the OAi that made the SFc.– Now there is the SFc that is implemented by more

than the original OAi

• discussion of what changes required to the OAi – steering committee– attract funding to develop other application domain

Page 18: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

Ithaca meeting 2000 -09

• Experience gained with implementing & discussing the current SFc specs

• aim: new spec by the end of 2000

• stable for experimentation but not definite

• hope to minimise risks for implementors maximise chances for interoperability

• SFc+ to translate from eprint domain interoperability towards general domain interoperability

Page 19: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

Abstract concepts to keep

• open eprint archive --> open archive

• data provider / service provider

• archive management

• issue of records needed to be discussed OAMS confuses metadata and full text

Page 20: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

Implementation features to keep

• Metadata harvesting

• OAi namespace

• shared metadata and parallel metadata

• acceptable use

• registration of data and service providers

Page 21: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

All change please, all change...

• OAi DIENST replaced by OA protocol

• OAi ID revised

• OAMS replaced by wrapped DC

• introduction of the concept of native metadata

• generalised and marginalised partitions

• revisited registrations

Page 22: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

New OAi metadata

• Accession date to be renamed datestamp and stripped of semantic link to the records

• Full ID kept, colon used as canonical separator

• unqualified DC is mandatory, but empty DC may be returned

• introduction of the idea of native metadata

• OAMS scrapped, Krichel and Warner to lead an EPMS discussion

Page 23: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

Solution: encapsulate metadata

<oai>

<oai.fullid>dini:01</oai.fullid>

<oai:datestamp>”2000-09-21”

<oai:datestamp>

<dc xmlns:dc=“…”>

<dc.title> Someone’s paper </dc.title>

</dc>

</oai>

Page 24: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

Identifier

• Identifiers point to metadata records

• Concatenate– Case sensitive archive name– delimiter is a colon– anything internal to the archive appearing after

that

• prefixed by OAI as a pointer to a resolution mechanism

Page 25: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

Sets • replace partitions• ONLY for a local community to implement

selective harvesting • there can be zero or more sets in an archive• records can exist at interior nodes in the set

hierarchy• asking for records in a set returns records in

the set and in all its subsets.

Page 26: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

OA protocol

• Identify (no arguments, no exceptions)

• ListMetadataFormats ([fullId]), response is the same as for the SFc

• ListSets (no arguments, empty response ok)

• ListRecord ([Sets] colon as separator)

Page 27: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

OA protocol

• ListContents ([sets][recordbefore] [recordafter][metaformat])– response as before but may contain– resumption token (set,recordbefore,recordafter)– errors 206,503,302

• GetRecord (fullId)– response as before– error 404

Page 28: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

Encoding via cgi

• General syntax

baseurl?verb=verbname&argname=argval...

• baseurl is the location of the OA v1 protocol as registered at openarchives.org

• verbname is the name of the verb

• argname is the name of the attribute

• argval is the value of the attribute

Page 29: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

Registration of archives• Metadata format registration as now, names alphanumeric and

underscore• Self-description introduced in the OA protocol through the

identify verb • Fields of data provider templates

– Natural language name– description url– archive id– maintainer (of OA interface) email– version of OA protocol used– OA base url

Page 30: The Open Archives Initiative Story Thomas Krichel  Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.

Conclusion• After the Ithaca work, the OAi is set for another time of

testing, with a broader set of tests rather than at the first time.

• Many ideosyncracies of the old SFc have been removed, and that will increase the overall acceptability.

• The new version one of the OAi protocol may be a bit more complicated than the SFc, but a lot more sound.

• It still is not definite.