The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University...

35
The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July 27, 2004

Transcript of The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University...

Page 1: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

The Basics of OAI

An Introduction to the Protocol for Metadata Harvesting

Sarah ShreevesUniversity of Illinois at Urbana-Champaign

Basics and Beyond July 27, 2004

Page 2: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 2

Outline

What the OAI protocol is & what it is not Place in digital library infrastructure How it works (basically) Challenges for data / service providers

Page 3: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 3

OAI- PMH is a tool

Moves metadata (not content) from a data provider to a service provider (or harvester)

A set of rules that defines the communication between two systems (like FTP and HTTP)

Build once, use for many applications – a building block for digital library services

Facilitates the federation of metadata

Page 4: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 4

OAI-PMH is not….

Metadata

A search tool

A database

Open Access

Page 5: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 5

Who uses OAI?

Approximately 400 data providers

Basic building block of the National Science Digital Library (NSDL); OAIster

Incorporated into D-Space and Eprints.org

Part of CONTENTdm, Michigan’s DLXS, and other products

International use

Page 6: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 6

Basic OAI-PMH Concepts

“Aggregated search” rather than “Federated search”

Data providers – support OAI PMH as a means to expose metadata

Service providers – ‘harvests’ metadata from data providers via the OAI-PMH

OAI-PMH based upon HTTP and XML

OAI-PMH requires use of simple Dublin Core BUT supports and encourages use of other metadata schemas

Unique and Persistent Identifiers and a Datestamp for each OAI record

Page 7: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 7

AggregatedMetadata

Dig.Man

a Sys.

OA

I D

ata

Pro

vid

er

DataBase

OA

I D

ata

P

rovid

er

XML files

OA

I D

ata

Pro

vid

er

OAI Request

OAI Response

OAI Request

OAI Response

OAI Response

OAI Request

OAI Data Provider

SERVICES

O

A

I

H

A

R

V

E

S

T

E

R

Page 8: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 8

Examples of OAI Service Providers

OAIster: http://oaister.umdl.umich.edu/o/oaister/

Engineering, Computer Science, and Physics: http://g118.grainger.uiuc.edu/engroai/

Open Language Archives Community:http://www.language-archives.org/

Page 9: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 9

How OAI Works (Technically)

6 distinct ‘verbs’ or requests

OAI requests are sent via HTTP

Responses are sent in valid XML

Dig.

Mngt.

Sys.

OAI

HARVESTER

OAIData

PROVIDER

Service Provider Data Provider

HTTP Request

(OAI Verb)

HTTP Response

(Valid XML)

AGGREGATED

METADATA

Page 10: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 10

An OAI Record- <record xmlns="http://www.openarchives.org/OAI/2.0/">- <header>

  <identifier>oai:docsouth.unc.edu:12</identifier>   <datestamp>2003-04-24T13:15:52Z</datestamp>   <setSpec>4</setSpec>

  </header>- <metadata>

- <oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" xmlns="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/">

  <title>Advice to Soldiers</title>   <creator>William Royal</creator>   <subject>United States -- History -- Civil War, 1861-1865 -- Religious aspects.</subject>   <subject>Confederate States of America -- Religion.</subject>   <subject>Soldiers -- Religious life -- Confederate States of America.</subject>   <subject>Soldiers -- Confederate States of America -- Conduct of life.</subject>   <subject>Confederate States of America -- Church history.</subject>   <subject>Sin.</subject>   <publisher>[Raleigh, N. C.: s. n., between 1861 and 1865]</publisher>   <date>2003-04-24T13:15:52Z</date>   <type>Text</type>   <format>text/html</format>   <identifier>http://docsouth.unc.edu/royal/royal.html</identifier>   <language>en-us</language>   </oai_dc:dc>

  </metadata>  </record>

Page 11: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 11

OAI “VERBS”

Identify

ListMetadataFormats

ListSets

ListIdentifiers

ListRecords

GetRecord

Page 12: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 12

Identify

Purpose Return general information about the archive

and its policies (e.g., datestamp granularity)

Parameters None

Sample URL http://aerialphotos.grainger.uiuc.edu/oai.asp?ve

rb=Identify

Page 13: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 13

ListSets

Purpose Provide a listing of sets in which records may be

organized (may be hierarchical, overlapping, or flat)

Parameters None

Sample URL: http://aerialphotos.grainger.uiuc.edu/oai.asp?verb

=ListSets

Page 14: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 14

ListMetadataFormats

Purpose List metadata formats supported by the archive as

well as their schema locations and namespaces

Parameters identifier – for a specific record (O)

Sample URL http://aerialphotos.grainger.uiuc.edu/oai.asp?verb

=ListMetadataFormats

Page 15: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 15

ListIdentifiers

Purpose List headers for all items corresponding to the specified

parameters Parameters

from – start date (O) and/or until – end date (O) set – set to harvest from (O) metadataPrefix – metadata format to list identifiers for

(R) resumptionToken – flow control mechanism (X)

Sample URL http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListId

entifiers&metadataPrefix=oai_dc

Page 16: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 16

GetRecord

Purpose Returns the metadata for a single item in the form of an

OAI record Parameters

identifier – unique id for item (R) metadataPrefix – metadata format for the record (R)

Sample URL http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=GetR

ecord&identifier=oai:aerialphotos.grainger.uiuc.edu:AP-1A-1-1940&metadataPrefix=oai_dc

Page 17: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 17

ListRecords

Purpose Retrieves metadata records for multiple items

Parameters from – start date (O) until – end date (O) set – set to harvest from (O) resumptionToken – flow control mechanism (X) metadataPrefix – metadata format (R)

Sample URL http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListR

ecords&metadataPrefix=oai_dc

Page 18: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 18

Other Pieces of OAI

Flow Control

Sets

Multiple metadata schemas

Page 19: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 19

Challenges for the OAI Community

Relatively recent protocol but no best practices (yet)

‘Shareablity of metadata’ Heterogeneity of items described Loss of Context / Information loss Knowledge structures differ so….

Native metadata schemas differ Controlled vocabularies differ Use and presentation of items differ

Page 20: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 20

Metadata for different communities

http://digital.lib.umn.edu/IMAGES/reference/mswp/MPW00476.jpg

Page 21: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 21

Metadata for different communities

http://images.library.uiuc.edu:8081/cgi-bin/viewer.exe?CISOROOT=/tdc&CISOPTR=746

Page 22: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 22

Loss of Context: Record in OAI aggregation

Page 23: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 23

Context: Record in native database

Page 24: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 24

Loss of context / data

Page 25: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 25

Loss of context / data

Page 26: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 26

Sense / Completeness of Metadata

identifier:http://images.umdl.umich.edu/cgi/i/image/image-idx?view=entry;subview=detail;cc=fish3ic;entryid=X-0802;viewid=1004_112

publisher: UMMZ Fish Division format: jpeg type: image subject: 1926-05-18 subject: 1926;0812;18;Trib. to Sixteen Cr. Trib. Pine River, Manistee

R.;R10W;S26; S27;JAM26-460;05;T21N;1926/05/18 language: UND description: Flora and Fauna of the Great Lakes Region;

Page 27: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 27

Page 28: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 28

Granularity of Description: Excerpt of Metadata Record Describing "Cotton coverlet with embroidered butterfly design"

Digital Image of "Cotton Coverlet with Emboridered Butterfly Design"

Description: Digital image of a single-sized cotton coverlet for a bed with embroidered butterfly design. Handmade by Anna F. Ginsberg Hayutin.

Source: Materials: cotton and embroidery floss. Dimensions: 71 in. x 86 in. Markings: top right hand corner has 1 1/2 in. x 1/2 in. label cut outs at upper left and right hand side for head board; fabric is woven in a variation of a rib weave; color each of yellow and gray; hand-embroidered cotton butterflies and flowers from two shades of each color of embroidery floss - blue, pink, green and purple and single top 20 in. bordered with blue and black cotton embroidery thread; stitches used for embroidery: running stitch, chain stitch, French knot and back stitches; selvage edges left unfinished; lower edges turned under and finished with large gray running stitches made with embroidery floss.

Format: Epson Expression 836 XL Scanner with Adobe Photoshop version 5.5; 300 dpi; 21-53K bytes. Available via the World Wide Web.

Coverage: —

Date Created: 2001-09-19 09:45:18; Updated: 20011107162451; Created: 2001-04-05; Created: 1912-1920?

Type: Image

Page 29: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 29

Granularity of Description: Excerpt of Metadata Record Describing “American Woven Coverlet”

Digital Image of "American Woven Coverlet"

Description: Materials: Textile--Multi, Pigment—Dye; Manufacturing Process: Weaving--Hand, Spinning, Dyeing, Hand-loomed blue wool and white linen coverlet, worked in overshot weave in plain geometric variant of a checkerboard pattern.Coverlet is constructed from finely spun, indigo-dyed wool and undyed linen, woven with considerable skill. Although the pattern is simpler, the overall craftsmanship is higher than 1934.01.0094A. - D. Schrishuhn, 11/19/99 This coverlet is an example of early "overshot" weaving construction, probably dating to the 1820's and is not attributable to any particular weaver. -- Georgette Meredith, 10/9/1973

Source: —

Format: 228 x 169 x 1.2 cm (1,629 g)

Coverage: Euro-American; America, North; United States; Indiana? Illinois?

Date: Early 19th c. CE

Type: cultural; physical object; original

Page 30: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 30

Range of vocabularies in use

ElementTop three used Controlled Vocabulary (% of respondents who identified C.V.)

SubjectLCSH (73%); LC TGM I (27%); AAT

(17%)

FormatLC TGM II (17%); AAT (10%); MIME

types (8%); AACR2 (8%)

TypeLC TGM II (21%); DCMI Type (13%);

AACR2 (10%)

Personal names

LC Name Authority File (67%)

Geographic names

LCSH (27%); LC Name Authority File (25%); Getty Thesaurus of Geographic Names (15%)

Page 31: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 31

Data providers can:

Create metadata for interoperability

Reusable metadata - think beyond your local users and environment

Use well structured and defined schemas; move beyond simple DC

Use and identify controlled vocabularies

Page 32: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 32

Service Providers can…

Analyze metadata and cluster and normalize some aspects

Communicate with data providers about their metadata

Custom interfaces and selective views for target audiences / domains

Page 33: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 33

Resources

OAI for beginners tutorialhttp://www.oaforum.org/tutorial/

OAI Frequently Asked Questionshttp://www.openarchives.org/documents/FAQ.html

IMLS Digital Collections and Content Projecthttp://imlsdcc.grainger.uiuc.edu/

Page 34: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 34

Recap

OAI protocol is a tool

OAI is easy - metadata is hard

Better metadata = better interoperability

Page 35: The Basics of OAI An Introduction to the Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July.

July 27, 2004 Basics and Beyond 35

Sarah Shreeves

Project CoordinatorIMLS Digital Collections and ContentUniversity of Illinois Library at Urbana-ChampaignEmail: [email protected]: 217-244-7809Website: http://imlsdcc.grainger.uiuc.edu/

Contact Information