Merritt Repository Depositing Content and Providing Access University of California Curation Center...
-
Upload
gerard-stevenson -
Category
Documents
-
view
215 -
download
2
Transcript of Merritt Repository Depositing Content and Providing Access University of California Curation Center...
Merritt RepositoryDepositing Content and Providing Access
University of California Curation Center TeamCalifornia Digital Library
July 28, 2011
UC3 Summer Webinar Series
Merritt summary
• Curation repository– Supporting long-term preservation and access– Publish, share, preserve, discover, (re-)use
• “Model free”– There are no prescriptive requirements for content genre,
format, structure, or accompanying metadata
• No service fee (for UC affiliates)– Contributors are billed only for storage, $1.04/GB/year
Cost of a physical book in offsite storage $4.62/yearCost of a digital book in HathiTrust $0.15/yearCost of a digital book in Merritt $0.06/year
Cost of a dataset in Merritt $1.00/year
For more information, review the June 9 webinarhttp://www.cdlib.org/uc3/uc3webinars.html
Master recipe
• Registration (one time) [contributor → UC3, [email protected]]
• Submission [contributor → Merritt]
• Ingest [Merritt]
• Notification [Merritt → contributor]
• Discovery/delivery [consumer → Merritt → consumer]
Registration
• Contact Perry Willett, Merritt service manager [email protected]
Submission
• User interface• METS feeder• API
manual deposits
existing DPR workflows
automated deposits
UI submission
• The submission package is always a single file
• An opportunity to supply descriptive metadata
UI submission
• The submission package is always a single file, which may be:– For a single object
• The complete object• A multi-file object in a container (zip, gzip, tar.gz)
• A multi-file object defined by a manifest
– For a batch of objects• A manifest referring to single file objects• A manifest referring to objects in containers• A manifest referring to objects defined by manifests
Manifest
• A “packing slip” for an object, providing URLs for all object’s file components– Object manifest
• Algorithm = adler32, crc32, md2, md5, sha1, sha256, sha384, sha256
• See User’s Guide and online help for more information http://merritt.cdlib.org/
fileURL | hashAlgorithm | hashValue | fileSize | fileName | mimeType...
#%checkm_0.7#%profile| http://uc3.cdlib.org/registry/ingest/manifest/mrt-ingest-manifest #%prefix | mrt: | http://merritt.cdlib.org/terms##%prefix | nfo: | http://www.semanticdesktop.org/ontologies/2007/03/22/nfo# #%fields | nfo:fileUrl | nfo:hashAlgorithm | nfo:hashValue | nfo:fileSize | nfo:fileLastModified | nfo:fileName | mrt:mimeType
http://merritt.cdlib.org/samples/call911.jpg | md5 | 47d321056e60944a06973...http://merritt.cdlib.org/samples/call911.txt | md5 | 77fe42b1055bbabe51648...
#%eof
Manifest
• A “packing slip” for a batch, providing URLs for all object’s file components– Batch manifest
• Batch of single file objects• Batch of container objects• Batch of manifest objects
• An Excel macro is available for automatically generating manifests from spreadsheets http://merritt.cdlib.org/docs/merrittManifest.xls
• See User’s Guide and online help for more information http://merritt.cdlib.org/
fileURL | hashAlgorithm | hashValue | fileSize | fileName | primaryID | localID | creator | title | date...
Metadata
• Submission form• Batch manifest• Object component: mrt-erc.txt
erc:who: Blaine, Tegan Woodwardwhat: Continuous measurements of atmospheric argon/nitrogen ...when: 2005where: ark:/20775/bb21509964
Dublin Kernel Dublin Core Element
who creator Responsible person or party
what title Content description
when date Lifecycle-meaningful date
where identifier Locally-meaningful identifier
http://dublincore.org/groups/kernel/spec/
METS feeder
• METS must conform to a profile documented in the CDL Guidelines for Digital Objectshttp://www.cdlib.org/services/dsc/contribute/docs/GDO.pdf
– METS, all referenced file components, and manifest must be web accessible
– The Merritt IP address can be provided for configuring firewall rules
• Feeder manifest
• Submission
http://url/path/mets.xmlhttp://url/path/mets.xml...
http://feeder.cdlib.org/?userID=id&authCode=passwd& accessGroupID=collection&manifestURL=manifest
API submission
Field Value
filename optional File name
file required File contents
type optional
File type:• file • batch-manifest• container • container-batch-
manifest• object-manifest • single-file-batch-
manifestprofile required Profile (supplied by UC3)
primaryIdentifier optional Primary identifier (ARK)
localIdentifier optional Local identifier
digestType optional
Message digest type:• adler-32 • sha-1• crc-32 • sha-256• md2 • sha-384• md5 • sha-512
API submission
Field optional Value
digestValue optional Message digest value (hexadecimal encoded)
creator optional Creator
title optional Title
date optional Date
note optional Descriptive note
responseForm optional Response form:• anvl• json• xhtml• xml
API submission
POST /object/ingest HTTP/1.1Host: merritt.cdlib.orgContent-type: multipart/form-data; boundary=boundary
--boundaryContent-disposition: form-data; name=“file”; filename=“filename”file--boundaryContent-disposition: form-data; name=“type”
type--boundaryContent-disposition: forma-data; name=“profile”
profile--boundary...
API submission
• cURLhttp://curl.haxx.se/
% curl –s –u user password –F “file=@manifest” -F “type=manifest-type” -F “profile=profile” -F “localIdentifier=identifier” -F “creator=creator” -F title=title” http://merritt.cdlib.org/object/ingest
Ingest
• Primary identifier– ARK (required; auto-generated by if not
supplied)
– DOI (can be optionally requested from )
• Validation
• Characterization
• SIP → AIPISO 1472, Open Archival InformationSystem (OAIS)
Notification
• You will receive two email separate notifications– Initial notification that we have received your submission,
and that it is queued for subsequent processing
– Final notification that we have fully processed your submission• UC3’s preservation commitment starts at the time of final
notification
Initial notification
From: UC3 Merritt Support [mailto:[email protected]] Sent: Thursday, July 14, 2011 3:28 PMTo: Stephen AbramsSubject: Completion of submission Completion of submission - Notification - Submission ID: bid-4ed4bf45-aa78-4da7-bb65- 63b125d88150 - Job(s):
Number of pending job(s): 1Number of completed job(s): 0Number of failed job(s): 0
- User agent: slabrams - Submission date: 2011-07-14T15:27:41-07:00 - Status: QUEUED
Completion of submission - Notification Report
- Submission ID: bid-4ed4bf45-aa78-4da7-bb65-63b125d88150 - Job(s):
- Job ID: jid-3498bef6-e296-429d-b652-da1f35f8bc04 - Primary ID: ark:/20775/bb21509964 - Local ID: http://libraries.ucsd.edu/ark:/20775/bb21509964;b4946677;umi-ucsd-1040 - Filename: manifest2.txt - Object title: Continuous measurements of atmospheric argon/nitrogen as a tracer of
air-sea heat flux : models, methods, and data
- Object creator: Blaine, Tegan Woodward - Object date: 2005 - Status: PENDING
- User agent: slabrams - Submission date: 2011-07-14T15:27:41-07:00 - Status: QUEUED
With attachment, bid-4ed4bf45-aa78-4da7-bb65-63b125d88150.txt
Final notification
From: UC3 Merritt Support [mailto:[email protected]] Sent: Thursday, July 14, 2011 3:28 PMTo: Stephen AbramsSubject: Completion of ingest Notification Summary - Submission ID: bid-4ed4bf45-aa78-4da7-bb65-63b125d88150 - Job(s):
Number of pending job(s): 0Number of completed job(s): 1Number of failed job(s): 0
- User agent: slabrams - Queue Priority: 06 - Submission date: 2011-07-14T15:27:41-07:00 - Completion date: 2011-07-14T15:27:53-07:00 - Status: COMPLETED
With attachment, bid-4ed4bf45-aa78-4da7-bb65-63b125d88150.txt
Completion of ingest - Notification Report
- Submission ID: bid-4ed4bf45-aa78-4da7-bb65-63b125d88150 - Job(s):
- Job ID: jid-3498bef6-e296-429d-b652-da1f35f8bc04 - Primary ID: ark:/99999/fk4vm4kg6 - Local ID: ark:/20775/bb21509964 - Version: 3 - Filename: manifest2.txt - Object title: Continuous measurements of atmospheric argon/nitrogen as a tracer of air-sea heat flux : models, methods,
and data - Object creator: Blaine, Tegan Woodward - Object date: 2005 - Object state: http://store-stage.cdlib.org:35121/state/2111/ark%3A%2F99999%2Ffk4vm4kg6?t=xhtml - Submission date: 2011-07-14T15:27:46-07:00 - Completion date: 2011-07-14T15:27:53-07:00 - Status: COMPLETED
- User agent: slabrams - Queue Priority: 06 - Submission date: 2011-07-14T15:27:41-07:00 - Completion date: 2011-07-14T15:27:53-07:00 - Status: COMPLETED
Discovery/delivery
• Search
Discovery/delivery
• Search
Discovery/delivery
• Search
Discovery/delivery
• Browse
Discovery/delivery
• Browse
Coming soon …
• Enhanced characterization– JHOVE2
http://jhove2.org/
• Faceted search/browse– XTF (the technology behind )
http://xtf.cdlib.org/
• Investigation of CMS/DAMS-like function through integration with …– Islandora/Drupal (in cooperation with UCLA)
– Alfresco (in cooperation with UCB)
– Omeka (in cooperation with UCSC)
Questions?
Upcoming webinars
Date/time TopicThursday, August 112:00 pm
EZID: Create and Manage Persistent IdentifiersJoan Starr, UC3/CDL
Thursday, August 252:00 pm
DCXL (Data Curation Excel)Carly Strasser, UC3/CDL
Thursday, Sept. 222:00 pm
Data Management Planning ToolPatricia Cruse/Tracy Seneca, UC3/CDL
http://www.cdlib.org/uc3/uc3webinars.html
For more information
UC Curation Centerhttp://www.cdlib.org/uc3http://www.cdlib.org/uc3/[email protected]
Stephen Abrams David LoyLisa Colvin Mark Reyes Patricia Cruse Abhishek SalveScott Fisher Tracy Seneca Erik Hetzner Joan StarrGreg Janée Carly StrasserJohn Kunze Marisa StrongMargaret Low Perry Willett
UC3 webinar serieshttp://www.cdlib.org/uc3/uc3webinars.html
Merritt repositoryhttp://merritt.cdlib.org/ http://merritt.cdlib.org/helphttp://merritt.cdlib.org/docs/merritt_handout.pdfhttp://merritt.cdlib.org/docs/merritt_user_guide.pdf