DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of...

51
DCAPE Project Update Richard Marciano Chien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management Services SALT SALT
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    212
  • download

    0

Transcript of DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of...

Page 1: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

DCAPE Project Update

Richard Marciano Chien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management Services SALT SALT

Page 2: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

NHPRC Issued a Call…

Design a digital preservation service with a business model for the archival community

Fill the needs of archival repositories that cannot build and sustain their own electronic records archive

Page 3: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

DCAPE Project

Distributed Custodial Archival Preservation Environments Project was funded by NHPRC in 2008 (RE10010-08)

Officially started in December 2008 Project extended through April 2012 http://www.dcape.org/

Page 4: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

What is Distributed Custodial Preservation?

Physical custody of archival collections is distributed outside of the archival repository to a trusted preservation service

Archival repository retains legal custody Archival repository remains responsible for

archival functions, including preservation and access

Access to collections is controlled by archival repository

Page 5: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

DCAPE Partners 28 people across 9 institutions and 2 staff at UNC,

for a total of 32 participants Cultural Entity: Getty Research Institute Cyberinfrastructure: West Virginia University,

Carleton University (Canada) State Archives: California, Kansas, Michigan,

Kentucky, North Carolina, New York State Library: North Carolina University Archives: Tufts UNC: School of Information and Library Science

(SILS), Sustainable Archives and Leveraging Technologies (SALT)

Page 6: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

DCAPE Goals Build a preservation environment that meets the needs of archival repositories for trusted archival preservation services. Services are based on policies (rules) that are defined by the archivist

Over 250 rules have been developed for the iRODS library that can be leveraged for DCAPE

A series of rules might “look” like this: When files are ingested, replicate them in

three different locations and run a checksum on each file. Bit-check files every month. Send an alert about any changes to the files.

Page 7: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

DCAPE Goals

The trusted digital repository infrastructure will be assembled from state-of-the-art rule-based data management systems, commodity storage systems, and sustainable preservation services.

The software infrastructure will automate many of the administrative tasks associated with the management of archival repositories.

Tasks will include: authentication, replication, migration, obsolete file management, preservation metadata management, etc.

Page 8: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

Project Tasks Execute service agreements between UNC

and partners to govern use of the test collections.

Define rules and services (organized according to the OAIS framework) for iRODS to perform on test collections.

Ingest test collections into iRODS and validate the rules and services.

Develop business model (including costs) for sustaining a repository service based on iRODS.

Develop model service agreements that define the standard and optional services of the repository.

Page 9: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 10: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 11: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 12: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 13: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

Role of iRODS

Preservation environment provides rule-based automation of archival functions (repeatable services)

Standard and optional services will be available

Shared service should reduce costs for each archival repository compared to the cost of building in-house preservation capabilities

Page 14: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

Life Cycle of Data

Virtual Loading Dock

PreservationArea

SIP AIP DIP ReferenceRoom

DIP

Page 15: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

DCAPE Framework

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP ReferenceRoom

R1 R2

DIP

Page 16: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

DCAPE Capabilities

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP

1 102, 3, 4, 5, 6, 7, 8

11, 12, 13, 14, 16, 17, 18, 19, 20, 21,

22, 23

25, 26 24

15

ReferenceRoom

R1 R2

DIP

Page 17: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP

1 102, 3, 4, 5, 6, 7, 8

11, 12, 13, 14, 16, 17, 18, 19, 20, 21,

22, 23

25, 26 24

15

ReferenceRoom

R1 R2

DIP

DCAPE Capabilities

Replication

Page 18: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

Sample RulesampleRule||delayExec(<PLUSET>1m</PLUSET><EF>2m</

EF>,assign(*path,/samplePath)##msiMakeGenQuery("COLL_NAME","COLL_PARENT_NAME = '*path' AND META_COLL_ATTR_NAME = 'DCAPE_COLL_TYPE' AND META_COLL_ATTR_VALUE = 'AIP'",*GenQInp)##msiExecGenQuery(*GenQInp, *GenQOut)##forEachExec(*GenQOut,msiGetValByKey(*GenQOut, "COLL_NAME",*DataObj)##msiSplitPath(*DataObj,*p,*c)##assign(*newpath,SamplePath2*c) ##msiDataObjRename(*DataObj,*newpath,1,*result)##acAddLog(Move_Collection,"*DataObj")##acCheckPolicy(*newpath,DCAPE_POLICY_REPLICA,*pResult)##ifExec((*pResult == Yes),msiCollRepl(*newpath,destRescName=resource,*status)##acAddLog(Replicate_Coll,"*newpath"),nop,nop,nop),nop),nop)|nop

Page 19: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP

1 102, 3, 4, 5, 6, 7, 8

11, 12, 13, 14, 16, 17, 18, 19, 20, 21,

22, 23

25, 26

15

ReferenceRoom

R1 R2

DIP

An Interface that is easy to manage the policies! 24

Page 20: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

Hide the technical details Show the information that archivists want to know

Be able to customize policies easily Web-based, no installation required

Interface - Requirements

Page 21: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP

1 102, 3, 4, 5, 6,

7, 8

11, 12, 13, 14, 16, 17, 18, 19, 20, 21,

22, 23

25, 26 24

15

ReferenceRoom

R1 R2

DIP

Checksum

Replication

Demo I

Page 22: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 23: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 24: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 25: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 26: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 27: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 28: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 29: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 30: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 31: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 32: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP

1 102, 3, 4, 5,

6, 7, 8

11, 12, 13, 14, 16, 17, 18, 19, 20, 21,

22, 23

25, 26 24

15

ReferenceRoom

R1 R2

DIP

Checksum & Virus Check

No Replication

Demo II

Page 33: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 34: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 35: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 36: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 37: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 38: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 39: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 40: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 41: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 42: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

DCAPE is More

More than a storage service or environment

More than a reference tool DCAPE will provide the capability for all archival repositories to fulfill their responsibility to preserve electronic records

Page 43: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

DCAPE Interface

Page 44: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

DCAPE Metadata

Follow Dublin Core model Allow customization Encourage standardization Define

Source: creator, system, archivist Level: collection, accretion, item Accessibility: internal vs. public Fields: Required vs. optional

Page 45: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

DCAPE Workflow Define functionality at each stage

Virtual Loading Dock Pre-accessioning Ingestion

Preservation Area Archival storage Data management Administration Preservation planning

Reference Room Access

Common services Management

Page 46: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Page 47: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

DCAPE Business Model

Non-profit Fees for services Fees for storage Storage and disaster prevention services

Software maintenance Access and connectivity

Page 48: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

MetaArchive Cooperative Encourage organizations to build their own preservation

infrastructures rather than outsourcing to external vendors 3 levels of membership: 3 yr commitment

Basic costs: Equipment: 1st year, $4.6K server purchase Staffing: 2% of a sys. admin’s time + POC admin + software eng. For

content ingestion preparation

Storage: $1.00 / GB / year for content stored in net. Yearly dues:

Sustaining Members: $5.5K / yr Preservation Members: $3K / yr Collaborative Members: varies

Cost scenarios: 2TB of content

Sustaining Member:

Preservation Member:

Collaborative Member:

$27.1K / 3 yrs ---> ($5.5K (membership) + $2K (space) )x 3 yrs + $4.6K (server)

$19.6K / 3 yrs ---> ($3K (membership) + $2K (space)) x 3 yrs + $4.6K (server)

$22.6K/ 3 yrs ---> ($4K (membership) + $2K (space)) x 3 yrs + $4.6K (server)

Page 49: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

Archive-It Subscription service from the Internet Archive, allowing

institutions to build and preserve collections of born digital content

Allows users to crawl, scope, catalog, manage, and browse their archived collections

Collections are hosted at the IA data center and are available through URL and full-text search

a minimum of 2 copies of each collection are kept online

Cost Scenarios

Page 50: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

Storage Cost Model Scenarios

1. Question: What is the yearly charge for a customer with 4,000 files and 1.5 TB of storage,assuming the need for two copies – one on disk and one on tape (iRODS)?

2. Question: What is the yearly cost of 6 million files (web crawl scenario) and 1 TB of storage,assuming the need for two tape copies (using iRODS)?

3. Question: What is the yearly cost of 100,000 files and 20 TB of storage with two tape copies (using iRODS)?

Answer: $2,900 + $1,400 x 1.5 = $5,000

Answer: $2,900 + $550 + 6 x $870 + $5,165 = $13,835

Answer: $2,900 + 20 x $550 + 0.1 x $870 + $5,165 = $19,152

Page 51: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.

DCAPE Project

http://dcape.org