Preserving the scholarly record

31
Preserving the scholarly record Michele Kimpton CEO, DuraSpace Stu Baker Northwestern University Educause Conference October 19,2011

description

Preserving the scholarly record. Michele Kimpton CEO, DuraSpace Stu Baker Northwestern University Educause Conference October 19,2011. Who are we?. Mission. - PowerPoint PPT Presentation

Transcript of Preserving the scholarly record

Page 1: Preserving the scholarly record

Preserving the scholarly recordMichele Kimpton

CEO, DuraSpace

Stu BakerNorthwestern University

Educause ConferenceOctober 19,2011

Page 2: Preserving the scholarly record

Who are we?

Page 3: Preserving the scholarly record

Mission

We are committed to providing open source technologies and services that promote durable,

persistent access to the scholarly record.

Page 4: Preserving the scholarly record

Preservation challenges

• Ability to readily provision online storage (ideally in another geographic area, another administration)

• Synchronize content across storage systems• Audit integrity of content• Technical resources• Internal Policies

Page 5: Preserving the scholarly record

Why cloud?

Massively scalable compute and storage offered as a web based service

Page 6: Preserving the scholarly record

Digital archiving by media type

ESG white paper, Feb 2011

Page 7: Preserving the scholarly record

Survey Higher Ed, 145 responses

Pay for use

Elasticity

Cost

Lack of Local staff

Flexibility

Ease of implementation

Remot off campus storage

Scalability

0 10 20 30 40 50 60 70 80 90

Key Benefits

Responses

Page 8: Preserving the scholarly record

Survey Higher Ed, 145 responses

Transparency

Data lock-in

Admin burden SLA's

Loss of control

Performance

data security

Long term reliability

Trust 3rd party

0 10 20 30 40 50 60 70

Key Challenges

Responses

Page 9: Preserving the scholarly record

What is DuraCloud?

Digital archiving solution based on cloud infrastructureAcross multiple cloud providers

Page 10: Preserving the scholarly record

DuraCloud apps

Online Backup(s)

Image Transformation

File health check

Advanced Image Viewing

Media Streaming

Synchronization to multiple clouds …more on the roadmap

File Format Identification

Archiving and Preservation Multimedia Access

Page 11: Preserving the scholarly record

Azure (beta)

Rackspace

Amazon S3(primary)YourDuracloud.org

Virtual serverContent

Transfer Content via:- - Web User Interface

- - Sync Utility-- REST API

How does DuraCloud work?

Page 12: Preserving the scholarly record

DuraCloud security Architecture

Page 13: Preserving the scholarly record

DuraCloud Project History

10/2009Initial Pilot Program Begins

Release 0.1 Cloud

Storage Mediation11/2009

02/2010Release 0.2

Service Infrastructure

Release 0.305/2010

06/2010Release 0.4

Media Streaming

.9 releaseBeta production

06/2011

09/2010Second Pilot

program starts

Release 0.5 Open Source

Release07/2010

.7 releaseBulk services12/2010

08/2011Release 1.0

Service Lauch10/2011

Page 14: Preserving the scholarly record

Pilot Partners

University Use Case RepositoryRice U Preservation DSpace, meta archive

Hamilton College Access/international collaboration

Fedora

Northwestern U Preservation books, audio, image

Fedora

U of PEI Image access Fedora/Islandora

ICPSR Preservation Fedora

IUPUI Preservation DSpace, Content DM

Rhodes College Image Access DSpace

North Carolina State U Preservation DSpace

CARL Preservation and Services Fedora

MIT Preservation Dspace

Columbia Preservation and Services IA, Fedora

Page 15: Preserving the scholarly record

Use cases exploredDigital archiving Video streaming Data management

Image serving Collaboration

Page 16: Preserving the scholarly record

Archiving and Preservation support• Requirements met

Easy back up to primary and secondary cloudKeep backups in syncCheck health of backupsAbility to view and download filesRetrieve and restore filesWeb accessible

Page 17: Preserving the scholarly record

On the roadmap• Services/Apps

– Improved video handling– File validation– Encryption

• Storage providers– SDSC Cloud– Azure– Local Eucalyptus integration

• Simplification

Page 18: Preserving the scholarly record

Key Findings from Pilots

• Biggest challenge was transferring data from content owner

• Extremely difficult to compare costs• Integration with local systems• Biggest value of DuraCloud was simplicity and

abstraction of individual cloud stores

Page 19: Preserving the scholarly record

Pilot partner comments

• “The ease-of-use of DuraCloud is its most impressive feature.”~Columbia

• “DuraCloud delivers the benefits of a diverse network of storage locations, but without the overhead of managing different vendors.”~ICPSR

• “One of the best things that DuraCloud does is bridge the complex gap between normal users and cloud-based services, and it does this job very well.”~Rhodes College

Page 20: Preserving the scholarly record

What we learned about the cloud

• Durability- No loss of data over 2 year period, 15+ accounts, 15+ TB BUT no published SLA or policy

• Eventual consistency• Data transfer rates over http, 1 GB realistic• Large file handling evolving- 5 GB chunk size• Verification of content via check sum depends

on provider• No standard API’s yet across providers• Instability of cloud providers ( Atmos,Sun)

Page 21: Preserving the scholarly record

Reaching for the Cloud

EDUCAUSE 2011Stu Baker

[email protected]

Page 22: Preserving the scholarly record

On the docket

• Northwestern’s computing environment• Digital Preservation Strategies• Duracloud pilot• Cloud Challenges

Page 23: Preserving the scholarly record

4,930,613 Books 208,757 Maps 601,153 Images 66,977 Audio 33,882 Film and video

98,844 Journals and serials

22 TB Unique digital content 84 Web sites750 Workstations59 Servers

NORTHWESTERN UNIVERSITY LIBRARY

Page 24: Preserving the scholarly record
Page 25: Preserving the scholarly record

Moving towards a strategy

• Preservation risk assessment, metadata extraction, format migration planning, object identification and validation

• Digital Preservation Services and Workflow• Scalable Architecture• Hybrid storage strategy• Re-allocation of staff expertise and local technology

Page 26: Preserving the scholarly record

DuraCloud pilot

• Dark archive• Locally digitized

books• Fedora repository

 For a 300 page book:

o 300 camera jpegso 300 edited jpegso 300 jp2so 300 pdfs o 300 OCR txt fileo 300 OCR XML files

Page 27: Preserving the scholarly record

Cloud Challenges

• Service Provider Security Assessment • Addressing PII, PHI, or other sensitive data concerns• More sophisticated access control provisions• Determining lifecycle costs for preservation;

Consortial deals?• Tiered mass storage strategy; How far do we go?

Page 28: Preserving the scholarly record

Upcoming Webinars

Using DuraCloud for Archiving and PreservationNovember 2nd at 1pm ET

Technical Overview of DuraCloudNovember 16th at 1pm ET

DSpace and DuraCloud integrationFedora and DuraCloud integration

November/December days/times TBA

Page 29: Preserving the scholarly record

Free DuraCloud Trial Accounts

Page 30: Preserving the scholarly record

Where can I find out more?• Web site:

www.duracloud.org

• Email:[email protected]

Page 31: Preserving the scholarly record

The open source platform

• Available for organizations to run locally• Requires cloud ( commercial or private) infrastructure• Being tested by several organizations now • Looking for organizations that want to enable their

“software as a service” in DuraCloud• For more details visit the wiki at: https://wiki.duraspace.org/display/duracloud/