When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and...

100
When the Rubber Hits the Road: Real-World Digital Preservation Midwest Archives Conference May 23, 2018 #MAC2018 #s302 Laura Alagna Nat Wilson Sarah Dorpinghaus Doug Boyd Michael Shallcross Dan Noonan Cinda May

Transcript of When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and...

Page 1: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

When the Rubber Hits the Road:

Real-World Digital Preservation

Midwest Archives Conference

May 23, 2018

#MAC2018 #s302

Laura AlagnaNat WilsonSarah DorpinghausDoug BoydMichael ShallcrossDan NoonanCinda May

Page 2: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Digital Preservation (in the real world)Laura Alagna, Northwestern University

@Digitized_Laura

Page 3: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Results from the 2nd NDSA Storage Survey

• The NDSA Infrastructure Working Group surveyed NDSA membership in 2011 and 2013 on digital preservation practices

• Trends revealed:• Memory institutions are continually increasing the amount

of data preserved – NDSA members surveyed nearly doubled content preserved between 2011 and 2013

• Organizations generally underestimate growth in digital content

• Surveys indicated that respondents have a strong record of mitigating against risk of disasters, but mitigating against everyday threats such as bit rot is “an opportunity for improvement”

• “General preservation practices are not always reflective of community best practice standards” (https://doi.org/10.1045/july2017-gallinger)

Page 4: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

“Beyond the Repository”

• Team from Northwestern and UC-San Diego conducted a survey on digital preservation in 2017 as part of an IMLS grant

• We widely distributed the survey and received 170 complete responses

• Trends revealed:• Significant variety in how digital preservation is conducted• More than 90% reported preserving a terabyte or more• No one digital preservation system was used by a majority of

respondents• Survey responses cite a number of barriers to achieving

better digital preservation practices, including limited funding, staff, or expertise, as well as lack of buy-in from administrators and limitations in technology

• “This is so messy…” • (https://doi.org/10.21985/N28M2Z/)

Page 5: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

If only…

Page 6: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

When the Rubber Hits the Road: Real-World Digital Preservation

Nat Wilson Carleton College#MAC2018 #s302

Page 7: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Digital Preservation Planning - 5 years later

Ideal vs realistic goals

Started work at Carleton in 2010

Page 8: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Digital Preservation Planning - 5 years later

Ideal vs realistic goals

Started work at Carleton in 2010

Page 9: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Digital Preservation Planning - 5 years later

No program in place for digital archives at Carleton

Framework for preservation planning in 2012

Implementing the plan for a little over 5 years

Page 10: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

First Things First - Policy Planning in 2012

Starting from scratch

Strong support from institution

Worked closely with Library Director from the beginning

● Director was very invested in the process and supportive of the Archives

● Extensive experience with strategic planning

Page 11: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

First Things First - Policy Planning in 2012

Goals:

● Widely applicable● Scalable ● Prioritizes records and adjusts practices accordingly

○ Efficient allocation of resources

Page 12: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 13: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 14: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 15: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Tiered approach / Triage

● 3 Levels for records in the archive● Higher tiered items received more care● Placement determined by numerous factors

Page 16: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Tiered approach / Triage

Factors that increased a record’s ranking.

● High value to institutional record● Cost of replacement in the event of loss● Heavily used

Page 17: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Tiered approach / Triage

Factors that lowered a record’s ranking.

● Cost of preservation● Low to moderate value to institution● Hardware or software dependencies

Page 18: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 19: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 20: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 21: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 22: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Framework from 2012

Page 23: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Progress as of 2018

Page 24: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Points of failure

● Some things are beyond our control○ Storage method and backup cycle provided by IT

● Some things are beyond our skill○ Checksum validation across entire archive

Page 25: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Compromise

Backup cycle and medium - whatever is provided by our IT department.

Checksum validation on 10% of holdings every 1-2 years.

● Not dependant on value.

Page 26: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Progress as of 2018

Page 27: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Progress as of 2018

Page 28: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Progress as of 2018

Page 29: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

When the Rubber Hits the Road: Real-World Digital Preservation

Nat Wilson Carleton College

[email protected]

Conceptual Framework for Digital Preservationhttps://goo.gl/Gmrjxv

#MAC2018 #s302

Page 30: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Sarah Dorpinghaus

University of Kentucky

What Goes Where?: Measures to Decrease the Costs of Digital Storage

Page 31: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Storage options

�Campus storage�LTO tape�AWS�Google Drive �DPN�Internet Archive

Page 32: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Partnerships & discounts

�Campus IT�AWS�Google Drive �Internet Archive

Page 33: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Preservation A system based on decisions of what, where, and why?

Page 34: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

What goes where?

�Digitized for access or preservation?�How many resources went into creating

the digital copy?�What quality should be kept?�How will the materials be used?�What are the risks?

Page 35: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

What goes where?

�Campus storage�LTO tape�AWS�Google Drive �DPN�Internet Archive

Page 36: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

�AIP + DIP�Complete backups�Partial backups�Complete backups�Partial backups�Newspaper + media

What goes where?

Page 37: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Document

�Document policies, workflows, locations�Develop growth estimates�Build storage costs into annual budget

Page 38: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

[email protected]@SMDorpinghaus

Page 39: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

DOUG BOYDLouie B. Nunn Center for Oral History

University of Kentucky Libraries

[email protected]

Twitter: douglasaboyd

Preserving Digital Oral Histories

Page 40: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Louie B. Nunn Center for Oral History

In 2008• 6000 interviews• Majority still analog

PRESERVING DIGITAL ORAL HISTORY

Page 41: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

In 2018• 11,000+ interviews• 100% digitized

PRESERVING DIGITAL ORAL HISTORY

Louie B. Nunn Center for Oral History

Page 42: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

In 2017•36 concurrent interviewing projects•917 new interviews accessioned in 2017• 733 Audio• 176 Video

•116 interviews conducted in the Nunn Center studio

PRESERVING DIGITAL ORAL HISTORY

Louie B. Nunn Center for Oral History

Page 43: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Audio AND Video

PRESERVING DIGITAL ORAL HISTORY

Page 44: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

PRESERVING DIGITAL ORAL HISTORY

• StorageRapid Growth100% digitizedVideo

Challenges

Page 45: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Audio AND Video

PRESERVING DIGITAL ORAL HISTORY

• Master AVCHD30 gb / hour

• Preservation Master Apple ProRes HQ100 gb / hour

Page 46: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Audio AND Video

PRESERVING DIGITAL ORAL HISTORY

• SIP Size for 1 interview could be 250 gb

Page 47: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

PRESERVING DIGITAL ORAL HISTORY

• StorageServersPreservation RepositoryCampus Tape System (HSM)

Locations

Page 48: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 49: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 50: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 51: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 52: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 53: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

LTO Tape Storage• Snapshot of entire collection (2 sets as of 12/2017) • Each new accession• 2 Simultaneous Versions Created

1 stored in archival storage1 stored offsite

PRESERVING DIGITAL ORAL HISTORY

Louie B. Nunn Center for Oral History

Page 54: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

LTO Tape Storage• Metadata: Tracking what & where• Refresh schedule• LTO-8 (obsolescence cycle)

PRESERVING DIGITAL ORAL HISTORY

Louie B. Nunn Center for Oral History

Page 55: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

LTO Tape Storage• Metadata: Tracking what & where• Refresh schedule• LTO-8 (obsolescence cycle)• $150 / 6 TB x 2

PRESERVING DIGITAL ORAL HISTORY

Louie B. Nunn Center for Oral History

Page 56: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

LTO Tape StorageSoftware

PreRollPostltfsthunderboltinventory + exported csvfixitytapes not software dependent

PRESERVING DIGITAL ORAL HISTORY

Louie B. Nunn Center for Oral History

Page 57: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 58: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 59: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 60: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 61: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 62: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 63: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 64: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 65: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 66: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):
Page 67: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

DOUG BOYDLouie B. Nunn Center for Oral History

University of Kentucky Libraries

[email protected]

Twitter: douglasaboyd

Preserving Digital Oral Histories

Page 68: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

From Here to There:Integrating Open Source Tools in Digital

Preservation Workflows

Mike Shallcrosshttp://bentley.umich.edu/

[email protected] Archives Conference Annual Meeting

March 23, 2018

Page 69: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Overview

• Key assumptions for BHL digital preservation workflow development.

• Iterations of workflow development, 1997-– Staffing– Processes and workflows – Infrastructure

• Some lessons learned

3/23/2016 69

Page 70: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Key Assumptions in BHL Workflow Development

• The most pressing/immediate issues should be addressed first.

• Workflows should be informed by standards (esp. OAIS)

if/when possible.

• Communities of practice provide value and support.

• Solutions should allow for alternative options in the future.

• We are going to make mistakes, but will hopefully improve!

3/23/2016 70

Page 71: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

1997-2009: First Steps• Staffing: – .5 FTE, limited external IT support– Limited expertise: GUI applications

• Processes and workflows:– Boutique projects – Highly specialized/manual processes

• Infrastructure:– Workspace: personal computer– Storage: optical media, website

3/23/2016 71

Page 72: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Integrations, Pt. I (2010-2011)

• Staffing:

– 2 FTE filled by grant funding (Mellon Foundation)

– Developing skills; explored 60+ tools and reviewed peer institutions’ workflows

• Processes and workflows:

– Standardize and scale digital preservation strategies (format characterization, defined migration pathways, PII detection, etc.)

– Developed a more robust—but still manual—workflow

• Infrastructure

– Workspace: Personal computers

– Storage: UM network drive (short term) & DSpace (long term)

3/23/2016 72

Page 73: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Integrations, Pt. II (2011-2013)• Staffing:– New Digital Curation Division (formed April 2011); 2 dedicated FTE.– Limited tech skills: GUI/CLI applications & basic shell scripting.

• Processes and workflows:– Automated key steps with Windows CMD.EXE shell scripts.– Microservice design (influenced by Archivematica).

• Easy to adapt scripts or introduce custom workarounds.

• Infrastructure:– Processing and backlog storage: NAS box.– Homegrown systems (EAD creation, Filemaker database for

accessions/locations).3/23/2016 73

Page 74: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Integrate This!

3/23/2016 74

Page 75: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Integrations, Pt. III (2014-2016)• Staffing– 3 FTE for Mellon grant (and additional support for ASpace implementation).– Development of technical skills (Python/Ruby, GitHub, XML parsing, etc.).– Increased collaboration and improved communication with IT staff.

• Processes and workflows: – Facilitate creation/reuse of metadata.– Streamline the ingest and deposit of content into a repository.– Introduce new appraisal and review functionality in Archivematica.

• Infrastructure– Storage (short/long term): UM network drives (with tape backup)– Web-based interfaces: ArchivesSpace (collections management, authoritative

metadata), Archivematica (ingest and packaging of content), DSpace (preservation/access).

– Exploit native APIs and SWORD protocol to transfer data/metadata among systems.3/23/2016 75

Page 76: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Hey! You got appraisal functionality in my Archivematica!

3/23/2016 76

Page 77: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Integrations, Pt. IV (2017-)

• Staffing– Archivist for Digital Curation, Archivist for Metadata and Digital Projects, and

two project archivists.– Increasing expertise with Python (https://github.com/bentley-historical-

library); biweekly, .5 hr technical skills workshops for Curation staff.• Processes and Workflows: – Developing alternative workflows for legacy removable media, A/V

materials, and donor-digitized materials as well as accessions with donor-supplied metadata.

– Using ASpace as system of record for Archive-It metadata.• Infrastructure– New ASpace API endpoints for use in digitization and location management.– Select backups in DPN.

3/23/2016 77

Page 78: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Lessons Learned• Staffing– Find opportunities for staff to grow and develop.– “No one cooks the bacon alone” (Erin O’Meara)

• Processes and workflows– Iterative approach– Scalability and flexibility – Avoid unnecessary complexity: loosely-coupled, fine-grained services.

• Infrastructure– What do you and users need? What can you sustain? – Foster relationships with IT support (internal/external and

formal/informal)– Open Source Software: interoperability, customizability, community

support, flexibility3/23/2016 78

Page 79: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

InDiPres:Distributed

Digital Preservation for the Hoosier State

“When the Rubber Hits the Road”MAC s302 / Chicago / March 23, 2018

Cinda May, Indiana State University Library

Page 80: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

The Problem

• A plethora of digitization projects

supported by LSTA grants, 2006-

2017

• Indiana Memory: 484,736

items

• Hoosier State Chronicles:

500,000+ newspaper pages

• The need for an affordable, cost-

effective digital preservation

solution for small to mid-sized

under-resourced cultural memory

organizations

Henry J. Schroder, 1914. Courtesy of Helen (Fox)

Julian, Vincennes, IN

CMay / s302

Page 81: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

The Solution, Part 1: The MetaArchive Cooperative

What is the MetaArchive Cooperative?

• A digital preservation network created and

hosted by and for memory organizations

• Established in 2004 by 6 academic libraries in

cooperation with the Library of Congress

NDIIPP Program

• Currently incorporates 15 secure, closed-

access preservation nodes and preserves

more than 200 TB of content for 60+

members

Collaborative Membership Level

• Allows new or existing consortia to join as one

entity operating a single LOCKSS server

• Cost based on 20 participating institutions

• Annual fee $2,500 + $100 per institution +

storage ($0.59/GB ; $585/TB)

https://metaarchive.org

CMay / s302

Page 82: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

The Solution, Part 2: Hoosiers for Distributed Digital Preservation

• Form a membership based group of cultural memory organizations

• The group joins the MetaArchive Cooperative as a Collaborative Member

• Indiana State Library serves as the lead institution

• Indiana State Library Foundation serves as the fiscal agent

• Indiana State University Library serves as the site of LOCKSS box (i.e. network cache) Courtesy of the Sullivan County Public Library

CMay / s302

Page 83: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Creating Indiana Digital Preservation (InDiPres)http://www.indipres.org

• Initial funding through IMLS/LSTA grants

• Working group to develop membership agreement & outline governance elements

• 8 Open Forums on digital preservation

• Foundational governance meeting

• MetaArchive membership & server purchase

• Inaugural membership meeting

• Ingest Pathways Working Group

• InDiPres Guidance Document & Technical Appendix

• Digital Preservation Policy Creation Workshop

• Ingest of members’ content into the MetaArchive Digital Preservation Network

CMay / s302

Page 84: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

InDiPres Membership Fee ScheduleBased on a Minimum of 20 Participants

Individual Participation Fee $100.00/year

Share of Server Cost : $100.00/year(3 year replacement cycle,$6,000/3yr/20=$100)

Share of MetaArchive Collaborative $125.00/yearMembership Fee

($2,500/yr/20 participants=$125)Total $325.00/year

+ Individual storage fee = $0.59/GB/year

CMay / s302

Page 85: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Current InDiPres Membership• American Legion Auxiliary• Bartholomew County Public Library• Butler University Libraries• DePauw University Libraries• Indiana State Library (Host)• Indiana State University Library (Host)• Indianapolis Public Library• Knox County Public Library• Lebanon Public Library• Private Academic Library Network of Indiana• Parke County Public Library• Logan Library, Rose Hulman Institute of Technology• Sisters of Providence of Saint Mary-of-the-Woods Archives• Sullivan County Public Library• Vigo County Historical Society• Vigo County Public Library

CMay / s302

Page 86: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

InDiPres Members’ Content Ingest Pathways & Workflow

CMay / s302

Page 87: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Challenges

• Raising awareness of the difference between preservation for access and digital preservation

• Forming a self-sustaining, collaborative organization with a governance structure

• Building the membership

• Involving members in governance committees and activities

• Providing timely ingests

• Providing skill development training

• Creating local digital preservation policies and plansCourtesy of the Vigo County Historical Society

CMay / s302

Page 88: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Thank You for Your Interest!

Iva Allison, ca1915, courtesy of Willoughby Steckley, Vincennes, IN

Cinda MayChair, Special CollectionsIndiana State University [email protected]

CMay / s302

Page 89: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Thank you!

Please fill out the survey feedback form:

Laura AlagnaNat WilsonSarah DorpinghausDoug BoydMichael ShallcrossDan NoonanCinda May

Page 90: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Good Morning! We have been in a prolonged process at THE OSU to create a preservation and access environment for our University Libraries born-digital and converted content.

"Balancing the ideal vs. the real" from "When the rubber hits the road in real-world digital preservation"

2018.03.23

Daniel W. Noonan @ the Midwest Archives Conference Annual Meeting 2018 1

Page 91: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

For years we had been reliant upon our DSpace instance known as the KnowlwedgeBank or KB., a home-grown OSU—albeit, a College of Arts and Sciences owned—platform, The Media Manager, and what we lovingly refer to as our “Dark Archive”.

However, there were (are) several issue with each of these solutions:

• The KB was designed more as an institutional repository for university scholarly output; to which we only guarantee bit level preservation. Further, it houses derivatives; and the archival and curatorial staff never warmed to the way it stored and rendered their content.

• The archival and curatorial staff liked the Media Manager and the ease with which the could upload and manage content. But, the College of Arts & Sciences decided to no longer support the system and as of just about three years ago, turned it off. Additionally, the Media Manager was constructed at a time that web browsers were not rendering TIF images, and therefor all the content was once again derivatives, and there was ZERO preservation activity going on.

• And the there is the infamous “Dark Archive” originally located on a server named DSpace4, which led to the myth that some preservation activity was happening, when in reality it was just a Secure FTP server, where a staging instance of DSpace had once been located.

"Balancing the ideal vs. the real" from "When the rubber hits the road in real-world digital preservation"

2018.03.23

Daniel W. Noonan @ the Midwest Archives Conference Annual Meeting 2018 2

Page 92: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

• And this image of the “Dark Archive” is a very apt analogy, as it was created by the dumping of a failing special collections projects share-drive into this environment. It being an sFTP server eventually led to more controlled access and its use for preservation masters, but by then it was filled with upwards of 2M items, of which a significant portion could at best be described as “provisional masters” with no real preservation activity happening.

Now nearly half of these item are “masters” to things that are in the KB – which we have yet to decide how to deal with.

In the meantime, we needed to decide on a preservation and access environment, especially to consume the content that was going to go away once the Media Manager was turned off.

This all occurred at about the same time we developed our Digital Preservation Policy Framework that led to a Master Objects Repository Task Force, which in turn led to our decision to build our preservation and access platform using Fedora and Hydra (specifically Sufia). A platform that we will be migrating this year to a Fedora-Hyrax environment.

"Balancing the ideal vs. the real" from "When the rubber hits the road in real-world digital preservation"

2018.03.23

Daniel W. Noonan @ the Midwest Archives Conference Annual Meeting 2018 3

Page 93: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

By the summer of 2015, we had the first iteration of the system—the Image Management System– that had been populated solely with content from the now defunct Media Manager. While I could spend the whole session and then some discussing the challenges that arose with this first attempt, the lingering pain-points come from decisions made to narrowly address the needs/desires of the two collecting units whose content was being migrated.

"Balancing the ideal vs. the real" from "When the rubber hits the road in real-world digital preservation"

2018.03.23

Daniel W. Noonan @ the Midwest Archives Conference Annual Meeting 2018 4

Page 94: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

Over the past couple of years I have reported at various conferences and in writing some of the processes we have gone through starting in 2013, including:

• The de-duping of content in the “Dark Archive”

• A quantification of the “Dark Archive” content by format

• The development of an Access database to analyze the content, engaging the archivists and curators in determining

• What to keep and what to toss

• Where the metadata is or if it exists

• Intellectual property rights, and

• What type of access we are allowed to provide

• Finally, we have been through several iterations of a migration prioritization that has changed based on issues that arose form the Digital Collections or DC system, itself.

"Balancing the ideal vs. the real" from "When the rubber hits the road in real-world digital preservation"

2018.03.23

Daniel W. Noonan @ the Midwest Archives Conference Annual Meeting 2018 5

Page 95: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

We developed a robust metadata application profile that serves as an example of that ideal vs real world balancing act.

In the ideal world we would include all of these 38 metadata elements or additionally as the last entry says “other”

Note I do not include the pid & cid (parent and child ID) fields in this count, as they are required only for complex objects.

(Wait a minute, with “Other” that is 39 and pid and cid make it 41; if we only had 1 more it would truly be ideal and answer all the questions in the universe – any Doug Adams fans out there?)

"Balancing the ideal vs. the real" from "When the rubber hits the road in real-world digital preservation"

2018.03.23

Daniel W. Noonan @ the Midwest Archives Conference Annual Meeting 2018 6

Page 96: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

In reality all we need is 8 metadata fields that the archivists and curators need to identify (along with pid and cid if complex object).

One way to look at this, is that this is ideal and real world for archivist and curators who have to worry about how to find time to get 1M objects moved into the DC preservation and access environment.

However, this may not be so ideal for patrons and users trying to access the information without the richer metadata context.

"Balancing the ideal vs. the real" from "When the rubber hits the road in real-world digital preservation"

2018.03.23

Daniel W. Noonan @ the Midwest Archives Conference Annual Meeting 2018 7

Page 97: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

The balancing act on the “digital preservation and access” high-wire that we continue to proceed on include:

• Lack of staffing devoted to archival and special collections metadata, although we are in the process of hiring a metadata strategist for the Libraries

• Competing priorities for archivists’, curators’ and system developers’ time

• Wading through the detritus in the “Dark Archive” so that we are truly only migrating preservation or provisional masters…even with all the de-duping and prioritization exercises we still end up with IDing stuff that should just be thrown away when we sit down and “dig-in” to a collection.

• Ongoing performance issues with the way Fedora works and how our layers of software interact with it.

• We see this specifically with complex objects where the number items within object seems to max out around 10 before we cannot access the item even to QC it. This is further exacerbated by using a data model that currently does not allow us to construct ordered complex objects. These we are hoping to

"Balancing the ideal vs. the real" from "When the rubber hits the road in real-world digital preservation"

2018.03.23

Daniel W. Noonan @ the Midwest Archives Conference Annual Meeting 2018 8

Page 98: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

ameliorate with the upcoming migration to Hyrax and other system modifications.

• A library-wide server infrastructure migration over the holidays had us locked out of uploading content to the system for more than 2 months, while our systems folks dug through three different issues that were blocking permissions. In an ideal world, we would not have waited that long, but if we loop back to the first and second bullets we were dealing with competing priorities for the time of a limited staff pool.

• As it is a somewhat homegrown system, our user documentation is minimal, and as my metadata assistant and I wade through all of this, we are discovering a variety of issues that have never been documented, but are using it as an opportunity to develop more robust user documentation.

• Finally, as we have been so focused on the “Dark Archive” migration, we have yet to develop born-digital accessioning work flows to get that type of content into the DC.

"Balancing the ideal vs. the real" from "When the rubber hits the road in real-world digital preservation"

2018.03.23

Daniel W. Noonan @ the Midwest Archives Conference Annual Meeting 2018 8

Page 99: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

I’d like to finish with the following quote that I think sums up the idea of “Balancing the ideal vs. the real when the rubber hits the road in real-world digital preservation”:

A colleague of ours, Sofia Becerra (from the Berklee College of Music) posted a link to Facebook yesterday from Sara Allain’s blog “Letters to a Young Librarian” The post 10 Things I Didn't Learn in Archives School, included this at #9:

“Email, social media, digital preservation - we're still figuring it out.

I regularly feel lost when it comes to these topics, but I’ve realized over time that it's okay to feel lost because we're all lost, as a profession.

It's easy to focus on the small majority of people and institutions that are making headway - they're the folks who present at conferences and write papers and tweet about their amazing work.

They’re wonderful! They're truly doing some exceptional work. But it's also okay to be the person who is doing the little things.

"Balancing the ideal vs. the real" from "When the rubber hits the road in real-world digital preservation"

2018.03.23

Daniel W. Noonan @ the Midwest Archives Conference Annual Meeting 2018 9

Page 100: When the Rubber Laura Alagna Hits the Road: Real-World ... · – Introduce new appraisal and review functionality in Archivematica. • Infrastructure – Storage (short/long term):

You want to be ahead of the game on digital preservation? Make sure that your content isn't stored on a hard-drive and you'll be doing more than many.

As we continue to push the boundaries of what archiving comprises in the 21st Century, it's okay to take an inch rather than a mile. Positive incremental change can be as powerful as the big leaps.”

(https://letterstoayounglibrarian.blogspot.com/2018/03/10-things-i-didnt-learn-in-archives.html)

Thank you!

"Balancing the ideal vs. the real" from "When the rubber hits the road in real-world digital preservation"

2018.03.23

Daniel W. Noonan @ the Midwest Archives Conference Annual Meeting 2018 9