Archiving and Preservation
description
Transcript of Archiving and Preservation
![Page 1: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/1.jpg)
Archiving and PreservationMichele Kimpton
CEO, DuraSpace
Bryan BeecherDirector, ICPSR
DuraSpace WebinarNovember 2, 2011
![Page 2: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/2.jpg)
DuraSpace Mission
We are committed to providing open source technologies and services that promote durable, persistent access to
the scholarly record.
![Page 3: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/3.jpg)
Preservation challenges
• Ability to readily provision online storage (ideally in another geographic area, another administration)
• Synchronize content across storage systems• Audit integrity of content• Technical resources required• Internal Policies• Sustainability over time
![Page 4: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/4.jpg)
Why cloud?
Massively scalable compute and storage offered as a web based service
![Page 5: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/5.jpg)
Higher Ed survey, 211 responses
![Page 6: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/6.jpg)
Digital archiving by media type
ESG white paper, Feb 2011
![Page 7: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/7.jpg)
What is DuraCloud?
Platform and service based on cloud infrastructureAcross multiple cloud providers
![Page 8: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/8.jpg)
DuraCloud apps
Online Backup(s)
File health check
Synchronization of content to multiple clouds …more on the roadmap
File Format Identification
Archiving and Preservation focused-
![Page 9: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/9.jpg)
Archiving and Preservation support
• Duracloud providesEasy back up to multiple cloud providersKeep backups in syncCheck health of backupsAbility to view and download filesRetrieve and restore filesWeb accessible
![Page 10: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/10.jpg)
Using DuraCloud for Archiving & Preservation
Bryan BeecherDirector, Computer & Network ServicesICPSR
![Page 11: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/11.jpg)
About ICPSR
• Inter-university Consortium for Political and Social Research
• Located at the University of Michigan• World’s largest archive of social
science research data• In operation for 50 years• About $15m in revenues
![Page 12: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/12.jpg)
Archival holdings
• Lots of little files– text/plain– application/pdf– text/xml– other stuff
• 2m files; 6TB of storage
![Page 13: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/13.jpg)
Strategy
• Bit-level for original (SPSS + Word)• Normalize into more durable formats
(plain text data + XML metadata + PDF/A documentation)
• Transform for better delivery• Retain transform and derivatives• Lots of copies
![Page 14: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/14.jpg)
Data archiving, 1 BC
![Page 15: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/15.jpg)
Geographic Diversity, 1 BC
![Page 16: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/16.jpg)
Geographic Diversity, 1 BC
![Page 17: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/17.jpg)
Geographic Diversity, 1 BC
![Page 18: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/18.jpg)
Maybe disk instead of tape?
• Synchronize content to other locations
• Fixity checking lets us know when we need to “fix” something
![Page 19: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/19.jpg)
Get by with a little help from our friends
![Page 20: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/20.jpg)
And they are friends
• Based on relationships• No SLA• No scale up/down• Idiosyncratic interface• Contracts? We don’t need no stinkin’
contracts!
![Page 21: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/21.jpg)
A copy in the cloud
![Page 22: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/22.jpg)
Are you crazy?
• FISMA Low• Not encrypted• Machine room
open access• Firewalled• Professional IT
staff + others
• FISMA Medium• Encrypted• Machine room
controlled access• Firewalled• Professional IT
staff
![Page 23: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/23.jpg)
Honeymoon period
• Automated monthly billing for usage (storage, computer, network I/O)– Small EC2 instance + 6 x 1TB EBS
volumes bound together as a RAID• Easy to scale up and down• Easy to synchronize
![Page 24: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/24.jpg)
And best of all…
![Page 25: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/25.jpg)
So what’s not to like?
• Cloud diversity– Location– Technology platform– Operational processes– Business viability
• Vendor lock-in
![Page 26: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/26.jpg)
Who can save us?
![Page 27: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/27.jpg)
What we like
• Single interface to “the cloud”• Single billing contact
– Single relationship• Value-added services
– Fixity checking
![Page 28: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/28.jpg)
What we would change
• Filesystem semantics would work better for us– rsync v. synctool– files v. objects
• Support for big files/objects• Tools suitable for automated batch
use (i.e., out of cron)
![Page 29: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/29.jpg)
Takeaways
• Cloud is a viable option for additional archival copies
• Physical infrastructure may be at least as good as your own
• Encrypt the sensitive stuff• Not the low-cost solution; but may be
the low-hassle solution
![Page 30: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/30.jpg)
More info
• Bryan Beecher– [email protected]– http://techaticpsr.blogspot.com/
Thank you for attending this talk
![Page 31: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/31.jpg)
Upcoming DuraCloud Webinars
Technical Overview of DuraCloudNovember 16 at 1pm ET
DSpace and DuraCloudNovember 30 at 1pm ET
Fedora and DuraCloudJanuary 11 at 1pm Et
![Page 32: Archiving and Preservation](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164e5550346895dd74b44/html5/thumbnails/32.jpg)
Try DuraCloud Free for One Month:Trial or Subscription