Your Digital Preservation Cookbook
-
Upload
artefactual-systems-archivematica -
Category
Technology
-
view
593 -
download
4
Transcript of Your Digital Preservation Cookbook
Your Digital Preservation Cookbook Sara Allain, Dan Gillean,
and Sarah Romkey,Artefactual Systems
Archives Association of B.C. Annual Conference,
April 15, 2016
https://www.pinterest.com/pin/455145106065308238/
Your Digital Preservation Cookbook Sara Allain, Dan Gillean,
and Sarah Romkey,Artefactual Systems
Archives Association of B.C. Annual Conference,
April 15, 2016
Today’s offerings:
1. Ingredient preparation (digital preservation actions)
2. Cooking (preservation storage)
3. Serving (providing access to digital content)
4. Kitchen management (policies and procedures)
http://www.sampletemplates.com/menu-templates/blank-menu-template.html
Ingredient Preparation
Digital preservation actions
Ingredient Preparation
https://www.atlaswearables.com/blog/2015/05/we-love-vegetables/
Preparation: Digital preservation actions
By taking on digital preservation prep, your files are better understood for the future.
Like properly prepped ingredients, prepared digital content is better cooked (preserved).
Unlike ingredients in your favourite recipe, prep activities actually increase their authenticity rather than transforming them into something new.
http://www.blogher.com/women-and-food-will-win-war-wwi
Preparation: FixityAre those ingredients what they say they are on the box?
Fixity, or checksums, record the order of the bits so it can be re-checked in the future.
Capturing fixity as early as possible in the accessioning process makes sense - don’t move the files several times before creating a checksum.
Checksums pair nicely with other functions, e.g. packaging (Bagit).
http://www.buzzfeed.com/leonoraepstein/16-fascinating-facts-about-jell-o#.uqrYYQqw7
Preparation: Virus scanKeep pests out of the kitchen!
Scan for viruses so you don’t ingest them into your preservation environment!
Quarantine functionality in a preservation system allows virus definitions time to update.
City of Vancouver Archives, Deer in Malahat Lookout KitchenWilliam Bros. Photographers Collection AM1545-S3-: CVA 586-497
Preparation: File identificationKnow your ingredients!
Know what you’re cooking with: identify file formats, ideally using digital signatures for increased precision.
Should identify not just the format, but also the version
Identifying the file formats accurately increases likelihood of getting more/better technical metadata.
City of Vancouver Archives, [Woman mixing ingredients at] Dale's [Roast Chicken] kitchen on Granville Street
William Bros. Photographers Collection AM1545-S3-: CVA 586-4012
Preparation: Validation, characterization, metadata extractionAre those noodles real?
Validation: is it a well-formed example of that particular file format?
Characterization: what are the particulars of this specific file? (e.g. size, codec, bitrate, etc)
Extracting this technical metadata from the files and storing in a standardized way helps ensure their longevity.
http://travelwireasia.com/2013/08/fake-food-japanese-style-that-looks-good-enough-to-eat/
Preparation: PII and sensitiveinformationLike in the analogue world, you may have a requirement to flag files that contain personally identifying information and restrict access to the originals.
Unlike the analogue world, there are tools available that can help you scan automatically for this information!
This task can be performed during processing, or after access is requested.
http://www.amazon.com/White-Horse-Whisky-Blindfolded-Taste/dp/B0159EOIXQ
Preparation: Normalization, migration, emulationStrategies for dealing with software obsolescence:
Normalization converts files into a more preservation-friendly format while retaining the originals
Migration migrates the files overtime as new file formats emerge.
Emulation preserves the files and their software/operating system.
UBC Archives, Two Students in Cooking Class in Home Economics, School of Family and Nutritional Science fonds, UBC 101.1/15
Preparation: Putting it all togetherIf that all sounded like more kitchen prep than Thanksgiving dinner, luckily there’s an easier way!
Digital preservation systems can tie much of the functionality together into one workflow.
Some of these functions are also taken care of in repository systems (coming up next).
http://freshome.com/2013/03/22/what-you-can-learn-from-the-jetsons-about-home-automation/
CookingPreservation storage
Cookinghttp://www.colonelsretreat.com/home/cooking.php
Cooking: Preservation storagePrep is critical, but it’s only the first step!
Like cooking a meal, preserving your content for the long term requires specific tools and methods.
As with food, the best way to preserve your digital content is to use an appropriate storage container to ensure that your content will be safe and usable for the long term.
https://www.flickr.com/photos/29069717@N02/10111289655/
Cooking: Preservation storagePreservatives and an airtight seal
Your storage container for digital content is a repository. Repositories come in many flavours:
• Can have a public interface or be closed off (“dark archive”)
• Can be a simple data store or something really complex
• May come with built-in tools to help you ensure that your data is valid for the long-term
https://www.flickr.com/photos/29069717@N02/10111289655/
Cooking: Fixity checkingSimpler - faster - better - surer!
Fixity checking ensures that your content is still viable.
By looking at the fixity record you created during preparation and then re-running the tool you used to create that fixity record in the first place, you can tell if your content is still viable - all the bits are still present and accounted for.
Your repository system should enable you to do this automatically - no human intervention needed, unless the fixity checks don’t match!
http://s.ecrater.com/stores/108769/55f584a6249cf_108769b.jpg
Cooking: RedundancyMake sure there’s enough for seconds. And thirds!
Making many copies of your digital content is critical to ensuring that you have back-up if something goes wrong. Two common kinds of redundancy are:
• Back-up copies of your database preserved on different servers
• Geo-redundancy, usually provided by a server hosting provider
https://c1.staticflickr.com/3/2096/5794109510_a4f966a812.jpg
Cooking: Technical metadataThe recipe for your digipres casserole
Technical metadata tells you what comprises the digital content as well as how it’s put together.
There are different standards depending on the type of technical metadata that you’re recording. PREMIS is widely used to capture metadata specifically relating to preservation; there are many others as well.
Following a standard means that your metadata will be consistent both within your repository and over time. http://www.midcenturymenu.com/2010/06/the-mid-century-menu-ham-banana-casserole/
Cooking: Audit and controlDon’t let strangers mess around in your kitchen!
Performing regular, holistic audits to check on the integrity of your files is the best way to ensure that they’re not degenerating over time.
Only authorized users should have access to your repository. Controlling who can edit your digital content - including metadata - is a crucial component to ensure that it’s stored safely and securely.
http://land.allears.net/blogs/jackspence/21%20Yak%20%26%20Yeti%2001.jpg
Cooking: Future proofingIf you start with the basics, you’ll be able to cook anything
Choosing the best repository system isn’t just about your present needs - it’s also about the future.
Ensuring that your repository is open and built around standards and best practices means that, if you need to, you can migrate to a new system.
Adhering to standards and best practices is like learning to chop an onion - it’s the foundation on which your collections rely.
http://ecx.images-amazon.com/images/I/81DGvz%2BcNZL.jpg
Serving
Managing Access
Servinghttps://www.sclv.com/Dining/Buffets.aspx
Serving: know your designated community!Who’s coming to dinner?
The OAIS reference model defines a designated community as:
“An identified group of potential Consumers who should be able to understand a particular set of information. The Designated Community may be composed of multiple user communities. A Designated Community is defined by the Archive and this definition may change over time.”
This means understanding that your end users might have different needs than the institutional actors responsible for ongoing preservation.
http://hahasforhoohas.com/stories/ten-things-you-never-want-say-dinner-guests-arrive
Serving: Applying access restrictions Knowing what not to serve is just as important as knowing what to serve!
You will need to make sure that you are applying appropriate access restrictions. These might be based on copyright, local statutes, donor restrictions, licenses, etc. You’ll need clear policies on who can access what when.
PREMIS Rights: http://www.loc.gov/standards/premis/
Coyle, Karen. “Rights in the PREMIS Data Model.” A report for the Library of Congress, December 2006. http://www.loc.gov/standards/premis/Rights-in-the-PREMIS-Data-Model.pdf
https://makeameme.org/meme/no-dinner-for-pczmhb
Serving: Creating access derivatives (DIPs)Or, don’t serve a whole chicken on wing night!
Preservation masters ≠ access copies!
For access, you want:
Smaller file sizes
In common formats
Supported by many web browsers and OSes
TIFF → JPG
WAV → MP3http://vancouverfoodster.com/2012/11/27/tasting-plates-chinatown-strathcona/
Dissemination Information Package (DIP): An Information Package, derived from one ormore AIPs, and sent by Archives to the Consumer in response to a request to the OAIS.
Serving: adding descriptive metadataLet your dinner guests know what’s on the menu
Use existing content standards: Dublin Core, ISAD(G), RAD (Canada), MODS, etc.
This can be done in a database or content management system (e.g. AtoM, ArchivesSpace, CollectiveAccess; custom databases, etc), or in locally created finding aids.
However you choose to do it, you will also need to think about how users are eventually going to access this information...
http://www.flavourbistro.co.nz/bistro-menu-g-173.html
Serving: indexing your content and making it discoverableSend out the dinner invitations!
Your end users (or consumers) will need a way to explore and understand the content you are making available.
Some facility for searching and browsing will greatly ease this.
If your resources are web-accessible, they can be indexed by search engines and become more broadly discoverable.
Indexing also includes adding access points - give your users a way into the content!
Access Software: A type of software that presents part of or all of the information contentof an Information Object in forms understandable to humans or systems.
http://www.sandyloujohnson.com/974-2/
Serving: Maintaining a relationship with the masterYou need to know where your hor d'oeuvres came from if you want to be able to serve them again in the future
Additional descriptive metadata created outside of the preservation workflow should remain linked to the AIP / digital object master.
Links to your rights statements are crucial for monitoring compliance!
Mutts comic strip, by Patrick McDonnell.http://farmtotablela.com/farm-table-humor/
Provenance: maintaining the digital chain of custody
If you need to generate updated DIPs in the future, you want to be able to re-trace that chain
Serving: Evaluating Access Systems for DigiPresChanneling your inner food critic
If you are looking to implement an existing access system as part of your digital preservation environment, here’s a summary of some of the factors to consider:
• Search and retrieval • Digital object display• Hierarchies and context • Access restrictions / rights management • Standards adherence• Data exchange and interoperability• Digital provenance (relationship to preservation masters)
https://www.pinterest.com/pin/73253931414036246/
Kitchen ManagementPolicies and Procedures
Kitchen Management
http://www.kitchenkitties.com/service-archive/kitchen-boot-camp/
Kitchen Management: The importance of policyDigital preservation is not all about tools and technology:
In standards like ISO 16363 (2012), policies and organizational infrastructure account for between ⅓ - ½ of the entire standard!
You need to ensure that your organization has the will, the capacity, and the vision to undertake digital preservation over the long-term.
http://recruitloop.com/blog/who-really-needs-to-get-involved-in-the-recruitment-process/
Kitchen Management: The importance of policyExample factors to consider:
• Does your organization’s mission statement explicitly cover a commitment to digital preservation?
• Do you have succession, contingency, and/or escrow plans in place?
• Do you have training policies around digital preservation?
• Are the duties of each staff associated with each link in the chain documented?
• Do you have an internal auditing mechanism?
• Do you have a long-term financial plan for your preservation?
http://liaisoncollegeoakville.com/chef-diploma-programs/specialist-chef/
Kitchen Management: The value of collaboration
This ain’t Iron Chef!!!
• Digital preservation is hard - and ongoing• Archives are underfunded - especially in
Canada• There’s a lot to learn…
But we can learn together, and share resources.
To be successful, we’ll need to collaborate, not compete - like a REAL professional kitchen!
http://www.popsugar.com/food/Interview-Next-Iron-Chef-Geoffrey-Zakarian-20967020
Shopping List
Tools and resources
http://www.middlevillemarketplace.com/shopping-list.php
Shopping List
Fixity
Tools to create checksums:md5deep: http://md5deep.sourceforge.net/
md5summer: http://www.md5summer.org/
Built into various preservation systems/tools: Archivematica, Preservica, Bagger, DuraCloud, etc.
Tools to verify checksums:Fixity: https://github.com/avpreserve/fixity
Built into various tools/systems as above
Tools to scan viruses
Clam AV : http://www.clamav.net/
Format identification
PRONOM database: http://www.nationalarchives.gov.uk/PRONOM/Default.aspx
Tools:Format Identifier for Digital Objects (FIDO): https://github.com/openplanets/fido
Siegfried: https://github.com/richardlehane/siegfried
File Information Tool Set (FITS): http://projects.iq.harvard.edu/fit
DROID: https://github.com/digital-preservation/droid
Characterization, validation, and metadata extraction
File Information Tool Set (FITS): http://projects.iq.harvard.edu/fits
Metadata extraction tool: http://meta-extractor.sourceforge.net
ffprobe: https://ffmpeg.org/ffprobe.html
Exiftool: http://www.sno.phy.queensu.ca/~phil/exiftool/
MediaInfo: https://mediaarea.net/en/MediaInfo
JHOVE: https://github.com/openpreserve/jhove
veraPDF: http://verapdf.org/
Normalization, migration, and emulation
Imagemagick: http://www.imagemagick.org/script/index.php
Inkscape: http://www.inkscape.org/
FFMPEG: http://ffmpeg.org/ffmpeg.html
Ghostscript: http://www.ghostscript.com/
KEEP solutions Emulation Framework: http://emuframework.sourceforge.net/
bwFLA Emulation as a Service: http://bw-fla.uni-freiburg.de/
Digital preservation systems
Archivematica: http://www.archivematica.org
Preservica: www.preservica.com
Rosetta: http://www.exlibrisgroup.com/category/RosettaOverview
Technical metadata standards
PREMIS: http://www.loc.gov/standards/premis/
Coyle, Karen. “Rights in the PREMIS Data Model.” A report for the Library of Congress, December 2006. http://www.loc.gov/standards/premis/Rights-in-the-PREMIS-Data-Model.pdf
METS: http://www.loc.gov/standards/mets/
PBCore: http://pbcore.org/schema/
NISO Metadata for Images in XML: http://www.loc.gov/standards/mix/
And many more, depending on the filetypes you’re working with!
Descriptive metadata standards
Dublin Core: http://dublincore.org/documents/dcmi-terms/
Rules for Archival Description (Canada): http://www.cdncouncilarchives.ca/archdesrules.html
General International Standard for Archival Description - ISAD(G): http://ica.org/en/isadg-general-international-standard-archival-description-second-edition
MODS: http://www.loc.gov/standards/mods/
Derivatives and indexing 1/2For derivatives: see the resources above for normalization. The same tools that are used for preservation normalization can also be used for creating access derivatives!
For search indexing: This will depend on how you are making your resources available. A search index generally needs to be one component of an application stack. Here are a few resources to look into:
Elasticsearch: https://www.elastic.co/products/elasticsearch
Solr: https://lucene.apache.org/solr/
Blacklight: http://projectblacklight.org/
Derivatives and indexing 2/2For adding indexing terms for discovery: Use existing controlled vocabularies whenever possible!
• Library of Congress vocabularies: http://loc.gov/library/libarch-thesauri.html
• Getty Vocabularies: http://www.getty.edu/research/tools/vocabularies/index.html
• Library Archives Canada controlled vocabularies: http://www.bac-lac.gc.ca/eng/services/government-information-resources/controlled-vocabularies/Pages/controlled-vocabularies.aspx
• UNESCO thesaurus: http://databases.unesco.org/thesaurus/
• JISC Directory of Metadata Vocabularies: http://www.jiscdigitalmedia.ac.uk/guide/controlling-your-language-links-to-metadata-vocabularies/
• RBMS Controlled Vocabularies for Use in Rare Book and Special Collections Cataloging: http://rbms.info/vocabularies/
Description, repository, and access systems
• Access to Memory: https://www.accesstomemory.org
• ArchivesSpace: http://www.archivesspace.org/
• CollectiveAccess: http://collectiveaccess.org/
• Omeka: http://omeka.org/
• Islandora: http://islandora.ca/
• Hydra: https://projecthydra.org/
• Avalon: http://www.avalonmediasystem.org/
• ResCarta Toolkit: http://www.rescarta.org/
Note that a lot of these systems will include the tools and standards described in the previous slides
General & policy resources
• CCSDS - Reference Model for an Open Archival Information System (OAIS): http://public.ccsds.org/publications/archive/650x0m2.pdf
• CCSDS - Audit and Certification of Trustworthy Digital Repositories: http://public.ccsds.org/publications/archive/652x0m1.pdf
• TRAC review tool, developed by Developed by MIT in a project led by Nancy McGovern, Head of Curation and Preservation Services at MIT Libraries: https://wiki.archivematica.org/Internal_audit_tool
• COPTR - Community Owned digital Preservation Tool Registry: http://coptr.digipres.org/Main_Page
• Open Preservation Foundation: http://openpreservation.org/
• POWRR Project - Preserving (Digital) Objects With Restricted Resources: http://digitalpowrr.niu.edu/
• DigiPres Commons: http://www.digipres.org/
• Digital Preservation Q & A: http://qanda.digipres.org/
• National Digital Stewardship Alliance - Levels of Preservation: http://ndsa.diglib.org/activities/levels-of-digital-preservation/
• NDSA Digital Preservation in a Box: http://dpoutreach.net/
• AVPreserve’s open source tools: https://www.avpreserve.com/avpsresources/tools/
• AVPreserve’s papers and presentations: https://www.avpreserve.com/avpsresources/papers-and-presentations/
General & policy resources