Approaches to preserving digitized taxonomic data
-
Upload
chris-freeland -
Category
Technology
-
view
3.064 -
download
0
description
Transcript of Approaches to preserving digitized taxonomic data
Approaches to preserving digitized taxonomic data:
Prints, manuscripts & specimens
Chris FreelandDirector, Center for Biodiversity Informatics
Technical Director, Biodiversity Heritage Library28 October 2011
@chrisfreeland
Prints / Manuscripts / SpecimensDifferent objects, similar management
http://www.flickr.com/photos/biodivlibrary/6257859557 http://www.flickr.com/photos/chrisfreeland/6018724034 http://www.biodiversitylibrary.org/page/34045915
Overview of Talk
• Why worry about digital preservation?
• Considerations for preservation– Collaboration– File formats– Metadata standards
• Views to the future
Preservation Panic!
WHY WORRY?http://www.flickr.com/photos/biodivlibrary/6008902662
Do it once, do it right
Costs more to get object to scanner than to scan
• Conversion / Compost / Corruption• Longevity of digital objects• File changes• Media obsolescence
Cautionary Tales
CONSIDERATION: COLLABORATION
LOCKSS
Lots Of Copies Keeps Stuff Safe
• LOCKSS is both a software platform & a concept– Software: http://www.lockss.org
Museum XLibrary Y
Rule of 3
Archive Z
1. Geographic Locations 2. Administrations 3. Technology Platforms
CONSIDERATION: FILE FORMATS
JPEG2000
• Wavelet compression, lossless encoding• 12 Parts• Of particular interest to documents &
specimens:– Part 1: Core Coding System, ISO/IEC 15444-1– Part 6: Compound image file format– Part 10: JP3D, Volumetric images
http://www.jpeg.org/jpeg2000/
http://www.tropicos.org/ImageFullView.aspx?imageid=62182
JPEG2000 (Hurrahs & Hisses)
• Advantages– Store a single file for access & preservation– Standards-based– Saves drive space (important at museum scale)
• Disadvantages– Doesn’t have wide native support in many apps– Requires an intermediary app to decode & serve
• But, there’s an open source option: djatoka http://djatoka.sourceforge.net
– Reports of data loss
PDF/A
• ISO-standardized version of PDF suitable for long-term preservation
• Identifies a "profile" for electronic documents that ensures the documents can be reproduced exactly the same way in years to come.*
• Makes the file self-contained (and therefore larger)– Embeds fonts– Graphics
* http://en.wikipedia.org/wiki/PDF/A
CONSIDERATION: METADATA
The Great Thing AboutSTANDARDS
Is That There AreSO MANY
To Choose From
FilesystemFilesystem
Metadata Preservation
• Descriptive information (metadata) provides content & context for indexing, reuse
• Can bundle metadata within files– EXIF: images, common in digital cameras– Adobe XMP: docs, images
• Should commit metadata to file system– Should not manage just
in DB or other management system
<DwC> XMLXML
JP2JP2
THE FUTURE
Electronic Publications
• Happening now, has been for years• Should take same care in ensuring
heterogeneity & diversity in digital management systems as with printed, bound books– Monolithic libraries have failed over time– Monolithic electronic archives will, too
http://www.biodiversitylibrary.org/page/22681143
Need a meadow…
…not a monoculture.
There is no silver bullet
• Make best decision today
• Stay up with technology changes & best practices– <insert library & archive professionals here>
• Evaluate, experiment, document, lead
• Move to stable new technologies when necessary
Questions?Chris Freeland
Director, Center for Biodiversity InformaticsTechnical Director, Biodiversity Heritage Library
28 October 2011
Email: [email protected]
Twitter: @chrisfreeland