Approaches to preserving digitized taxonomic data

Post on 18-Nov-2014

3.065 views 0 download

description

Sherborn Symposium. Natural History Museum, London. 28 October 2011.

Transcript of Approaches to preserving digitized taxonomic data

Approaches to preserving digitized taxonomic data:

Prints, manuscripts & specimens

Chris FreelandDirector, Center for Biodiversity Informatics

Technical Director, Biodiversity Heritage Library28 October 2011

@chrisfreeland

Prints / Manuscripts / SpecimensDifferent objects, similar management

http://www.flickr.com/photos/biodivlibrary/6257859557 http://www.flickr.com/photos/chrisfreeland/6018724034 http://www.biodiversitylibrary.org/page/34045915

Overview of Talk

• Why worry about digital preservation?

• Considerations for preservation– Collaboration– File formats– Metadata standards

• Views to the future

Preservation Panic!

WHY WORRY?http://www.flickr.com/photos/biodivlibrary/6008902662

Do it once, do it right

Costs more to get object to scanner than to scan

• Conversion / Compost / Corruption• Longevity of digital objects• File changes• Media obsolescence

Cautionary Tales

CONSIDERATION: COLLABORATION

LOCKSS

Lots Of Copies Keeps Stuff Safe

• LOCKSS is both a software platform & a concept– Software: http://www.lockss.org

Museum XLibrary Y

Rule of 3

Archive Z

1. Geographic Locations 2. Administrations 3. Technology Platforms

CONSIDERATION: FILE FORMATS

JPEG2000

• Wavelet compression, lossless encoding• 12 Parts• Of particular interest to documents &

specimens:– Part 1: Core Coding System, ISO/IEC 15444-1– Part 6: Compound image file format– Part 10: JP3D, Volumetric images

http://www.jpeg.org/jpeg2000/

http://www.tropicos.org/ImageFullView.aspx?imageid=62182

JPEG2000 (Hurrahs & Hisses)

• Advantages– Store a single file for access & preservation– Standards-based– Saves drive space (important at museum scale)

• Disadvantages– Doesn’t have wide native support in many apps– Requires an intermediary app to decode & serve

• But, there’s an open source option: djatoka http://djatoka.sourceforge.net

– Reports of data loss

PDF/A

• ISO-standardized version of PDF suitable for long-term preservation

• Identifies a "profile" for electronic documents that ensures the documents can be reproduced exactly the same way in years to come.*

• Makes the file self-contained (and therefore larger)– Embeds fonts– Graphics

* http://en.wikipedia.org/wiki/PDF/A

CONSIDERATION: METADATA

The Great Thing AboutSTANDARDS

Is That There AreSO MANY

To Choose From

FilesystemFilesystem

Metadata Preservation

• Descriptive information (metadata) provides content & context for indexing, reuse

• Can bundle metadata within files– EXIF: images, common in digital cameras– Adobe XMP: docs, images

• Should commit metadata to file system– Should not manage just

in DB or other management system

<DwC> XMLXML

JP2JP2

THE FUTURE

Electronic Publications

• Happening now, has been for years• Should take same care in ensuring

heterogeneity & diversity in digital management systems as with printed, bound books– Monolithic libraries have failed over time– Monolithic electronic archives will, too

http://www.biodiversitylibrary.org/page/22681143

Need a meadow…

…not a monoculture.

There is no silver bullet

• Make best decision today

• Stay up with technology changes & best practices– <insert library & archive professionals here>

• Evaluate, experiment, document, lead

• Move to stable new technologies when necessary

Questions?Chris Freeland

Director, Center for Biodiversity InformaticsTechnical Director, Biodiversity Heritage Library

28 October 2011

Email: chris.freeland@mobot.org

Twitter: @chrisfreeland