J1021 110516(bit rotblues)
-
date post
22-Oct-2014 -
Category
Technology
-
view
446 -
download
0
description
Transcript of J1021 110516(bit rotblues)
Bit rot blues:emerging digital preservation
Emerging Technology ForumGordon Institute of TAFETuesday 17 May 2011
Alan [email protected]
1946
Background
• Digital technologies have brought about the greatest increase in information since invention of moveable type
• Need to keep some of it - but that’s problematic!
• Higher education sector and memory domains working on the issues and solutions for over a decade
• Status report on current thinking
Why preserve digital?• ‘because good research needs good data’• to continue in ‘business’• to protect rights and entitlements• to keep government accountable• to meet the need for information and creative
expression• so that a balanced record of our society endures
Managing massive quantities of data
Digital is measured in Bytes (B)
= 1 thousand GB, or1 Terabyte (TB)
= 1 thousand TB, or1 Petabyte (PB)
= 1 thousand PB, or1 Exabyte (EB)
= 1 thousand MB, or1 Gigabyte (GB)
= 1 thousand kB1 Megabyte (MB)
= 1 thousand bytes1 Kilobyte (kB)
1.5
1,500
1.5 million
1.5 billion1 Zettabyte (ZB) = 1 thousand EB, or 1.5 trillion
Obsolete technology is the most pressing technical issue, but also...
Analytical Engine 1837 Eniac 1946 CD-ROM 1980 - ?
Why is digital preservation an issue?• Change from preservation at some time in the
future to preservation as close to the point of creation as possible (preferably before!)
• Higher costs (than paper-based preservation) and business-level ICT
• Solutions are partly technological (addressing carrier instability and technological obsolescence) and partly organisational (addressing risk, acceptable loss and change) and partly cultural: cuts across boundaries
Most data will be lost,
either intentionally or accidentally.
Will this be planned?
What we hope for …
What we get…
With thanks to the NLA
Some principles
• Digital preservation principles are essentially the same physical custody and collection management principles that the archive, library and museum professions have established for paper-based resources.
So, what is digital preservation?• Technical: Keeping same # bits in the same order +
creating metadata to manage and audit progress• Intellectual: keeping significance, intellectual content or
essence or semantic meaning, observational data, format as an exemplar of a type, etc
• Archival: keeping context, provenance and evidential status
• User: keeping the ability to use digital resources for as long as required in the way that the designated community requires
• Managerial: policies and strategies to mitigate risk, ensure business continuity, increase competitive advantage
What we are looking for …
How to do it• Raise awareness - get informed – all on same page• Audit digital resources: is preservation needed?• Identify + categorise significant digital resources• Develop dp policy (by category + format)• Assess risks: start with high-risk/high-significance• Get the legals sorted out• Build lifecycle model: assume obsolescence• Establish metadata to manage resources over time• Secure the bitstream: use robust, redundant, storage• Do something rather than wait for perfect solutions• Collaborate, share results, stay informed…
Models: OAIS
National Archives of Australia• ‘Downstream’ approach (too many agencies)• Keeping original bit-stream• Using Xena software to “normalise” documents into
selected open, fully documented, xml archival formats (including png, html, ooxml, pdf). Not all conversions are perfect.
• Lots of redundancy• Physically separate transformation, access and long-
term storage zones built with “commodity-level” ICT components
TransferredItem in any
format
Access copyin formatof the day
Preservationmaster in
XML
Digital Repository
Normalisation
Transformation
Preservation/AccessibilityMaster
Source Master
Emerging digital issues• Managing massive amounts of data• Choosing what to keep - most data will be lost• Recognising that the principles already exist - no need to
reinvent the wheel• Understanding that significant digital objects will outlast
current systems• Not primarily about technology but about policy and
institutional and individual commitment• Required skills (problem solving, management,
communications, planning and profound understanding of technologies) are not being taught
• Challenge is too large and too complex for any one institution, sector or country
‘The Internet is getting big and it’s happening fast’Robert OrensteinAuthor, The irresponsible Internet statistics generatorSource: http://www.anamorph.com/docs/stats/stats.html