J1021 110516(bit rotblues)

20
Bit rot blues: emerging digital preservation Emerging Technology Forum Gordon Institute of TAFE Tuesday 17 May 2011 Alan Howell [email protected] www.alanhowell.com.au
  • date post

    22-Oct-2014
  • Category

    Technology

  • view

    446
  • download

    0

description

Alan Howell discusses the challenges of preserving massive amounts of digital information.

Transcript of J1021 110516(bit rotblues)

Page 1: J1021 110516(bit rotblues)

Bit rot blues:emerging digital preservation

Emerging Technology ForumGordon Institute of TAFETuesday 17 May 2011

Alan [email protected]

Page 2: J1021 110516(bit rotblues)

1946

Page 3: J1021 110516(bit rotblues)

Background

• Digital technologies have brought about the greatest increase in information since invention of moveable type

• Need to keep some of it - but that’s problematic!

• Higher education sector and memory domains working on the issues and solutions for over a decade

• Status report on current thinking

Page 4: J1021 110516(bit rotblues)

Why preserve digital?• ‘because good research needs good data’• to continue in ‘business’• to protect rights and entitlements• to keep government accountable• to meet the need for information and creative

expression• so that a balanced record of our society endures

Page 5: J1021 110516(bit rotblues)

Managing massive quantities of data

Page 6: J1021 110516(bit rotblues)

Digital is measured in Bytes (B)

= 1 thousand GB, or1 Terabyte (TB)

= 1 thousand TB, or1 Petabyte (PB)

= 1 thousand PB, or1 Exabyte (EB)

= 1 thousand MB, or1 Gigabyte (GB)

= 1 thousand kB1 Megabyte (MB)

= 1 thousand bytes1 Kilobyte (kB)

1.5

1,500

1.5 million

1.5 billion1 Zettabyte (ZB) = 1 thousand EB, or 1.5 trillion

Page 7: J1021 110516(bit rotblues)

Obsolete technology is the most pressing technical issue, but also...

Analytical Engine 1837 Eniac 1946 CD-ROM 1980 - ?

Page 8: J1021 110516(bit rotblues)

Why is digital preservation an issue?• Change from preservation at some time in the

future to preservation as close to the point of creation as possible (preferably before!)

• Higher costs (than paper-based preservation) and business-level ICT

• Solutions are partly technological (addressing carrier instability and technological obsolescence) and partly organisational (addressing risk, acceptable loss and change) and partly cultural: cuts across boundaries

Page 9: J1021 110516(bit rotblues)

Most data will be lost,

either intentionally or accidentally.

Will this be planned?

Page 10: J1021 110516(bit rotblues)

What we hope for …

Page 11: J1021 110516(bit rotblues)

What we get…

With thanks to the NLA

Page 12: J1021 110516(bit rotblues)

Some principles

• Digital preservation principles are essentially the same physical custody and collection management principles that the archive, library and museum professions have established for paper-based resources.

Page 13: J1021 110516(bit rotblues)

So, what is digital preservation?• Technical: Keeping same # bits in the same order +

creating metadata to manage and audit progress• Intellectual: keeping significance, intellectual content or

essence or semantic meaning, observational data, format as an exemplar of a type, etc

• Archival: keeping context, provenance and evidential status

• User: keeping the ability to use digital resources for as long as required in the way that the designated community requires

• Managerial: policies and strategies to mitigate risk, ensure business continuity, increase competitive advantage

Page 14: J1021 110516(bit rotblues)

What we are looking for …

Page 15: J1021 110516(bit rotblues)

How to do it• Raise awareness - get informed – all on same page• Audit digital resources: is preservation needed?• Identify + categorise significant digital resources• Develop dp policy (by category + format)• Assess risks: start with high-risk/high-significance• Get the legals sorted out• Build lifecycle model: assume obsolescence• Establish metadata to manage resources over time• Secure the bitstream: use robust, redundant, storage• Do something rather than wait for perfect solutions• Collaborate, share results, stay informed…

Page 16: J1021 110516(bit rotblues)

Models: OAIS

Page 17: J1021 110516(bit rotblues)

National Archives of Australia• ‘Downstream’ approach (too many agencies)• Keeping original bit-stream• Using Xena software to “normalise” documents into

selected open, fully documented, xml archival formats (including png, html, ooxml, pdf). Not all conversions are perfect.

• Lots of redundancy• Physically separate transformation, access and long-

term storage zones built with “commodity-level” ICT components

Page 18: J1021 110516(bit rotblues)

TransferredItem in any

format

Access copyin formatof the day

Preservationmaster in

XML

Digital Repository

Normalisation

Transformation

Preservation/AccessibilityMaster

Source Master

Page 19: J1021 110516(bit rotblues)

Emerging digital issues• Managing massive amounts of data• Choosing what to keep - most data will be lost• Recognising that the principles already exist - no need to

reinvent the wheel• Understanding that significant digital objects will outlast

current systems• Not primarily about technology but about policy and

institutional and individual commitment• Required skills (problem solving, management,

communications, planning and profound understanding of technologies) are not being taught

• Challenge is too large and too complex for any one institution, sector or country

Page 20: J1021 110516(bit rotblues)

‘The Internet is getting big and it’s happening fast’Robert OrensteinAuthor, The irresponsible Internet statistics generatorSource: http://www.anamorph.com/docs/stats/stats.html