2013 11-27 sustainable-history_slides
-
Upload
james-baker -
Category
Education
-
view
174 -
download
0
description
Transcript of 2013 11-27 sustainable-history_slides
Preserving research data for
the future
Dr James Baker, Digital Curator
@j_w_baker
www.bl.uk 2
Some admin…
You are free to:
– Copy, share, adapt, or re-mix
– Photograph, film, or broadcast
– Blog, live-blog, or post video of;
this presentation provided that:
– You attribute the work to its author
and respect the rights and licences
associated with its components
– You distribute the resulting work only
under the same or similar license to
this one
Text attribution Greg Wilson, Two Solitudes, SPLASH 2013 (29 October 2013)
http://www.slideshare.net/gvwilson/splash-2013
This work is licensed under a
Creative Commons Attribution-
ShareAlike 3.0 Unported License
unless stated otherwise.
www.bl.uk 3
‘the fragility of evidence in the digital era’
‘[the digital] archive is considerably more fragile than one
would like’
‘The simultaneous fragility and promiscuity of digital data’
Roy Rosenzweig, Scarcity or Abundance? Preserving the Past in a Digital Era,
The American Historical Review 108:3 (2003), 736, 737, 739.
www.bl.uk 4
www.bl.uk 5
‘The core guiding principle is simple: Someone unfamiliar with
your project should be able to look at your computer
files and understand in detail what you did and why […]
Most commonly, however, that “someone” is you. A few
months from now, you may not remember what you were up to when you
created a particular set of files, or you may not remember what
conclusions you drew. You will either have to then spend time
reconstructing your previous experiments or lose whatever insights you
gained from those experiments.’
William Stafford Noble (2009) A Quick Guide to Organizing Computational Biology
Projects. PLoS Comput Biol 5(7): e1000424. doi:10.1371/journal.pcbi.1000424
Preservation and use
www.bl.uk 6
‘What is often said of military strategy seems to apply to digital
preservation: "the greatest enemy of a good plan is the
dream of a perfect plan." We have never preserved everything; we
need to start preserving something.’
Roy Rosenzweig, Scarcity or Abundance? Preserving the Past in a Digital Era,
The American Historical Review 108:3 (2003), 754.
www.bl.uk 7
and, how do we plan for the future - the unknown future of
digital space, digital dissemination, and digital information?
Heather Froehlich (heatherfro). “and, how do we plan for the future - the unknown
future of digital space, digital dissemination, and digital information?” 4 November
2013, 5:15 a.m. Tweet.
What demands the closest attention?
www.bl.uk 8
Victory is mine: while ago I worked out some Clever Stuff
(tm) in Excel. And I MADE NOTES ON IT. And those notes
ENABLED ME TO DO IT AGAIN.
Katie Birkwood (girlinthe). “Victory is mine: while ago I worked out some Clever
Stuff (tm) in Excel. And I MADE NOTES ON IT. And those notes ENABLED ME
TO DO IT AGAIN.” 7 October 2013, 3:46 a.m. Tweet.
Documentation
www.bl.uk 9
Good documentation must:
– Include ‘the archive references for the originals!’
– Explain the [source or] ‘dataset (and its limitations)
accurately’.
– Be ‘clear about what it represents (eg full transcriptions,
partial transcriptions, just summaries, changes, iterations)’.
– Be written ‘in a structured data format to make it machine-
readable […] Plain text files (.txt) are preferable to Word
docs’.
Documentation
Sharon Howard, ‘Unclean, unclean! What historians can do about sharing
our messy research data’, Early Modern Notes (18 May 2013)
www.bl.uk 10
"Word is not a digital preservation standard" -
understatement of the day #SearchSolutions2013
Helen Lippell (octodude). “"Word is not a digital preservation standard" -
understatement of the day #SearchSolutions2013” 27 November 2013, 12:36
a.m. Tweet.
Documentation
www.bl.uk 11
www.bl.uk 12
Notes on digital books.docx NO!
2013-11-18_MS_books_documentation.txt YES!
Extensible, scalable, reusable
www.bl.uk 13
www.bl.uk 14
\root\BL\Talks\ 2013-11_Sustainable_History
2013-11_Liverpool_John_Moores
2013-05_Going_Digital
(Extensible, scalable, reusable) Structure
\root\ Admin
Attic
BL
Notes
Research
Teaching
\root\BL\ Admin
Attic
Data
Events
Projects
Research
Talks
Teaching
www.bl.uk 15
2013-08-11_History_Journal_Articles.tsv
2013-08-11_History_Journal_Articles.txt
(Extensible, scalable, reusable) Naming
www.bl.uk 16
2013-08-11_History_Journal_Articles_africa.tsv
2013-08-11_History_Journal_Articles_america.tsv
2013-09-11_History_Journal_Articles_art.tsv
2013-09-11_History_Journal_Articles_britain.tsv
2013_History_Journal_Articles.txt
copy *.tsv newfile.tsv
copy 2013-08*.tsv newfile.tsv
copy 2013-0*-11_History_Journal_Articles_a*.tsv newfile.tsv
(Extensible, scalable, reusable) Naming
www.bl.uk 17
DATE_ARTIST_TITLE.FORMAT
1804-02-10_Gillray_TheKingofBrobdignagandGulliver.png
1653_Rembrandt_TheThreeCrosses.png
(Extensible, scalable, reusable) Naming
www.bl.uk 18
www.bl.uk 19
1653_Rembrandt_TheThreeCrosses.png
1653_Rembrandt_TheThreeCrosses_edited.png
2013-08-11_History_Journal_Articles_africa.tsv
2013-11-18_History_Journal_Articles_africa_3column.tsv
2013-11-18_Sustainable_History_talk.docx
2013-11-18a_Sustainable_History_talk.docx
…
2013-11-19_Sustainable_History_talk.docx
(Extensible, scalable, reusable) Version Controllite
www.bl.uk 20
1. We integrate curatorial assessments of our digital collection
content into preservation decisions, so that technical activities
support curatorial requirements for the collections
2. We preserve metadata about our digital
collections, so that we may understand
and preserve the collections over time 3. We preserve the provenance of our digital collection
content, so that we understand and can demonstrate its
authenticity over time
4. We record any modifications to digital
collection content (e.g. preservation action, normalisation)
during the lifecycle, so that we can understand and
demonstrate its integrity over time
5. We consistently apply and document our
application of metadata standards, so
that future generations can understand
our collections 6. We maintain file-level integrity of our digital collections, so
that we can protect against loss and damage
(Extensible, scalable, reusable) Review
7. We preserve original files in our long term
repository, alongside any other required
representations of the content, so that we
maintain the original artefacts acquired or deposited into our
care as a ground truth representation of the content for future,
currently unknown, preservation and access scenarios
8. We maintain Preservation Master copies of
collection content in our long term repository, so that the
format-based risks of preservation over time are minimised
9. We maintain and implement
preservation plans for our digital collections, so
that preservation actions are reliable and based on a holistic
understanding of the collections and their context
10. We implement comprehensive end-to-end
workflows, so that we may consistently manage and
preserve our digital collections across the entire lifecycle
11. We regularly monitor our digital collection content for
emergent preservation risks, so that we may mitigate against
them
12. We integrate quality assurance checks into the lifecycle
where appropriate, so that the authenticity and integrity of the
content is maintained Maureen Pennock, 'The Twelve Principles of Digital Preservation (and a cartridge in a
repository…)', British Library Collection Care blog (3 September 2013)
www.bl.uk 21