… because good research needs good data
Tools of the Trade Workshop, Manchester, 19th May 2010
Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
DAF methodology & Glasgow Uni scoping study
Sarah Jones
DCC, University of Glasgow
… because good research needs good data
Tools of the Trade Workshop, Manchester, 19th May 2010
www.data-audit.eu/
Background to DAF project
“JISC should develop a Data Audit Framework to enable all universities and colleges to carry out an audit of departmental data collections,
awareness, policies and practice for data curation and preservation”
Liz Lyon, Dealing with Data: Roles, Rights, Responsibilities and Relationships, (2007)
… because good research needs good data
Tools of the Trade Workshop, Manchester, 19th May 2010
www.data-audit.eu/
The methodology
http://www.data-audit.eu/DAF_Methodology.pdf
… because good research needs good data
Tools of the Trade Workshop, Manchester, 19th May 2010
www.data-audit.eu/
Stage 1: planning
Objective
Determine what you want to find out and prepare work in advance
Process
- Define scope / expected outcomes
- Research organisational context
- Set up survey, interviews, meetings…
… because good research needs good data
Tools of the Trade Workshop, Manchester, 19th May 2010
www.data-audit.eu/
Stage 2: identifying data
Objective
Create inventory to understand scale of data
Process
Engage researchers to:
- Identify key data assets
- Classify data to restrict scope
… because good research needs good data
Tools of the Trade Workshop, Manchester, 19th May 2010
www.data-audit.eu/
Stage 3: assessing data management
Objective
Identify weaknesses in data management and potential risks
Process
- In-depth assessment of most crucial
assets, given purpose of audit
- Discussion on lifecycle of data to
assess data management
… because good research needs good data
Tools of the Trade Workshop, Manchester, 19th May 2010
www.data-audit.eu/
Stage 4: recommendationsObjective
Recommend changes to improve data management
Process
- Collate audit results
- Analyse data
- Suggest changes to mitigate
weaknesses
… because good research needs good data
Tools of the Trade Workshop, Manchester, 19th May 2010
www.data-audit.eu/
DAF pilot implementations• Early test cases: GeoSciences; Archaeology; Mechanical Engineering; Humanities
• University of Edinburgh Physiology; Divinity; History; Brain Imaging; Astronomy
• University College LondonArchaeology; Scandinavian Studies; Physics & Astronomy; Life & Medical Sciences
• Imperial College LondonChemical Engineering; Physics; Business School
• King’s College LondonGeography; Psychiatry; Environmental Research; Biomedical And Health Sciences
• DataShare examplesCardiac group; Dept of International Development; Social Sciences
… because good research needs good data
Tools of the Trade Workshop, Manchester, 19th May 2010
www.data-audit.eu/
Workshop on next steps for DAF
• Many of the pilots found the actual process of gathering information on data management was more valuable than the asset register. The DAF approach was felt to be useful for defining requirements to improve data management. (JISC funded RDMI projects)
• A suggestion was made to enhance DAF with practical examples / guidance from the pilot studies. (Implementation Guide)
• Align the DAF process with other data management planning tools. (IDMP project between AIDA, DAF, DRAMBORA, LIFE)
… because good research needs good data
Tools of the Trade Workshop, Manchester, 19th May 2010
www.data-audit.eu/
GU scoping studies• Digital preservation Advisory Board established at GU in 2008• Keen to identify scale of digital preservation needs across the uni
• Scoping studies ran in 2009 in: • Archaeology• Chemistry• Corporate Communications• Court Office• English Language• Electronics and Electrical Engineering• Evolutionary Ecology and Biology• MRC Social and Public Health Sciences Unit
… because good research needs good data
Tools of the Trade Workshop, Manchester, 19th May 2010
www.data-audit.eu/
Methodology
• Semi-structured interviews • interview framework sent in advance• some background research done before interview e.g. reading staff profile• recorded (with permission) then transcribed and sent for comments
• Spoke with HoDs, researchers, teaching, admin and support staff
• Reviewed preliminary findings and increased scope• added more PhDs and ECRs as most researchers we’d spoken to were senior • added corporate communications for ‘web’ perspective• Spoke to additional key people at the Uni e.g. William Nixon, repository
manager; James Currall, security expert.
… because good research needs good data
Tools of the Trade Workshop, Manchester, 19th May 2010
www.data-audit.eu/
Interview framework
1. what digital material is being created
2. how this is being created and maintained
3. any issues that have been encountered
4. plans for the long-term e.g. preservation, reuse
5. requirements for support and services.
http://www.gla.ac.uk/media/media_126658_en.pdf
… because good research needs good data
Tools of the Trade Workshop, Manchester, 19th May 2010
www.data-audit.eu/
What did we find?
Pockets of good practice…
…. but a lot of confusion and need for support
•to connect data with documentation, we name files using
a code number which is the person’s initials, the lab book
number in roman numerals and then the experiment number
•We produced documentation workflows on how to take material from the DAT machines, how to transfer these into computer files, guidelines on transcription and anonymisation, and making derivates. It’s all very well documented which
means there is consistency across the team, which is vitally important.
It makes a huge difference if somebody can come and talk through problems and solutions with you. A personal contact like the RDOs is helpful.
… because good research needs good data
Tools of the Trade Workshop, Manchester, 19th May 2010
www.data-audit.eu/
Procedures for creation & management•the network has always been the bane of everyone’s
lives to find stuff on - you end up opening umpteen files to see if it’s the one you’re after
Digital images are a classic case in point as many still have the numerical ordering and cryptic letter
sequence auto-generated by the camera.
•the paper records system hasn’t transferred easily to the digital
•The volume of data produced makes
maintenance a bit like drinking from a fire hose.
•the licence is very expensive and if this weren’t renewed it wouldn’t be possible to
continue to access the data
They had major problems last year moving from ArcGIS 9.1 to 9.3 –
everything stopped working as they’d changed the geo-database format. It was not
straightforward to fix…
… because good research needs good data
Tools of the Trade Workshop, Manchester, 19th May 2010
www.data-audit.eu/
Storage and backupResearch groups tend to
run their own little fiefdom. The correlation seems to be the more computers they have, the less IT expertise
there is.
If they throw some money at the problem they can install another networked drive and the problem goes away for a while
•Insufficient backup space is a recurring problem, but it’s not really a lack of
space, it’s more an issue of not being able to control what people store on
their hard drives.
•People bring in sticks with 4GB of data on that simply no longer work and nothing can
be done to retrieve it.
large and reliable storage is expensive. You need this for home directories but things that are to be archived or backed up could be punted out of the way to cheaper storage.
… because good research needs good data
Tools of the Trade Workshop, Manchester, 19th May 2010
www.data-audit.eu/
Selection / long-term preservation•It’s one thing to keep something going, but are people still able to use it
in the same way? •If the website comes to an end, the data could still be preserved, but you lose the richness of being able to search that, or see it on a map, or have
them synchronised.•it’s like giving your baby away
If I know the code will be public I’ll pay more attention to properly annotating it with comments
so other people can understand it.
Probably only one tenth of what’s currently held should be retained.
•How do you decide what can be deleted? I’m not confident to
make that decision.
•Archiving is to allow someone else to reuse it
… because good research needs good data
Tools of the Trade Workshop, Manchester, 19th May 2010
www.data-audit.eu/
What next…
• DPAB continues to address this at senior management level
• JISC-funded Incremental project (part of MRD programme)
• Ensuring researchers can find guidance and support when needed
• Making data training and guidance more understandable to researchers
• Offering tailored support and partnering
http://www.lib.cam.ac.uk/preservation/incremental/index.html
Top Related