RDMRose 2.2 Practical data management

19
Practical Data Management 8/28/22 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research /projects/rdmrose Research Data Management Workshop 2.2

Transcript of RDMRose 2.2 Practical data management

Page 1: RDMRose 2.2 Practical data management

Apr 15, 2023

Practical Data Management

Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Research Data Management Workshop 2.2

Page 2: RDMRose 2.2 Practical data management

Apr 15, 2023

Learning outcomes

• By the end of this session you will be able to:– Describe and apply practical principles of data

management– Select appropriate messages about practical data

management for particular audiences (in terms of discipline and seniority)

Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Page 3: RDMRose 2.2 Practical data management

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Session 2.2 overview

• The importance of good data management• Risk assessment• Data quality• Data security• Teaching data management

Page 4: RDMRose 2.2 Practical data management

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Practical data management

• One of the key ways to motivate researchers for RDM is to consider the inherent importance of data quality management and the consequences of bad management

• Practical data management includes:– Data quality– Metadata quality, e.g. file naming– Backing up data / data security

Page 5: RDMRose 2.2 Practical data management

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

The importance of good data management

• The range of different arguments for RDM include data quality issues

• A good “starting point” for libraries in engaging with RDM is raising PhD researchers’ awareness

• Information professionals should already understand the principles of good data management

Page 6: RDMRose 2.2 Practical data management

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Activity Practical messages for PhD students

• Brainstorm what you think might be key practical messages for PhD students about how to manage their data.

Page 7: RDMRose 2.2 Practical data management

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Activity: Using stories about what can go wrong

• If you wanted to use one of these stories to inform an audience of early career researchers, which would you pick and why?

Page 8: RDMRose 2.2 Practical data management

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Risk assessment

• Data practices should undergo risk assessment• This implies categorising risks, in terms of their

severity and their likelihood, then determining possible stances (from toleration to terminating the activity)

• A risk log is a project management tool for of monitoring risks

• The key to managing risk is often said to be– Planning early and continuing to update plan– Apportioning responsibility clearly

Page 9: RDMRose 2.2 Practical data management

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Risk assessment

Based on DATUM in Action (2012 b) and JISC project guidelines.

Low severity Medium severity

High severity

Low probability Tolerate TolerateTreat

Treat

Medium probability

TolerateTreat

Treat TreatTransfer

High probability

Treat TreatTransfer

TreatTransferTerminate

Page 10: RDMRose 2.2 Practical data management

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Risk logsRisk description

(Condition/Cause/consequence)

Probability (P)1 – 5 (1 = low 5 = high)

Severity (S)1 – 5 (1 = low 5 = high)

Risk Score(PxS)

Timescale Owner Detail of action to be taken

Based on JISC infoNet (2012).

Page 11: RDMRose 2.2 Practical data management

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Data threats

• Theft or loss of device• Corruption of back up material• Hard drive failures• Difficulty locating data files

– Difficulty finding relevant version• Colleagues move on, taking files with them so they cannot be

consulted or leaving data without explanations of their source• Files over-written• Poor metadata• Not enough information about context is supplied to

understand the data• Obsolescence of file types

Page 12: RDMRose 2.2 Practical data management

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

High risk data (DATUM, 2012b)• “Details relating to identifiable individuals the contents of which, if

compromised, have the potential to cause damage or distress• Any set of data relating to an identifiable individual’s sensitive

personal details• Data concerning any vulnerable individual • Large data sets relating to 1,000 or more identifiable individuals• Research recommendations, before the decision was officially

announced• Data that, if compromised, would affect contracts with commercial

or other partners, or confidentiality and non-disclosure agreements• Information that would compromise patent applications• Any data that is the result of an un-repeatable study”

Page 13: RDMRose 2.2 Practical data management

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Data quality (Gordon, 2007)

• Completeness• Correctness

• Enterprise awareness

• Input validation• Integrity• Currency• Duplication• Inconsistency

Page 14: RDMRose 2.2 Practical data management

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Quality controls in the research context

• Instrument calibration• Taking multiple measurements• Following protocols in taking measurements• Validation rules• Using controlled vocabularies• Expert validation• Statistical tests to identify anomalous values

Page 15: RDMRose 2.2 Practical data management

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Metadata quality: file naming conventions

• “Data files are distinguishable from each other within their containing folder

• Data file naming prevents confusion when multiple people are working on shared files

• Data files are easier to locate and browse• Data files can be retrieved not only by the

creator but by other users• Data files can be sorted in logical

sequence• Data files are not accidentally overwritten

or deleted• Different versions of data files can be

identified• If data files are moved to other storage

platform their names will retain useful context”

(EDINA and Data Library, n.d.)

• Simplicity• Avoid special characters, spaces• Appropriate word order

• Rules about version control(DATUM in Action, 2012a)

Page 16: RDMRose 2.2 Practical data management

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Data security: storage and back up

• Choice of media• Frequency of back up• How long are back ups stored?• Security, if sensitive data

• Issues with cloud based services, such as Dropbox, e.g. procedures for restoring files, reliability

Page 17: RDMRose 2.2 Practical data management

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

REFERENCES

Page 18: RDMRose 2.2 Practical data management

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

References• DATUM in Action. (2012 a). Folders and files – guidance Newcastle:

Northumbria University School of Computing, Engineering & Information Sciences. Retrieved from http://www.northumbria.ac.uk/static/5007/ceispdf/filenameguide.pdf

• DATUM in Action. (2012 b). Information security guidance. Newcastle: Northumbria University School of Computing, Engineering & Information Sciences. Retrieved from http://www.northumbria.ac.uk/static/5007/ceispdf/infosecurity.pdf

• EDINA and Data Library, University of Edinburgh. (n.d.). Research Data MANTRA. Retrieved from http://datalib.edina.ac.uk/mantra/

Page 19: RDMRose 2.2 Practical data management

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

References• Gordon, K. (2007). Principles of Data Management: Facilitating

Information Sharing. Swindon: British Computing Society.• JISC infoNet (2012). The Risk Log. Newcastle upon Tyne. Retrieved

from http://www.jiscinfonet.ac.uk/infokits/risk-management/identifying-risk/risk-log

• UK Data Archive (2011). Managing and Sharing Data: Best Practice for Researchers (3d ed., fully revised). Colchester: University of Essex. Retrieved from http://www.data-archive.ac.uk/media/2894/managingsharing.pdf [This includes a useful checklist as an appendix.]