RDMRose 2.2 Practical data management
-
Upload
rdmrose -
Category
Data & Analytics
-
view
14 -
download
2
Transcript of RDMRose 2.2 Practical data management
![Page 1: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/1.jpg)
Apr 15, 2023
Practical Data Management
Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Research Data Management Workshop 2.2
![Page 2: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/2.jpg)
Apr 15, 2023
Learning outcomes
• By the end of this session you will be able to:– Describe and apply practical principles of data
management– Select appropriate messages about practical data
management for particular audiences (in terms of discipline and seniority)
Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
![Page 3: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/3.jpg)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Session 2.2 overview
• The importance of good data management• Risk assessment• Data quality• Data security• Teaching data management
![Page 4: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/4.jpg)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Practical data management
• One of the key ways to motivate researchers for RDM is to consider the inherent importance of data quality management and the consequences of bad management
• Practical data management includes:– Data quality– Metadata quality, e.g. file naming– Backing up data / data security
![Page 5: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/5.jpg)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
The importance of good data management
• The range of different arguments for RDM include data quality issues
• A good “starting point” for libraries in engaging with RDM is raising PhD researchers’ awareness
• Information professionals should already understand the principles of good data management
![Page 6: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/6.jpg)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Activity Practical messages for PhD students
• Brainstorm what you think might be key practical messages for PhD students about how to manage their data.
![Page 7: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/7.jpg)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Activity: Using stories about what can go wrong
• If you wanted to use one of these stories to inform an audience of early career researchers, which would you pick and why?
![Page 8: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/8.jpg)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Risk assessment
• Data practices should undergo risk assessment• This implies categorising risks, in terms of their
severity and their likelihood, then determining possible stances (from toleration to terminating the activity)
• A risk log is a project management tool for of monitoring risks
• The key to managing risk is often said to be– Planning early and continuing to update plan– Apportioning responsibility clearly
![Page 9: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/9.jpg)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Risk assessment
Based on DATUM in Action (2012 b) and JISC project guidelines.
Low severity Medium severity
High severity
Low probability Tolerate TolerateTreat
Treat
Medium probability
TolerateTreat
Treat TreatTransfer
High probability
Treat TreatTransfer
TreatTransferTerminate
![Page 10: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/10.jpg)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Risk logsRisk description
(Condition/Cause/consequence)
Probability (P)1 – 5 (1 = low 5 = high)
Severity (S)1 – 5 (1 = low 5 = high)
Risk Score(PxS)
Timescale Owner Detail of action to be taken
Based on JISC infoNet (2012).
![Page 11: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/11.jpg)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Data threats
• Theft or loss of device• Corruption of back up material• Hard drive failures• Difficulty locating data files
– Difficulty finding relevant version• Colleagues move on, taking files with them so they cannot be
consulted or leaving data without explanations of their source• Files over-written• Poor metadata• Not enough information about context is supplied to
understand the data• Obsolescence of file types
![Page 12: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/12.jpg)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
High risk data (DATUM, 2012b)• “Details relating to identifiable individuals the contents of which, if
compromised, have the potential to cause damage or distress• Any set of data relating to an identifiable individual’s sensitive
personal details• Data concerning any vulnerable individual • Large data sets relating to 1,000 or more identifiable individuals• Research recommendations, before the decision was officially
announced• Data that, if compromised, would affect contracts with commercial
or other partners, or confidentiality and non-disclosure agreements• Information that would compromise patent applications• Any data that is the result of an un-repeatable study”
![Page 13: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/13.jpg)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Data quality (Gordon, 2007)
• Completeness• Correctness
• Enterprise awareness
• Input validation• Integrity• Currency• Duplication• Inconsistency
![Page 14: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/14.jpg)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Quality controls in the research context
• Instrument calibration• Taking multiple measurements• Following protocols in taking measurements• Validation rules• Using controlled vocabularies• Expert validation• Statistical tests to identify anomalous values
![Page 15: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/15.jpg)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Metadata quality: file naming conventions
• “Data files are distinguishable from each other within their containing folder
• Data file naming prevents confusion when multiple people are working on shared files
• Data files are easier to locate and browse• Data files can be retrieved not only by the
creator but by other users• Data files can be sorted in logical
sequence• Data files are not accidentally overwritten
or deleted• Different versions of data files can be
identified• If data files are moved to other storage
platform their names will retain useful context”
(EDINA and Data Library, n.d.)
• Simplicity• Avoid special characters, spaces• Appropriate word order
• Rules about version control(DATUM in Action, 2012a)
![Page 16: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/16.jpg)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Data security: storage and back up
• Choice of media• Frequency of back up• How long are back ups stored?• Security, if sensitive data
• Issues with cloud based services, such as Dropbox, e.g. procedures for restoring files, reliability
![Page 17: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/17.jpg)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
REFERENCES
![Page 18: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/18.jpg)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
References• DATUM in Action. (2012 a). Folders and files – guidance Newcastle:
Northumbria University School of Computing, Engineering & Information Sciences. Retrieved from http://www.northumbria.ac.uk/static/5007/ceispdf/filenameguide.pdf
• DATUM in Action. (2012 b). Information security guidance. Newcastle: Northumbria University School of Computing, Engineering & Information Sciences. Retrieved from http://www.northumbria.ac.uk/static/5007/ceispdf/infosecurity.pdf
• EDINA and Data Library, University of Edinburgh. (n.d.). Research Data MANTRA. Retrieved from http://datalib.edina.ac.uk/mantra/
![Page 19: RDMRose 2.2 Practical data management](https://reader036.fdocuments.in/reader036/viewer/2022082813/55c84801bb61ebe1708b46cb/html5/thumbnails/19.jpg)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
References• Gordon, K. (2007). Principles of Data Management: Facilitating
Information Sharing. Swindon: British Computing Society.• JISC infoNet (2012). The Risk Log. Newcastle upon Tyne. Retrieved
from http://www.jiscinfonet.ac.uk/infokits/risk-management/identifying-risk/risk-log
• UK Data Archive (2011). Managing and Sharing Data: Best Practice for Researchers (3d ed., fully revised). Colchester: University of Essex. Retrieved from http://www.data-archive.ac.uk/media/2894/managingsharing.pdf [This includes a useful checklist as an appendix.]