What is a Data Management Plan?
Transcript of What is a Data Management Plan?
DMPs explained - or how to start making your research data FAIR & open
Myriam Mertens | Ghent University Library
They’re about making data available for reuse
3
Shift from traditional model of scholarly communication, where research data are undervalued & neglected
Degrees of data sharing
4
OPEN RESTRICTED CLOSED
“Can be freely used, modified & shared by anyone for any purpose”
http://opendefinition.org
Limits on who can access & use data, how, or for what purpose- only certain (types
of) users- only certain types of
use- …
Under embargoUnable to share
“As open as possible, as closed as necessary”
Adapted from ‘Managing and sharing research data’ by S. Jones, CC-BY
FAIR data principles
• Describe attributes that enable & enhance data re-use by humans and machines
• Originated in the life sciences, but gaining much traction beyond
• Spectrum: data can be FAIR to a greater or lesser degree
5
https://www.nature.com/articles/sdata201618
6
It should be possible for others to
discover your data. Rich metadata
should be available online in a
searchable resource, and the data
should be assigned a persistent
identifier (e.g. DOI, Handle…).
It should be possible for humans and machines to gain access to your data (retrievable by their PID using a standard protocol such as http), under specific conditions or restrictions where appropriate (authentication and authorization steps if necessary). There should be metadata, even if the data aren’t accessible.
Data and metadata should be conform to recognized
formats and standards to allow them to be combined &
exchanged (file formats, metadata schemas, controlled
vocabularies, keywords, ontologies, qualified references
& links to other related data).
Lots of documentation is needed to support data interpretation and reuse. It is clear how, why & by whom data were created & processed (provenance). The data should conform to community norms and be clearly licensed so others know what kinds of reuse are permitted.
Adapted from ‘How FAIR are your data?’ checklist, CC-BY by Sarah Jones & Marjan Grootveld, EUDAT. Image CC-BY-SA by SangyaPundir
FAIR vs. Open?
Not synonyms - FAIR does not mean that data need to be open!
7
OPEN DATA
FAIR DATA
Data can both, one, or neither
Also see the ARDC FAIR self-assessment tool
Adapted from ‘FAIR data: what it means, how we achieve it, and the role of RDA’ by S. Jones, CC-BY
Funder data sharing mandates
Increasingly Open & FAIR rhetoric
• e.g. EC: Horizon 2020, ERC…
10
http://ec.europa.eu/research/images/infographics/policy/open-data-2016-w920.png
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
Journal data availability policies
11
“PLOS journals require authors to make all data underlyingthe findings described in their manuscript fully available without restriction, with rare exception. (…)Refusal to share data (…) will be grounds for rejection.”
https://journals.plos.org/plosone/s/data-availability
Institutional policy on research data
UGent adopted a RD policy (2016) + formally subscribed to European Code of Conduct for Research Integrity (2018)
12https://ec.europa.eu/research/participants/data/ref/h2020/other/hi/h2020-ethics_code-of-conduct_en.pdf
• secure preservation for a reasonable period
• access: as open as possible, as closed as necessary
• access in line with FAIR principles
• recognized as legitimate & citable products of research
Expectations regarding research data include:
It requires good research data management
The active management of data throughout their lifecycle
14
Planning for data
management
Collecting or creating data
Processing & analyzing data
Preserving data
Giving access to data
Discovering & re-using data
RDM starts with planning
Decisions made early on affect what you can do later!
15
• do your consent forms preclude later data sharing?
• did you budget for RDM activities?• did you consider data issues in
negotiations with your research partners?
• …
What is a DMP?
• Document describing how data will be handled during & after a project
• Increasingly required by research funders/institutions
• Good practice for any project using data!
17
“[DMPs] typically state[s] what data will be created and how, and outline[s] the plans for sharing and preservation, noting what is
appropriate given the nature of the data and any restrictions that may need to be applied.” (DCC website)
Common topics in DMP’s
1. What data will be collected/generated and how • content, type, format, volume, data capture methods…
2. How data will be documented • including metadata
3. Ethics & legal issues • informed consent, privacy & confidentiality, IP…
4. Strategy for short-term storage & backup • including data security
5. Strategy for long-term preservation beyond project end • what to keep, for how long, where…
6. Plans for access & sharing • how, when, restrictions, licenses…
18
Example – data description
“This project will produce qualitative observational data from interviews and fieldwork
conducted at various locations across Finland between January and June 2017. Raw data will comprise digital audio recordings of interviews
(stored in .flac format), digital images (.tiff format) and hand-written field notes. Audio files will be
transcribed into digital text documents (.xml files) and notes will be digitized (via manual
transcription to .txt files) to prepare them for analysis.”
19
Example – data documentation, metadata
“Descriptive metadata of data items will be captured in XML files in accordance
with the Darwin Core schema, which is an international metadata standard for
biodiversity data. In addition, datasets will be accompanied by a separate readme.txt file providing study-level documentation including the field methods used for data
collection.”
20
Metadata (“data about data”)
• Needed to find research data, and to get a first idea of the content
• Also used to further describe and annotate research data
• Don’t reinvent the wheel, use existing standards• some domains have their own standards
21
http://rd-alliance.github.io/metadata-directoryhttp://www.dcc.ac.uk/resources/metadata-standards
https://fairsharing.org
Documentation
Any information needed to fully assess, understand & properly re-use data
• codebook explaining variables
• study design
• (lab) journal or notebook
• code
• machine configurations
• informed consent templates
…
22
Example – data storage & backup
23
“A primary copy of digital files will be stored on a shared university network drive. DICT is in charge of backing up this network drive. In addition, I will make my own daily backups of important files on an external hard drive. Paper-based data and documentation files
will be stored in clearly labeled folders in my office cabinet. In addition, they will be
digitally scanned as a backup.”
Storage & backup
• 3-2-1 backup rule• have at least 3 copies of important
files, on at least 2 different types of storage media, with at least 1offsite copy
• Use university-managed services, not (just) local storage devices • regular, automated back-ups
• Check out UGent Information Security policy & guidelines
24
Example – data preservation
“Raw data and documentation files will be offered for deposit to 4TU.ResearchData,
which is a certified data repository accepting research data in the field of engineering and preserves them for a minimum of 15 years.
Files will be offered in the repository’s preferred formats (.txt, .xml and JCAMP),
and as the volume of data does not exceed 10GB the repository will not charge for the
deposit.”
25
Data repositories for archiving & sharing data
26
Where to find one?
General purpose Domain-specific Institutional
http://www.re3data.org
Watch a Re3data demo: https://www.fosteropenscience.eu/content/re3data-demo
How to select a data repository?
• Does your funder or publisher recommend a repository?
• Does your domain have an established repository?
• Does it• provide a landing page for each dataset, with metadata?
• provide a persistent & unique identifier?
• have a certificate to indicate trustworthiness?
• have an explicit commitment to keep data available long-term?
• match your data needs regarding formats, access, licenses
(including legal requirements for data protection)?
• provide guidance on how to cite data?
• charge for its services?
27
Icons representing attributes of data repositories in Re3data.org
10.1371/journal.pone.0078080
Example – data sharing
“My dataset will be made available upon publication of the associated journal article
via the Zenodo repository. The data in Zenodo will be made
open access and licensed under CC-BY.”
28
Standard licenses include Creative Commons licenses
https://community.globalvoices.org/guide/editorial-guides/toolbox-for-authors/multimedia-copyright-and-attribution
Licensing
29
http://www.dcc.ac.uk/resources/how-guides/license-research-data
Check out the EUDAT license selector
https://ufal.github.io/public-license-selector/
Example – restrictions on sharing
“Because my research data are part of a potentially patentable invention, sharing will be delayed to investigate patent protection first. Before any disclosure can take place, my research results will be reported to my university’s TechTransfer office, which will
determine whether releasing data will need to be embargoed until after a patent
application has been filed.”
30
Restrictions on sharing
Things to consider:
• Do the research data constitute personal data?
• Are they otherwise confidential?
• Are they otherwise sensitive?
• Do they constitute third-party data?
• Are you the sole owner of any copyright or database right in the data?
• Do they have economic valorization potential?
31
Ethical & legal issues can be complex!
http://wiz.eudat.eu/#/app/home?lang=en&code=en
https://www.fosteropenscience.eu/learning/data-protection-and-ethics/#/id/5ace27ca8ee5d6920ab94c13
Check out FOSTER’s course on Data protection & ethics See the EUDAT legal guide
• Check applicable data policies & legislation
• Keep it simple, but be as specific as possible
• Justify your decisions
• Familiarize yourself with RDM terminology & best practices (for your field)
34
http://data-archive.ac.uk/media/2894/managingsharing.pdf
Online RDM training resources
• FOSTER training portal (incl. Managing & sharing research data course!)
• OpenAIRE webinars
• EUDAT training materials
• Digital Curation Centre How-to Guides & Checklists
• UK Data Archive ‘Create & Manage Data’ webpages
• MANTRA – Research Data Management Training
• ‘Research Data Management and Sharing’ MOOC on Coursera
• Data Management Training Clearinghouse
• CESSDA Expert Tour Guide on Data Management
35
• Consider it a ‘living’ document
• Use a DMP template • e.g. Horizon 2020 FAIR DMP, FWO
DMP…
• Have a look at example DMPs
• Use an online planning tool
36
Example plans
• Examples on the Digital Curation Centre (DCC) websitehttp://www.dcc.ac.uk/resources/data-management-plans/guidance-examples
• LIBER DMP Cataloguehttps://libereurope.eu/dmpcatalogue/
• Public DMPs on the DMPTool websitehttps://dmptool.org/public_dmps
• DMPs published in RIO (Research Ideas and Outcomes OA journal)http://riojournal.com/browse_user_collection_documents?collection_id=3
37
DMPonline.be
• Local instance of open source software developed by Digital Curation Centre (UK)
• Launched in 2015 for Ghent University researchers; since 2017: shared tool for users from DMPbelgium member institutions
• Contains DMP templates + tailored guidance (funders, institutions…)
39
How the tool works
40
Log in with your UGent credentials
Or sign in with ORCID
https://dmponline.be
How the tool worksProgress indicator
Section
Question
Write down your answer here
Leave a comment for collaborators
Custom guidance from funder, university, group…
How the tool works
• Create plans based on funder or institutional template & guidance• Funders: FWO, EC - Horizon 2020, ERC
• UGent: generic or faculty-specific
• Share plans with others (various permission levels possible)
• Export plans in format of your choice (txt, docx, PDF)
• More details: https://researchtips.ugent.be/en/tips/00001281/
42
Credits - Slides
Slides draw heavily on/adapt materials from:
• S. Jones (2015), ‘Managing and Sharing Research Data’, licensed under CC-BY 4.0
• S. Jones (2016), ‘What is a Data Management Plan?’, licensed under CC-BY 4.0
• S. Jones & M. Grootveld, ‘How to write a Data Management Plan’, licensed under CC-BY 4.0
• S. Jones (2018), ‘The FAIR data concept’, licensed under CC-BY 4.0
• S. Jones (2018), ‘FAIR data: what it means, how we achieve it, and the role of RDA’, licensed under CC-BY 4.0
• M. Grootveld & E. Leenaerts, ‘How to manage your data to make them open and FAIR’, licensed under CC-BY 4.0
44
Credits - Images
• [slide 1]: CC0 image (https://pixabay.com/en/plan-do-act-check-system-workflow-1725510/)• [slide 3]: ‘Publications and Data’ by Auke Herrema, licensed under CC-BY 4.0• [slide 5-6]: FAIR data • [slide 9]: From ‘Overcoming obstacles to sharing research data’ by Brian Hole, licensed under CC-BY
• [slide 10]: From ‘Open Research Data in Horizon 2020’, by European Commission
• [slide 14]: From ‘RDM: An Overview’ by Research Support Team, IT Services (University of Oxford), licensed under CC-BY-NC-SA 4.0
• [slide 15]: ‘Planning’ by Jørgen Stamp, attribution: digitalbevaring.dk, licensed under CC-BY 2.5 DK• [slide 17]: ‘Writing’ by Aiconica, licensed under CC0 1.0• [slide 20]: ‘Metadata’ by Jørgen Stamp, attribution: digitalbevaring.dk, licensed under CC-BY 2.5 DK• [slide 22]: From ‘RDM: An Overview’ by Research Support Team, IT Services (University of Oxford), licensed under CC-BY-NC-
SA 4.0• [slide 24]: From ‘RDM: An Overview’ by Research Support Team, IT Services (University of Oxford), licensed under CC-BY-NC-
SA 4.0• [slide 27]: From ’Making Research Data Repositories Visible: The Re3data.org Registry’ by H. Pampel et al., licensed under
CC-BY 4.0• [slide 28]: From ‘So you need an image’ by Global Voices Community Blog, licensed under CC-BY 3.0• [slide 36]: ‘Preservation plan’ by Jørgen Stamp, attribution: digitalbevaring.dk, licensed under CC-BY 2.5 DK• [slide 38]: ‘Tools’ by Jørgen Stamp, attribution: digitalbevaring.dk, licensed under CC-BY 2.5 DK• [slide 43]: ‘Knowledge’ by Jørgen Stamp, attribution: digitalbevaring.dk, licensed under CC-BY 2.5 DK
45