What is a Data Management Plan?

45
DMPs explained - or how to start making your research data FAIR & open Myriam Mertens | Ghent University Library

Transcript of What is a Data Management Plan?

DMPs explained - or how to start making your research data FAIR & open

Myriam Mertens | Ghent University Library

Open data, FAIR data? Some clarifications

2

They’re about making data available for reuse

3

Shift from traditional model of scholarly communication, where research data are undervalued & neglected

Degrees of data sharing

4

OPEN RESTRICTED CLOSED

“Can be freely used, modified & shared by anyone for any purpose”

http://opendefinition.org

Limits on who can access & use data, how, or for what purpose- only certain (types

of) users- only certain types of

use- …

Under embargoUnable to share

“As open as possible, as closed as necessary”

Adapted from ‘Managing and sharing research data’ by S. Jones, CC-BY

FAIR data principles

• Describe attributes that enable & enhance data re-use by humans and machines

• Originated in the life sciences, but gaining much traction beyond

• Spectrum: data can be FAIR to a greater or lesser degree

5

https://www.nature.com/articles/sdata201618

6

It should be possible for others to

discover your data. Rich metadata

should be available online in a

searchable resource, and the data

should be assigned a persistent

identifier (e.g. DOI, Handle…).

It should be possible for humans and machines to gain access to your data (retrievable by their PID using a standard protocol such as http), under specific conditions or restrictions where appropriate (authentication and authorization steps if necessary). There should be metadata, even if the data aren’t accessible.

Data and metadata should be conform to recognized

formats and standards to allow them to be combined &

exchanged (file formats, metadata schemas, controlled

vocabularies, keywords, ontologies, qualified references

& links to other related data).

Lots of documentation is needed to support data interpretation and reuse. It is clear how, why & by whom data were created & processed (provenance). The data should conform to community norms and be clearly licensed so others know what kinds of reuse are permitted.

Adapted from ‘How FAIR are your data?’ checklist, CC-BY by Sarah Jones & Marjan Grootveld, EUDAT. Image CC-BY-SA by SangyaPundir

FAIR vs. Open?

Not synonyms - FAIR does not mean that data need to be open!

7

OPEN DATA

FAIR DATA

Data can both, one, or neither

Also see the ARDC FAIR self-assessment tool

Adapted from ‘FAIR data: what it means, how we achieve it, and the role of RDA’ by S. Jones, CC-BY

Why share data? Benefits & drivers

8

Selfish and more altruistic reasons

9

Funder data sharing mandates

Increasingly Open & FAIR rhetoric

• e.g. EC: Horizon 2020, ERC…

10

http://ec.europa.eu/research/images/infographics/policy/open-data-2016-w920.png

http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf

Journal data availability policies

11

“PLOS journals require authors to make all data underlyingthe findings described in their manuscript fully available without restriction, with rare exception. (…)Refusal to share data (…) will be grounds for rejection.”

https://journals.plos.org/plosone/s/data-availability

Institutional policy on research data

UGent adopted a RD policy (2016) + formally subscribed to European Code of Conduct for Research Integrity (2018)

12https://ec.europa.eu/research/participants/data/ref/h2020/other/hi/h2020-ethics_code-of-conduct_en.pdf

• secure preservation for a reasonable period

• access: as open as possible, as closed as necessary

• access in line with FAIR principles

• recognized as legitimate & citable products of research

Expectations regarding research data include:

How to share research data, how to make them open/FAIR? Don’t treat it as an afterthought

13

It requires good research data management

The active management of data throughout their lifecycle

14

Planning for data

management

Collecting or creating data

Processing & analyzing data

Preserving data

Giving access to data

Discovering & re-using data

RDM starts with planning

Decisions made early on affect what you can do later!

15

• do your consent forms preclude later data sharing?

• did you budget for RDM activities?• did you consider data issues in

negotiations with your research partners?

• …

How to plan for data management & sharing ? Draft a Data Management Plan!

16

What is a DMP?

• Document describing how data will be handled during & after a project

• Increasingly required by research funders/institutions

• Good practice for any project using data!

17

“[DMPs] typically state[s] what data will be created and how, and outline[s] the plans for sharing and preservation, noting what is

appropriate given the nature of the data and any restrictions that may need to be applied.” (DCC website)

Common topics in DMP’s

1. What data will be collected/generated and how • content, type, format, volume, data capture methods…

2. How data will be documented • including metadata

3. Ethics & legal issues • informed consent, privacy & confidentiality, IP…

4. Strategy for short-term storage & backup • including data security

5. Strategy for long-term preservation beyond project end • what to keep, for how long, where…

6. Plans for access & sharing • how, when, restrictions, licenses…

18

Example – data description

“This project will produce qualitative observational data from interviews and fieldwork

conducted at various locations across Finland between January and June 2017. Raw data will comprise digital audio recordings of interviews

(stored in .flac format), digital images (.tiff format) and hand-written field notes. Audio files will be

transcribed into digital text documents (.xml files) and notes will be digitized (via manual

transcription to .txt files) to prepare them for analysis.”

19

Example – data documentation, metadata

“Descriptive metadata of data items will be captured in XML files in accordance

with the Darwin Core schema, which is an international metadata standard for

biodiversity data. In addition, datasets will be accompanied by a separate readme.txt file providing study-level documentation including the field methods used for data

collection.”

20

Metadata (“data about data”)

• Needed to find research data, and to get a first idea of the content

• Also used to further describe and annotate research data

• Don’t reinvent the wheel, use existing standards• some domains have their own standards

21

http://rd-alliance.github.io/metadata-directoryhttp://www.dcc.ac.uk/resources/metadata-standards

https://fairsharing.org

Documentation

Any information needed to fully assess, understand & properly re-use data

• codebook explaining variables

• study design

• (lab) journal or notebook

• code

• machine configurations

• informed consent templates

22

Example – data storage & backup

23

“A primary copy of digital files will be stored on a shared university network drive. DICT is in charge of backing up this network drive. In addition, I will make my own daily backups of important files on an external hard drive. Paper-based data and documentation files

will be stored in clearly labeled folders in my office cabinet. In addition, they will be

digitally scanned as a backup.”

Storage & backup

• 3-2-1 backup rule• have at least 3 copies of important

files, on at least 2 different types of storage media, with at least 1offsite copy

• Use university-managed services, not (just) local storage devices • regular, automated back-ups

• Check out UGent Information Security policy & guidelines

24

Example – data preservation

“Raw data and documentation files will be offered for deposit to 4TU.ResearchData,

which is a certified data repository accepting research data in the field of engineering and preserves them for a minimum of 15 years.

Files will be offered in the repository’s preferred formats (.txt, .xml and JCAMP),

and as the volume of data does not exceed 10GB the repository will not charge for the

deposit.”

25

Data repositories for archiving & sharing data

26

Where to find one?

General purpose Domain-specific Institutional

http://www.re3data.org

Watch a Re3data demo: https://www.fosteropenscience.eu/content/re3data-demo

How to select a data repository?

• Does your funder or publisher recommend a repository?

• Does your domain have an established repository?

• Does it• provide a landing page for each dataset, with metadata?

• provide a persistent & unique identifier?

• have a certificate to indicate trustworthiness?

• have an explicit commitment to keep data available long-term?

• match your data needs regarding formats, access, licenses

(including legal requirements for data protection)?

• provide guidance on how to cite data?

• charge for its services?

27

Icons representing attributes of data repositories in Re3data.org

10.1371/journal.pone.0078080

Example – data sharing

“My dataset will be made available upon publication of the associated journal article

via the Zenodo repository. The data in Zenodo will be made

open access and licensed under CC-BY.”

28

Standard licenses include Creative Commons licenses

https://community.globalvoices.org/guide/editorial-guides/toolbox-for-authors/multimedia-copyright-and-attribution

Licensing

29

http://www.dcc.ac.uk/resources/how-guides/license-research-data

Check out the EUDAT license selector

https://ufal.github.io/public-license-selector/

Example – restrictions on sharing

“Because my research data are part of a potentially patentable invention, sharing will be delayed to investigate patent protection first. Before any disclosure can take place, my research results will be reported to my university’s TechTransfer office, which will

determine whether releasing data will need to be embargoed until after a patent

application has been filed.”

30

Restrictions on sharing

Things to consider:

• Do the research data constitute personal data?

• Are they otherwise confidential?

• Are they otherwise sensitive?

• Do they constitute third-party data?

• Are you the sole owner of any copyright or database right in the data?

• Do they have economic valorization potential?

31

Ethical & legal issues can be complex!

http://wiz.eudat.eu/#/app/home?lang=en&code=en

https://www.fosteropenscience.eu/learning/data-protection-and-ethics/#/id/5ace27ca8ee5d6920ab94c13

Check out FOSTER’s course on Data protection & ethics See the EUDAT legal guide

Further tips for writing a DMP

33

• Check applicable data policies & legislation

• Keep it simple, but be as specific as possible

• Justify your decisions

• Familiarize yourself with RDM terminology & best practices (for your field)

34

http://data-archive.ac.uk/media/2894/managingsharing.pdf

• Consider it a ‘living’ document

• Use a DMP template • e.g. Horizon 2020 FAIR DMP, FWO

DMP…

• Have a look at example DMPs

• Use an online planning tool

36

Example plans

• Examples on the Digital Curation Centre (DCC) websitehttp://www.dcc.ac.uk/resources/data-management-plans/guidance-examples

• LIBER DMP Cataloguehttps://libereurope.eu/dmpcatalogue/

• Public DMPs on the DMPTool websitehttps://dmptool.org/public_dmps

• DMPs published in RIO (Research Ideas and Outcomes OA journal)http://riojournal.com/browse_user_collection_documents?collection_id=3

37

Web-based planning tool: DMPonline.be

DMPonline.be

• Local instance of open source software developed by Digital Curation Centre (UK)

• Launched in 2015 for Ghent University researchers; since 2017: shared tool for users from DMPbelgium member institutions

• Contains DMP templates + tailored guidance (funders, institutions…)

39

How the tool works

40

Log in with your UGent credentials

Or sign in with ORCID

https://dmponline.be

How the tool worksProgress indicator

Section

Question

Write down your answer here

Leave a comment for collaborators

Custom guidance from funder, university, group…

How the tool works

• Create plans based on funder or institutional template & guidance• Funders: FWO, EC - Horizon 2020, ERC

• UGent: generic or faculty-specific

• Share plans with others (various permission levels possible)

• Export plans in format of your choice (txt, docx, PDF)

• More details: https://researchtips.ugent.be/en/tips/00001281/

42

Thank you for listening!

43

Any questions?

Credits - Slides

Slides draw heavily on/adapt materials from:

• S. Jones (2015), ‘Managing and Sharing Research Data’, licensed under CC-BY 4.0

• S. Jones (2016), ‘What is a Data Management Plan?’, licensed under CC-BY 4.0

• S. Jones & M. Grootveld, ‘How to write a Data Management Plan’, licensed under CC-BY 4.0

• S. Jones (2018), ‘The FAIR data concept’, licensed under CC-BY 4.0

• S. Jones (2018), ‘FAIR data: what it means, how we achieve it, and the role of RDA’, licensed under CC-BY 4.0

• M. Grootveld & E. Leenaerts, ‘How to manage your data to make them open and FAIR’, licensed under CC-BY 4.0

44

Credits - Images

• [slide 1]: CC0 image (https://pixabay.com/en/plan-do-act-check-system-workflow-1725510/)• [slide 3]: ‘Publications and Data’ by Auke Herrema, licensed under CC-BY 4.0• [slide 5-6]: FAIR data • [slide 9]: From ‘Overcoming obstacles to sharing research data’ by Brian Hole, licensed under CC-BY

• [slide 10]: From ‘Open Research Data in Horizon 2020’, by European Commission

• [slide 14]: From ‘RDM: An Overview’ by Research Support Team, IT Services (University of Oxford), licensed under CC-BY-NC-SA 4.0

• [slide 15]: ‘Planning’ by Jørgen Stamp, attribution: digitalbevaring.dk, licensed under CC-BY 2.5 DK• [slide 17]: ‘Writing’ by Aiconica, licensed under CC0 1.0• [slide 20]: ‘Metadata’ by Jørgen Stamp, attribution: digitalbevaring.dk, licensed under CC-BY 2.5 DK• [slide 22]: From ‘RDM: An Overview’ by Research Support Team, IT Services (University of Oxford), licensed under CC-BY-NC-

SA 4.0• [slide 24]: From ‘RDM: An Overview’ by Research Support Team, IT Services (University of Oxford), licensed under CC-BY-NC-

SA 4.0• [slide 27]: From ’Making Research Data Repositories Visible: The Re3data.org Registry’ by H. Pampel et al., licensed under

CC-BY 4.0• [slide 28]: From ‘So you need an image’ by Global Voices Community Blog, licensed under CC-BY 3.0• [slide 36]: ‘Preservation plan’ by Jørgen Stamp, attribution: digitalbevaring.dk, licensed under CC-BY 2.5 DK• [slide 38]: ‘Tools’ by Jørgen Stamp, attribution: digitalbevaring.dk, licensed under CC-BY 2.5 DK• [slide 43]: ‘Knowledge’ by Jørgen Stamp, attribution: digitalbevaring.dk, licensed under CC-BY 2.5 DK

45