Research Data Management: Part 1, Principles & Responsibilities

44
Managing Research Data Part 1 Planning Working Finalizing Sharing Data This work is licensed under a Creative Commons Attribution 4.0 International License . WHY – WHAT– WHO – WHEN & HOW

Transcript of Research Data Management: Part 1, Principles & Responsibilities

Managing Research Data Part 1

Planning Working

Finalizing Sharing Data

This work is licensed under a Creative Commons Attribution 4.0 International License.

WHY – WHAT– WHO – WHEN & HOW

WHY manage data -

WHAT research data are-

WHO manages research data -

WHEN & HOW data management is done -

Planning Working

Finalizing Sharing Data

Managing Research Data

This work is licensed under a Creative Commons Attribution 4.0 International License.

This two-part course is a collaboration between CU Libraries/Information Services and the Office of Research Compliance & Training. The purpose of this course is to familiarize you with the various aspects of research data management (RDM) by taking

3

Managing Research Data

44/ Managing Research Data

This course will guide you through these areas, offering in-depth details on each of them. Please refer to the top navigation to keep track of which area you are currently exploring.

•  Why RDM is both recommended and required

•  What research data are

•  Who is responsible for RDM

•  When RDM activities occur

•  How you can carry out RDM activities

Part 1:

Part 2:

Learning objectives: At the end of this training you will be able to: •  Define & identify research data •  Understand the demands of responsible conduct of research with

regard to research data management •  Understand the reasons behind the federal mandates of research data

management

4

Managing Research Data

44/ Managing Research Data

Links to many of the references and policies referred to in this course can be found on the final slides. Have Fun!

5

Managing Research Data

44/ Managing Research Data

Why should you care about Research Data Management?

WHY –WHAT – WHO – WHEN & HOW

6 44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Managing research data: SAVES TIME

Taking time to plan for your expected data, back them up, and document them in detail saves time otherwise lost in searching for, recovering, and deciphering data in the future

SIMPLIFIES YOUR LIFE Managing your data, by adopting an organization scheme, developing a description standard, and creating a preservation plan avoids future confusion and turmoil

INCREASES RESEARCH EFFICIENCY By saving time and avoiding confusion you will be more efficient! Manage your data for the future and you will be able to more easily find, access, understand, and use your data

ENSURES RESEARCH INTEGRITY Good research data management makes it more feasible to fulfill the commitments of responsible research

7 44/ Managing Research Data

Up to 80% of data lost within 20 years of publication:

http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-1.14416

516 ecology papers published between 1991 and 2011 The chance of data being accessible fell by 17% per year Vines, T. H. et al. Curr. Biol. http://dx.doi.org/10.1016/j.cub.2013.11.014 (2013)

8

WHY –WHAT – WHO – WHEN & HOW

44/ Managing Research Data

When you engage in research at Columbia University you must:

•  Be ethical in the conduct of the research •  Abide by regulations and policies •  Be responsible stewards of the research dollars and other resources •  Share the results of your research for the good of society

Managing data is a critical responsibility for all researchers

9

WHY –WHAT – WHO – WHEN & HOW

44/ Managing Research Data

•  Increases visibility •  Facilitates discovery •  Satisfies funder & journal requirements •  Reinforces open scientific inquiry •  Establishes priority & enables citation •  Speeds research

Adapted from: https://libraries.mit.edu/guides/subjects/data-management/why.html & http://researchdata.wisc.edu/share-your-data/data-access-2/ 10

WHY –WHAT – WHO – WHEN & HOW Managing research data enables sharing, which:

44/ Managing Research Data

Sharing enables breakthroughs that lead to economic development:

http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf 11

WHY –WHAT – WHO – WHEN & HOW

“Scientific research supported by the Federal Government catalyzes innovative breakthroughs that drive our economy. The results of that research become the grist for new insights and are assets for progress in areas such as health, energy, the environment, agriculture, and national security.”

44/ Managing Research Data

“…a research project's success is measured … also by the data it makes available to the wider community.”

“It is obvious that making data widely available is an essential element of scientific research.”

12

WHY –WHAT – WHO – WHEN & HOW

44/ Managing Research Data

The directive to make federally funded research data openly accessible “is integrally tied to and supports the mission of higher education to produce, preserve, and share scholarship. It therefore provides the community with an opportunity to marshal its resources to improve the interoperability of research support systems and maximize the value of research funding.”

Association of Research Libraries (ARL) on the Office of Science & Technology Policy memorandum “Increasing Access to the Results of

Federally Funded Scientific Research”

13

WHY –WHAT – WHO – WHEN & HOW

44/ Managing Research Data

•  Funders –  Federal agencies –  Foundations

•  Journals

14

WHY –WHAT – WHO – WHEN & HOW

44/ Managing Research Data

Data sharing is required by:

National Science Foundation (NSF):

https://www.nsf.gov/eng/general/dmp.jsp 15

WHY –WHAT – WHO – WHEN & HOW

44/ Managing Research Data

“Beginning January 18, 2011, proposals submitted to NSF must include a supplementary document of no more than two pages labeled "Data Management Plan" (DMP) . This supplementary document should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results. Proposals that do not include a DMP will not be able to be submitted.”

National Institutes of Health (NIH):

1 http://grants.nih.gov/grants/policy/data_sharing/ 2 http://grants.nih.gov/grants/guide/notice-files/NOT-OD-03-032.html

16

WHY –WHAT – WHO – WHEN & HOW

44/ Managing Research Data

“Data sharing is essential for expedited translation of research results into knowledge, products and procedures to improve human health.”1

“…all investigator-initiated applications with direct costs greater

than $500,000 in any single year will be expected to address data sharing in their application”2

“The Office of Science and Technology Policy (OSTP) hereby directs each Federal agency with over $44 million in annual conduct of research and development expenditures to develop a plan to support increased public access to the results of research funded by the Federal Government.”

“…digitally formatted scientific data resulting from unclassified research supported wholly or in part by Federal funding should

be stored and publicly accessible to search, retrieve, and analyze.” (2013)

http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf 17

WHY –WHAT – WHO – WHEN & HOW

44/ Managing Research Data

…and not just them! More federal agencies will be requiring public access to data:

http://www.nature.com/srep/policies/index.html http://http://www.aeaweb.org/aer/data.php

Journal Sharing Policies: “It is the policy of the American Economic Review to publish papers only if the data used in the analysis are clearly and precisely documented and are readily available to any researcher for purposes of replication. Authors of accepted papers that contain empirical work, simulations, or experimental work must provide to the Review, prior to publication, the data, programs, and other details of the computations sufficient to permit replication.”

“…authors are required to make materials, data and associated protocols promptly available to readers.”

18 44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

http://www.sciencemag.org/site/feature/contribinfo/prep/gen_info.xhtml#dataavail http://www.bmj.com/about-bmj/resources-authors/article-types/research

“…trials of drugs and medical devices will be considered for publication only if the authors commit to making the relevant anonymised patient level data available on reasonable request”

“All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science. All computer codes involved in the creation or analysis of data must also be available...”

19

WHY –WHAT – WHO – WHEN & HOW

44/ Managing Research Data

Journal Sharing Policies:

Benefits of good data management & sharing practices:

20

WHY –WHAT – WHO – WHEN & HOW

44/ Managing Research Data

•  Increase citations •  Avoid retractions (& potential misconduct questions) •  Advance knowledge •  Enable reproducibility

Increase citations:

Piwowar HA, Day RS, Fridsma DB (2007). Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/

journal.pone.0000308 21

WHY –WHAT – WHO – WHEN & HOW

44/ Managing Research Data

Publicly available data was significantly associated with a 69% increase in citations, independent of journal impact factor, date of publication, and author country of origin using linear regression.

Avoid retractions:

http://retractionwatch.wordpress.com/2013/10/30/nejm-paper-on-sleep-apnea-retracted-when-original-data-cant-be-found/ 22

WHY –WHAT – WHO – WHEN & HOW

44/ Managing Research Data

Advance knowledge:

http://www.sciencedaily.com/releases/2013/09/130903194155.htm 23

WHY –WHAT – WHO – WHEN & HOW

44/ Managing Research Data

“70 percent of published genetic sequence comparisons are not publicly accessible, leaving researchers worldwide unable to get to critical data they may need to tackle a host of problems ranging from climate change to disease control.”

Enable reproducibility: Looking at 238 recently published papers, pulled from five fields of biomedicine, a team of scientists found that just under 50 percent of the research materials, from lab mice to antibodies, used in the work could not be identified. This phenomenon impedes the ability of scientists to reproduce & extend published studies.

Vasilevsky NA, Brush MH, Paddock H, Ponting L, Tripathy SJ et al. (2013) On the reproducibility of science: unique identification of research resources in the

biomedical literature. PeerJ 1:e148 http://dx.doi.org/10.7717/peerj.148 24

WHY –WHAT – WHO – WHEN & HOW

44/ Managing Research Data

TAKE-AWAYS

25

WHY –WHAT – WHO – WHEN & HOW

44/ Managing Research Data

•  Researchers share the products of their research (e.g., publications, data) for the good of: –  Society –  Advancement of science –  Themselves

•  Data management is required by: –  Funding bodies –  Institutions

•  Data sharing is a requirement of: –  Funding bodies –  Publishers

What are the research data that need to be managed?

WHY –WHAT – WHO – WHEN & HOW

26 44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Defining Research Data:

1 Marieke Guy. http://www.slideshare.net/MariekeGuy/bridging-the-gap-between-researchers-and-research-data-management , #2

2 Queensland University of Technology. Manual of Procedures and Policies. Section 2.8.3. http://www.mopp.qut.edu.au/D/D_02_08.jsp

3 http://www.whitehouse.gov/omb/circulars_a110#36

27 44/ Managing Research Data

“…information created [or discovered] in the course of research”1

Material or information “on which an argument, theory, test or hypothesis, or another research output is based.” 2

“(i) Research data is defined as the recorded factual material commonly accepted in the [research] community as necessary to validate research findings…”3

Data may be collected in many ways:

28 44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

Through real time & unique observations, from repeatable experiments or simulations, or through derivations from unique collections of data, as a few examples.

Data collection method costs and risks: Some data may be impossible to replace, some data may merely be very expensive replace. Alternatively, some data are so cheap and quick to acquire that it is less expensive to repeat the collection process than to store the data, e.g., some gene sequences.

Data may be classified by collection method, which include:

29 44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

OBSERVATIONS – collected in real time / irreplaceable e.g. survey results, images, telemetry, sensor readings, some literary/historical sources, recordings

EXPERIMENTS – reproducible/ variable expense e.g. chromatograms, antenna mappings, word frequency

SIMULATIONS – Models & inputs used to create datasets e.g. economic models, climate models

DERIVATIONS/COMPILATIONS – reproducible/expensive e.g. text or data mining, 3D models, compiled database

RESEARCH PROCESS DATA – real time / irreplaceable e.g. survey instruments, data description/documentation, developed software, algorithms, code/script, instrument settings

TRUE OR FALSE:

30 44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

In scientific research, only the information and observations that are collected as part of your research are considered data.

FALSE

31 44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

Data are not only the information and observations made as part of scientific research but also the materials, the means, and the products of that research. Examples: •  Survey instruments •  Associated software •  Cell lines •  Specimens

Information exists in different forms during the research process:

32 44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

RAW OR PRIMARY– Lab notebooks, observational notes, instrument readings, images, footage, individual survey responses, historical sources, textual analysis, etc.

PROCESSED– Statistical analyses, sources organized as evidence, rich descriptions, aggregated survey responses, etc.

PUBLISHED– Distribution in some finalized format to those outside of the project. Distribution may occur in both static and dynamic (e.g. longitudinal data sets with annual reporting) instances, etc.

“Research data means the recorded factual material commonly accepted in the scientific community as necessary to validate research findings”

33

•  Preliminary analyses •  Drafts of scientific papers •  Plans for future research •  Peer reviews •  Communications with colleagues. •  Trade secrets •  Commercial information •  Materials necessary to be held

confidential by a researcher until they are published

•  Information which is protected under law

•  Personnel and medical information •  Information the disclosure of which

would constitute a clearly unwarranted invasion of personal privacy, such as information that could be used to identify a particular person in a research study.

http://federalregister.gov/a/2013-30465

44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW The federal Office of Management and Budget offers the following explanation (summarized):

Exclusions:

2 CFR § 200.315, Intangible property, e (3)

Some data on research subjects may require special protections because they are highly sensitive and highly regulated. These sensitive data may require encryption and other security measures:

•  Personal Health Information (PHI) e.g. insurance information, health conditions, etc.

•  Personally Identifying Information (PII) e.g. financial information, social security numbers, etc.

There are a number of university policies that govern handling information of these types. Special training is required for researchers and others handling PHI.

34 44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

Sensitive data:

Release of sensitive data can damage:

35 44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

-  Individuals whose data were released: identity theft, financial loss, privacy violations, etc.

-  Research team members: loss of reputation, loss of position -  Research institution: financial liability

UNIVERSITY RESOURCES: -  Office of HIPAA Compliance website -  HIPAA training -  IRB website and training -  Data Classification Policy -  Policy on Electronic Data Security Breach Reporting and

Response -  Other IT Security Policies

Take-aways:

36 44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

•  Definitions of data are varied – Use the one(s) appropriate to your research community

•  Some data are sensitive –  Know which data they are –  Know and take the proper precautions to protect these data

Who is responsible for Research Data Management?

WHY –WHAT – WHO – WHEN & HOW

37 44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

Who is responsible for research data?

38 44/ Managing Research Data

The PI is ultimately responsible for the data, and is the steward of the data (more on this later).

It is incumbent upon every member of the research team to safeguard research products (more on this later, too).

PI responsibilities:

39 44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

“The full administrative, fiscal and scientific responsibility for the management of a sponsored project resides with the principal investigator named in the award”

Faculty Handbook 2008

As with all aspects of a proposal submission, the PI must be involved with establishing and describing an appropriate data management plan, as required.

PI responsibilities:

40 44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

The PI is responsible for the collection, management, maintenance and retention of research data accumulated during a research project. It is the PI’s responsibility to:

•  Determine what records need to be retained to comply with sponsor requirements

•  Adopt an orderly system of data organization

•  Communicate the chosen system to all members of a research group & to the appropriate administrative personnel

•  Establish & maintain procedures for protection of essential records in the event of a natural disaster or other emergency

Sponsored Projects Handbook

Research team member responsibilities:

41 44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

Everyone involved in the research project is responsible for adhering to the statements and requirements presented in the data management plan, and all other data management practices related to the research project.

These may include practices of handling:

•  Physical data e.g., lab notebooks, samples, data documentation (aka metadata), etc.

•  Electronic data e.g., file naming conventions, generating metadata, keeping an e-lab notebook, data storage, data back-ups, annotating findings

Take-aways:

42 44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

•  PI is responsible for all aspects of the grant, including data management

•  All members of the research team are responsible for adhering to the data management plan

Research data management can be complex, but there are

resources available

See Part 2 of this course for details on WHEN & HOW to practice Research Data Management

à SEE NEXT PAGE!

Resources for Research Data Management:

43 44/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

Title   URL  

Scholarly Communications Program, Data Management http://scholcomm.columbia.edu/data-management/

Research and Data Integrity Program (ReaDI)

http://www.columbia.edu/cu/compliance/docs/ReaDI_Program/index.html

Data Management Plan Templates http://scholcomm.columbia.edu/data-management/data-management-plan-templates/

CUIT Research Computing Services http://rcs.columbia.edu

Academic Commons Archival Storage http://academiccommons.columbia.edu/about

Citation Management http://library.columbia.edu/research/citation-management.html

Managing Secure Information - Training http://columbia.sighttraining.com

Data Security Policies http://policylibrary.columbia.edu/category/computingtechnology

This work is licensed under a Creative Commons Attribution 4.0 International License.

REFERENCES •  Sco$,  Mark,  Boardman,  Richard  P.,  Reed,  Philippa  A.S.  and  Cox,  Simon  J.  (2012)  

Introducing  research  data.  Southampton,  GB,  Univeristy  of  Southampton,  29pp.  h$p://eprints.soton.ac.uk/338816/  

•  Responsible  research  data  management  and  the  prevenQon  of  scienQfic  misconduct  www.knaw.nl/Content/Internet_KNAW/publicaQes/pdf/2013449.pdf  

•  h$p://dmconsult.library.virginia.edu/  

44 44/ Managing Research Data Created  by:  Amy  Nurnberger,  2015-­‐05-­‐12    

This work is licensed under a Creative Commons Attribution 4.0 International License.