Research Data Management: Part 1, Principles & Responsibilities
-
Upload
amyln -
Category
Data & Analytics
-
view
99 -
download
0
Transcript of Research Data Management: Part 1, Principles & Responsibilities
Managing Research Data Part 1
Planning Working
Finalizing Sharing Data
This work is licensed under a Creative Commons Attribution 4.0 International License.
WHY – WHAT– WHO – WHEN & HOW
WHY manage data -
WHAT research data are-
WHO manages research data -
WHEN & HOW data management is done -
Planning Working
Finalizing Sharing Data
Managing Research Data
This work is licensed under a Creative Commons Attribution 4.0 International License.
This two-part course is a collaboration between CU Libraries/Information Services and the Office of Research Compliance & Training. The purpose of this course is to familiarize you with the various aspects of research data management (RDM) by taking
3
Managing Research Data
44/ Managing Research Data
This course will guide you through these areas, offering in-depth details on each of them. Please refer to the top navigation to keep track of which area you are currently exploring.
• Why RDM is both recommended and required
• What research data are
• Who is responsible for RDM
• When RDM activities occur
• How you can carry out RDM activities
Part 1:
Part 2:
Learning objectives: At the end of this training you will be able to: • Define & identify research data • Understand the demands of responsible conduct of research with
regard to research data management • Understand the reasons behind the federal mandates of research data
management
4
Managing Research Data
44/ Managing Research Data
Links to many of the references and policies referred to in this course can be found on the final slides. Have Fun!
5
Managing Research Data
44/ Managing Research Data
Why should you care about Research Data Management?
WHY –WHAT – WHO – WHEN & HOW
6 44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Managing research data: SAVES TIME
Taking time to plan for your expected data, back them up, and document them in detail saves time otherwise lost in searching for, recovering, and deciphering data in the future
SIMPLIFIES YOUR LIFE Managing your data, by adopting an organization scheme, developing a description standard, and creating a preservation plan avoids future confusion and turmoil
INCREASES RESEARCH EFFICIENCY By saving time and avoiding confusion you will be more efficient! Manage your data for the future and you will be able to more easily find, access, understand, and use your data
ENSURES RESEARCH INTEGRITY Good research data management makes it more feasible to fulfill the commitments of responsible research
7 44/ Managing Research Data
Up to 80% of data lost within 20 years of publication:
http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-1.14416
516 ecology papers published between 1991 and 2011 The chance of data being accessible fell by 17% per year Vines, T. H. et al. Curr. Biol. http://dx.doi.org/10.1016/j.cub.2013.11.014 (2013)
8
WHY –WHAT – WHO – WHEN & HOW
44/ Managing Research Data
When you engage in research at Columbia University you must:
• Be ethical in the conduct of the research • Abide by regulations and policies • Be responsible stewards of the research dollars and other resources • Share the results of your research for the good of society
Managing data is a critical responsibility for all researchers
9
WHY –WHAT – WHO – WHEN & HOW
44/ Managing Research Data
• Increases visibility • Facilitates discovery • Satisfies funder & journal requirements • Reinforces open scientific inquiry • Establishes priority & enables citation • Speeds research
Adapted from: https://libraries.mit.edu/guides/subjects/data-management/why.html & http://researchdata.wisc.edu/share-your-data/data-access-2/ 10
WHY –WHAT – WHO – WHEN & HOW Managing research data enables sharing, which:
44/ Managing Research Data
Sharing enables breakthroughs that lead to economic development:
http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf 11
WHY –WHAT – WHO – WHEN & HOW
“Scientific research supported by the Federal Government catalyzes innovative breakthroughs that drive our economy. The results of that research become the grist for new insights and are assets for progress in areas such as health, energy, the environment, agriculture, and national security.”
44/ Managing Research Data
“…a research project's success is measured … also by the data it makes available to the wider community.”
“It is obvious that making data widely available is an essential element of scientific research.”
12
WHY –WHAT – WHO – WHEN & HOW
44/ Managing Research Data
The directive to make federally funded research data openly accessible “is integrally tied to and supports the mission of higher education to produce, preserve, and share scholarship. It therefore provides the community with an opportunity to marshal its resources to improve the interoperability of research support systems and maximize the value of research funding.”
Association of Research Libraries (ARL) on the Office of Science & Technology Policy memorandum “Increasing Access to the Results of
Federally Funded Scientific Research”
13
WHY –WHAT – WHO – WHEN & HOW
44/ Managing Research Data
• Funders – Federal agencies – Foundations
• Journals
14
WHY –WHAT – WHO – WHEN & HOW
44/ Managing Research Data
Data sharing is required by:
National Science Foundation (NSF):
https://www.nsf.gov/eng/general/dmp.jsp 15
WHY –WHAT – WHO – WHEN & HOW
44/ Managing Research Data
“Beginning January 18, 2011, proposals submitted to NSF must include a supplementary document of no more than two pages labeled "Data Management Plan" (DMP) . This supplementary document should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results. Proposals that do not include a DMP will not be able to be submitted.”
National Institutes of Health (NIH):
1 http://grants.nih.gov/grants/policy/data_sharing/ 2 http://grants.nih.gov/grants/guide/notice-files/NOT-OD-03-032.html
16
WHY –WHAT – WHO – WHEN & HOW
44/ Managing Research Data
“Data sharing is essential for expedited translation of research results into knowledge, products and procedures to improve human health.”1
“…all investigator-initiated applications with direct costs greater
than $500,000 in any single year will be expected to address data sharing in their application”2
“The Office of Science and Technology Policy (OSTP) hereby directs each Federal agency with over $44 million in annual conduct of research and development expenditures to develop a plan to support increased public access to the results of research funded by the Federal Government.”
“…digitally formatted scientific data resulting from unclassified research supported wholly or in part by Federal funding should
be stored and publicly accessible to search, retrieve, and analyze.” (2013)
http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf 17
WHY –WHAT – WHO – WHEN & HOW
44/ Managing Research Data
…and not just them! More federal agencies will be requiring public access to data:
http://www.nature.com/srep/policies/index.html http://http://www.aeaweb.org/aer/data.php
Journal Sharing Policies: “It is the policy of the American Economic Review to publish papers only if the data used in the analysis are clearly and precisely documented and are readily available to any researcher for purposes of replication. Authors of accepted papers that contain empirical work, simulations, or experimental work must provide to the Review, prior to publication, the data, programs, and other details of the computations sufficient to permit replication.”
“…authors are required to make materials, data and associated protocols promptly available to readers.”
18 44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
http://www.sciencemag.org/site/feature/contribinfo/prep/gen_info.xhtml#dataavail http://www.bmj.com/about-bmj/resources-authors/article-types/research
“…trials of drugs and medical devices will be considered for publication only if the authors commit to making the relevant anonymised patient level data available on reasonable request”
“All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science. All computer codes involved in the creation or analysis of data must also be available...”
19
WHY –WHAT – WHO – WHEN & HOW
44/ Managing Research Data
Journal Sharing Policies:
Benefits of good data management & sharing practices:
20
WHY –WHAT – WHO – WHEN & HOW
44/ Managing Research Data
• Increase citations • Avoid retractions (& potential misconduct questions) • Advance knowledge • Enable reproducibility
Increase citations:
Piwowar HA, Day RS, Fridsma DB (2007). Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/
journal.pone.0000308 21
WHY –WHAT – WHO – WHEN & HOW
44/ Managing Research Data
Publicly available data was significantly associated with a 69% increase in citations, independent of journal impact factor, date of publication, and author country of origin using linear regression.
Avoid retractions:
http://retractionwatch.wordpress.com/2013/10/30/nejm-paper-on-sleep-apnea-retracted-when-original-data-cant-be-found/ 22
WHY –WHAT – WHO – WHEN & HOW
44/ Managing Research Data
Advance knowledge:
http://www.sciencedaily.com/releases/2013/09/130903194155.htm 23
WHY –WHAT – WHO – WHEN & HOW
44/ Managing Research Data
“70 percent of published genetic sequence comparisons are not publicly accessible, leaving researchers worldwide unable to get to critical data they may need to tackle a host of problems ranging from climate change to disease control.”
Enable reproducibility: Looking at 238 recently published papers, pulled from five fields of biomedicine, a team of scientists found that just under 50 percent of the research materials, from lab mice to antibodies, used in the work could not be identified. This phenomenon impedes the ability of scientists to reproduce & extend published studies.
Vasilevsky NA, Brush MH, Paddock H, Ponting L, Tripathy SJ et al. (2013) On the reproducibility of science: unique identification of research resources in the
biomedical literature. PeerJ 1:e148 http://dx.doi.org/10.7717/peerj.148 24
WHY –WHAT – WHO – WHEN & HOW
44/ Managing Research Data
TAKE-AWAYS
25
WHY –WHAT – WHO – WHEN & HOW
44/ Managing Research Data
• Researchers share the products of their research (e.g., publications, data) for the good of: – Society – Advancement of science – Themselves
• Data management is required by: – Funding bodies – Institutions
• Data sharing is a requirement of: – Funding bodies – Publishers
What are the research data that need to be managed?
WHY –WHAT – WHO – WHEN & HOW
26 44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Defining Research Data:
1 Marieke Guy. http://www.slideshare.net/MariekeGuy/bridging-the-gap-between-researchers-and-research-data-management , #2
2 Queensland University of Technology. Manual of Procedures and Policies. Section 2.8.3. http://www.mopp.qut.edu.au/D/D_02_08.jsp
3 http://www.whitehouse.gov/omb/circulars_a110#36
27 44/ Managing Research Data
“…information created [or discovered] in the course of research”1
Material or information “on which an argument, theory, test or hypothesis, or another research output is based.” 2
“(i) Research data is defined as the recorded factual material commonly accepted in the [research] community as necessary to validate research findings…”3
Data may be collected in many ways:
28 44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Through real time & unique observations, from repeatable experiments or simulations, or through derivations from unique collections of data, as a few examples.
Data collection method costs and risks: Some data may be impossible to replace, some data may merely be very expensive replace. Alternatively, some data are so cheap and quick to acquire that it is less expensive to repeat the collection process than to store the data, e.g., some gene sequences.
Data may be classified by collection method, which include:
29 44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
OBSERVATIONS – collected in real time / irreplaceable e.g. survey results, images, telemetry, sensor readings, some literary/historical sources, recordings
EXPERIMENTS – reproducible/ variable expense e.g. chromatograms, antenna mappings, word frequency
SIMULATIONS – Models & inputs used to create datasets e.g. economic models, climate models
DERIVATIONS/COMPILATIONS – reproducible/expensive e.g. text or data mining, 3D models, compiled database
RESEARCH PROCESS DATA – real time / irreplaceable e.g. survey instruments, data description/documentation, developed software, algorithms, code/script, instrument settings
TRUE OR FALSE:
30 44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
In scientific research, only the information and observations that are collected as part of your research are considered data.
FALSE
31 44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Data are not only the information and observations made as part of scientific research but also the materials, the means, and the products of that research. Examples: • Survey instruments • Associated software • Cell lines • Specimens
Information exists in different forms during the research process:
32 44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
RAW OR PRIMARY– Lab notebooks, observational notes, instrument readings, images, footage, individual survey responses, historical sources, textual analysis, etc.
PROCESSED– Statistical analyses, sources organized as evidence, rich descriptions, aggregated survey responses, etc.
PUBLISHED– Distribution in some finalized format to those outside of the project. Distribution may occur in both static and dynamic (e.g. longitudinal data sets with annual reporting) instances, etc.
“Research data means the recorded factual material commonly accepted in the scientific community as necessary to validate research findings”
33
• Preliminary analyses • Drafts of scientific papers • Plans for future research • Peer reviews • Communications with colleagues. • Trade secrets • Commercial information • Materials necessary to be held
confidential by a researcher until they are published
• Information which is protected under law
• Personnel and medical information • Information the disclosure of which
would constitute a clearly unwarranted invasion of personal privacy, such as information that could be used to identify a particular person in a research study.
http://federalregister.gov/a/2013-30465
44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW The federal Office of Management and Budget offers the following explanation (summarized):
Exclusions:
2 CFR § 200.315, Intangible property, e (3)
Some data on research subjects may require special protections because they are highly sensitive and highly regulated. These sensitive data may require encryption and other security measures:
• Personal Health Information (PHI) e.g. insurance information, health conditions, etc.
• Personally Identifying Information (PII) e.g. financial information, social security numbers, etc.
There are a number of university policies that govern handling information of these types. Special training is required for researchers and others handling PHI.
34 44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Sensitive data:
Release of sensitive data can damage:
35 44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
- Individuals whose data were released: identity theft, financial loss, privacy violations, etc.
- Research team members: loss of reputation, loss of position - Research institution: financial liability
UNIVERSITY RESOURCES: - Office of HIPAA Compliance website - HIPAA training - IRB website and training - Data Classification Policy - Policy on Electronic Data Security Breach Reporting and
Response - Other IT Security Policies
Take-aways:
36 44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
• Definitions of data are varied – Use the one(s) appropriate to your research community
• Some data are sensitive – Know which data they are – Know and take the proper precautions to protect these data
Who is responsible for Research Data Management?
WHY –WHAT – WHO – WHEN & HOW
37 44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Who is responsible for research data?
38 44/ Managing Research Data
The PI is ultimately responsible for the data, and is the steward of the data (more on this later).
It is incumbent upon every member of the research team to safeguard research products (more on this later, too).
PI responsibilities:
39 44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
“The full administrative, fiscal and scientific responsibility for the management of a sponsored project resides with the principal investigator named in the award”
Faculty Handbook 2008
As with all aspects of a proposal submission, the PI must be involved with establishing and describing an appropriate data management plan, as required.
PI responsibilities:
40 44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
The PI is responsible for the collection, management, maintenance and retention of research data accumulated during a research project. It is the PI’s responsibility to:
• Determine what records need to be retained to comply with sponsor requirements
• Adopt an orderly system of data organization
• Communicate the chosen system to all members of a research group & to the appropriate administrative personnel
• Establish & maintain procedures for protection of essential records in the event of a natural disaster or other emergency
Sponsored Projects Handbook
Research team member responsibilities:
41 44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Everyone involved in the research project is responsible for adhering to the statements and requirements presented in the data management plan, and all other data management practices related to the research project.
These may include practices of handling:
• Physical data e.g., lab notebooks, samples, data documentation (aka metadata), etc.
• Electronic data e.g., file naming conventions, generating metadata, keeping an e-lab notebook, data storage, data back-ups, annotating findings
Take-aways:
42 44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
• PI is responsible for all aspects of the grant, including data management
• All members of the research team are responsible for adhering to the data management plan
Research data management can be complex, but there are
resources available
See Part 2 of this course for details on WHEN & HOW to practice Research Data Management
à SEE NEXT PAGE!
Resources for Research Data Management:
43 44/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Title URL
Scholarly Communications Program, Data Management http://scholcomm.columbia.edu/data-management/
Research and Data Integrity Program (ReaDI)
http://www.columbia.edu/cu/compliance/docs/ReaDI_Program/index.html
Data Management Plan Templates http://scholcomm.columbia.edu/data-management/data-management-plan-templates/
CUIT Research Computing Services http://rcs.columbia.edu
Academic Commons Archival Storage http://academiccommons.columbia.edu/about
Citation Management http://library.columbia.edu/research/citation-management.html
Managing Secure Information - Training http://columbia.sighttraining.com
Data Security Policies http://policylibrary.columbia.edu/category/computingtechnology
This work is licensed under a Creative Commons Attribution 4.0 International License.
REFERENCES • Sco$, Mark, Boardman, Richard P., Reed, Philippa A.S. and Cox, Simon J. (2012)
Introducing research data. Southampton, GB, Univeristy of Southampton, 29pp. h$p://eprints.soton.ac.uk/338816/
• Responsible research data management and the prevenQon of scienQfic misconduct www.knaw.nl/Content/Internet_KNAW/publicaQes/pdf/2013449.pdf
• h$p://dmconsult.library.virginia.edu/
44 44/ Managing Research Data Created by: Amy Nurnberger, 2015-‐05-‐12
This work is licensed under a Creative Commons Attribution 4.0 International License.