Research Data Management: Part 2, Practices
-
Upload
amyln -
Category
Data & Analytics
-
view
102 -
download
0
Transcript of Research Data Management: Part 2, Practices
Managing Research Data Part 2
WHY – WHAT– WHO – WHEN & HOW
Planning Working
Finalizing Sharing Data
This work is licensed under a Creative Commons Attribution 4.0 International License.
WHY manage data -
WHAT research data are-
WHO manages research data -
WHEN & HOW data management is done -
Planning Working
Finalizing Sharing Data
Managing Research Data
This work is licensed under a Creative Commons Attribution 4.0 International License.
This two-part course is a collaboration between CU Libraries/Information Services and the Office of Research Compliance & Training. The purpose of this course is to familiarize you with the various aspects of research data management (RDM) by taking
3
Managing Research Data
56/ Managing Research Data
This course will guide you through these areas, offering in-depth details on each of them. Please refer to the top navigation to keep track of which area you are currently exploring.
• Why RDM is both recommended and required
• What research data are
• Who is responsible for RDM
• When RDM activities occur
• How you can carry out RDM activities
Part 1:
Part 2:
Learning objectives: At the end of this training you will be able to: • Identify at which research stages data management activities occur • Understand practical details of research data management such as:
– File naming – File formats – Spreadsheet structure – Data preservation
4
Managing Research Data
56/ Managing Research Data
Links to many of the references and policies referred to in this course can be found on the final slides. Have Fun!
5
Managing Research Data
56/ Managing Research Data
When does Research Data Management happen? How is it done?
WHY –WHAT – WHO – WHEN & HOW
6 56/ Managing Research Data
Planning
Planning Working
Finalizing Sharing Data
8 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
When planning to manage data or writing a data management plan consider:
Planning
9 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
• What data will be shared? • Who will have access to the data? • Where will the data to be shared be located? • When will the data be shared? • How will researchers locate and access the data?
CONSIDER: • File format • File sizes • Changing rates of data production • Anticipated size of project data • Storage & Back-up • Privacy / security requirements • Data description • Retention period • Sharing requirements
Planning
10 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Plan for the entire data life-cycle.
• Non-proprietary • Open, documented standard • Standard representation (e.g., ASCII, Unicode) • Common, or commonly used by the research community (e.g.
FITS, CIF) • Unencrypted • Uncompressed
Planning
Some commonly recognized formats to avoid for storage include: Word [.doc(x)], SPSS [.sav], Excel [.xls(x)], STATA [.dta], Access [.mdb, .accdb], JPEG [.jpg], .gif, QuickIme [.mov], SAS [.sas]
Some commonly recognized formats meeIng these criteria: ASCII [e.g., .csv, .txt], PDF [.pdf], FLAC, TIFF, JPEG2000 [.jp2], MPEG-‐4 [.mp4], XML [.xml, .odf, .rdf], R [.r]
11 http://www.data-archive.ac.uk/media/2894/managingsharing.pdf
http://www.digitalpreservation.gov/formats/index.shtml?PHPSESSID=c26c5e5101396d5f5ebacedb13cae6e3 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Storage file formats should be:
X ✓
Not sure about the extension? Check hYps://www.naIonalarchives.gov.uk/PRONOM/default.htm
Storage / Back-ups Planning
Lifespan of Storage Media: http://www.crashplan.com/medialifespan/
12 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
No storage medium lasts forever. Consider the following media life-spans:
When choosing storage and back-up options you should: • Reduce the risk of damage or loss • Use multiple locations (here, near, far) • Create a back-up schedule • Use reliable back-up media • Test your back-up system (i.e., test file recovery, checksums)
Planning
13 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Storage / Back-ups
Remember to: • Back up data frequently • Make 3 copies – Original (here) – External/local (near) – External/remote – different geographic area (far)
• Verify recovery is possible – Confirm that file has not been corrupted, e.g., checksum
validation – Make sure you can reload the file, i.e., test file restore after
initial set-up – Check file recovery periodically & systematically thereafter
Planning
14 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Storage / Back-ups
Consider physical, network, computer system and file security for: • Intellectual Property –Trade secrets, commercial
information, confidential materials, restricted data • Personally identifying information (PII) • Personal health information (PHI) • High-security data
Planning
15 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Privacy & Security
CU Information Security Charter: “Users are persons who use Information Resources. Users are responsible for ensuring that such Resources are used properly in compliance with the Columbia University Acceptable Usage of Information Resources Policyhttp://policylibrary.columbia.edu/acceptable-usage-information-resources-policy, information is not made available to unauthorized persons, and appropriate security controls are in place.”
Planning
16 http://policylibrary.columbia.edu/information-security-charter 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Privacy & Security
(Some) Best practices for handling sensitive data: • Restrict physical access to computers, offices and storage media
• Encrypt any device (mobile, laptop, desktop, tablet, removable media [e.g., USB flash drives, CDs, hard drives]) containing sensitive data
• Store lab notebooks, research records, in locked cabinets
• Keep confidential and sensitive data on computers not connected to the Internet
• Don't send confidential data via e-mail or FTP (use encryption, if you must)
• Use strong passwords on files and computers
• Sanitize all systems before reusing, disposing, or donating
Planning
17 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Privacy & Security
• Lab notebooks • Data descriptions / code book • File naming
– Consistency: Pick a system, write it down, & stick with it – Identify necessary elements – Create brief, understandable names – Date: YYYY-MM-DD – Version: v01, v02,…FINAL
In general, try to stay away from spaces in filenames as well as the following characters: .\ / : * ? “ < > | [ ] & $
• File / directory structure • Sometimes there is a community standard for data formatting &
description for sharing/integration (aka metadata schema) – Find yours!
Planning
18 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Data Description / Documentation
Plan to keep your data according to: • CU Data Retention Policy: at least 3 years • Funder requirements: It varies – check them! • Regulations • Contract terms, for industry sponsored research • The importance of the data, regardless of external
requirements
Planning
19 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Retention Period
Do you plan to share your data? Prepare to follow the requirements of your: • Funder • Journal • Discipline • Data repository
Planning
20 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Planning to share
Working
Planning Working
Finalizing Sharing Data
21 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Review the data management plan: • Are you following it? • Did it survive first contact with the research? If not, – Does it need to be revised? – Take the opportunity to change it as necessary,
documenting the changes
Working
22 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Data Collection
Revisit your: • File naming conventions – Are they written down? – Does everyone on the project know & follow them?
• File structure / organization / tagging – Is it easy to understand / logical? – Is everyone on the project familiar with the organizational
practices so they can store and find files efficiently? • Back-up processes – Are they working? – Are they being followed?
Working
23 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Data Collection
Are you using someone else’s data as part of your research? You should probably cite it…
Consider a citation management software to keep track of it:
Working
hYp://library.columbia.edu/research/citaIon-‐management.html
24 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Data Collection
Using a spreadsheet for your data? Structure your data so that it’s easily sortable & usable by other software/machines. Be consistent with your: • Labels • Types • Formats • Layout
(Alternatively, consider using a database for easier data management)
Working
25 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Data Collection
Spreadsheet labels: Adopt a consistent style that indicates a cell contains a label rather than a value
Working
Date Instrument SoundLevel_R SoundLevel_L Amp_Se7ng
2013-‐12-‐22 BK-‐732A 84.6 86.0 3
2013-‐12-‐23 BK-‐732A 115.2 116.4 9
2013-‐12-‐24 BK-‐732A 128.7 130.0 11
Date: 12/22/2013 Instrument BK732A
Sound lev Right <85 Amplifier 3 (27%)
Lei 86.0
Date: Dec 23, 2013 Instrument Amp_Sekng
SoundLevel-‐R 115 BK_732-‐A 9
SoundLevel_L 116.4
J L
26 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Data Collection
Spreadsheet types: Don’t mix text & number types in the same column
Working
Date Instrument SoundLevel_R SoundLevel_L Amp_Se7ng
2013-‐12-‐22 BK-‐732A 84.6 86.0 3
2013-‐12-‐23 BK-‐732A 115.2 116.4 9
2013-‐12-‐24 BK-‐732A 128.7 130.0 11
Date: 12/22/2013 Instrument BK732A
Sound lev Right <85 Amplifier 3 (27%)
Lei 86.0
Date: Dec 23, 2013 Instrument Amp_Sekng
SoundLevel-‐R 115 BK_732-‐A 9
SoundLevel_L 116.4
J L
27 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Data Collection
Spreadsheet formats: Do all of your dates or other variable values look the same?
Working
Date Instrument SoundLevel_R SoundLevel_L Amp_Se7ng
2013-‐12-‐22 BK-‐732A 84.6 86.0 3
2013-‐12-‐23 BK-‐732A 115.2 116.4 9
2013-‐12-‐24 BK-‐732A 128.7 130.0 11 J L
Date: 12/22/2013 Instrument BK732A
Sound lev Right <85 Amplifier 3 (27%)
Lei 86.0
Date: Dec 23, 2013 Instrument Amp_Sekng
SoundLevel-‐R 115 BK_732-‐A 9
SoundLevel_L 116.4
28 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Data Collection
Spreadsheet layout: Tables of similar data should be structured similarly
Working
Date Instrument SoundLevel_R SoundLevel_L Amp_Se7ng
2013-‐12-‐22 BK-‐732A 84.6 86.0 3
2013-‐12-‐23 BK-‐732A 115.2 116.4 9
2013-‐12-‐24 BK-‐732A 128.7 130.0 11 J L
Date: 12/22/2013 Instrument BK732A
Sound lev Right <85 Amplifier 3 (27%)
Lei 86.0
Date: Dec 23, 2013 Instrument Amp_Sekng
SoundLevel-‐R 115 BK_732-‐A 9
SoundLevel_L 116.4
29 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Data Collection
When it’s not a spreadsheet (it may be a database): Be consistent! • Consistent process • Consistent organization • Consistent descriptions AND • Consistently documenting everything that’s done
Working
30 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Data Collection
(Some)Best practices for assuring quality data entry: • System-limited value entry, i.e., hard code controlled lists of
values • Check 5-10% of data records manually • Check out-of-range values • Check empty values / blank fields • Consider using a data entry program or double entry keying
Working
31 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Quality Assurance / Control
• Keep an untouched, “raw” copy of the data file – Make it Read Only
• Save cleaned or analyzed data as new files (with good file names, as previously described) – Take extensive notes of the actions taken or scripts used to
“clean” the data • Use a scripted language (e.g., R, SAS, SPSS) to consistently
process data and create a record of data processing & analysis • Document scripts / code with comments! • Write a ReadMe.txt file as you go, rather than trying to
remember what you did later
Working
32 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Analysis
Create a file to document your project. Include: • What data are being collected & why • Names of project files (data & analysis) • Project file naming and file organization conventions • Data definitions (aka Code Book or Data Dictionary) – more
next slide • Project standards • Calibration, precision, accuracy & units of instruments or
measurements
Working
33 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Analysis - documenting
Code books / data dictionaries should include: • Data codes or coding keys • Missing value codes • Field name / Column header / Data label
– Definition e.g., Amp_setting | Dial setting of guitar amplifier – Values – Possible values e.g., from 0 to 11, whole numbers – Units – may be included in either Definitions or Values – Type e.g., string, float, char, date [YYYYMMDD]
Working
34 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Analysis - documenting
Finalizing
Planning Working
Finalizing Sharing Data
35 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
• Check requirements • Are data useful/usable • Select data for preservation • Choose publication path • Consider publishing negative data – Others may find it useful • Repositories
Finalizing
36 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Preparing data for publication, sharing, storage, preservation:
Have you fulfilled the expectations of your: • Funder • Journal • Discipline • Repository • Institution
Finalizing
? 37 56/
Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Check requirements
Are data consistently: • Formatted • Named • Organized • Described / Documented
Are they in a file format that may be easily accessed and reused?
Finalizing
38 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Usability of data
You might consider long-term preservation if the answer to any of these questions is “Yes” • Do the data support published research? • Are the data difficult or expensive to regenerate? • Are the data required for your research but from another source
(i.e. not your original research data)? – If so, is the future availability of that data from the original source
uncertain? • Do you plan to share your data, or are you required to per funder
agreement? • Are the data historically significant? • Are the data vulnerable to loss, corruption, endangerment, etc.?
Finalizing
https://lib.stanford.edu/data-services/preserve 39 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Data preservation: Selection
Sharing
Planning Working
Finalizing Sharing Data
40 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
Honestly, the party most interested in the data you are producing today is probably:
Your future self But there are others to consider, too, so…
Sharing
41 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Who will you share with?
“By ‘final research data,’ we mean recorded factual material commonly accepted in the scientific community as necessary to validate research findings.”
NIH FAQ Data Sharing (3/03)
Guidelines will “be determined by the community of interest” and “may include…data, publications, samples, physical collections, software and models.”
Data Management and Sharing FAQ (11/10)
Sharing
42 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW What to share
“Timely release and sharing’ is defined as no later than the acceptance for publication of the main findings from the final data set.”
Data Sharing Policy, Section II.8.2.3.1, NIH Grants Policy Statement (10/12)
“The expectation is that all data will be made available after a reasonable length of time….[which] will be determined by the community of interest…”
Data Management and Sharing FAQ (11/10)
Sharing
43 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW When to share
There are many paths to publishing data: • Data paper / Data journal • Supplementary material • Data repositories Wherever you publish, make sure people can find it, use it, and give you credit for it! (this usually requires a permanent identifier e.g., DOI) Do you have negative data? Others may find it useful – consider making it available!
Sharing
44 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Publishing data
Institutional repository Columbia’s repository accepts materials from faculty, students, and staff. It offers: • Long-term preservation strategy • Multiple back-ups (including off site) • Quality content descriptions for increased discoverability • Monthly usage reports • Permanent URL & doi
Sharing
45 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Repositories
Disciplinary repository e.g., • GenBank • RCSB Protein Data Bank • ICPSR
Sharing
46 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Repositories
Public access repository e.g., • Figshare.com • DataDryad.org • ResearchCompendia • Academic Commons
Sharing
47 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Repositories
• How would you like your data cited • Licensing • Privacy/confidentiality/anonymization – Revisit IRB
commitments • What to share • When to share
Sharing
48 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW
When you share your data, consider:
• Publish your data, and make sure to cite it in your journal publication
• When publishing your data, provide a preferred citation • Did you use someone else’s data? – Check the license for restrictions – Provide the following minimum in your work’s citations:
• Title • Author/Creator name • Publisher • Publication year • Unique identifier e.g., DOI
Sharing
49 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Data citation
In the USA, facts – which means most datasets – are outside the scope of copyright protection. Some researchers have adopted the practice of data licensing because of this. There are many different license types, with varied provisions for reuse and attribution. When thinking about licenses keep in mind: • Funder requirements • Institutional requirements • Scientific and scholarly ethos of extending knowledge
Sharing
50 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Licensing
Revisit: • IRB commitments • Privacy Board requirements (HIPAA) • Institutional requirements • Ethical considerations Consider: • Have direct identifiers been removed? • Have indirect identifiers that could reveal identity when
combined been managed? • Does relational or spatial data have the possibility of
identifying participants? Maintain the maximum amount of detail possible without compromising participants confidentiality.
Sharing
51 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Privacy / Anonymization
.
Planning Working
Finalizing Sharing Data
• Managing research data takes place at every stage of the research and scholarly process – Planning – Working, where you
follow the plan, collecting and analyzing data
– Finalizing, where you make sure you followed the plan – Sharing, where you sigh with relief; it’s so simple, because you
followed your plan! • Research data management can be complex, but there are resources
available
52 56/ Managing Research Data
WHY –WHAT – WHO – WHEN & HOW Take-aways
à SEE NEXT PAGE!
Resources for Research Data Management ⬅⬅⬅links located to the left
WHY –WHAT – WHO – WHEN & HOW
53 56/ Managing Research Data
Resources for Research Data Management:
WHY –WHAT – WHO – WHEN & HOW
Title URL
Scholarly Communications Program, Data Management http://scholcomm.columbia.edu/data-management/
Research and Data Integrity Program (ReaDI)
http://www.columbia.edu/cu/compliance/docs/ReaDI_Program/index.html
Data Management Plan Templates http://scholcomm.columbia.edu/data-management/data-management-plan-templates/
CUIT Research Computing Services http://rcs.columbia.edu
Academic Commons Archival Storage http://academiccommons.columbia.edu/about
Citation Management http://library.columbia.edu/research/citation-management.html
Managing Secure Information - Training http://columbia.sighttraining.com
Data Security Policies http://policylibrary.columbia.edu/category/computingtechnology
This work is licensed under a Creative Commons Attribution 4.0 International License.
RESOURCES • CU Data Policies & Procedures:
– Faculty Handbook – Sponsored Projects Handbook – Clinical Research Handbook – Administrative Policy Library, Security Policies
e.g., Electronic Information Resources Security, Data Classification Policy, Policy on Electronic Data Security Breach Reporting and Response
• Scholarly Communications Program • Office of Research Compliance and Training
54 56/ Managing Research Data This work is licensed under a Creative Commons
Attribution 4.0 International License.
RESOURCES • Data Management Plans • CUIT Active Storage options • Academic Commons archival storage • Citation management • Executive Vice President’s Office of Research (EVPR) • Training on managing Personal Health Information
(PHI) • Research and Data Integrity Program (ReaDI)
55 56/ Managing Research Data This work is licensed under a Creative Commons
Attribution 4.0 International License.
REFERENCES • ScoY, Mark, Boardman, Richard P., Reed, Philippa A.S. and Cox, Simon J. (2012)
Introducing research data. Southampton, GB, Univeristy of Southampton, 29pp. hYp://eprints.soton.ac.uk/338816/
• Responsible research data management and the prevenIon of scienIfic misconduct www.knaw.nl/Content/Internet_KNAW/publicaIes/pdf/2013569.pdf
• hYp://dmconsult.library.virginia.edu/
56 56/ Managing Research Data Created by: Amy Nurnberger, 2015-‐05-‐12
This work is licensed under a Creative Commons Attribution 4.0 International License.