ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental...

22
Introductory Data Management Developing, Archiving, Sharing Data for Current/Future Use ResBaz Gainesville 1:55 pm 2:45 pm Friday, September 13, 2019 Health Sciences Center Library Computer Lab

Transcript of ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental...

Page 1: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

Introductory Data Management – Developing, Archiving, Sharing Data for Current/Future Use

ResBaz Gainesville

1:55 pm – 2:45 pm

Friday, September 13, 2019

Health Sciences Center Library Computer Lab

Page 2: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

Table of Contents

1. Basic data management concepts/terms

2. Fundamental data management plan components

3. UF resources for data management and archiving

4. External resources for data management and archiving

5. Data repositoriesoChoosing a data repository

6. The importance of open access to data

7. DMPTool hands-on training

8. References

Page 3: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

Basic data management concepts/terms

• Data Lifecycle • Metadata• Data Curation• Data Archiving• Data Preservation• Data Repository• Digital Object Identifier (DOI)• Open Access• Open Science - https://bit.ly/2lJEnTO• ORCID - https://orcid.org/

Figure 1: Digital Curation Centre (DCC) Curation Lifecycle Model (DCC, 2007) -http://www.dcc.ac.uk/resources/curation-lifecycle-model

Page 4: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

Fundamental data management plan components (DCC, 2013)

Administrative Data

• ID (funder or institution)

• Funder (i.e. NSF, USDA-NIFA)

• Grant Reference #

• Project Name

• Project Description

• PI/Researcher

• Researcher ID (e.g. ORCID)

• Date of 1st version, last update

Data Collection

• What data will you create or collect?• What type, format, and volume?

• Text, variant call format (VCF), >20GB

• Quantitative, qualitative

• How will the data be collected or created?• What standards or methodologies will

you use?• How will you name and

structure your files and folders?

Page 5: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

Fundamental data management plan components (1 of 2) – Name/Structure Files and Folders

Name/Structure Files and Folders• Develop naming conventions

• Always include same information (e.g. date, time, location)

• Retain order of information (YYYMMDD, not MMDDYYYY)

• Document standard file naming (e.g. codebook)

• Be descriptive so others can understand meaning

• Unique identifiers (i.e. Project Name of Grant Number in folder name)

• Date (embedded in file properties also)

• Use application-specific codes in three letter file extensions (e.g. MOV, TIF, XML)

• Limit the depth of sub-folders to no more than two sub-folders

Page 6: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

Fundamental data management plan components (2 of 2) – Name/Structure Files and FoldersName/Structure Files and Folders

• Use sequential numbered system (e.g. v1, v2, v3, etc.)

• DO NOT use confusing labels (e.g. revision, final, final2, etc.)

• Avoid spaces (use underscore)

• Use ASCII Characters only

• Document, share, evaluate

• Separate classes of products: raw data, derived data, graphics, code, documents, etc.

• Consider version control software (e.g. Git, GitLab, GitHub, etc.)

• Record all changes

• Discard obsolete versions (but never the raw copy)

• Make backups (store in three locations)

Figure 2: Data organization (Benedict, 2019)

Page 7: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

Fundamental data management plan components (DCC, 2013)

Documentation and Metadata

• What documentation and metadata will accompany the data?• What information is needed for

the data to be read and interpreted in the future?

• How will you capture/create the documentation and metadata?

• What metadata standards will you use and why?

Ethical, Legal, and Regulatory Compliances

• How will you manage any ethical issues?• Have you gained consent for data

preservation and sharing?

• How will you manage copyright and intellectual Property Rights (IPR) issues?• Who owns the data?• How will the data be licensed for

reuse?

Page 8: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

Fundamental data management plan components (DCC, 2013)

Storage and Backup

• How will data be stored and backed up during research (e.g. HiPerGator)?• Do you have sufficient storage or will

you need to include charges for additional services?

• How will you manage access and security?• What are the risks to data security

and who will manage data security risks?

Selection & Preservation

• Which data should be retained, shared, and/or preserved?• What data must be

retained/destroyed for contractual, legal, or regulatory purposes?

• What is the long-term preservation plan for the dataset?

• Where and in which repository or archive will the data be held?

Page 9: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

Fundamental data management plan components (DCC, 2013)

Data Sharing

• How will you share the data?• How will potential users find out

about your data?

• Are there any required data sharing restrictions?• What action will you take to

overcome or minimize restrictions (e.g. anonymize, de-identify)?

Responsibilities & Resources

• Who will be responsible for data management?

• What resources will you require to deliver your plan?• Is additional specialist expertise

(or training for existing staff) required?

Page 10: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

Fundamental data management plan components (Lorenzen, et al., 2016)Objective Output name Output description Output (type, format)

Obj. 1 Synthesized datasets Habitat; Fisheries independent; Fisheries dependent

Habitat (derived, geospatial), Fisheries (derived tabular)

Obj. 2 Hierarchical analyses of spatial recruitment and angler effort

Reports; Instructions for analyses; Data analyses code; Geospatial images

Reports and Instructions (test, PDF/XML); Code text, .txt); Geospatial (TIFF and GIS)

Obj. 3 Socio-ecological regional system model analyses

Reports; instructions for analyses; Data analyses code

Reports and Instructions (text, PDF/XML); Code (text, .txt)

Obj. 4 Restoration management strategyevaluation (MSE)

Simulation results; Reports; Instructions for analyses; Data analyses code

Simulation (simulated data, CSV); Reports and Instructions (text, PDF/XML); Code (text, .txt)

Table 1: Description of project data output and products from revised DMP - https://ufdc.ufl.edu/AA00014835/00088

Page 11: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

Fundamental data management plan components

Figure 3: Description of key components and processes in a fundamental data management plan

Page 12: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

Fundamental data management plan components – Reproducible template (1 of 2)A. Hardware specification

1. Processor (architecture, type, and number of processors/sockets)2. Caches (number of levels, size of each level)3. Memory (size and speed)4. Secondary storage (type: SSD/HDD/other, size, performance: random/sequential read or

write)5. Network (if applicable: type and bandwidth)

B. System and Environment Setup1. Operation system (e.g., the required compiler must be run with a specific version of the

OS)2. Configuration for the environment if needed (e.g., environment variables, paths)3. Programming Language: [C/C++/Java/…]4. Additional Programming Language info: [optional, e.g., Java version]5. Packages/Libraries needed: [an as thorough as possible list of software packages needed]6. Compiler info: [full details of compiler and version]7. Procedures to test if system is configured correctly: [Ideally, there is a script

called:./prepareSoftware.sh]

Page 13: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

Fundamental data management plan components – Reproducible template (2 of 2)C. Dataset Info

1. Repository: [url]2. Data generators: [url]

D. Experimentation and Measurements1. Scripts and how-tos to generate all necessary data or locate datasets: [Ideally,

there is a script called:./prepareData.sh]2. Scripts and how-tos for all experiments executed and measurements are taken:

[Ideally, there is a script called:./runExperiments.sh]3. Scripts for a clean-up phase where the system is prepared to avoid interferences

with the next round of experiments.

E. Data Representation and Visualization1. Tools that are used to generate the graphs (e.g., Gnuplot or Matplotlib)2. Scripts (or spreadsheet) how to generate the graphs.

Page 14: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

UF resources for data management and archiving

UF DropBox

IR@UF

RedCapResearchVault

HiPerGator

Figure 4: Select UF Resources for Data Management and Archiving

Page 15: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

External resources for data management and archiving

Data Management Training• CUAHSI Data Management Plans -

https://www.cuahsi.org/data-models/data-management-plans/

• ESIP Data Management Training (DMT) Clearinghouse –http://dmtclearinghouse.esipfed.org/

• ESIP Commons Data Management Short Course for Scientists –http://commons.esipfed.org/datamanagementshortcourse

• USGS Data Management –https://www.usgs.gov/products/data-and-tools/data-management

Tools• ELN (electronic lab notebook) –

https://bit.ly/2lNBJBQ

• GitHub (code repository) –https://github.com/

• Open Refine (data cleaning) –http://openrefine.org/

• Open Science Framework (research project sharing) – https://osf.io/

• Stackedit (in-browser Markdown editor) –https://stackedit.io/

• Zenodo (data repository sandbox) –https://sandbox.zenodo.org/

Page 16: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

Data repositories

• Discipline-specific repositories• Examples: Development Data Library (USAID), Climate Model Data Service (NASA),

IFPRI E-brary

• Registry of Research Data Repositories (https://www.re3data.org) is a great resource for options

• General data repositories• Examples: Zenodo (https://zenodo.org/), Dryad (https://datadryad.org/), Figshare

(https://figshare.com/)

• Institutional repositories (IRs)• Example: the IR@UF (https://ufdc.ufl.edu/ufir)

• Can be an additional layer of preservation, access, and discoverability for your work

• For more information and instructions on how to submit, visit http://guides.uflib.ufl.edu/ufir/home.

Page 17: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

Data repositories – Choosing a data repository• Technical Specs

• Are there size limits?• What types of materials can be uploaded?

• Cost• Are there charges to use the repository?

• Discoverability• Are there options for access (open, closed, restricted)? What is required for your project?• How easy is it to find items in the repository? How about outside the repository?

• Other Considerations• Is a persistent identifier (like a DOI) needed for your materials? Is that service offered?• Is your project collaborative? Will others need to upload files or add notes to the collection?• Is deposit with a specific repository required (e.g., as a condition of a grant or award)?

Page 18: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

The importance of open access to data

• Complies with the U.S. Federal Public Access Mandate• Open and machine-readable is the new default for all government data

• Federally funded research must be made available free to readers within 12 months of publication• To comply with mandate, research output (publications, reports) can be published in

open access (OA) journals or as OA articles in hybrid journals

• Encourages development of data that is findable, accessible, interoperable, and reusable (FAIR) - https://www.go-fair.org/fair-principles/

Page 19: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

DMPTool – hands-on training (1 of 2)

DMPTool - https://dmptool.org/DMPTool – Sign in instructions (FREE to the public)1. Navigate to

https://dmptool.org (Recommended browser Chrome, Firefox – IE not functioning)

2. Click on Sign in upper-right hand corner

3. Select most relevant Sign in optiona. Option 1: Your institution (if your institution is

affiliated with DMPTool) - enter University of Florida

b. Option 2: Email address (if your institution is not affiliated with DMPTool )

c. Option 3: Create account with email address (if not affiliated and need an account)

4. Click on the Next button

5. Login with you GatorLink credentials

6. Click on Create New DMP

7. Enter metadata and ORCID (if you do not have an ORCID, then create an ORCID – next slide)

Figure 7: Data Management Plan Tool (DMPTool)

Page 20: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

DMPTool – hands-on training (2 of 2)

ORCID - https://orcid.org/ORCID – Sign in instructions (FREE to the public)1. Navigate to https://orcid.org/

2. Click on Sign in upper-right hand corner

3. Select most relevant Sign in option under Sign into ORCID or Register Now

a. Option 1: Personal account (current user)

b. Option 2: Institutional account (Select if affiliated with University of Florida)

c. Option 3: Register now (create new account)

4. Click on the Next button

5. Click pencil icon to develop profileFigure 8: Open Contributor Researcher ID (ORCID)

Page 21: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

References• Benedict, K. (2019). Data Management Skills & Training Resources. Presentation to the INSC590 Problems: Information Science – Data Management graduate course

guest lecture at the University of Tennessee, Knoxville School of Information Sciences. March 27, 2019.

• DCC. (2013). Checklist for a Data Management Plan. V.4.0. Edinburgh: Digital Curation Centre. Available online: https://bit.ly/1Z2Gbqk.

• DASlab, Harvard SEAS. (2019). db reproducbile: ACM SIGMOD 2019 Reproducibility. Accessed September 12, 2019 from http://db-reproducibility.seas.harvard.edu/.

• Executive Order of President Obama for open and machine-readable government data, 5/9/2013. Available from the National Archives at https://bit.ly/2n5YBLz.

• FOSTER. (2019). Open Science Definition. Accessed September 12, 2019 from https://bit.ly/2lJEnTO.

• Lorenzen, K., Camp, E., & Dutka-Gianelli, J. (2016). Synthesizing spatial dynamics of recreational fish and fisheries to inform restoration strategies: red drum in the Gulf of Mexico. Revised Data Management Plan. http://ufdc.ufl.edu/AA00014835/00088.

• RedCap. (2019). Research Electronic Data Capture (RedCap). Accessed September 12, 2019 from https://bit.ly/2MExnII.

• *NCSU Libraries. (nd). Formats & Data Organization. Adapted from Making Data Management Easier by the University of Virginia Libraries and Storing Data by the University of Minnesota Libraries. Accessed September 12, 2019 from http://www.lib.ncsu.edu/data-management/formats.

• OPEN Government Data Act signed into law by President Trump, 1/14/2019. New release from SPARC at https://bit.ly/2T6a0ZE.

• UF Research Computing. (2019). Service rates. https://www.rc.ufl.edu/services/rates/service/.

• UF Research Computing. (2019). UF Apps for Research. https://www.rc.ufl.edu/services/uf-apps-for-research/.

• Whitemire et al., (2015). A table summarizing the Federal public access policies resulting from the US Office of Science and Technology Policy Memorandum of February 2013. figshare. http://dx.doi.org/10.6084/m9.figshare.1372041. Retrieved September 12, 2019 from http://tinyurl.com/hkgqytu.

• Wilkinson, M. D, et al. (2016). The FAIR Guiding Principles for Scientific data management and stewardship. Scientific Data 3, Article number: 160018. https://www.nature.com/articles/sdata201618.

• Zenodo. (2019). Frequently Asked Questions. Accessed September 12, 2019 from http://help.zenodo.org/.

Page 22: ResBaz Data Management Presentation€¦ · 1. Basic data management concepts/terms 2. Fundamental data management plan components 3. UF resources for data management and archiving

Thank you

Questions/comments?

Contact:Chelsea Johnston, Scholarly Repository Librarian: [email protected]

Plato Smith, Data Management Librarian: [email protected]

Data Management and Curation Working Group: [email protected]

Presentation slides available at https://ufdc.ufl.edu/l/IR00010966/00001