CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.

12
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey https://www.surveymonkey.com/ s/update 1

Transcript of CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.

CC&E Best Data Management Practices, April 19, 2015

Please take the Workshop Survey

https://www.surveymonkey.com/s/update 

1

Data Management Practices for Early Career Scientists:ClosingRobert CookORNL Distributed Active Archive Center Environmental Sciences Division Oak Ridge National LaboratoryOak Ridge, [email protected]

CC&E Joint Science WorkshopCollege Park, MDApril 19, 2015

CC&E Best Data Management Practices, April 19, 2015

Plan for archiving data

“Begin with the end in mind” 

•Identified the Data Center

•Collaborated with data center during project 

•Communicated:• Volume and Number of Files

• Special needs 

• Delivery dates

3

CC&E Best Data Management Practices, April 19, 2015

Followed Fundamental Data Practices

4

Define the contents of your data files   Define the variables Use consistent data organization Use stable file formats  Assign descriptive file names  Preserve processing information Perform basic quality assurance  Provide documentation Protect your data Preserve your data

CC&E Best Data Management Practices, April 19, 2015

What to submit to the archive?

• Well-structured data files, with variables, units, and fill values well-defined

• Document that describes the data set• Additional information

– Article written with the data set – Files that describe project, protocols, or field sites 

(photographs)– Material from Project Web site or Wiki

• Basic description of the data (15 questions)–  http://daac.ornl.gov/PI/questions.shtml

5

CC&E Best Data Management Practices, April 19, 2015

Issues with data sets received

• Descriptive information about data files and content is incomplete– Data description and collection method– Field sites – Quality / uncertainty of data • Inconsistencies with publication 

• Files uploaded are not identified / described• Variable names are not defined or vague

– “Height” unclear, change to “canopy_height” • Perhaps append the method/sensor for added clarity

6

CC&E Best Data Management Practices, April 19, 2015

Information about Data (15 questions)

Information About Your Data Set1.Have you looked at our Best Data Management Practices2.Who produced this data set?3.What agency and program funded the project?

What awards funded this project? (comma separate multiple awards)

Data Set Description4.Provide a title for your data set. (maximum 84 characters)

What type of data does your data set contain?What does the data set describe? (2-3 sentences)

5.What parameters did you measure, derive, or generate? (comma separated, limit to ten)6.Have you analyzed the uncertainty in your data?

Briefly describe your uncertainty analysis. (2-3 sentences)Will the uncertainty estimates be included with your data set?

7

CC&E Best Data Management Practices, April 19, 2015

Information about Data (cont)

Temporal and Spatial Characteristics7.What date range does the data cover? (YYYY-MM-DD)

What is a representative sampling frequency or temporal resolution for your data?8.Where were the data collected/generated?9.Which of the following best describes the spatial nature of your data?

(single point, multiple points, transect, grid, polygon, n/a)10.What is a representative spatial resolution for these data?11.Provide a bounding box around your data.

Data Preparation and Delivery12.What are the formats of your data files?

How many data files does your product contain?What is the total disk volume of your data set? (MB)

13.Is this data set final, unrestricted, and available for release?What are the reasons to restrict access to the data set?

14.Has this data set been described and used in a published paper?If so, provide a DOI or upload a digital copy of the manuscript with the data set.

15.Are the data and documentation posted on a public server?If so, provide the URL.

8

9

Exploration and Distribution– provide tools to explore, access, 

and extract data 

Post-Project Data Support– provide long-term secure 

archiving– serve as a buffer between end 

users and PIs– provide usage statistics

Stewardship– security, disaster recovery– migration to new computer 

systems

Data Center: Stewardship and Archive Functions

Ingest– perform QA checks– compile project-provided 

metadata– generate additional metadata– convert to archival file 

formatsMetadata / Documentation

– prepare final metadata record and documentation

 Archive / Release− generate citation and DOI 

(digital object identifier)

CC&E Best Data Management Practices, April 19, 2015

Workshop Goal

Provide fundamental data management practices that investigators should perform during the course of data collection.

10

To improve the usability of data sets for:• You• Collaborators• People outside your project

By following the practices taught in this workshop, your data will be • less prone to error, • more efficiently structured for analysis, and • more readily understandable for any future research.

CC&E Best Data Management Practices, April 19, 2015

Please take the Workshop Survey

• https://www.surveymonkey.com/r/72MJWGF 

11

12

Workshop Sponsors