CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
-
Upload
loreen-booth -
Category
Documents
-
view
214 -
download
0
Transcript of CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
CC&E Best Data Management Practices, April 19, 2015
Please take the Workshop Survey
https://www.surveymonkey.com/s/update
1
Data Management Practices for Early Career Scientists:ClosingRobert CookORNL Distributed Active Archive Center Environmental Sciences Division Oak Ridge National LaboratoryOak Ridge, [email protected]
CC&E Joint Science WorkshopCollege Park, MDApril 19, 2015
CC&E Best Data Management Practices, April 19, 2015
Plan for archiving data
“Begin with the end in mind”
•Identified the Data Center
•Collaborated with data center during project
•Communicated:• Volume and Number of Files
• Special needs
• Delivery dates
3
CC&E Best Data Management Practices, April 19, 2015
Followed Fundamental Data Practices
4
Define the contents of your data files Define the variables Use consistent data organization Use stable file formats Assign descriptive file names Preserve processing information Perform basic quality assurance Provide documentation Protect your data Preserve your data
CC&E Best Data Management Practices, April 19, 2015
What to submit to the archive?
• Well-structured data files, with variables, units, and fill values well-defined
• Document that describes the data set• Additional information
– Article written with the data set – Files that describe project, protocols, or field sites
(photographs)– Material from Project Web site or Wiki
• Basic description of the data (15 questions)– http://daac.ornl.gov/PI/questions.shtml
5
CC&E Best Data Management Practices, April 19, 2015
Issues with data sets received
• Descriptive information about data files and content is incomplete– Data description and collection method– Field sites – Quality / uncertainty of data • Inconsistencies with publication
• Files uploaded are not identified / described• Variable names are not defined or vague
– “Height” unclear, change to “canopy_height” • Perhaps append the method/sensor for added clarity
6
CC&E Best Data Management Practices, April 19, 2015
Information about Data (15 questions)
Information About Your Data Set1.Have you looked at our Best Data Management Practices2.Who produced this data set?3.What agency and program funded the project?
What awards funded this project? (comma separate multiple awards)
Data Set Description4.Provide a title for your data set. (maximum 84 characters)
What type of data does your data set contain?What does the data set describe? (2-3 sentences)
5.What parameters did you measure, derive, or generate? (comma separated, limit to ten)6.Have you analyzed the uncertainty in your data?
Briefly describe your uncertainty analysis. (2-3 sentences)Will the uncertainty estimates be included with your data set?
7
CC&E Best Data Management Practices, April 19, 2015
Information about Data (cont)
Temporal and Spatial Characteristics7.What date range does the data cover? (YYYY-MM-DD)
What is a representative sampling frequency or temporal resolution for your data?8.Where were the data collected/generated?9.Which of the following best describes the spatial nature of your data?
(single point, multiple points, transect, grid, polygon, n/a)10.What is a representative spatial resolution for these data?11.Provide a bounding box around your data.
Data Preparation and Delivery12.What are the formats of your data files?
How many data files does your product contain?What is the total disk volume of your data set? (MB)
13.Is this data set final, unrestricted, and available for release?What are the reasons to restrict access to the data set?
14.Has this data set been described and used in a published paper?If so, provide a DOI or upload a digital copy of the manuscript with the data set.
15.Are the data and documentation posted on a public server?If so, provide the URL.
8
9
Exploration and Distribution– provide tools to explore, access,
and extract data
Post-Project Data Support– provide long-term secure
archiving– serve as a buffer between end
users and PIs– provide usage statistics
Stewardship– security, disaster recovery– migration to new computer
systems
Data Center: Stewardship and Archive Functions
Ingest– perform QA checks– compile project-provided
metadata– generate additional metadata– convert to archival file
formatsMetadata / Documentation
– prepare final metadata record and documentation
Archive / Release− generate citation and DOI
(digital object identifier)
CC&E Best Data Management Practices, April 19, 2015
Workshop Goal
Provide fundamental data management practices that investigators should perform during the course of data collection.
10
To improve the usability of data sets for:• You• Collaborators• People outside your project
By following the practices taught in this workshop, your data will be • less prone to error, • more efficiently structured for analysis, and • more readily understandable for any future research.
CC&E Best Data Management Practices, April 19, 2015
Please take the Workshop Survey
• https://www.surveymonkey.com/r/72MJWGF
11