Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences...
-
Upload
dana-parrish -
Category
Documents
-
view
214 -
download
1
Transcript of Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences...
![Page 1: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/1.jpg)
Data Management 101 for Earth Scientists
Managing Your Data Robert CookEnvironmental Sciences DivisionOak Ridge National Laboratory
![Page 2: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/2.jpg)
Data Management 101 for Earth Scientists
Roadmap
•Introduction•Metadata•Best Practices for Data Management
![Page 3: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/3.jpg)
Data Management 101 for Earth Scientists
Benefits of Good Data Management Practices
Short-term•Spend less time doing data management and more
time doing research.•Easier to prepare and use data for yourself.•Collaborators can readily understand and use data
files.
Long-term (data publication)•Scientists outside your project can find, understand,
and use your data to address broad questions.•You get credit for archived data products and their
use in other papers.•Sponsors protect their investment.
3
![Page 4: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/4.jpg)
Data Management 101 for Earth Scientists
Metadata
Information to let you find, understand, and use
the data descriptors
documentation
4
![Page 5: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/5.jpg)
Data Management 101 for Earth Scientists
Metadata needed to Understand Data
The details of the data ….
Measurement
date Sample ID
Parametername
location
5
![Page 6: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/6.jpg)
Data Management 101 for Earth Scientists
Metadata Needed to Understand Data
MeasurementQA flag
media
generator
method
datesample
ID
parameter name
location
records
Units
Sample def.typedate
locationgenerator
labfield
Method def.
unitsmethod
Parameter def.
org.typename
custodianaddress, etc.
coord.elev. type
depth
Recordsystem
datewords, words.
QA def.
Units def.
GIS
6
![Page 7: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/7.jpg)
Data Management 101 for Earth Scientists
The 20-Year Rule (NRC 1991)
• The metadata accompanying a data set should be written for a user 20 years into the future--what does that investigator need to know to use the data?
• Prepare the data and metadata / documentation for a user who is unfamiliar with the details of your project, methods, and observations
7
![Page 8: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/8.jpg)
Data Management 101 for Earth Scientists
Proper Curation Enables Data Reuse
Time
Info
rmati
on
Con
ten
t
Planning
Collection
Assure
Metadata and Documentatio
n
Archive
Sufficient for Sharing and Reuse
8
![Page 9: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/9.jpg)
Data Management 101 for Earth Scientists
Proper Curation Enables Data Reuse
Time
Info
rmati
on
Con
ten
t
Planning
Collection
AssureMetadata
and Documentati
on
Archive
Sufficient for Sharingand Reuse
9
![Page 10: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/10.jpg)
Data Management 101 for Earth Scientists
Fundamental Data Practices
1. Define the contents of your data files 2. Use consistent data organization3. Use stable file formats – Curt Tilmes4. Assign descriptive file names 5. Perform basic quality assurance 6. Provide documentation7. Protect your data
10
![Page 11: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/11.jpg)
Data Management 101 for Earth Scientists
1. Define the contents of your data files
• Content flows from science plan (hypotheses) and is informed from requirements of final archive.
• Keep a set of similar measurements together in one file
• same investigator,
• methods,
• time basis, and
• instrument
• No hard and fast rules about contents of each files.
11
![Page 12: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/12.jpg)
Data Management 101 for Earth Scientists
1. Define the Contents of Your Data Files
Define the parameters
12Scholes (2005)
• Use commonly accepted parameter names and units(SI Units)
• Be consistent
• Explicitly state units
• Use ISO formats
Parameter Table
![Page 13: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/13.jpg)
Data Management 101 for Earth Scientists
2. Use consistent data organization (one good approach)
Station Date Temp Precip
Units YYYYMMDD C mm
HOGI 19961001 12 0
HOGI 19961002 14 3
HOGI 19961003 19 -9999
Note: -9999 is a missing value code for the data set
13
• Each row in a file represents a complete record, and the columns represent all the parameters that make up the record.
![Page 14: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/14.jpg)
Data Management 101 for Earth Scientists
2. Use consistent data organization (a 2nd good approach)
Station Date Parameter Value Unit
HOGI 19961001 Temp 12 C
HOGI 19961002 Temp 14 C
HOGI 19961001 Precip 0 mm
HOGI 19961002 Precip 3 mm
14
• Parameter name, value, and units are placed in individual columns.
• This approach is used in relational databases.
![Page 15: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/15.jpg)
Data Management 101 for Earth Scientists
• Use descriptive file names• Unique• Reflect contents• ASCII characters only• Avoid spaces
Bad: Mydata.xls2001_data.csvbest version.txt
Better: bigfoot_agro_2000_gpp.tiff
Site name
YearWhat was measured
(abundance)
Project Name
File Format
4. Assign descriptive file names
15
![Page 16: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/16.jpg)
Data Management 101 for Earth Scientists
16Source : PhD Comics
![Page 17: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/17.jpg)
Data Management 101 for Earth Scientists
5. Perform basic quality assurance
• Perform and review statistical summaries• Plot data and assess errors
• No better QA than to analyze data17
![Page 18: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/18.jpg)
Data Management 101 for Earth Scientists
5. Perform basic quality assurance (con’t)
Plot information to examine outliers
18
Model X uses UTC time, all
others use Eastern
TimeData from the North American Carbon Program Interim Synthesis (Courtesy of Yaxing Wei, ORNL)
Model-Observation Intercomparison
![Page 19: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/19.jpg)
Data Management 101 for Earth Scientists
5. Perform basic quality assurance (con’t)
Plot information to examine outliers
19
Data from the North American Carbon Program Interim Synthesis
(Courtesy of Yaxing Wei, ORNL)
Model-Observation Intercomparison
![Page 20: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/20.jpg)
Data Management 101 for Earth Scientists
6. Provide Documentation / Metadata• What does the data set describe?
• Why was the data set created?
• Who produced the data set and Who prepared the metadata?
• How was each parameter measured?
• What assumptions were used to create the data set?
• When and how frequently were the data collected?
• Where were the data collected and with what spatial resolution? (include coordinate reference system)
• How reliable are the data?; what is the uncertainty, measurement accuracy?; what problems remain in the data set?
• What is the use and distribution policy of the data set? How can someone get a copy of the data set?
• Provide any references to use of data in publication(s) 20
![Page 21: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/21.jpg)
Data Management 101 for Earth Scientists
7. Protect data
• Create back-up copies often• Ideally three copies
• original, one on-site (external), and one off-site
• Frequency based on need / risk
• Know that you can recover from a data loss• Periodically test your ability to restore
information
21
![Page 22: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/22.jpg)
Data Management 101 for Earth Scientists
DataONE Resources
dataone.org•Data Management Plans•Links to DMP Tool•Best Practices•Tools Database
Everything you wanted to know about data management, but were afraid to ask. Carly Strasser, et al 22
![Page 23: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/23.jpg)
Data Management 101 for Earth Scientists
Best Practices: Conclusions
•Data management is important in today’s science
•Well organized data: • enables researchers to work more efficiently• can be shared easily by collaborators• can potentially be re-used in ways not imagined when
originally collected
23
![Page 24: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/24.jpg)
Data Management 101 for Earth Scientists
Questions?
24
![Page 25: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/25.jpg)
Data Management 101 for Earth Scientists
References and Resources (continued)
• Ball, C. A., G. Sherlock, and A. Brazma. 2004. Funding high-throughput data sharing. Nature Biotechnology 22:1179-1183. doi:10.1038/nbt0904-1179.
• Borer, ET., EW. Seabloom, M.B. Jones, and M. Schildhauer. 2009. Some Simple Guidelines for Effective Data Management ,Bulletin of the Ecological Society of America. 90(2): 205- 214.
• Christensen, S. W. and L. A. Hook. 2007. NARSTO Data and Metadata Reporting Guidance. Provides reference tables of chemical, physical, and metadata variable names for atmospheric measurements. Available on-line at: http://cdiac.ornl.gov/programs/NARSTO/
• Cook, Robert B, Richard J. Olson, Paul Kanciruk, and Leslie A. Hook. 2001. Best Practices for Preparing Ecological Data Sets to Share and Archive. Bulletin of the Ecological Society of America, Vol. 82, No. 2, April 2001.
• Hook, L. A., T. W. Beaty, S. Santhana-Vannan, L. Baskaran, and R. B. Cook. June 2007. Best Practices for Preparing Environmental Data Sets to Share and Archive. http://daac.ornl.gov/PI/pi_info.shtml
• Kanciruk, P., R.J. Olson, and R.A. McCord. 1986. Quality Control in Research Databases: The US Environmental Protection Agency National Surface Water Survey Experience. In: W.K. Michener (ed.). Research Data Management in the Ecological Sciences. The Belle W. Baruch Library in Marine Science, No. 16, 193-207. 25
![Page 26: Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.](https://reader036.fdocuments.in/reader036/viewer/2022062720/56649f135503460f94c26eeb/html5/thumbnails/26.jpg)
Data Management 101 for Earth Scientists
References and Resources
• Management in the Ecological Sciences. The Belle W. Baruch Library in Marine Science, No. 16, 193-207.
• National Research Council (NRC). 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information. Report by the Committee on Geophysical Data of the National Research Council Commission on Geosciences, Environment and Resources. National Academy Press, Washington, D.C.Michener, W K. 2006. Meta-information concepts for ecological data management. Ecological Informatics. 1:3-7.
• Michener, W.K. and J.W. Brunt (ed.). 2000. Ecological Data: Design, Management and Processing, Methods in Ecology, Blackwell Science. 180p.
• Michener, W. K., J. W. Brunt, J. Helly, T. B. Kirchner, and S. G. Stafford. 1997. Non-Geospatial Metadata for Ecology. Ecological Applications. 7:330-342.
• U.S. EPA. 2007. Environmental Protection Agency Substance Registry System (SRS). SRS provides information on substances and organisms and how they are represented in the EPA information systems. Available on-line at: http://www.epa.gov/srs/
• USGS. 2000. Metadata in plain language. Available on-line at: http://geology.usgs.gov/tools/metadata/tools/doc/ctc/
26