DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research
description
Transcript of DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research
DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research
Bob CookEnvironmental Sciences DivisionOak Ridge National Laboratory
February 6, 2013NACP All-Investigator Meeting
2
The DataONE Vision and Approach:
Providing universal access to data about life on earth and the environment that sustains it, as well as the tools needed by researchers.
1. Building community2. Developing sustainable data discovery and interoperability solutions
3. Supporting researcher tools and services
3
The long tail of orphan data
Volu
me
Rank frequency of datatype
Specialized repositories (50%)
Orphan data (50%)
(B. Heidorn)
3
CharacteristicsBig ScienceLarge VolumeAutomated sensosWell describedWell curatedEasily Discovered
• Small Science• Small Volume• Poorly described• Rarely Indexed• Invisible to scientists• Rarely Used• Dark Data
• High spatial resolution• Process based• Theory Development• Model Development• Benchmarking
Characteristics
4
✔Check for best practices✔Create metadata✔Connect to ONEShare
Data & Metadata (EML)
https://dataone.orghttp://dataup.cdlib.org/
5
• Sponsor Requirements for Data Management
• Credit for data through citation, DOI, and Data Citation Index
• Training in Data Management• Improved tools for data
preparation – DataUp• Developing a metadata editor
Model-Data Fusion: Harnessing Observations
66
Model-Data Fusion:Data System Characteristics (1)
• Dedicated financial support for data management is essential
• Close coordination between the data group(s) and the producers (experimentalists) and users (modelers) of the data products
• Based on a data management plan and a data policy• Integrated system that delivers a suite of diverse products• Establish standards (file, workflow, network) and promote
interoperability • Processes to assure and document data quality to allow
proper interpretation and use
77
• Facilitate rapid exchange of data, products, and information; rapid exchange of large volume data
• Promote the use of best practices to prepare and document data to share and archive
• Make efficient use of existing data management infrastructure and resources
• Ensure that finalized data and associated documentation are transferred to an appropriate archive
• Make numerical models (source code) and description of the models available, along with model parameters and example input and output data (Thornton et al 2005)
Model-Data Fusion: Data System Characteristics (2)
8
Interoperability
KNB
LTER
ORNL DAAC InternalMetadataIndex
CDL
Coordinating Nodes
Met
adat
a Ex
trac
tion
• Virtual Portals • Numerous search capabilities• Metadata has link to data,
which reside at Member Nodes
USGS CSAS
DRYAD
Mem
ber N
odes
Futu
re
EML, ISO FGDC
FGDC, ISO
EML
FGDC
METS
FGDC, ISO
9
The long tail of orphan dataVo
lum
e
Rank frequency of datatype
Specialized repositories(e.g. Remote Sensing, NEON)
Orphan data
(B. Heidorn)
“Most of the bytes are at the high end, but most of the datasets are at the low end” – Jim Gray
9
10
Decr
easin
g Sp
atial
Cov
erag
e
Incr
easin
g Pr
oces
s Kno
wle
dge
Adapted from CENR-OSTP
Remotesensing
Intensive science sites and experiments
Extensive science sites
Volunteer & education networks
“Data intensive science” and the “80:20 rule”
10