Research Data Infrastructure for Geochemistry (DFG Roundtable)
-
Upload
kerstin-lehnert -
Category
Science
-
view
277 -
download
1
Transcript of Research Data Infrastructure for Geochemistry (DFG Roundtable)
1
Research Data Infrastructure for Geochemistry
iedadata.org
2
Investment2
IEDA 2016-2021: Operation of a Multi-Disciplinary Data Facility for the Earth Science Community• Invited renewal proposal after IEDA
review in 2014/15 • Next 5 years of operating IEDA• $14.4 million
IEDA Data Systems for Geochemistry3
4
IEDA / EarthChem
Community driven Community governance Community engagement & training
Standards compliant (accredited ‘trustworthiness’) Follow data curation standards
QA/QC procedures Unique, persistent identification of data Persistent access of data holdings
Operational procedures (risk management, IP, etc.)
Demonstrated impact on science
4
5
5
Scientific Justification
enable new data intensive science, new cross-disciplinary studies, and new kinds of collaborations.
expand opportunities for scientists, educators, and the public to participate in science.
maximize the return on national research investments.ensure reproducible science: permit verification of
research results.contribute to new science initiatives.
“Data collections provide more than an increase in the efficiency and accuracy of research: they enable new research opportunities.”Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century” (NSB Report, September 2005)
6
Science from EarthChem Data Systems
7Gale et al.
8Gale et al.
9
Data Policies
December 11, 2013
9
Agencies
Societies
Journals
May 9, 2013
February 22, 2013
10
Data Policies
December 11, 2013
10
Concern: Reproducibility11
“The field sciences (e.g., geology, ecology, and archaeology), where each study is temporally (and often spatially) unique, provide exemplars for the importance of
preserving data and samples for further analysis.”
12
Data Policies:
December 11, 2013
12
COPDESSCoalition for Publishing Data in the Earth & Space Sciences
“Connecting Earth Science publishers and Data Facilities to help translate the aspirations of open,
available, and useful data from policy into practice.”
14
14
Data: Publishers’ PerspectiveMany have had supplements for some time.
Difficult to deal with, costlyPDF’s mostly (not searchable, poorly indexed, variable quality)
Require authors to comply with data availability policy; policing
Little guidance on community standards
Want to use and promote repositories, but not well integrated except for a few exceptions
Worried about repository funding and stability
Slide courtesy of Brooks Hanson, AGU Director for Publications
15
15
Statement of CommitmentCOPDESS.orgreaffirm and ensure adherence to our existing journal and
publishing policies…regarding data sharing and archiving...
Signed by ~50 publishers & data facilities
“Earth and space science data should, to the greatest extent possible, be stored in appropriate domain repositories that ... follow leading practices, and can provide additional data services.”
released 15 January. Article in Eos.org https://eos.org/agu-news/committing-publishing-data-earth-space-s
ciences
https://copdessdirectory.osf.io/
To be integrated with re3data.org
Domain-specific Data Facilities17
Science Community
Domain specific Data facility
17
Libraries Archives
CI, Computer Science
Publishers, editors
Discipline-specific data services• Context & provenance metadata
• Semantics• Workflows
Funding Agencies
Data Facilities
Registries
Data curation servicesCI development
18
findableidentification,persistence
accessibleprotection,protocols
context,provenance
re-usableharmonized, machine-readable
interoperableBIG DATA
Adding Value
small data
1/6/16ESIP Winter 2016: "Unleashing the BIG in Small Data"
Generic Repositories
Data Curation Standards
Community Data Collections
Domain-specific Data Standards
19
findableidentification,persistence
accessibleprotection,protocols
context,provenance
re-usableharmonized, machine-readable
interoperableBIG DATA
Generic Repositories Community Data Collections
Domain Repositories
Adding Value
small data
Unleashing the BIG in small Research Data
Kerstin Lehnert Lamont -Doherty Earth Observatory of Columbia UniversityPalisades, NY, 10964
http://bigdata-madesimple.com/hey-big-data-dont-forget-your-little-data-cousin/
21
Small Data:Pieces of a Puzzle …
1/6/16ESIP Winter 2016: "Unleashing the BIG in Small Data"
21
1/6/16ESIP Winter 2016: "Unleashing the BIG in Small Data"
22
… that build a picture
Small Data, Big Science: Example 123
1/6/16ESIP Winter 2016: "Unleashing the BIG in Small Data"
“Understanding where the dust that's in the atmosphere and oceans comes from can help scientists estimate its impact on earth's climate system.”
Bess Koffman, Michael Kaplan, Steven Goldstein, Gisela Winckler (LDEO), Natalie Mahowald (Cornell)
http://blogs.ei.columbia.edu/2014/03/13/did-new-zealand-dust-influence-the-last-ice-age/
Science Question:Did New Zealand Dust Influence the Last Ice Age?
Small Data - Big Effort or What it takes to generate a few kilobytes of data
ESIP Winter 2016: "Unleashing the BIG in Small Data"
24
1/6/16
ESIP Winter 2016: "Unleashing the BIG in Small Data"
25
25
Small Data, Big Science: Example 2
1/6/16
Science question:Do convergent margin volcanoes really represent continental crust?
“As it is crucial to understand the extent and origin of the compositional difference between central Aleutian lavas and plutons through time and space, this project will map and sample plutonic rocks exposed on the central Aleutians and their coeval volcanic host rocks.”
http://www.nsf.gov/discoveries/disc_summ.jsp?cntn_id=135851&org=NSF
ESIP Winter 2016: "Unleashing the BIG in Small Data"
26Small Data - Big Effort or What it takes to generate a few kilobytes of data
1/6/16
• 4 scientists (3 institutions) traveling to Alaska
• 5 weeks on remote islands• a boat (with crew)• a helicopter
Anticipated Data:• ~ 250 samples• ~ 200 major element analyses• ~ 150 trace element analyses• 50 U/Pb zircon geochronology• 30 Ar-Ar ages• 80 Sr, Nd, Hf and Pb isotope analyses
27
28EarthChem Data Systems
Data Data Data Data Data
EarthChem Library
Data Data Data Data Data
PetDB, SedDB EarthChem Portal
Data Publication & Preservation Data Mining & Analysis
InvestigatorsMetadata
Catalog Data & Metadata
Data & Metadata
External SystemsEarthChem Data Managers
29
EarthChem Library
Data Types:- Analytical datasets- Experimental datasets- Macros/tools- Data compilations (syntheses)- Images- Data reports
30DOI to allow proper citation of data
Link to publications
Link to funding source
30
31
Accessible in the EarthChem Library
32
Editors Roundtable Recommendations
Data need to be available in useful format Complete disclosure of data Data in tabular (usable!) format, no .PDF or .jpg No ratios
Sample metadata locations Unique sample identifiers Object classifications
Analytical metadata Method Lab Data quality & reproducibility (reference material measurements)
33
33
Data Templates
LPSC 2015 Workshop: Restoration and Synthesis of Planetary Geochemical Data
EarthChem Data Templates34
36
NEW!
37
Data Standards: Why?
Re-usability of data
Reproducibility of science
Integration/interoperability of data
38
38
Open Geospatial Consortium (OGC):Observations & Measurements
Observation Result
Feature of Interest
Sampling Sampling Feature
Observation
“Observations commonly involve sampling of an ultimate feature of interest. This International Standard defines a common set of sampling feature types classified primarily by topological dimension, as well as
samples for ex-situ observations.” (OGC O&M 2.0.0 / ISO19156; editor: Simon Cox)
e.g. Station,Transect, Section, Specimen
Observation Data Model v2
39
ODM2 Team:J S HorsburghA K AufdenkampeL HsuA JonesK LehnertE MayorgaL SongD TarbotonI Zaslavsky
Horsburgh et al., Environmental Modelling & Software, Volume 79, 2016.
PetDB40
ESIP Winter 2016: "Unleashing the BIG in Small Data"
41
41
PetDB Data Mining: Search & Filter
1/6/16
Filter by method or concentration
ESIP Winter 2016: "Unleashing the BIG in Small Data"
42
43
EarthChem Collaborations
External EC Portal contributors GEOROC, USGS, MetPetDB, GANSEKI
Critical Zone Observatories
DiamondDB (funded by Sloan Foundation/DCO)
DECADE Portal (funded by Sloan Foundation/DCO) Collaboration with Global Volcanism Program & MAGA
database (C. Cardellini)
Layered Intrusions Database J. van Tongeren (student engagement project)
MoonDB (funded by NASA 2015-2017) Johnson Space Center, C. Neal,
43
44
IEDA Data Rescue Initiative
Data Rescue Mini-awards ($7,000) J. Delano (SUNY Albany), A. Saal, E. Hauri: Apollo samples J. Gill (UCSC, retired): P. Janney (UCT): UCT Mantle Xenolith Collection M. Rhodes (U Mass): Hawaiian Drilling project T. Fischer (UNM): Russian Volcanic Gas Data
International Data Rescue Award in the Geosciences Sponsored by Elsevier Research Data division Awared 2013 (at AGU FM) and 2015 (at EGU GA) Competition for 2016 starting soon
Special Issue of GeoResJ on Data Rescue (volume 6, 2015)
44
EarthChem Portal45
Data Analysis 46
47
Data Analysis 48
Interoperability with LEPR (M. Ghiroso) 49
Results at LEPR 50
Data Analysis 51
52
53
53
EarthCube
Advances coordination, collaboration, and integrationCommunity governance Integrative Activities
Fosters new data communitiesResearch Coordination Networks
Develops and adapts new technologies to structure, transform, integrate, document, harmonize data & metadataBuilding Blocks