Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of...

33
Data Resources US Perspective Kerstin Lehnert Suzanne Carbo Lamont-Doherty Earth Observatory of Columbia University

Transcript of Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of...

Page 1: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Data Resources

US Perspective

Kerstin Lehnert Suzanne Carbotte

Lamont-Doherty Earth Observatory of Columbia University

Page 2: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Scientific Data in the Digital Age“It is exceedingly rare that fundamentally new approaches to research and education arise. Information technology has ushered in such a fundamental change. Digital data collections are at the heart of this change.”

US National Science Board, Report to the US National Science Foundation,, 2005

Page 3: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Access to Data

“Effective access to research data, in a responsible and efficient manner, is required to take advantage of the new opportunities and benefits offered by new information and communication technologies.”

Organization for Economic Co-operation & Development:

“Principles and Guidelines for Access to Research Data from

Public Funding”

May 2007

Page 4: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Open Access to Data: Benefits Democratize access to research resources

Ensure broad dissemination of results Facilitate new cross-disciplinary approaches - access for non-

specialist users Enable verification of research results Provide new research opportunities

Provide access to data from variety of sources and enable integration across fields

Provide foundation for use of automated tools Facilitate more efficient use of resources

Data are often expensive to collect (especially marine!) often/usually unique, repeat collection/analysis rare

Page 5: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Data Synthesis ‘the Old Way’

Months to Years

Months to Years

Page 6: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Data Synthesis Today

2 Minutes2 Minutes2 Minutes2 Minutes

Page 7: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Data Visualization: 2 Minutes

GeoMapApp software: www.geomapapp.org

Page 8: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Sharing Research Data: USA

“GAO recommends the agencies explore opportunities in the grants process to better ensure the availability of data to other researchers and determine if additional archiving strategies are warranted.”

GAO Report #07-1172September 28, 2007

Page 9: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Existing US Data Resources relevant for MARGINS Science Marine Geoscience Data System: hosts the MARGINS

Data Portal Geoinformatics for Geochemistry: hosts PetDB,

SedDB, SESAR, EarthChem (links to GEOROC & NAVDAT)

NGDC: Marine geoscience data - mostly legacy programs

IRIS: Seismic network data and earthquake catalogs UNAVCO: GPS data GEON: Lidar data SIO-GDC: hosts marine geoscience data from Scripps

expeditions WHOI: hosts data from vehicles of the NDSF

Page 10: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

www.marine-geo.org

www.geoinfogeochem.org

Page 11: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

PetDB

SESARSample Registry

EarthChem

SedDB

AntarcticMultibeam

Seismic ReflectionField Data Center

MARGINS

Ridge2K

Legacy

GfG & MGDSCollaborations & Partnerships

Boston UnivOregon State

Boise State

University of Kansas

WHOIScripps

Texas A&MUTIG

NGDCUniversity of NH

Data & IT • GEON• UNAVCO• USGS• IODP• ICDP• Pangaea• CoreWall• PaleoStrat• MetPetDB• LEPR

Data & IT • GEON• UNAVCO• USGS• IODP• ICDP• Pangaea• CoreWall• PaleoStrat• MetPetDB• LEPR

ScienceScience

Page 12: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Development

Operation

- Data modeling- Metadata standards- QC & ingestion

procedures- Data submission tools

& procedures

- Solicitation & Compilation

- Ingestion- Quality Control- Documentation- Curation- User support- Archiving

- Web applications- Query tools- Download

options- Web services, XML- Visualization & data

analysis tools

- System operation - Maintenance- User support

- Education modules - Presentations- Publications- Exhibits & demos- Workshops & short

courses- Web sites (News etc.)

DataData

ServicesServices

AccessAccess

EducationEducation& Outreach& Outreach

Page 13: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Scope of the MGDSScope of the MGDS

Metadata catalog: Central cruise catalog and data repository for all MARGINS programs- important goal is to preserve full data collection context for each expedition

Sensor Database: data documentation and access for multibeam and geophysical data from Palmer & Gould and MCS reflection data from Ewing & Langseth Global DEM: Synthesis of multibeam bathymetry into the Global Multi Resolution Topography - GMRT

MG&G Legacy data and derived data Tools for data access: lower barrier to data

access with tools tailored to science needs

October 23-24, 2007

Page 14: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

MARGINS DatabaseMARGINS Database

Provides access to expedition information & data for all MARGINS funded marine and some terrestrial programs

Diverse data collected during these programs hosted within MARGINS database: swath bathymetry gravity and magnetics MCS reflection water column data (BLISP, CTD) side-scan sonar mapping data rock and fluid sampling information

Database includes links to WHOI (near bottom camera), UTIG (processed MCS), IRIS (seismometer), UNAVCO (GPS)

Page 15: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

MGDS Data HoldingsMGDS Data Holdings

Page 16: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

MGDS Access InterfacesMGDS Access Interfaces

Data Link (server side)

GeoMapApp (client side)

Web services

Access data hosted at distributed data repositories

Page 17: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Access to data at distributed data repositoriesAccess to data at distributed data repositories

Alvin and Jason2 near bottom photos

Page 18: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

With bathymetry tiles exposed through a programmatic interface - can make use of GoogleEarth

Page 19: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

GfG Program: Scope

PetDB, SedDB, EarthChem data sets Build and provide access to integrated compilations of large volumes of geochemical data desktop access to the entire published geochemical literature within minutes

EarthChem Portal: Central access point to the broadest range of geochemical data in federated databases

SESAR Sample Registry: Provide global unique identifiers for samples; build global sample catalog

Page 20: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Database Features

Archive & serve integrated data sets of geochemical data (each individual value searchable)

Include complete metadata of samples and analytical procedures for searching and data evaluation

Offer interactive, dynamic user interfaces that allow extraction of any customized subset of the data

Support data analysis Tools for data quality assessment & control. Tools for visualization (map interfaces, plotting tools). Integration with broader Geoscience data via

interoperability & partnerships.

Page 21: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

EarthChem Data

Page 22: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.
Page 23: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

EarthChem Portal

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 24: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Access via GeoMapApp

Page 25: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Ambiguous Sample Naming

Examples from the PetDB Database

Name Location Publication CruiseD3-1 SEIR ANDERSON, 1980 VM3301 (Vema)D3-1 North Fiji Basin EISSEN 1994 Starmer 1 (Nadir)D3-1 Shimada Smt GRAHAM 1988 S1-79 (Sea Sounder)D3-1 Gorda Ridge CLAGUE 1984 KK2-83NP (Kana Keoki)3-1 Lamont Smts BATIZA 1982 RISE III (New Horizon)

Name Location Publication CruiseD3-1 SEIR ANDERSON, 1980 VM3301 (Vema)D3-1 North Fiji Basin EISSEN 1994 Starmer 1 (Nadir)D3-1 Shimada Smt GRAHAM 1988 S1-79 (Sea Sounder)D3-1 Gorda Ridge CLAGUE 1984 KK2-83NP (Kana Keoki)3-1 Lamont Smts BATIZA 1982 RISE III (New Horizon)

Sample names are duplicated.

Sample names are modified or changed.

D3 Engel 1964D-3 Scheidegger 1981, Schilling 1971PD3 Tatsumoto 1965, 1966PD-3 Hedge 1970, Muehlenbach 1972PV D-3 Engel 1965AMPH3D Pineau 1976AMPH-D3 MacDougall 1986AMPH D-3 Sun 1980, Schilling 1975AMPH 3-PD-3 Hart 1971S-10 Subbarao 1972

Dredge sample 3, Amphitrite Cruise 1963/4D3 Engel 1964D-3 Scheidegger 1981, Schilling 1971PD3 Tatsumoto 1965, 1966PD-3 Hedge 1970, Muehlenbach 1972PV D-3 Engel 1965AMPH3D Pineau 1976AMPH-D3 MacDougall 1986AMPH D-3 Sun 1980, Schilling 1975AMPH 3-PD-3 Hart 1971S-10 Subbarao 1972

Dredge sample 3, Amphitrite Cruise 1963/4

46396B 22 3,28-38 Dungan 1978396B 22 3,28-38 Muehlenbach 1979249 Dungan 1978DSDP046-0396B-022-003/28-38 PetDB

DSDP Leg 46, Hole 396B, Section 22, Sample 3, 28-33cm

Page 26: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

International Geo Sample Number IGSN SESAR serves as registry that provides &

manages unique identifiers for samples IGSN - International Geo Sample Number Obtained upon submission of sample metadata

(registration) Implementation in sample collection &

curation ongoing (IODP, core repositories)

Ca. 4 Mio. samples registered System still under development

Page 27: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Challenges for Open Data Access

Page 28: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Improving Global Data Access

Agreed on statements of principle and recommendations to address technical, procedural, and organizational issues of open global data sharing.

“Building a Global Data Network for Studies of Earth Processes at the World’s Plate Boundaries”International Workshop, Kiel (Germany), May 2007. Attended by 71 people from 14 countries.Sponsored by the MARGINS, Ridge2000, InterMARGINS, InterRIDGE programs.

Page 29: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Workshop Recommendations Science User Needs

Access to all data needed to reproduce scientific results Access to multidisciplinary & integrated marine & terrestrial data

Data Documentation & Publication Uniform best practices & standards for data acquisition, data

submission to data centers & data publication Easy procedures for metadata creation & data submission

Data & Metadata Interoperability Minimize proliferation of metadata standards Development of a data discovery service across distributed data

resources Opportunities & Obstacles for International Data

Sharing Leverage international bodies & programs (e.g. GEOSS, eGY, ICSU,

IPY) Establish dedicated task group & special interest groups to advance

implementation of a global data network

Page 30: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.
Page 31: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Cyberinfrastructure

Geoinformatics = Cyberinfrastructure for the Geosciences

Goal: A genuine infrastructure of highly reliable, widely accessible capabilities and services to support the entire range of scientific work.

Page 32: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

Infrastructure Components

Technological Infrastructure Institutional & Management Models Legal & Policy Framework Financial Support Cultural & Behavioral Changes

Page 33: Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.

MARGINSTAMU*

LEGACYNGDC/UNH

Ridge2000WHOI*

AntarcticMBS

Seismic Reflection DMSUTIG (Lead)

MGDS