DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

38
DATA SYSTEMS FOR SAMPLE- BASED OBSERVATIONS 1 Kerstin Lehnert

description

DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS. Kerstin Lehnert. Data from Samples. Distributed data acquisition Different labs/researchers analyze the same sample or subsamples of it. Distributed data publication Different data for the same sample are published in different papers. - PowerPoint PPT Presentation

Transcript of DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

Page 1: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

1

DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

Kerstin Lehnert

Page 2: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

2

Page 3: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

3

Data from Samples

Distributed data acquisition Different labs/researchers analyze the

same sample or subsamples of it. Distributed data publication

Different data for the same sample are published in different papers.

Distributed data archiving Data for the same sample are kept in

different data systems. Integrated data access required to

maximize utility.

Page 4: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

4

Geochemical Data

diverse hundreds of parameters thousands of materials vary with space and time over a range of

more than ten orders of magnitude complex

mostly sample-based with complex relations among samples & subsamples

distributed data acquisition (one sample analyzed in different labs by different researchers at different times)

Idiosyncratic data acquisition methods

Page 5: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

5

Geoinformatics for Geochemistry DATABASES

thematic geochemical databases (PetDB, SedDB, VentDB) DATA REPOSITORY

Geochemical Resource Library REGISTRIES

System for Earth Sample Registration SESAR IEDA Data Publication Agent of the STD-DOI system (DataCite®) GeoPass: single sign-on authentication system

DATA ACCESS & ANALYSIS TOOLS GfG user interfaces EarthChem Data Engine (Portal)

Page 6: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

6

EarthChem XML DB

Metadata catalog

datasets

(original data & derived

products)

GCDM DB

GfG Architecture

USGS

NAVDAT

GEOROC

EarthChem Portal

GfG Data EntryUser Submission

External Databases

Topical Data Collections

Geochemical Resource Library

Page 7: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

7

GeoChemical Data Model

observed value

publication data source

method/DQ

samplefeature of interest

collection,geospatial

analysis

materialpreparatio

n,obs. point

Page 8: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

Metadata

Geospatial Geographical coordinates Geographical names

Collection Sampling technique Field program

Description & Age Classification Texture Alteration Age

Data Quality Technique Instrument Laboratory Precision Reference material measurements Correction procedures

Page 9: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

9

Page 10: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

10

Page 11: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

11

Page 12: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

12

Page 13: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

13

Page 14: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

14

Page 15: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

15

Page 16: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

16

Standards for Data Access & Integration

WMS, WFS For visualization tools

OAI-PMH For joint data inventories

EarthChemML For integration across geochemical data

systems For interoperability with other systems

Page 17: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

17

Page 18: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

IEDA System-wide Inventory

InventoryExpedition MetadataReference MetadataDataset Metadata

Geospatial Metadata

RSS feed

MGDS SESAREarthChem GRL

GeochemDBs

Object Registration Object Metadata

Chemical DataCruise Info

DOI Registration

Page 19: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

19

EarthChem Portal

PetDB Others

USGSGEOR

OCNAVDAT

EarthChem Data Engine

Database

XML

XML XML

XML

XML

EarthChem Data Engine

Search & Visualization

Partner databases encode their data & metadata in XML and send them to the EarthChem portal database in Kansas.Queries submitted at the EarthChem portal search the contents of the EarthChem Portal Database.

Page 20: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

20

Page 21: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

Access Levels

Page 22: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

EarthChemML

Page 23: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

23

EarthChem Repository: user submission need tools that are easy to use and

support the data flow from lab to publication ideally, represent ‘pipelines’ for data

capture early in the data acquisition process

tools need to include data validation and DQC procedures

offer citable data publication need data policies

Page 24: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

IEDA data publication service

24

Page 25: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

25

STD-DOIs

The STD-DOI metadata are mainly Dublin Core elements, plus data specific elements.

The metadata transmitted to the National Library via web service (HTTP/SOAP) and incorporated into the library catalogue.

The metadata may contain references to other objects (DOI, IGSN, ...): Element <RelatedIdentifier> isCited, isParent, isChild, isDuplicate, …

Page 26: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

26

STD-DOIs

The element <relatedIdentifier> can be used to point to other electronic objects: Point to the literature where the data set is

interpreted. Point to samples, from which the data were

derived. Point to other datasets that belong to the

same collection of datasets. These links can be used by machines

(e.g. data portals) to make search suggestions and thus aid discovery of data, literature and samples, or other added value services.

Page 27: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

STD-DOI System Architecture

Page 28: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

28

Data DOIs

Page 29: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

Information Discovery

Link to publication

Citation of data

IGSN points to sample

Page 30: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

30

The International GeoSample Number

Page 31: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

Ambiguous Sample Naming

Examples from the PetDB Database

Name Location Publication CruiseD3-1 SEIR ANDERSON, 1980 VM3301 (Vema)D3-1 North Fiji Basin EISSEN 1994 Starmer 1 (Nadir)D3-1 Shimada Smt GRAHAM 1988 S1-79 (Sea Sounder)D3-1 Gorda Ridge CLAGUE 1984 KK2-83NP (Kana Keoki)3-1 Lamont Smts BATIZA 1982 RISE III (New Horizon)

Sample names are duplicated.

Sample names are modified or changed.

D3 Engel 1964D-3 Scheidegger 1981, Schilling 1971PD3 Tatsumoto 1965, 1966PD-3 Hedge 1970, Muehlenbach 1972PV D-3 Engel 1965AMPH3D Pineau 1976AMPH-D3 MacDougall 1986AMPH D-3 Sun 1980, Schilling 1975AMPH 3-PD-3 Hart 1971S-10 Subbarao 1972

Dredge sample 3, Amphitrite Cruise 1963/4

Page 32: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

Provides & manages unique identifiers for samples IGSN - International Geo Sample Number Assigned upon registration of sample metadata

Catalogs & archives sample metadata Access to sample metadata via web site & web

services Long-term preservation of metadata Link to sample archives

Facilitates links to data IGSN will be incorporated into persistent resolvable

GUIDs

Page 33: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

33

IGSN:SIO8JH3M4

International GeoSample NumberA Global Unique Identifier for Earth Samples

Strict syntax (9 digits, alphanumeric) First three characters are unique user code (registered with

SESAR) Last 6 characters are random numbers + letters Allows 2,176,782,336 sample identifiers per registrant

Does not replace personal or institutional names. Applied to samples & sub-samples

system tracks relations

www.geosamples.org

Name space

Page 34: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

Geoinformatics for Geochemistry

Core

Core Section 1

Core Section 3

Core Section 2

Sample 1

Sample 2

Sample 1

Sample 2

Sample 3

Sample 1

Sample 2

Sample 3

Rock powderMineral conc.Leachate

Fossil separateMicroprobe mount

ParentParentChild

ChildChildParent

IGSN:XXX000120

IGSN:XXX0065B3

IGSN:XXX9K23G6

IGSN:XXX07ST4K

IGSN:XYZ0G693M

IGSN:ABC0L98SW

IGSN:ABC0L53NW

IGSN:ABC0L653X

IGSN:ABC078HGB

Page 35: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

Sample Types

“Sampling events” such as holes, cores, dredges, stratigraphic sections

“Individual samples”: specimens rocks, minerals, fossils, fluid samples, precipitates, synthetic material, etc.

“Sub-samples” of any of above: processed samples such as mineral or fossil separates, leachates, thin sections, etc.

Page 36: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

Sample Registration

Spreadsheet forms for batch

loading

Interoperability(web services)

SESAR Web Site

Page 37: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

37

Implementation Challenges

Diversity of users Large sampling campaigns (IODP, ICDP, ECS) Repositories Data systems Individual investigators

Diversity of sample types Integration into existing policies,

procedures, data systems International scope Connectivity in the field

Page 38: DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS

38

Solutions

Schema improvements Web-service based registration from client data

systems Distributed system of registration nodes (Trusted

Agents) Handle service for IGSNs (persistent, resolvable)

http://dx.doi.org/18.2539/IGSN.SIO001234 Tools to facilitate registration

iSESAR (registration via iPhone) eCollections (personal sample management) webCollections (hosting services for repositories)

IGSN International Consortium