Catherine Maillard First Training Session Ostende, February 12-17, 2007

30
SeaDataNet A Pan-European Infrastructure for Ocean and Marine Data Management www.seadatanet.org Catherine Maillard First Training Session Ostende, February 12-17, 2007 Introduction to Oceanographic Data Management

description

Introduction to Oceanographic Data Management. Catherine Maillard First Training Session Ostende, February 12-17, 2007. Catalogues. User’s Web browser. Analysis program. Data sets aggregation. Data management. Data discovery. Safeguarding. CDI - Data indexing in local - PowerPoint PPT Presentation

Transcript of Catherine Maillard First Training Session Ostende, February 12-17, 2007

Page 1: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

SeaDataNet A Pan-European Infrastructure for Ocean

and Marine Data Managementwww.seadatanet.org

Catherine MaillardFirst Training Session

Ostende, February 12-17, 2007

Introduction to Oceanographic Data Management

Page 2: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

2

Data management

ANALYSIS & MODELLING

SYSTEMSOBSERVING

SYSTEMSUser’s Web

browser

Analysis program

Product generationQuality

control Checks

Data discovery

Safeguarding

Data sets aggregation

Catalogues

Data Compilation

DataFormatting

CDI - Data indexing in local

archiving system

Page 3: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

3

1. Data Compilation

The data never go directly to the data centres – therefore it needs to:Locate the data sets not yet archivedRequest and get a copy of the missing data sets from the source laboratory/scientist –Check that the data sets is properly documented

Page 4: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

4

COMPILATION 1.1: Locate the data sets

which are not yet archived

Search in cruise report (CSR) catalogueOr in observation system (EDIOS)Or in EDMED or EDMERP

A data set should be identified either+ Maintain regular direct contacts

Page 5: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

5

COMPILATION 1.2: get a copy of the missing data

sets from the source laboratory/scientist

Request(s) a copy of the missing data sets identified as not archive at any format

Emphasize the importance of: long term archiving to follow up the environmental

changes Integration in long time series of data of the same

type – availability of global/regional/thematic database depends on all contributions

Facilitate the use of these databases Get and safeguard the electronic fileSometimes necessity of digitalization (GODAR)

Page 6: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

6

COMPILATION 1.3: The mandatory meta-data

Check that the data sets is properly documented with the mandatory fields describeda minimum of meta-data should be

included in the data files eg. Reference to cruise or observation system and source laboratorySensor typeParameter names and units etc.

Complete the missing information by asking questions to the originator

Page 7: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

7

2 - Data ReformattingIn general the original formats of the data files cannot be used in data management Incomplete/not standardized meta-data Incompatibility with QC and other processing input

format Need of a unique archiving format for safeguarding

the data sets of the same type Data management format, Archiving format and dissemination/exchange format(s) may be but not necessarily the same

Page 8: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

8

2 - Different Data Formats used

Archiving format : can be one of the actual exchange format or local format designed according to rules to insure sustainabilityExchange/Disemination format(s): joint projects and interoperability require common exchange format(s)Data Management/processing

Page 9: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

9

2.1 : General rules for sustainability of an archiving

format The archiving format should:

be independent from the computer (and libraries) – RDBS are not appropriateinsure that any isolated data includes enough meta-data to be processed (eg. Location and date)be compatible and include at least the mandatory fields (meta-data) requested for the agreed exchange format(s)Include additional textual or standardized “history” or “comment” fields to prevent any loss of informationProvide similar structure and meta-data for different data type such as vertical profiles and time series

These rules are normally followed also for exchange formats

Page 10: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

10

2.2 - SeaDataNet Data transport Formats

obligatory formats:NetCDF (Binary) for gridded data and 3D observation data such

as ADCP (Modified) ODV spreadsheet for other data types (vertical

profiles and time series) optional format:ASCII Medatlas as standard exchange format for the

Mediterranean and Black Sea community. BODC leads the task to modify the present ODV and NetCDF

formats for SeaDataNet use (QC flags, parameters semantics etc..and conformity with the international standards)

Formatting exercises to asses the coherence and compatibility of exchange formats

Page 11: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

11

2.3 – Processing Formats

For data management, (QC, cataloguing, selection, extraction, visualisation) the data can be In the archiving format and the In relational database system (RDBS) – the

presently most used RDBS in the community are ORACLE and MySQL

Note: an interface is needed between the software input format and the local data management system

Page 12: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

12

3 - Quality ChecksWhat they do Detect missing mandatory information Detect errors made during the transfer or reformatting Detect remaining outliers Detect duplicates Attach a quality flag to each numerical value

What they don’t do the preliminary data calibration and validation made by the

expert scientists Modify the data points

General rule The tools for data QC are not unique (eg. ODV and other local

systems), but the procedures are compatible. Any QC of a data set should be reported to the originator to

give feedback and ask questionsHow they are performed Next presentation by Sissy

Page 13: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

13

4 - SafeguardingThe QCed data sets should be safeguarded in a perennial system for further use 2 copiesFollowing up of the backup when the system or the

technology changes It is recommended to use the common computer

infrastructure of the institutes for making the backup regular and automatic

The original not standardized and not QCed data sets should be safeguarded also, for possible further checks by the data manager or the source scientists, but not to be disseminated

Page 14: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

14

5 - Data Dissemination and service

National data sets according to the national rulesAggregated data sets with other data sourcesExport the data in a unique exchange format With the appropriate documentation on:

the format and codesQC performed on the data The source of the data and the condition of use (license)

Page 15: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

15

5 - Data aggregationData Aggregation represents a service and a productTo answer data requests related to a geographical area or other selection criteria independently from the sourceInterrogate the local data centreComplete with other sourcesEliminate the duplicates

Page 16: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

16

Other data sources

The other data centres of the consortiumRegional and project databases: ICES: North-East Atlantic Medatlas 2002, Mater1996-1999 but some data

included in Medatlas, MFS/MOON for RTThe World Ocean Atlas – delayed mode dataThe Coriolis/Argo Server – Real Time DataThe satellite data

Page 17: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

17

The consortium data

The Common Data Index (CDI) shows what is presently available in the data centres. It will be continously updated during the project

http://www.sea-search.net/cdi/(also from the SeaDataNet website)

During the development phase (2006-2007) of the interoperable system, by the Technical Task Team, each data centre is interrogated separately to get access to the the data - Several Data centres provide on line tools for data search and access, including geographical selection and web services.

Page 18: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

18

Regional Databases

ICEShttp://www.ices.dk/ocean/ICES format

Medatlas 2002www.ifremer.fr/medar + Cdrom +ftp site Developed in the frame of the EU Medar project (a

regional DAR)Data selection tools according to various criteria

including geographical search available on the CdromAlso available on line from several partner data centres

Medatlas format

Page 19: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

19

World Ocean Atlas 2005http://www.nodc.noaa.gov

/OC5/WOD05/pr_wod05.html

Developed by US/NODC – WDC Washington – Ocean Climate Laboratory in the frame of IOC/GODAR project with the contribution of the other data centresData, mainly delayed mode data, are available through on line selection tool or on DVD (on request) All the fields can be interrogated for data selection. The possibility to select countries by group ( to get all but the own country, or all but the SDN consortium for example) is commonly used.

Page 20: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

20

Data Types in WOA 2005Type of observations

Ocean Station Data (OSD) [Bottle, low resolution CTD/XCTD, plankton data]High Resolution CTD/XCTD (CTD) Expendable (XBT) and Mechanical (MBT) Bathythermographs Autonomous Pinniped Bathythermographs (APB)Profiling Floats (PFL) Drifting Buoys (DRB) Moored Buoys (MRB) [TAO, PIRATA, others]Undulating Oceanographic Recorder (UOR) [Towed CTD] Glider data (GLD) Surface-Only (SUR) [Bucket, Thermosalinograph]

ParametersPressure, Temperature,salinity + 23 bio-geochemical parameters + biological taxons

Page 21: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

21

WOA 2005 export format

US-NODC formatCodes and standards different from SeaDataNet Tools available to process the data: US/NODC tools in fortran, C, Java to read the

data SeaDataNet/Ifremer tool to transcribe from WOA to

Medatlas by a converter (presently available in Unix only)

ODV can visualise the data directly in WOA format

Page 22: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

22

Coriolis/ Argo Serverhttp://www.coriolis.eu.org/cdc/

The Coriolis/Argo server is one of the two Argo Global Data Assembly Centres (GDAC) synchronized on a daily basis with the US

GODAE Data Centre (Monterey) serving daily real time data (+gridded analyses)

from the following national DACs including: Australian, Canadian, Chinese, French, Indian, Korean, Japanese, UK, and US, contributors from Chile, Costa-Rica, Germany, Morocco, Mexico, Norway, Netherlands, Russia, Spain and data from the GTS (sources difficult to establish)

On line selection tools allowing to visualize and download in-situ data

Page 23: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

23

Data Types in Coriolis/Argo

Vertical profiles mainly from : XBT, XCTD or XBT from research or opportunity vessels ; Argo profiling floats ; Anchored buoys or moorings ; Drifting buoys.

Trajectory data mainly from : Drifting buoys ; Argo floats ; Vessels equipped with a thermosalinograph (GOSUD server)

Many data but few parameters : P, T, S essentially Unerdevelopment: integration in the SeaDataNet

system

Page 24: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

24

Export Formats from Coriolis/Argo

Argo Netcdf – widely used in operational oceanography, designed for TS profilesASCII – (quasi) Medatlas

Important: for Medatlas format extraction, do the data selection data type by data type, to avoid to have all types grouped in the same file.

Page 25: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

25

Duplicates problem for data dissemination and products

preparationEven if the data are checked for duplicates at the national levels, remaining problems may exist:Data insufficiently documented and attributed

to two different sourcesPTS files and same station with other

parametersRT and DM profilesData declassified by the Navies with poor

meta-dataData sets from the GTS with decimated and

poorly documented profiles

Page 26: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

26

What tcan be done?

Selection country by country (however difficult for the RT)Visualising ship tracks and trajectories and superimposing the position maps of cruises made in the same region in the same period. In case of duplicate data sets, evaluate which is the best set of observations, the more complete and documented etc..

Can lead to a lot of manual work in the QC

Page 27: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

27

Page 28: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

28

Template for TA web page

All the images in the directory « Template_images »

Page 29: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

29

Education and Outreach pages

SDN-EDU.html

Page 30: Catherine Maillard First Training Session  Ostende, February 12-17, 2007

30

Conclusive remarks

SeaDataNet is developing basic tools for implementing the data management activities in conformity with internationally agreed protocols.The NODC/DNA of the 40 TAP use either the common tools or the existing local systems, but they should be inter-comparable and compatible.The present infrastructure is not yet stabilized in regards of standards and available software, but the main functionalities are available to insure the data circulation from the start of the project.Any new information, result or software is made immediately available on the website.Importance of developing a local page to connect by using the ENEA template