GEOTRACES International Data Assembly Centre Edward Mawji GDAC/BODC [email protected].
-
Upload
aubrey-gaines -
Category
Documents
-
view
222 -
download
0
Transcript of GEOTRACES International Data Assembly Centre Edward Mawji GDAC/BODC [email protected].
GEOTRACES International Data GEOTRACES International Data Assembly CentreAssembly Centre
Edward Mawji
GDAC/BODC
Discussion PointsDiscussion Points
Important concepts in data management… What and who are GDAC (GEOTRACES
International Data Assembly Centre)? Attempt to bring together data managers for WG3
meeting Issues with management of GEOTRACES data
across Europe Membership of WG Dates for first WG3 meeting
Guiding principles of data managementGuiding principles of data management
• quality assurance of data
• treat all information as data
• data that lack sufficient metadata have limited value beyond the research program for which they were collected
• metadata should include sufficient information to support discovery, value assessment and accurate re-use of data
“data stewardship”
The data collection generated by a research projectis a valuable component of its legacy.
The GEOTRACES International Data Assembly Centre (GDAC) was initially created in 2008 to serve the data management needs of the GEOTRACES community. At present GDAC is jointly funded by NSF and NERC.
Sole responsibility is data management and storage
Starting premise - data must be secure and readily usable in the short term for GEOTRACES participants and for long term without reference to the originator.
What and who are GDAC
GDAC is located at the British Oceanographic Data Centre Liverpool UK
The Web-site is: http://www.bodc.ac.uk
GDAC data manager is Edward Mawji
GDAC roleGDAC role
• Main role of GDAC will be to compile global datasets for all key GEOTRACES parameters.
•Provide PI with guidance on Metadata requirements
•Capture and record supporting documentation (metadata)
•Make data easily accessible to participating scientists and the larger science community.
•To communicate with national data centres
•The policy for data release will be determined by the Scientific Steering Committee (SSC).
•Website for data delivery is under construction, the delivery aspect will be developed once data has been submitted to GDAC.
Maintain contact with national DC
Provide advice and/or assistance for PI’s
prior to planning workshops prior to cruise …
cruise metadata forms discuss data management strategy
to support research post-cruise … data publishing …
data inventory cruise documentation data contributions …
To meet the requirements of the GEOTRACES programme, GEOTRACES cruises require a high level of data management
GDAC responsibilities
GEOTRACES Data Management set up GEOTRACES Data Management set up
Ship-based measurements
National Data
Centres
Store data and all metadata for all GEOTRACES
cruises
Ship-based TEI Ship-based TEI measurementsmeasurements
GDAC
Flow of Data
Policy Policy
Before cruise PI/PSO informs GDAC of intended cruise
Down load pre-cruise metadata form and Scientific sampling event log forms from website
Identify appropriate DAC
If no DAC inform GDAC who will act as DAC (cruise planning stage)
After cruise- Chief Scientist
Submits metadata forms and event logs to GDAC and DAC (1 week)
Submits underway navigational files and data (1 week)
Submits CTD data to GDAC (1 week)
Submits cruise report (8-16 weeks)
•DAC carries out data tracking and submits final data to GDAC, If no DAC GDAC carries out data tracking
Progress to date
Initial stepsDesigned and published a GDAC Websitewww.bodc.ac.uk/geotraces/
Please have a look, feedback would be welcome
Website Progress and issuesWebsite Progress and issues
Things to do
• Create links under relevant section for metadata forms and example event logs (waiting for feed back from DMC)
• Link to IMBERS data management cook book (once published).
• Add a full description of every parameter measured on each cruise. This can only be achieved when PI’s or national data centres pass on this information.
A cruise report would help.
Unified description of key GEOTRACES parameters
• Add cruises to POGO
• Development of the data delivery function will be put on hold until it is necessary.
Accessing DataAccessing Data
GEOTRACES data specifications
Ask for user name and details when a data request is made
Provide information about the data with the data file = standard processing file OR metadata form
Link from GEOTRACES website to GDAC website
No Data will be made public without appropriate approval
Attempt to bring together data managers Attempt to bring together data managers for WG3 meetingfor WG3 meeting
Initially a meeting was planned for May 2009
Poor response-so cancelled
Why
Lack of contact with national data centres
To early for a Data meeting?
Why a WG3 meeting is importantWhy a WG3 meeting is important
Centralisation of data
Version control
Communication
Metadata forms and requirements
Advisable all GEOTRACES cruise have an on board data managers
Greatly helps the organisation of data. IMBER have developed an online guide, will be made available and/or adapted for GEOTRACES once the IMBER SSC approve the draft
Data Management Issues
IMBER Data Management Cook Book
Would be useful for feedback from the GEOTRACES community for our own cookbook
http://planktondata.net/imber/
Likely ProblemsLikely Problems
GEOTRACES Problems
Lack of cooperation from scientist and data centres. Lead nations need to identify GEOTRACES data
managers Mistrust i.e Australia IPY data will not be made
available until data is published
Integrate GEOTRACES data into GDAC’s database
Version control of Date: Potential problem from holding GEOTRACES data in multiple locations.
Measured needed to ensure international data management works smoothly
Data Managers should get together regularly to discuss progress.
Contacts and progress with IPY Contacts and progress with IPY GEOTRACES dataGEOTRACES data
The following contact has been made
Germany- PANGAEA (DE) ([email protected]), Contact via email. Not much data in PANGAEA (Sea Bird Bottle data and some Radio nuclei data for cruise ARK-XXII/2). Issues over version control exist, hopefully will resolve before transferring data.
Netherlands- Taco de Bruin ([email protected]). IPY data manager but will also deal with GOTRACES IPY. New to the job GDAC have only just received his details (March 2009). Hein de Baar and Michiel Rutgers van der Loeff are confident they can make good progress for submitting data for the Polarstern cruises ARK-XXII/2 and ANT XXIV-3 in the next few months.
France -Marie-Paule Torre (FR) ([email protected]). Contact via email, Ed Mawji will arrange a meeting in the next few months to discuss BONUS-GOOD HOPE data. Marie Paule Torre expects data to start arriving around May 2009.
USA- Cyndy Chandler [email protected]. Very helpful, no data at present. Will probably start with metadata
New Zealand - Philip Boyd ([email protected]) will handle all NZ GEOTRACES data. Ed Mawji is in the process of setting up arrangements for GDAC to act as local data centre for the GEOTRACES process study FeCycle II (March 2009).
Australia –Andy Bowie and Edward Butler are in charge of IPY-GEOTRACES data. Butler’s group is still analysing samples. Bowie has finalised data for SAZ-SENSE, but will not transfer data until it is published.
No contact with Norway and Sweden but have been advised the following people are in charge of IPY data .
Norway -Oystein Godoy, ([email protected])
Sweden- Jan Szaron ([email protected]) SMHI oceanography
No contact with Japan or China.
Summary of progress and problemsSummary of progress and problems
Without a detail cruise report or metadata forms, tracking data becomes difficult (no record of what parameters were measured).
Metadata forms have been developed waiting for feed back from Reiner and Chris.
Distrust between some lead PI and data centres. And between data centres
Data retrieval method have not been tested but are likely to vary between data centers and country. No details at present.
No data set have arrived at GDAC yet. Not possible to set up BODC parameter codes until data with detailed method is received.
Future progress
The most important task is mapping all GEOTRACES parameters measured on all IPY cruises
Membership of WGMembership of WG
Need to increase the data management membership
Please nominate an appropriate person from each nation
Dates for the first meeting will depend on increased membership
FeedbackFeedback
Thank You
Feedback, comments and suggestions welcome and encouraged
Names of relevant data managers for GEOTRACES data
Data management support for large, Data management support for large, oceanographic research projectsoceanographic research projects
Facilitate the interchange of data within the project community
Work up and quality assure data Assemble project data into a single high quality
coherent data set, maintaining spatial and temporal relationships
Ensure documentation of data sets via developed metadata forms
Final banking and publication of the project data set Encourage future utilisation of data
Quality Control of GEOTRACES Data
Reformat data to BODC internal format
Check parameters, units, time zone
Visually inspect data using in-house software
Data checked for spikes, gaps, physically unreasonable values
Data compared with that previously received from a site
Adjacent stations compared to check unusual signals
Problems discussed and resolved with data suppliers
Accompanying documentation compiled
Data loaded to BODC data bank
Data made available (via the Web, or on request)
Mission statement of GDACMission statement of GDAC
To operate world class data management for GEOTRACES
providing data management support for all cruises
maintaining and developing a TEI’s database
international exchange and management of oceanographic data
making high quality TEI’s data readily available to research scientists in academia, government and industry
Quality control and data integrationQuality control and data integration
The processing stage includes five main steps:
1) Initial quality control of metadataReported metadata are checked against other available sources
(e.g. ship’s log, scientific log, cruise report, other data centres)
Disagreements are investigated and originator may be contacted if any doubt subsists
Originator may also be contacted if there appears to be a problem with the data or if information about methodology is insufficient to start banking the data. GEOTRACES metadata forms should eliminate this problem
2) ReformattingAttribution of BODC parameter codes.
CTD and underway data are transferred to BODC internal binary format
Cruise Departure Dates GEOTRACES PI Region Ship
IPY
Sea of Okhotsk 05/08/2007 Jun Nishioka Artic. R/V Professor Khromov
ATOS-1 Antonia Tovar-Sanchez Artic.
SAZ-SENSE 17/01/2007 Andy Bowie and Edward Butler Antarctic Aurora Australis
ARK-XXII/2 (IPY-Cruise) 28/07/2007 Michiel Rutgers van der Loeff Artic. Polarstern
SIPEX 05/09/2007 Andy Bowie Antarctic Aurora Australis
ANT XXIV/3 (Caso-GEOTRACES; Zero & Drake) 06/02/2008 Hein de Baar Antarctic Polarstern
BONUS/GOOD-HOPE 07/02/2008 Marie Boye Antarctic Marion Dufresne
SR3-GEOTRACES 28/03/2008 Andy Bowie Antarctic Aurora Australis
International Siberian Shelf Study (ISSS 08 ) 15/08/2008 Per Andersson
Future IPY cruises
Beauford Sea (IPY) 2009 Roger Francois and Kristin Orians Artic
ATOS-2 (IPY) 2009 Antonia Tovar-Sanchez Antarctic
Full list of IPY cruisesFull list of IPY cruises
GEOTRACES CruisesGEOTRACES Cruises
Cruise Departure Dates GEOTRACES PI Region Ship
Process studies
DynaLife Jan-March 2009 Anne-Carlijn Alderkamp Antarctic Nathaniel Palmer
FeCycle II 2008 (sept) Philip Boyd FRV Tangaroa
GEOTRACES cruises
Intercalibration 1st leg 08/06/2008 Greg Cutter
Intercalibration 2nd leg 29/06/2008 Greg Cutter
Future cruises
NL. Iceland to Bermuda, provisionally scheduled in 2010 (June) West Atlantic RV Pelagia
NL. Bermuda to Barbados 2010 (August) West Atlantic RV Pelagia
UK 40oS cruise 2010 (Oct) Atlantic
US. Atlantic section 2010
NL.Barbados to Recife in 2011 2011 West Atlantic RV Pelagia
NL. Recife to Buenos Aires in 2011 West Atlantic RV Pelagia
Quality control at GDAC/BODCQuality control at GDAC/BODC
• Quality control Visualisation of data –
“screening” Anomalous data points marked Developing more complex
automated systems
• Metadata assembly Oracle tables Link data to time, position, originator, restrictions, XML
documentation…
• Audit
• Banking Final version to a secure location Visible via GIDAC/BODC web software / NDG
Screening of CTD and underway dataScreening of CTD and underway data
Visual screening on high speed graphics workstation using BODC’s graphics editor Serplo
Serplo enables the display of multiple parameters and rapid zooming
Screening of CTD and underway dataScreening of CTD and underway data
So a suspect value is NOT edited but a 1-byte quality control flag becomes associated with it
GEOTRACES DACGEOTRACES DAC
BODC is taking on responsibility for assembling and delivering GEOTRACES data
Who are BODC
A national facility for storing and distributing data concerning the marine environment (started in 1969).
• BODC deal with biological, chemical, physical and geophysical data and our data bases contain measurements of over 10,000 different variables.
This includes quality control, dissemination and long term archival
• Are involved in International project such as Argo, CLIVAR< WOCE, GLOSS and GEBCO and now GEOTRACES