ECMWF ODB Training 2006 slide 1 Introduction to Observational DataBase (ODB) Sami Saarinen, Paul...
-
Upload
britney-peters -
Category
Documents
-
view
238 -
download
0
Transcript of ECMWF ODB Training 2006 slide 1 Introduction to Observational DataBase (ODB) Sami Saarinen, Paul...
ECMWFODB Training 2006 slide 1
Introduction to Observational DataBase (ODB)
Sami Saarinen, Paul BurtonECMWF
22-Mar-2006
ECMWFODB Training 2006 slide 2
OverviewIntroduction to ODB
Creating a simple database
Use of simulobs2odb –program
Visualizing data using odbviewer, ODBTk
The bigger picture
ODB within IFS/4DVAR-system
A more complex database
Manipulating ODB from Fortran90
Tools: odbless, odbdiff, odbcompress, odbdup, odb2netcdf
ECMWFODB Training 2006 slide 3
OverviewIntroduction to ODB
Creating a simple database
Use of simulobs2odb –program
Visualizing data using odbviewer, ODBTk
The bigger picture
ODB within IFS/4DVAR-system
A more complex database
Manipulating ODB from Fortran90
Tools: odbless, odbdiff, odbcompress, odbdup, odb2netcdf
ECMWFODB Training 2006 slide 4
Introduction to ODBODB is a tailor made database software developed at
ECMWF to manage very large observational data volumes through the IFS/4DVAR-system, and to enable flexible post-processing of observational data
Observational database usually contains following items:
Observation identification, position and time coordinates
Observation value, pressure levels, channel numbers
Various quality control flags
Obs. departures from background and analysis fields
Satellite specific information
Other closely related information
ECMWFODB Training 2006 slide 6
Basic components of ODBODB/SQL-language
Data Definition Language: To describe what data items belong to database, what are their data types and how they are related (if any) to each other
Data Query Language: To query and return a subset of data which satisfies certain user specified conditions. This is the key feature of the ODB software !!
Fortran90 interface layer
Data manipulation : create, update & remove data
Execute ODB/SQL-queries and retrieve filtered data
To control MPI and OpenMP-parallelization
ECMWFODB Training 2006 slide 8
Typical ODB usage patternsDatabase can be created interactively or in batch mode
We usually run our in-house BUFR2ODB in batch
New observation types can also be fed in via text file
Complete database manipulation currently prefers using Fortran90-interface, but read/only database can also be accessed via rudimentary client-server –interface (C/C++)
When database has been created, the application program normally queries data and places the result (also known as view) into a data matrix allocated by the user
There can be virtually any number of active views at any given time. These can be updated and fed back to database
ECMWFODB Training 2006 slide 9
OverviewIntroduction to ODB
Creating a simple database
Use of simulobs2odb –program
Visualizing data using odbviewer, ODBTk
The bigger picture
ODB within IFS/4DVAR-system
A more complex database
Manipulating ODB from For
Tools: odbless, odbdif, odbcompress, odbdup, odb2netcdf
ECMWFODB Training 2006 slide 10
Creating a simple databaseWe will create a very simple database using text files
The 3 text files describe
Data layout i.e. what data items comprise this ODB
Location and time information of observations
Actual observation measurement information for each location at the given pressure levels
Feed these files into simulobs2odb-program
Discover the data values in database by using odbviewer
ECMWFODB Training 2006 slide 11
Data definition layout : MYDB.ddl
CREATE TABLE hdr AS (
seqno pk1int,
obstype pk1int,
codetype pk1int,
lat pk9real,
lon pk9real,
date yyyymmdd,
time hhmmss,
body @LINK,
);
CREATE TABLE body AS (
entryno pk1int,
varno pk1int,
vertco_type pk1int,
press pk9real,
obsvalue pk9real, );
ECMWFODB Training 2006 slide 12
Input file#2 : hdr.txt
#hdr
obstype = 2
codetype = 141
seqno lat lon date time body.len
1 45 -15 20041101 000000 1
ECMWFODB Training 2006 slide 13
Input file#3 : body.txt
#body
entryno varno vertco_type press obsvalue
1 2 1 50000 251.0
ECMWFODB Training 2006 slide 14
Running simulobs2odbInitialize ODB interactive environment :
use odb
Create database using the following simple command :
simulobs2odb –l MYDB –i hdr.txt –i body.txt
As a result of these commands, a small database called MYDB has been created and it contains one data pool with two tables hdr and body, which are linked (related) to each other via special @LINK data type
It is now easy to extend database by providing more data, or specifying more data items, or adding more tables, or all above at the same time
ECMWFODB Training 2006 slide 15
Visualizing with odbviewerHistory: odbviewer was originally written to be used as a
debugging tool for ODB software development
Linked with ECMWF graphics package MAGICS/MAGICS++ it displays coverage plots
Also a textual report generator
Displays output of data queries
“Sensitive” to ODB/SQL-language : tries automatically produce both coverage plot and textual report for the user
Textual report itself can be invaluable source of information for further post-processing tasks
ECMWFODB Training 2006 slide 16
Running odbviewerGo to database directory
cd MYDB
Run
odbviewer –q ‘SELECT lat,lon,press,obsvalue\
FROM hdr, body \
WHERE obstype = 2’
ECMWFODB Training 2006 slide 18
Some odbviewer [options]-h List of options (gimme some “help” !)
-q ‘SQL-stmt’ Provide ODB/SQL-statement inline
-v viewname/poolno Choose SQL name (& optionally pool number)
-p “1-10,12,15” Choose from a subset of pools
-R No radians-to-degrees conversion for (lat,lon)
-r Enforce radians-to-degrees conversion
-c Clean start (i.e. recompile all)
-e editor Choose preferred editor
-e batch Run in batch mode (same as –e pipe)
-N Do not produce a report at all
-I Do not show plot immediately
-P projection Change projection
-C file.cmap Supply a color map file
-A plot_area Choose plotting area
ECMWFODB Training 2006 slide 19
ODBTk : The ODB ToolkitGUI based ODB visualisation tool
Easy way for non-experts to build SQL
Interactive viewing of observational data
Can refine SQL “WHERE” statement as you view the data
Portable, lightweight applicationRequires ODB, perl, Fortran90 & C compilers
ECMWFODB Training 2006 slide 20
ODBTk : Building an SQL
Twin views on structure
Hierarchical structure
Allows relationship between
tables/columns to be seen
“Flat structure”
Easy to find a given column/member
or table
Allows user to sort structure
SQL library
Both local & shared
ECMWFODB Training 2006 slide 23
OverviewIntroduction to ODB
Creating a simple database
Use of simulobs2odb –program
Visualizing data using odbviewer, ODBTk
The bigger picture
ODB within IFS/4DVAR-system
A more complex database
Manipulating ODB from Fortran90
Tools: odbless, odbdiff, odbcompress, odbdup, odb2netcdf
ECMWFODB Training 2006 slide 26
A more complex databaseIn the real world a database may contain many more tables
(>>5) than in the simple example earlier
Each table can contain 10—50 data columns
There can also be a sophisticated data hierarchy (next slide) to describe potentially complex relationships between tables
In order to provide a good parallel performance on supercomputers, data tables are furthermore divided into data pools
They behave like sub-databases within a database
Allows much bigger data sets than otherwise possible
ECMWFODB Training 2006 slide 28
ECMWF BUFR to ODB conversionODBs at ECMWF are normally created by using bufr2odb
Enables MPI-parallel database creation efficient
Allows retrospective inspection of Feedback BUFR data by converting it into ODB
bufr2odb can also be used interactively, for example: bufr2odb –i bufr_input_file –I 1-20 –n 4
The preceding example creates 4 pools of ECMA database from the given BUFR input file, but includes only BUFR subtypes from 1 to 20 (inclusive)
Feedback BUFR to ODB works similarly:
fb2odb –i feedback_bufr_file –n 8 –u 2
ECMWFODB Training 2006 slide 29
OverviewIntroduction to ODB
Creating a simple database
Use of simulobs2odb –program
Visualizing data using odbviewer, ODBTk
The bigger picture
ODB within IFS/4DVAR-system
A more complex database
Manipulating ODB from Fortran90
Tools: odbless, odbdiff, odbcompress, odbdup, odb2netcdf
ECMWFODB Training 2006 slide 30
Manipulating ODB from Fortran90Currently Fortran90 is the only way to fill an ODB database
simulobs2odb is also a Fortran90-program underneath
likewise odbviewer or practically any other ODB-tool
Also: to fetch and update data, Fortran90 is necessary
ODB Fortran90 interface layer offers a comprehensive set of functions to
Open & close database
Attach to & execute precompiled ODB/SQL queries
Load, update & store queried data
ECMWFODB Training 2006 slide 31
An example ODB program program main
use odb_module
implicit none
integer(4) :: h, rc, nra, nrows, ncols, npools, j, jp
real(8), allocatable :: x(:,:)
npools = 0
h = ODB_openODB_open(‘MYDB’, ’OLD’, npools=npools)
< data manipulation loop ; see next page >
rc = ODB_closeODB_close(h, save=.TRUE.)
end program main
ECMWFODB Training 2006 slide 32
Data manipulation loop DO jp=1,npools
! Execute SQL, allocate space, get data into matrix
rc = ODB_selectODB_select(h,’sqlview’,nrows,ncols,poolno=jp)
allocate(x(nrows,0:ncols))
rc = ODB_getODB_get(h,’sqlview’,x,nrows,ncols,poolno=jp)
! Update data, put back to DB, deallocate space
call update(x,nrows,ncols) ! Not an ODB-routine
rc = ODB_putODB_put(h,’sqlview’,x,nrows,ncols,poolno=jp)
deallocate(x)
rc = ODB_cancelODB_cancel(h,’sqlview’,poolno=jp)
! Use the following only with READONLY-databases
! rc = ODB_releaseODB_release(h,poolno=jp)
ENDDO
ECMWFODB Training 2006 slide 33
Compile, link and run
(1) use odb # once per session
(2) odbcomp MYDB.ddl # once only;often from file MYDB.sch
(3) odbcomp sqlview.sql # recompile only when changed
(4) odbf90 main.F90 update.F90 –lMYDB –o main.x # link
(5) ./main.x # run
ECMWFODB Training 2006 slide 34
OverviewIntroduction to ODB
Creating a simple database
Use of simulobs2odb –program
Visualizing data using odbviewer, ODBTk
The bigger picture
ODB within IFS/4DVAR-system
A more complex database
Manipulating ODB from Fortran90
Tools: odbless, odbdiff, odbcompress, odbdup, odb2netcdf
ECMWFODB Training 2006 slide 35
odblessA textual browser that allows to look at ODB data page-by-
page –basis (a little like Unix less-command):
By default calculates statistical summary for each retrieved data column
Cheap with near-optimal ODB data access pattern
User has a choice of specifying starting row
Usage: odbless –q ‘SELECT column(s) FROM table(s) WHERE …’ \
–s starting_row –n number_of_rows_to_display \
[–b buffer_size –X]
ECMWFODB Training 2006 slide 36
odbdiffEnables to compare two ODB databases for differences
Very useful tool when trying to identify errors/differences between operational and experimental 4DVAR runs
Usage:
odbdiff –q ‘SELECT …’ DATABASE1 DATABASE2
By default brings up an xdiff-window with respect to diffs
If latitude and longitude were given in the data query, then also produces a difference plot using odbviewer-tool
ECMWFODB Training 2006 slide 37
odbcompressEnables creation of very compact database from the
existing one for
archiving purposes, or for smaller footprint
Makes post-processing considerably faster
At this point the user has choices of both
Truncating the data precision
Leaving out columns that are less of importance
Early tests show that this new tool achieves compression factors from 2.5X to 11X
the higher compression being for satellite data !!
ECMWFODB Training 2006 slide 38
odbdupDuplicates database(s) by copying metadata (low volume),
but shares the actual data (high volume)
Allows database sharing between multiple users
Over shared (e.g. NFS mounted) disk
Enables creation of time-series database, for example: odbdup –i “200601*/ECMA.conv” –o USERDB
The previous example creates a new database labelled as USERDB, which presumably spans over all the conventional observations during January 2006
The heureka is : user has now access to a whole month of data as if it was situated in one single database !!
ECMWFODB Training 2006 slide 39
odb2netcdfTranslates the given ODB-query (or whole ODB-table) into a
series of NetCDF-files, by default one file for each ODB data pool
Usage:
odb2netcdf –q ‘SELECT …’
The result files can be viewed with standard NetCDF tools like ncdump and ncview
The files can also be produced in NetCDF packed format (with a caveat of truncated precision)
ECMWFODB Training 2006 slide 40
Also … Some interesting factsWritten mainly in C-language
Except Fortran90-interface and IFS/4DVAR interface
Except BUFRODB (by Milan Dragosavac)
ODB/SQL is currently converted into C-code
10 lines of SQL generates >> 100 lines of C-code
Standalone ODB installation (w/o IFS) is also available
Can be built in about 30 minutes for Linux/laptop
Tested at least on the following machines
SGI/Altix, IBM Power3/4, Linux Intel/AMD, VPP, …
Automatic binary data conversion guarantees database portability between different machines
ECMWFODB Training 2006 slide 41
… and some ODB “limitations”ODB software is clearly meant for large scale computation
since – given lots of memory and disk space, fast CPUs:
A single program can handle up to 2^31 ODB databases
A single database can have up to 2^31 data pools
A single database can have any number of tables
A single table in a data pool can have up to 2^31 rows and (by default) 9999 columns
A single ODB/SQL-query over active data pools can retrieve up to 2^31 rows in one go
These really big numbers show that ODBs potential is on parallel computers, but we haven’t forgotten desktop PCs!
ECMWFODB Training 2006 slide 42
Finally…ODB software is developed to allow unprecedented amounts
of satellite data through the IFS/4DVAR system
Software has been operational at ECMWF since June’2000, but is still evolving
Emphasis is now on graphical post-processing and how to enable fast access to very large amounts of data
Other ECMWF member states and co-operating countries that are also using or just becoming users of ODB
MeteoFrance, DWD, Hungary, Aladin/HIRLAM-nations
MetOffice is considering via collaboration with BoM
University of Vienna via re-analysis ERA40 collaboration