FIELDS DATABASE - redmine.dkrz.de · july 2007 ecmwf fdb - fields database what is it? history? why...

22
ECMWF July 2007 ECMWF Data and Services Section FIELDS DATABASE D. Jokic

Transcript of FIELDS DATABASE - redmine.dkrz.de · july 2007 ecmwf fdb - fields database what is it? history? why...

ECMWFJuly 2007

ECMWF – Data and Services Section

FIELDS DATABASE

D. Jokic

ECMWFJuly 2007

FDB - FIELDS DATABASE

WHAT IS IT?

HISTORY?

WHY DO WE NEED IT?

FEATURES?

WHAT DATA IS STORED?

HOW DOES IT WORK?

WHERE IS IT USED?

PERFORMANCE?

FUTURE?

ECMWFJuly 2007

What is it?

DBMS

System for organizing, storing and retrieving bulk data

Designed and implemented in the house

Distributed database

MPI/O

Object oriented

ECMWFJuly 2007

History

FORTRAN version until 1994;

C “Version 0” on Cray YMP16-C90, Cray T3E;

Cray C90 – New Configuration / Database Structure 1995;

Version 1.0 Distributed Database 1996;

Version 2.0 Multi-Server, Multi-Client Mode, Asynchronous I/O

1996/1997;

Version 3.0 MPI/O Multiple File Systems 1997/1998

V 3.3 Parallel I/O on WS

Version 4.0

ECMWFJuly 2007

Why do we need it?

We use our own data

Where data resides?

How is data organised?

Other people use our data

Tight time schedule

Research experiments

Variability of the output

New projects

New data attributes

Portable

ECMWFJuly 2007

Features

Variable length binaries in the database

Locking

Security

Data Dictionary

Client-Server Architecture

MPIO

Multi File System

ECMWFJuly 2007

More Features

Flexibility

Ability to handle any bulk data

Network support

No data position pre-setting

Easy to switch from one configuration to another

Easy to use

Simple user interface

Same interface for all configuration Modes

Enables file pre-allocations

ECMWFJuly 2007

How does it work?

Interface

Database Structure

Configuration

Standalone

Client-Server

Multi-Client Multi-Server

MPI

IFS (MPP)

Environment Variables

ECMWFJuly 2007

FDB Routine syntax overview

Initialisation routine

Initfdb();

Manage databases

Openfdb(log_name, desc, mode, stat);

Closefdb(desc);

Manipulate records

Writefdb(desc, data, len);

Readfdb(desc, data, len);

Set attribute values

Setvalfdb(desc, attribute_name, value);

ECMWFJuly 2007

FDB I/O

Raw

Asynchronous

List I/O

Remote

ECMWFJuly 2007

Environment Variables

Configuration: FDB_CONFIG_MODE, FDB_CONFIG_FILE,

FDB_SHARED_MEMORY, FDB_MPI_CACHE

File Systems : FDB_ROOT, FDB_FILE_SYSTEM

Security : FDB_OWNER, FDB_GROUP, FDB_MODE

Buffers : FDB_MAX_CACHE, FDB_READ_BUFFR_SIZE,

FDB_SECTOR_SIZE, FDB_NOF_BUF

Flow control : FDB_DEBUG, FDB_TRACE, FDB, FDB_ABORT

Database : FDB_SIGNALS, FDB_LOCKING

Distributed : FDB_SERVER_HOST, FDB_SERVER_PORT,

FDB_SERVER_BUF_SIZE,

FDB_SEARCH_SIBLING_HOSTS

ECMWFJuly 2007

Standalone mode

FDB

FDB

FDB

FDB buffer

Dynamic indexing

EXEC

ECMWFJuly 2007

Client-Server mode

FDB

Server

Spooler

server

Ph

ys

ica

l ne

two

rk

TC

P/IP

Buffering on client/server side / flushing

ECMWFJuly 2007

Multi-Client Mode

Clie

nt

Clie

nt

Clie

nt

Clie

nt

FD

B S

erv

er Buffering on client/server side

(select server)

FDB

(Parallel Application)

ECMWFJuly 2007

Massive Parallel IO

Parallel Application

Bulk data

Meta data

I/O PE

ECMWFJuly 2007

Multi File System Database

Parallel Application

Bulk data

Meta data

I/O P

E

I/O P

E

I/O P

E

I/O P

E

ECMWFJuly 2007

ECMWF Operations – ‘o’ suite

Products

Generation

FD

B S

erv

er Plotting

MS Jobs

MS Jobs

FDB FDB FDB

FDB

PDB

75330 fields, 108.3 GB 5100 fields, 7.1 GB

80430 fields

115.4 GB

256PEs 128PEs

Dissemination

Requirements

Bit-Maps Climatology

Dissemination

concurrently

separately

T799T799

13085 files

160GB

ECMWFJuly 2007

ECMWF Operations – ‘mc’ suite

Products

Generation

FD

B S

erv

er

Plotting

MS Jobs

FDB

FDB

FDB

PDB

53209 fields,

26.4 GB 53209 fields

26.4 GB

Dissemination

Requirements

Bit-Maps

Dissemination

Means

FDB Server

Probabilities

TubesClusters

288PEs T511

PF + CFX in parallel

50

FDB Server

ECMWFJuly 2007

A Few Numbers

560 active users, at ECMWF and in the Member States

100 000 retrieval requests a day, 10 000 000 fields

5 000 000 fields added daily (0.7 Terabyte)

More than half a Petabyte

More than 3.5 109 meteorological fields

Analysis from 1980, Forecasts from 1985

After ERA40, analysis and observations since 1957

ECMWFJuly 2007

Performance

Independent of number of fields and the volume of data in the

database

Dynamic indexing during production processes

Data partitioning

“It is easy to write data out fast from the parallel application. It is,

however, not easy to make straight use of such data”

ECMWFJuly 2007

Future

Distributed Synchronised database

New phase supercomputer

?

ECMWFJuly 2007

Thank you!!!