GRIB NetCDF Setting the scene - ECMWF · 2015. 11. 17. · PT and PV levels SCDA Forecast SCDA...
Transcript of GRIB NetCDF Setting the scene - ECMWF · 2015. 11. 17. · PT and PV levels SCDA Forecast SCDA...
-
Slide 1 © ECMWF
GRIB NetCDFSetting the scene
-
Slide 2 © ECMWF
GRIB at ECMWF
-
Slide 3 © ECMWF
ECMWF model output are encoded in GRIB
● One parameter
● One date
● One time
● One step
● One level
● One forecasting system
● …
-
Slide 4 © ECMWF
Size of the archive vs. Sustained HPC performance
-
Slide 5 © ECMWF
GRIB are stored in MARS
● 28 years in existence
● A managed archive
● MARS is not a file system
– Users are not aware of the location of the data
– Retrievals are expressed in meteorological terms
● An archive, not a database
– Metadata online
– Data offline (automated tape library)
-
Slide 6 © ECMWF
MARS in numbers
●53 Petabytes of primary data in ~ 11 million files, for more
than 170 billion (1.7 · 1011) meteorological fields
● ~ 800 Gigabytes of metadata
●200 million fields added daily (peaks at 100 Terabytes)
●650 active users/day executing 1.5 million requests/day
●~ 100 Terabytes retrieved daily
-
Slide 7 © ECMWF
Addition of data type in MARS
3DVar 4DVar 12 Hour 4DVar DCDA
EPS 15 days
Vareps/Monthy EDA
50 Members EPS
T106L16
T106L19
T213L31T319L31
T319L50
T319L60
T511L60 T799L91
T1279L91
FC Pressure levels
FC Model levels
Chernobyl
SSTs
TOGA FC
Errors in FG
Waves
EPS
Clusters
Waves FG
Probabilities
Ensemble means & stdev
Other centers
Sensitivity
NCEP EPS
OI
Errors in AN and FG
4D-Var
Tubes
Wave EPS
Errors if FG, surface
Wave proba.
SCDA Analysis
PT and PV levels
SCDA Forecast
SCDA Forecast
Wave 4V
SCDA Waves
Multi-Analysis
4D-var increments
EFIs
DCDA
DCDA Wave
SCDA 4D-Var
EPS PT levels
Overlap, CalVal
Wave EFIs
Vareps/Monthy
4d-Var Model errors
Ensemble data assimilation
X-MP/4 Y-MP/8 C90/12
C90/16
VPP700-48
VPP700-112
VPP5000 IBM-P4 IBM-P5 IBM-P5+ IBM-P6
10M
100M
1G
10G
100G
1T
10T
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10
Weekend EPS
Weekly Monthly
Extra fields, new gaussian grid
00Z EPS
00Z 10 day FC
00Z Run
00Z Run
End sensitivity
-
Slide 8 © ECMWF
What data?
● Operational runs
– Medium-range (15 days, twice a day, including ensemble)
– Extended-range (a month), Long-range (a year)
– Re-forecasts , Ocean waves
● Projects
– Reanalyses (15 years, 45 years, 100 years)
– WMO: TIGGE, TIGGE-LAM, S2S
– EU projects: DEMETER, ENSEMBLES, EURO4M, MACC, PROVOST, ECSN…
● Research experiments
– ECMWF, Member States
● Member States’ Projects
– COSMO-LEPS , Aladin-LEAF
-
Slide 9 © ECMWF
MARS language – Retrieve request
retrieve, action
class = od, identificationstream = oper,
expver = 1,
date = -3, date & time relatedtime = 12,
type = analysis, data relatedlevtype = model levels,
levelist = 1/to/137,
param = temperature,
grid = 2.5/2.5, post-processing
target = “analysis” storage
-
Slide 10 © ECMWF
Metview and GRIB
-
Slide 11 © ECMWF
Metview and NetCDF
-
Slide 12 © ECMWF
But we want more…
-
Slide 13 © ECMWF
We want to archive NetCDF in MARS
…and provide a service on par with what we do for GRIBs.
-
Slide 14 © ECMWF
GRIB NetCDFSetting the scene (part 2)
-
Slide 15 © ECMWF
Disclaimer
GRIB vs. NetCDF: I have no preferences
I know GRIB better, that’s all.
-
Slide 16 © ECMWF
GRIB
● Designed for telecoms
● As small as possible, table driven (must to read the doc :-)
● Record/message format, in memory
● No separation between format and data model
● Used in operational NWP, exchanged on the GTS
● Designed by committee
● No such thing as a “GRIB file”, just a file with GRIB messages
% cat file1.grib file2.grib > file3.grib
● file3.grib is a valid “GRIB file”
-
Slide 17 © ECMWF
NetCDF
● Primarily an API/library
● Self describing
● Needs a convention (CF)
● Clean separation between format and data model (CF)
● Loads of tools
● Used in academia, oceanography and climate modeling
● CF: community driven
● NetCDF is a file format:
% cat file1.nc file2.nc > file3.nc
● file3.nc is NOT a valid NetCDF file.
-
Slide 18 © ECMWF
GRIB/BUFR: data and
envelop are mixedCF: data stored
using a convention
NetCDF is an envelop
NetCDF vs. GRIB/BUFR
-
Slide 19 © ECMWF
Converting from GRIB to NetCDF
● Metadata
– Date and time
– Parameters
● Data
– Units
– Grids
● Compression
– Internal (Packing)
– External (zlib)
● File structures
-
Slide 20 © ECMWF
Metadata
-
Slide 21 © ECMWF
Why do we need metadata?
● Type 1: we cannot use the data otherwise
– E.g. description of the grid (latitudes, longitudes)
– E.g. units
● Type 2: identification (used for indexing, use for querying)
– E.g. date/time
● Type 3: Nice to have
– E.g. contact details of principal investigator
● GRIB => NetCDF
– How to map this metadata?
– How to map the data?
-
Slide 22 © ECMWF
Convention? What convention?
● Parameter names are well covered by CF.
● What about other attributes:
– lat/lon?
– lat/long?
– latitude/longitude?
– x/y?
– y/x?
● And:
– lev?
– level?
– height?
– z?
● I have seen them all. Are users supposed to inspect new files before using them?
-
Slide 23 © ECMWF
Units
-
Slide 24 © ECMWF
Should covertion modify the data?
● Example: Total Precipitations
– NetCDF: kg m-2 (e.g. mm assuming 1l of water = 1km)?
– GRIB: m m-2 ?
– Mapping implies multiplication by 1000
● Is that acceptable?
-
Slide 25 © ECMWF
Parameter names
-
Slide 26 © ECMWF
“Semantic Spectrum”M
ore
gen
era
l Mo
re s
pecific
surface_air_temperature
discipline = meteorology
parameter = temperature
level = 2
level_unit = meter
BUFR GRIB NetCDF/CF
Computer “friendly” Human friendly
-
Slide 27 © ECMWF
Two interesting examples:
tendency_of_atmosphere_mass_content_of_
particulate_organic_matter_dry_aerosol_
expressed_as_carbon_due_to_emission_from_
residential_and_commercial_combustion
surface_upward_mass_flux_of_carbon_dioxide_
expressed_as_carbon_due_to_emission_from_
fires_excluding_anthropogenic_land_use_change
-
Slide 28 © ECMWF
File structure
-
Slide 29 © ECMWF
File structures
● How to structure NetCDF files?
● ECMWF golden rule:
– File must be self describing
– File name MUST NOT carry any semantic
● Consider:
% mv ECMWF-ERA20C-geopotential-20010101.nc foo.nc
● What is in foo.nc ?
– ncdump (or grib_dump) should tell us
-
Slide 30 © ECMWF
So how do we structure a NetCDF file?
● One 2D field per file?
● One 3D field per file?
● Many 3D fields per file?
● Many 4D files per files?
=> See effect on packing
-
Slide 31 © ECMWF
File structure: the problem: how to map to NetCDF files?
GRIB file
-
Slide 32 © ECMWF
File structure: one file per field
GRIB file
file1.nc file2.nc file3.nc
file4.nc file5.nc
file7.nc
file6.nc
file11.nc
file10.ncfile8.nc
file9.nc file12.nc
-
Slide 33 © ECMWF
File structure: group by time? By level? Both?
GRIB file
file1.nc
file2.nc
file3.nc
file4.nc
-
Slide 34 © ECMWF
File structure: group by time? By level? Both?
GRIB file NetCDF file
A =
B =
C =
D =
-
Slide 35 © ECMWF
Date and time
-
Slide 36 © ECMWF
Date & time
D
D-1
D-2
D-3
T T+1 T+2 T+3 T+4
-
Slide 37 © ECMWF
Date & time
D
D-1
D-2
D-3
T T+1 T+2 T+3 T+4
-
Slide 38 © ECMWF
D
D-1
D-2
D-3
T T+1 T+2 T+3 T+4
Date & time
-
Slide 39 © ECMWF
D
D-1
D-2
D-3
T T+1 T+2 T+3 T+4
Date & time
-
Slide 40 © ECMWF
D
D-1
D-2
D-3
T T+1 T+2 T+3 T+4
Date & time
-
Slide 41 © ECMWF
Date & time (Hindcasts)
D
T T+1 T+2 T+3 T+4
D – 1Y
T T+1 T+2 T+3 T+4
T T+1 T+2 T+3 T+4
T T+1 T+2 T+3 T+4
T T+1 T+2 T+3 T+4
D – 2Y
D – 3Y
D – 4Y
-
Slide 42 © ECMWF
Date & time (long window 4D-var)
T T+1 T+2 T+3 T+4T-1T-2
Data assimilation Forecast
-
Slide 43 © ECMWF
Averaging: ensemble means
D
T T+1 T+2 T+3 T+4
M1
M2
M3
M4
M5
M5
M7
-
Slide 44 © ECMWF
Averaging: monthly means
D
T T+1 T+2 T+3 T+4
M1
M2
M3
M4
M5
M5
M7
-
Slide 45 © ECMWF
Averaging: monthly means of ensemble means
D
T T+1 T+2 T+3 T+4
M1
M2
M3
M4
M5
M5
M7
-
Slide 46 © ECMWF
Compression
-
Slide 47 © ECMWF
GRIB simple packing
● Maps floating point range [Fieldmin,Fieldmax] to integerrange [0,2n-1]
● It’s equivalent to sampling the field into 2n buckets
– Packing is lossy
● n can be anything between 1 and 32 (standard does not prevent n to be 255!)
● Most of the fields are packed with n = 16.
– GRIB fields are half the size of the equivalent single precision float (or a quarter of double precision)
● Blind conversion from GRIB to NetCDF will create files twice as large (NC_FLOAT) or four time bigger (NC_DOUBLE)
-
Slide 48 © ECMWF
NetCDF supports “simple packing”
● Using scale_factor and add_offset, and packing to NC_BYTE, NC_SHORT, NC_INT
– Please note that these are signed (unsigned version comes with NetCDF4)
● Only multiple of 8 bits are supported
– NC_BYTE = 8, NC_SHORT = 16, NC_INT = 32
● Missing values:
– GRIB uses a “bitmap” to mark the missing values
– NetCDF uses _FillValue to mark missing values
– Consequence:
● When packing NetCDF to NC_BYTE or NC_SHORT , we have 1 less value than GRIB
● We cannot have encode the same range
-
Slide 49 © ECMWF
Conversion and “simple packing”: major issue
● GRIB applies simple packing per 2D field
● NetCDF may apply packing per 3D (space and level, space and time) and even 4D fields (space, level and time)
● Consequence: mapping floating point range [Fieldmin, Fieldmax] to integer range [0, 2n-1] is done on more values in the case of NetCDF
– Higher loss of accuracy
● Conversion leads to loss of information !!!!!
– That’s not good™
-
Slide 50 © ECMWF
Difference is small, but non-zero (showing precipitations)
-
Slide 51 © ECMWF
Grids
-
Slide 52 © ECMWF
Reduced Gaussian Grid
-
Slide 53 © ECMWF
What do I want from this workshop?
● A general agreement on how to map GRIB to NetCDF (parameters, units, metadata, file structures,…)
– So no one complains that we “are not doing it right”
● A general agreement on how we deal with future requirements (new grids, new parameters, …)
– Maybe a tighter collaboration between WMO and the CF community, like we did for OGC?