Lossless Compression of Meteorological Data in GRIB Format
R. LorentzFraunhofer Institute for
Scientific Computation and Algorithms (SCAI)Germany
Seite 2Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo
Data Compression
• What is it?
• What is it good for in the context of WIS? a) Reducing archive size b) Speeding up data transfer
• Who needs it? a) Archive of size above ~ 100 Terabyte b) Frequent transfer of large blocks of data ~ 1 GB
Seite 3Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo
Data Compression
Some numbers
Lossless Data compression:
e.g., Zip programs: compression factors
for text: 2 – 3
for simulation data: 1 – 1.2
Lossy data compression
e.g., Jpeg, Mpeg: compression factors
for pictures: ~ 10 – 100
(not suitable for floating point numbers)
Seite 4Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo
Data Compression
Disadvantages of data compression
1. Costs resources
compression and decompression take time, say 20 MB/sec for a 3 Ghz Linux PC
2. Software must be integrated into the production run
Seite 5Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo
Data Compression
Example
Compression of meteorological data for the German Weather Service
1. This is lossless compression of LME data in GRIB1 format
compression factor ~ 2,5
archive size: 3.5 Petabyte
2 Data on rectangular grids
3. Compression factor is most important
Seite 6Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo
Data Compression
Data Formats
Meteorological
• GRIB 1, 2: has built-in compression
• BUFR: has compression option
General purpose
• HDF5
• Netcdf
Seite 7Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo
Data Compression
GRIB1 Grid Types
1. Function values, rectangular grid
2. Function values, global triangular grid (GRIB2)
3. Function values, global Gaussian grid (topologically equivalent to a rectangular grid)
4. Function values, thinned Gaussian grid (global)
5. Spectral coefficients, both simple and complex packing
Seite 8Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo
Data Compression
How does lossless compression work?
For grid data:
Neighboring grid points have similar values => store only the differences
Heuristic conclusions:
• the higher the grid resolution, the better the compression
• the smoother the functions (observables) the better the compression
Seite 9Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo
Data Compression
Some Numbers
Computed with GRIBZip, a commercial program developed at SCAI
Average compression factors over all GRIB files of a forecast.
Rectangular grids, function values (LME model), resolution 7 km:
2D: K = 2.65
Rectangular grids, LMK model: resolution 2.8 km
2D: K = 2.75
Global triangular grids (GME model), resolution 40 km
2D: K = 2.38
Seite 10Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo
Data Compression
More Numbers
Calculated with experimental programs (work in progress):
• Gaussian grids, resolution 63 km (DKRZ, Max Planck Institute), K = 3.1
• Thinned Gaussian grids, resolution 39 km (ECMWF), K = 2.34
• Spectral data, simple packing, highest frequency 213 (DKRZ), K = 1.99
• Spectral data, complex packing (ECMWF), not possible??
Seite 11Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo
Data Compression
3D Compression
Compressing several layers of data together
Typical examples
Local grid: LME data
For 2D: K = 2.7
For 3d: K = 3.17
Global grid: GME data
For 2D: K = 1.97
For 3d: K = 2.59
XOne GRIB record
Several GRIB records
Seite 12Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo
Data Compression
3D Compression
Comments:
• 3D data (in GRIB format) is relatively hard to compress => 3D compression is particularly effective
• Improvement of the compression factor by 0.5 to 1.0
• Harder to implement
• Is it worth it? => depends on the proportion of 3D data.
Seite 13Prof. Dr. Rudolph Lorentz, FhG-SCAI.NuSo
Data Compression
My Message
Compression is possible and saves resources
1. For archiving
2. When transferring data
Work initiated as a research cooperation between the DWD (German Weather Service) and SCAI.
Work done together with R. Iza-Teran, M. Rettenmeier.
Top Related