Error-Controlled Lossy Compression Optimized for High …tao.cs.ua.edu/paper/BigData18-SZ2.0.pdf ·...

10
Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets Xin Liang * , Sheng Di , Dingwen Tao , Sihuan Li * , Shaomeng Li § , Hanqi Guo , Zizhong Chen * , and Franck Cappello †¶ * University of California, Riverside, CA, USA Argonne National Laboratory, Lemont, IL, USA University of Alabama, AL, USA § National Center for Atmospheric Research University of Illinois at Urbana-Champaign, IL, USA [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Abstract—Today’s scientific simulations require a significant reduction of the data size because of extremely large volumes of data they produce and the limitation of storage bandwidth and space. If the compression is set to reach a high compression ratio, however, the reconstructed data are often distorted too much to tolerate. In this paper, we explore a new compression strategy that can effectively control the data distortion when significantly reducing the data size. The contribution is threefold. (1) We propose an adaptive compression framework to select either our improved Lorenzo prediction method or our optimized linear regression method dynamically in different regions of the dataset. (2) We explore how to select them accurately based on the data features in each block to obtain the best compression quality. (3) We analyze the effectiveness of our solution in details using four real-world scientific datasets with 100+ fields. Evaluation results confirm that our new adaptive solution can significantly improve the rate distortion for the lossy compression with fairly high compression ratios. The compression ratio of our compressor is 1.5X8X as high as that of two other leading lossy compressors (SZ and ZFP) with the same peak single-to-noise ratio (PSNR), in the high-compression cases. Parallel experiments with 8,192 cores and 24 TB of data shows that our solution obtains 1.86X dumping performance and 1.95X loading performance compared with the second-best lossy compressor, respectively. I. I NTRODUCTION An efficient data compressor is increasingly critical to to- day’s scientific research because of the extremely large volume of data produced by high performance computing (HPC) simulations. Such big data are generally stored in a parallel file system (PFS), with limited storage space and limited I/O bandwidth to access. Some climate studies, for example, need to run large ensembles of 1 km×1 km simulations, with each instance simulating 15 years of climate in 24 h of computing time. Every 16 seconds, 260 TB of data will be generated across the ensemble, when estimating even one ensemble member per simulated day [1]. Based on our communication with researchers working on extreme-scale HPC simulations, they expect to see a compression ratio up to dozens of times or even 100:1, meaning that the bit-rate (i.e., the number of bits used to represent one data point on average after compression) should be no greater than 2 and best less than 1. Corresponding authors: Sheng Di, MCS division, Argonne National Laboratory, 9700 Cass Avenue, Lemont, IL 60439, USA Although considerably reducing the data size can definitely improve I/O performance significantly as well as the post- analysis efficiency, the decompressed data may suffer from significant distortion compared with the original dataset. If the compression ratio reaches 64:1 (that is, the bit-rate is about 0.5 bit per 32 bit data point), the precision of the decompressed data will degrade significantly in general, even using the best existing lossy compressors such as SZ [2], ZFP [3], and FPZIP [4]. This will cause a huge distortion in the visualization (shown in Section III). The question addressed in this paper is, can we significantly improve the precision of the decompressed data for the lossy compression with a fairly low bit-rate compared to state-of-the-art lossy compressors? Designing an efficient error-bounded lossy compressor that can significantly reduce the data size with a relatively high resolution of decompressed data is very challenging. We explain this point based on the two most effective lossy com- pressors [5]: SZ and ZFP. SZ and ZFP adopt largely different compression models: respectively, a data prediction model and an orthogonal transform model. In the data prediction model [2], each data value needs to be predicted by using its adjacent data points in multidimensional space according to the order of scanning data points. Moreover, the data used in the prediction during the compression have to be the decompressed values, in order to guarantee that the error bound is respected during the decompression, which is a strict limitation to the design of SZ [2]. Such a limitation may significantly degrade the prediction accuracy, especially for the lossy compression with a relatively large error bound, leading to limited compression quality. On the other hand, in the orthogonal transform-based compressor such as ZFP [3], the entire dataset has to be split into many small blocks (e.g., 4x4x4 for 3D data), each of which will be transformed to another decorrelated domain individually based on a fixed coefficient matrix. Relatively large error bound setting will introduce significant loss to the coefficients, leading to the over-distortion of decompressed data in turn. In this paper, we propose an adaptive lossy compression framework in terms of the data prediction compression model that can obtain a stable, high compression quality when the error bound is set to a large value for reaching a high com- pression ratio. Specifically, our contributions are as follows:

Transcript of Error-Controlled Lossy Compression Optimized for High …tao.cs.ua.edu/paper/BigData18-SZ2.0.pdf ·...

Page 1: Error-Controlled Lossy Compression Optimized for High …tao.cs.ua.edu/paper/BigData18-SZ2.0.pdf · 2018-11-07 · Error-Controlled Lossy Compression Optimized for High Compression

Error-Controlled Lossy Compression Optimized forHigh Compression Ratios of Scientific Datasets

Xin Liang∗, Sheng Di†, Dingwen Tao‡, Sihuan Li∗, Shaomeng Li§, Hanqi Guo†, Zizhong Chen∗, and Franck Cappello†¶∗ University of California, Riverside, CA, USA†Argonne National Laboratory, Lemont, IL, USA

‡University of Alabama, AL, USA§National Center for Atmospheric Research

¶ University of Illinois at Urbana-Champaign, IL, [email protected], [email protected], [email protected], [email protected],

[email protected], [email protected], [email protected], [email protected]

Abstract—Today’s scientific simulations require a significantreduction of the data size because of extremely large volumes ofdata they produce and the limitation of storage bandwidth andspace. If the compression is set to reach a high compression ratio,however, the reconstructed data are often distorted too much totolerate. In this paper, we explore a new compression strategythat can effectively control the data distortion when significantlyreducing the data size. The contribution is threefold. (1) Wepropose an adaptive compression framework to select either ourimproved Lorenzo prediction method or our optimized linearregression method dynamically in different regions of the dataset.(2) We explore how to select them accurately based on the datafeatures in each block to obtain the best compression quality. (3)We analyze the effectiveness of our solution in details using fourreal-world scientific datasets with 100+ fields. Evaluation resultsconfirm that our new adaptive solution can significantly improvethe rate distortion for the lossy compression with fairly highcompression ratios. The compression ratio of our compressor is1.5X∼8X as high as that of two other leading lossy compressors(SZ and ZFP) with the same peak single-to-noise ratio (PSNR),in the high-compression cases. Parallel experiments with 8,192cores and 24 TB of data shows that our solution obtains 1.86Xdumping performance and 1.95X loading performance comparedwith the second-best lossy compressor, respectively.

I. INTRODUCTION

An efficient data compressor is increasingly critical to to-day’s scientific research because of the extremely large volumeof data produced by high performance computing (HPC)simulations. Such big data are generally stored in a parallelfile system (PFS), with limited storage space and limited I/Obandwidth to access. Some climate studies, for example, needto run large ensembles of 1 km×1 km simulations, with eachinstance simulating 15 years of climate in 24 h of computingtime. Every 16 seconds, 260 TB of data will be generatedacross the ensemble, when estimating even one ensemblemember per simulated day [1]. Based on our communicationwith researchers working on extreme-scale HPC simulations,they expect to see a compression ratio up to dozens of times oreven 100:1, meaning that the bit-rate (i.e., the number of bitsused to represent one data point on average after compression)should be no greater than 2 and best less than 1.

Corresponding authors: Sheng Di, MCS division, Argonne NationalLaboratory, 9700 Cass Avenue, Lemont, IL 60439, USA

Although considerably reducing the data size can definitelyimprove I/O performance significantly as well as the post-analysis efficiency, the decompressed data may suffer fromsignificant distortion compared with the original dataset. Ifthe compression ratio reaches 64:1 (that is, the bit-rate isabout 0.5 bit per 32 bit data point), the precision of thedecompressed data will degrade significantly in general, evenusing the best existing lossy compressors such as SZ [2], ZFP[3], and FPZIP [4]. This will cause a huge distortion in thevisualization (shown in Section III). The question addressedin this paper is, can we significantly improve the precision ofthe decompressed data for the lossy compression with a fairlylow bit-rate compared to state-of-the-art lossy compressors?

Designing an efficient error-bounded lossy compressor thatcan significantly reduce the data size with a relatively highresolution of decompressed data is very challenging. Weexplain this point based on the two most effective lossy com-pressors [5]: SZ and ZFP. SZ and ZFP adopt largely differentcompression models: respectively, a data prediction model andan orthogonal transform model. In the data prediction model[2], each data value needs to be predicted by using its adjacentdata points in multidimensional space according to the order ofscanning data points. Moreover, the data used in the predictionduring the compression have to be the decompressed values, inorder to guarantee that the error bound is respected during thedecompression, which is a strict limitation to the design of SZ[2]. Such a limitation may significantly degrade the predictionaccuracy, especially for the lossy compression with a relativelylarge error bound, leading to limited compression quality. Onthe other hand, in the orthogonal transform-based compressorsuch as ZFP [3], the entire dataset has to be split into manysmall blocks (e.g., 4x4x4 for 3D data), each of which willbe transformed to another decorrelated domain individuallybased on a fixed coefficient matrix. Relatively large errorbound setting will introduce significant loss to the coefficients,leading to the over-distortion of decompressed data in turn.

In this paper, we propose an adaptive lossy compressionframework in terms of the data prediction compression modelthat can obtain a stable, high compression quality when theerror bound is set to a large value for reaching a high com-pression ratio. Specifically, our contributions are as follows:

Page 2: Error-Controlled Lossy Compression Optimized for High …tao.cs.ua.edu/paper/BigData18-SZ2.0.pdf · 2018-11-07 · Error-Controlled Lossy Compression Optimized for High Compression

• We propose an adaptive lossy compression frameworkthat is more effective in compressing the scientificdatasets with relatively large error bounds. In particular,we split the whole dataset into multiple non-overlappedblocks, and select the best-fit prediction method based ontheir data features.

• We develop new prediction methods that are particularlyeffective for the lossy compression with relatively largeerror bounds. On the one hand, we develop a hybridLorenzo prediction method by combining the classicLorenzo predictor [6] and the densest mean-value baseddata approximation method (also called mean-integratedLorenzo prediction). On the other hand, we develop alinear regression method that can obtain much higherprediction accuracy in this case since the design is beyondthe limitation that the decompressed values have to beused in the prediction.

• We explore how to select adaptively and efficiently thebest-fit prediction method based on the data featuresacross blocks during the compression. We also proposetwo optimization strategies that can further improve theprediction accuracy and quantization efficiency.

• We evaluate our proposed compression method comparedwith six other state-of-the-art lossy compressors, based onfour well-known large datasets produced by real-worldlarge-scale HPC simulations with a total of 104 fields.Evaluation results demonstrate that the compression ratioof our compressor is 1.5X∼8X as high as that of thetwo best lossy compressors (SZ and ZFP) with the samePSNR, respectively, in the high compression ratio cases.Parallel experiments with 8,192 cores and 24 TB ofdata shows that our solution obtains 1.86X dumpingperformance and 1.95X loading performance comparedwith the second-best lossy compressor, respectively.

The rest of the paper is organized as follows. In Section II,we discuss the related work. In Section III, we formulate theresearch problem. In Section IV, we provide an overview ofthe design on our new compression framework. In Section V,we present the mean-integrated Lorenzo prediction method.In Section VI, we describe the regression-based predictionmethod in details. In Section VIII, we present the evaluationresults on real-world simulation data with 100+ fields. Finally,we conclude with a vision of future work in Section IX.

II. RELATED WORK

The I/O bottleneck has become one of the most serious is-sues for the overall execution performance of today’s extreme-scale HPC scientific simulations. The recent I/O performancestudy [8] shows that the parallel I/O performance may onlyreach several GB/s on an optimized storage system facilitatedwith Luster and GPFS. As such, data compression is criticalto reduce I/O bottleneck for extreme-scale simulations.

Lossless compressors such as Gzip [9], FPC [10], FPZIP[4], BlosC [11] and other lossless compressors [12] cannotsignificantly reduce the floating point data size because of thelargely random nature of the ending mantissa bits. Specifically,

the lossless compression ratios are generally around 2:1 oreven lower, according to the recent study [13].

Traditional lossy compressors (such as [14], [15]) do notrespect a specific error bound set by users, such that thedecompressed data may not be available for scientific analysis.

Error-controlled lossy compression techniques [2], [3] havebeen considered the best trade-off solution compared with loss-less compression, but it still suffers from limited performancegain, especially in situations with terabytes or even petabypesof data to process because of the limited compression ratios.Tao et al. [16] reported that it took about 2,300 secondsto write 1.8 TB of cosmology simulation data on a parallelfile system when running 1,024 processes, and the total I/Oprocessing time was up to 500 seconds using the best lossycompressor with a compression ratio of 3.6:1. Their work alsoconcluded that the total I/O performance depends mainly onthe compression ratio because of the bounded I/O bandwidthof the PFS. That is, further reducing the data size significantlywill definitely improve the total performance.

The existing lossy compressors, however, cannot keep highfidelity on the reconstructed data after the significant reduc-tion of data size with a fairly high compression ratio (i.e.40:1 or higher). The studies [17] of lossy compression ofclimate simulation data within a large ensemble shows thatthe compression ratio needs to be around 8:1∼10:1 in orderto get an indistinguishable visualization of reconstructed datacompared with the original data. Lu et al. [5] demonstratedthat in a case study of fusion blob detection, the fusionblobs cannot be detected at all when the compression ratioincreases up to 30:1 for ZFP or 100:1 for SZ because of overdistorted visualization of the data. More examples of the overdistortion of lossy compression data in high-ratio compressioncases are presented in Section VIII of this paper. To controlthe distortion of the data in this situation, we develop anadaptive compression framework and explore more effectivedata prediction strategies. Evaluation results using four real-world HPC simulations with 100+ fields show that our solutionoutperforms the second-best lossy compressor significantly,from the perspective of both rate distortion and visualization.

III. PROBLEM FORMULATION

In this paper, we target a critical, challenging research issue:how to reduce the distortion of data during an error-boundedlossy compression with a fairly high compression ratio. Onthe one hand, the decompressed dataset must respect a stricterror bound (denoted by ε) on each data point comparedwith the original dataset. Not only do we need to guaranteethe maximum error bounded within an specified error bound,but we also need to maximize the peak single-to-noise ratio(PSNR) for the lossy compression of scientific data. PSNRis a commonly used indicator to assess the distortion of dataduring the lossy compression. It is defined as follows:

psnr = 20 · log10((dmax − dmin)/rmse) (1)where N refers to the number of data points, rmse =√

1N

∑Ni=1(di − d′i)2; di and d′i refer to the original and

Page 3: Error-Controlled Lossy Compression Optimized for High …tao.cs.ua.edu/paper/BigData18-SZ2.0.pdf · 2018-11-07 · Error-Controlled Lossy Compression Optimized for High Compression

(a) original raw data (b) our sol. (PSNR=56, SSIM=0.9855)

(c) SZ (PSNR=34.7, SSIM=0.7349) (d) ZFP (PSNR=35, SSIM=0.7004)

Fig. 1. Data Distortion of NYX(velocity x:slice 100) with CR=156:1

decompressed data values, respectively; dmax and dmin referto the max and min values, respectively. The larger thePSNR, the lower the mean squared error of the reconstructeddata versus original data, meaning higher overall precision ofreconstructed data and thus more accurate post-analysis.

Because of the diverse value changes in different regionsof a dataset, the reconstructed data would be distorded signif-icantly from the original values after the lossy compressionwith a high compression ratio such as 100:1. In Fig. 1(a), (c)and (d), we can observe that the visualization using slice 50 ofthe 3D dataset velocity x is largely distorted by two existingstate-of-the-art lossy compressors, SZ and ZFP, with a highcompression ratio up to 156:1. In particular, the decompresseddata under SZ has an artifact issue (i.e., the large differencein the bottom), as shown in Fig. 1(c); ZFP simply flusheslocal regions to a single color, making the reconstructed dataunavailable for user’s post-analysis. As shown in Fig. 1(b), thedecompressed data under our new solution will lead to veryhigh visual quality with the same compression ratio.

In summary, our objective is to maximize the PSNR duringerror-bounded lossy compression with a high compressionratio such as 100:1. A satisfactory lossy compressor mustconform to the following four conditions/criteria.

1) It should respect a specified maximum error: the differ-ence between the reconstructed data and original datamust be bounded strictly for each data point from userspecified limits.

2) With the maximum error controlled, PSNR must also beas high as possible, such that the overall distortion of thedata is controlled well for guaranteeing effective post-analysis and high visualization quality.

3) The developed compressor should not degrade the com-pression quality in the cases demanding high precision(with relatively low compression ratios).

4) The new compression technique must have low computa-tion cost, leading to comparable compression/decompres-sion rate with the existing state-of-the-art compressorssuch as SZ and ZFP.

IV. DESIGN OVERVIEW

In the following, we first present an overview of ouradaptive lossy compression framework and then describe thecompression techniques in detail.

The key idea of our adaptive solution is splitting the entiredataset into multiple non-overlapped equal-sized blocks inmultidimensional space and selecting the best-fit data predic-tion method dynamically for each block based on its datafeature. Algorithm 1 presents the pseudo-code of the entiredesign. The basic idea is to select in each block the best-fit prediction method, from among the three data predictionapproaches: classic Lorenzo predictor [6], mean-integratedLorenzo predictor, and linear regression-based predictor. Thefirst one was already adopted by some existing compressorssuch as SZ and FPZIP, while the other two are proposed asa critical contribution in this paper, because they can improvethe lossy compression quality significantly.Algorithm 1 ADAPTIVE ERROR-BOUNDED COMPRESSORInput: user-specified error bound εOutput: compressed data stream in form of bytes1: Estimate the densest position (denoted as v0) and calculate the frequency

(denoted p1) of the densest error-bound-based interval surrounding it;2: Calculate the densest frequency of classic Lorenzo predictor (denoted p2);3: if (p1>p2) then4: µ←

∑|di−v0|≤ε di

‖{di||di−v0|≤ε }‖ ; /*Compute mean value of densest interval*/5: `-PREDICTOR ← mean-integrated Lorenzo predictor;6: else7: `-PREDICTOR ← classic Lorenzo predictor;8: end if9: for (each block in the multi-dimensional space) do

10: Calculate regression coefficients; /*4 coefficients/block in 3d dataset*/11: end for12: Calculate statistics of all coefficients for compression of coefficients later;13: for (each block in the multi-dimensional space) do14: Create a sampling set (denoted SM ) with M sampled data points;15: Compute cost values Ereg-predictor and E`-PREDICTOR based on SM ;16: if (Ereg-predictor < E`-PREDICTOR) then17: Execute regression-based prediction and quantization;18: else19: Execute `-PREDICTOR and quantization;20: end if21: end for22: Construct Huffman tree according to the quantization array;23: Encode/compress quantization array by Huffman tree;24: Compress regression coefficients;

We describe our algorithm in the following text. At thebeginning (line 1), the algorithm searches for the densestinterval based on the error-bound ε and calculates its datafrequency (denoted by p1), in order to estimate the predictionability of the mean-integrated Lorenzo predictor. This partinvolves three steps: (1)

√N data points will be sampled

uniformly in space; (2) the mean value of the sampled datapoints is calculated and a set of consecutive intervals (eachwith 2ε in length) will be constructed surrounding the meanvalue; (3) we then calculate the number of data points inthe consecutive intervals and select the one with the highest

Page 4: Error-Controlled Lossy Compression Optimized for High …tao.cs.ua.edu/paper/BigData18-SZ2.0.pdf · 2018-11-07 · Error-Controlled Lossy Compression Optimized for High Compression

frequency of data points as the densest interval, whose centeris called the densest position (denoted as v0). Here

√N is

chosen by heuristics because it already exhibits good accuracy.On line 2, the algorithm checks the prediction ability of theclassic Lorenzo predictor, by calculating the data frequency(denoted p2) of its error-bound based prediction interval (i.e.,[pred value−ε , pred value+ε], where pred value refers tothe predicted value), based on 1% of uniformly sampleddata points. The number of sampled data points (i.e., 1%)is a heuristic setting, which is similar to the configuration ofSZ. Based on the sampled data points, we select the best-fitLorenzo predictor (denoted as `-PREDICTOR) according to ourestimated prediction ability of the two predictors (line 3-8). Ifthe best-fit predictor is the mean-integrated Lorenzo predictor,we need to calculate the mean value of the densest interval(line 4), which will be used later.

After determining the best-fit Lorenzo predictor, the al-gorithm calculates the linear regression coefficients for eachblock (lines 9-11), as well as the statistics (such as thevalue range of the coefficients), which will be used in thecompression of the coefficients later (Section VI-B).

The most critical stage is scanning the entire dataset and per-forming the best-fit prediction and linear-scaling quantizationin each block (lines 13-21). In each block, M data points (1/8data points for 2D dataset and 1/9 for 3D dataset) are sampleduniformly in space, in order to select the best-fit predictionmethod as accurately as possible. The sampling method willbe further detailed in Section VI-A. Then, the algorithmdetermines which prediction method (either regression-basedpredictor or `-PREDICTOR) should be used in the currentblock in terms of their estimated overall prediction errors(denoted by Ereg-predictor and E`-PREDICTOR respectively). Howto estimate the prediction errors for the two predictors will bedetailed in Section VII-B.

Our algorithm then compresses the quantization array con-structed in the prediction stage by Huffman encoding (lines22-23). It also compresses the regression coefficients for theblocks selecting the regression-based prediction methods, byIEEE 754 binary analysis (detailed in Section VI-B).

The time complexity of the algorithm is O(N), becausethe algorithm is composed of three parts, whose time com-plexities are no greater than O(N). Specifically, the first partestimates densest frequency (lines 1-2) with a time complexityof O(

√N); the second part calculates regression coefficients

(lines 9-11) that costs O(N) in total (for details, see Lemma1 to be presented later); and the last part performs predic-tion+quantization (lines 13-21), which also costs O(N) sinceeach data point will be scanned only once.

V. MEAN-INTEGRATED LORENZO PREDICTOR

In this section, we develop a new, efficient predictor in termsof the classic Lorenzo predictor [6]. Fig. 2(a) illustrates theclassic Lorenzo prediction method in a 3D dataset. Specifi-cally, it predicts the current data point based on the followingformula for a 3D dataset:f(L)111 = f ′000 + f ′011 + f ′101 + f ′110 − f ′001 − f ′010 − f ′100 (2)

where f (L) and f ′ refer to the predicted value and decom-pressed value, respectively. {111} is the current data pointto deal with, and the other seven data points are adjacentto it on a unit cube which have been processed. f (L)111 is thepredicted value for the data point {111} and the decompressedvalue f ′111 can be obtained by applying the linear-scalingquantization on the difference between f (L)111 and the origin datavalue at {111}. The compressor will continue this proceduredata point by data point until all the data are processed.

x

yz

Current data point

f111(L)

'f000'f100

'f110

'f011

'f001

'f010

'f101

(a) Classic Lorenzo predictor (b) Data approximation by mean value

Fig. 2. Illustration of mean-integrated Lorenzo predictor

The classic Lorenzo predictor has a significant defect: manypredicted values would be uniformly skewed from the originalvalues if the error bound is relatively large in the lossy com-pression, leading to an unexpected artifact issue as illustratedin Fig. 1(c). Our developed mean-integrated Lorenzo predictorcan solve this issue well.

The fundamental idea is approximating those data pointswhose values are clustered intensively by a fixed value, ifmajority of data values are clustered to a small interval withpretty high density (called densest interval). This situationappears in about one-third of the fields of the NYX cosmologysimulation [27] and about half the fields of the Hurricanesimulation [26]. In the dark matter density field of NYX, forinstance, 84+% of the data values are in the range of [0,1],while the remaining data are in the range of [1,1.34×104].

Suppose we are given a dataset D = {di|i = 1, . . . , N}such that the interval [v0−ε,v0+ε] can cover a large percentageof data points, where ε is the compression error bound andv0 is the densest position (as illustrated in Fig. 2(b)). Ifthis percentage is greater than some threshold, we will selectthe mean-integrated Lorenzo predicator. In practice, we setthe threshold to the sampled prediction accuracy of classicLorenzo predictor (line 3 in Algorithm 1). However, the classicLorenzo predictor would be highly over-estimated when errorbound is relatively large because the sampling stage does nottake into account the impact of decompressed data, leading to aserious artifact issue. In order to mitigate this issue, we let thealgorithm opt to select the mean-integrated Lorenzo predictordirectly when [v0−ε,v0+ε] can cover more than half of thedata. Now, we need to derive an optimal value to approximatethe data in this interval.

Lemma 1: The optimal value used to approximate the ma-jority of data points should be the mean value of data in thedensest interval [v0−ε,v0+ε].

Proof: If a fixed value v is used to approximate all thedata points in this interval, the corresponding MSE can be

Page 5: Error-Controlled Lossy Compression Optimized for High …tao.cs.ua.edu/paper/BigData18-SZ2.0.pdf · 2018-11-07 · Error-Controlled Lossy Compression Optimized for High Compression

represented as follows:

MSE =

∫ v0+ε

v0−εpd(x)(x− v)2dx

where pd(x) is the probability density function. Then, theproblem becomes an optimization problem aiming to minimizeMSE. Letting the partial derivative ∂MSE

∂v = 0, we have−2

∫ v0+εv0−ε pd(x)xdx + 2v

∫ v0+εv0−ε pd(x)dx = 0. Solving this

equation will obtain the optimal value v=∫ v0+εv0−ε pd(x)xdx∫ v0+εv0−ε pd(x)dx

. It

is the mean value of all data points in this interval (asshown in Fig. 2(b)). This lemma also holds in discrete cases,where the optimal approximation value v can be calculated by∑

x∈[v0−ε,v0+ε] x

‖{x||x−v0|≤ε}‖ in practice, where ‖{x ||x− v0| ≤ ε}‖ is thenumber of data points in the densest interval.

VI. REGRESSION-BASED PREDICTION

Although the proposed mean-integrated Lorenzo predictoralleviates the artifact and reduces prediction errors in predict-ing the intensively-clustered data, it does not work well onthe data following a rather uniform distribution. To addressthis issue, we propose a regression-based prediction model,which can deal with generic datasets. We adopt a linear-regression model instead of a higher-order regression modelconsidering the overhead. Quadratic regression model, forexample, requires 2.5X the number of coefficients the linear-regression model needs, with ≥3X the computation workload.

A. Derivation of Regression Coefficients

In what follows, we derive the regression coefficients froma generic perspective in terms of a dataset with m dimensions(n1×n2×· · ·×nm), which can be easily extended to block-wise situations. The value at position x = (i1, . . . , im) isdenoted as f(x) = fi1...im , where ij is its index alongeach dimension. We also denote f (r) as the linear regressionprediction. Then the linear regression model can be built as:

f (r)(x) = xTβ + α

where β = (β1,. . . ,βm) denotes the slope coefficient vectorand the constant α is the intercept coefficient. By redefiningx′ = (1, i1, . . . , im) and β′ = (β0=α,β1,. . . ,βm), the formulaabove can be rewritten as:

f (r)(x′) = x′Tβ′

Since each data point corresponds to a position x and its valuef(x), the objective of the regression model is to minimize thesquared error (SE) between predicated and original values:

SE =∑

x′∈{(1,i1,...,im)|0≤ij<nj}

(x′Tβ′ − fi1...im)2

This is a convex function, and its optimal can be obtainedby derivation. The derivation over each element in β′ willresult in a linear system of m unknowns. By solving the linearsystem, the optimal solution can be achieved as:

β′ = (XTX)−1XT y (3)X is the full permutation of {i1,i2,· · · ,im}, whereij∈{0,1,· · · ,nj−1}, and y is the sequence of the correspond-ing data values (f00...0,f00...1,· · · ,f(n1−1)(n2−1)...(nm−1)).

Since X is a {∏

1≤i≤m ni}×(m+1) matrix, computing theclosed-form solution (3) is very expensive. However, we canderive the solution to a simple form, significantly reducing thecomputation cost. Let us take a 3D dataset as an example todescribe our method, which can be extended to datasets withhigher dimensions easily without loss of generality.

Lemma 2: The regression coefficients of a 3D dataset withdimensions n1, n2, n3 can be calculated as:

β1 = 6n1n2n3(n1+1) (

2Vx

n1−1 − V0)β2 = 6

n1n2n3(n2+1) (2Vy

n2−1 − V0)β3 = 6

n1n2n3(n3+1) (2Vz

n3−1 − V0)β0 = V0

n1n2n3− (n1−1

2 β1 +n2−1

2 β2 +n3−1

2 β3)

(4)

where V0 =

n1−1∑i=0

n2−1∑j=0

n3−1∑k=0

fijk, Vx =

n1−1∑i=0

n2−1∑j=0

n3−1∑k=0

i ∗ fijk,

Vy =

n1−1∑i=0

n2−1∑j=0

n3−1∑k=0

j ∗ fijk, Vz =

n1−1∑i=0

n2−1∑j=0

n3−1∑k=0

k ∗ fijk.

Proof: Substitute the index with the coordinate values.The SE expression will turn out to be:SE=

∑n1−1i=0

∑n2−1j=0

∑n3−1k=0 (β0 + β1i+ β2j + β3k − fijk)2

The following linear system can be derived by getting all itspartial derivatives over the coefficients and setting them to 0:

n1−1∑i=0

n2−1∑j=0

n3−1∑k=0

1 i j ki i2 ij ikj ij j2 jkk ik jk k2

β0β1β2β3

=

V0

Vx

Vy

Vz

Since

∑n−1i=0 i =

n(n−1)2 and

∑n−1i=0 i

2 = (2n−1)n(n−1)6 , the

above linear system can be simplified to:1 (n1−1)

2(n2−1)

2(n3−1)

2

1 (2n1−1)3

(n2−1)2

(n3−1)2

1 (n1−1)2

(2n2−1)3

(n3−1)2

1 (n1−1)2

(n2−1)2

(2n3−1)3

β0β1β2β3

=

V0

n1n2n32Vx

n1n2n3(n1−1)2Vy

n1n2n3(n2−1)2Vz

n1n2n3(n3−1)

After that, Gaussian elimination can be leveraged to transfer

the linear system to the following:1(n1−1)

2(n2−1)

2(n3−1)

20 1 0 00 0 1 00 0 0 1

β0β1β2β3

=

V0

n1n2n36

n1n2n3(n1+1)( 2Vxn1−1

− V0)6

n1n2n3(n2+1)(

2Vy

n2−1− V0)

6n1n2n3(n3+1)

( 2Vzn3−1

− V0)

Then Equation (4) can be derived accordingly.

These coefficients will be used in the regression model forpredicting the data accurately. Each data point with index(i, j, k) will be predicted as f (r)ijk = β0 + iβ1 + jβ2 + kβ3.Then, we will compute the difference between each predictedvalue f (r)ijk and its original value fijk, and perform the linear-scaling quantization [2] to convert the floating-point values tointeger codes. The data size will be significantly reduced afterconducting Huffman encoding on the quantization codes.

B. Compressing Regression Coefficients

We adopt the block size 6×6×6 for 3D data and 12×12data for 2D data in our implementation, since such settingsalready lead to satisfying compression quality. For each blockthat adopts the linear regression model, four coefficients haveto be kept in the compressed bytes together with the encodingof regression-based predicted values in that block. If each

Page 6: Error-Controlled Lossy Compression Optimized for High …tao.cs.ua.edu/paper/BigData18-SZ2.0.pdf · 2018-11-07 · Error-Controlled Lossy Compression Optimized for High Compression

block is a 6×6×6 cube, the overhead of saving the fourcoefficients (single-precision floating-point values) would be

46×6×6= 1

54 of the original data size (i.e., bit-rate=0.6). Notethat we are targeting high-compression ratios (such as 100:1),so we have to further compress the coefficients significantlyby a lossy compression technique with controlled impact oflossy coefficients on the prediction accuracy. We reorganizeall the coefficients into four groups and compress each groupof coefficients in a similar way by the unpredictable datacompression method used in SZ [18].

VII. ADAPTIVE SELECTION OF BEST-FIT PREDICTOR

In this section, we analyze the nature of the linearregression-based predictor (proposed in Section VI) andLorenzo predictor (introduced in Section V). We also proposea cost function (or a metric), based on which we can accuratelyselect the best-fit predictor via a sampling approach.

A. Analysis of Linear Regression versus Lorenzo Predictor

The two predictors are particularly suitable for various datablocks with different data features.

For each data point to predict, the Lorenzo predictor [6]constructs a fixed quadratic hyperplane based on its 7 adjacentdata points in a 2×2×2 cube. No coefficients need to be savedfor reconstructing the hyperplane during the decompression.However, Lorenzo predictor must conform to a strict conditionwhen being used in lossy compression [2]. The predictionperformed during the compression must use the decompressedvalues instead of original values; otherwise, unexpected dataloss would be accumulated during the decompression, in-troducing significant distortion of data eventually. Using thedecompressed values to perform the Lorenzo predictor woulddegrade the prediction accuracy significantly especially whenthe error bound is relatively large [2], leading to limitedcompression quality.

Unlike Lorenzo predictor, the linear regression-basedpredictor constructs an MSE-minimized linear hyperplane(f (r)111=β0 + β1x + β2y + β3z) with four extra coefficients tostore for each data block. The advantage of this predictor,however, is that it does not depend on the decompressed valuesduring the prediction stage because the hyperplane will bereconstructed by the coefficients. In contrast, linear-regressionmay not be as accurate as Lorenzo predictor if the data exhibitspiky changes in a pretty small region or the compressionerror bound is relatively small, because the Lorenzo predictorcorresponds to a quadratic hyperplane. This issue inspires usto seek an adaptive solution between the two predictors.

B. Estimate Prediction Errors and Select Best-fit Predictor

We adopt an effective, lightweight sampling method toselect the best of the two predictors. In the following, we usethe 3D dataset to explain our idea, cases with other dimensionscould be extended similarly.

In a 3D block (6×6×6), we sample 24 points that aredistributed along the diagonal lines. These points can also beregarded as on the 8 corners of the innermost 2×2×2 cube,

(a) Points sampled in a block (b) Decompressed noise estimation

Fig. 3. Sample points and decompressed noise estimation

4×4×4 cube and 6×6×6 cube, respectively, as shown in Fig.3(a). We denote the set of the 24 sample points by {S24}.Then, the cost function for both regression model and theLorenzo predictor is defined as follows:

E′predictor =∑

(i,j,k)∈{S24}

|f (p)ijk − fijk| (5)

where f (p)ijk refers to the predicted values (f (r)ijk , f (L)ijk ). If onepredictor is better than the other, it tends to have a lowercost (i.e., summed error). The reason we adopt absolute errorinstead of squared error in the cost function is that the squarederrors may easily over amplify the cost value if there is oneoutlier, leading to skewed cost values. Formula (5) can beused directly as the cost function (denoted Ereg-predictor inAlgorithm 1) for the regression-based predictor.

The Lorenzo predictor, however, would be overestimatedbecause the formula does not take into account the influenceof decompressed data, such that we have to further adjustthe cost function for the Lorenzo predictor. Without lossof generality, we assume that the compression errors areindependent random variables following a uniform distributionin [−ε, ε], according to the recent study on the distribution ofcompression errors [13]. Then, the perturbation in the finalprediction will be a random variable e following a shiftedand scaled Irwin-Hall distribution with parameter n = 7,which is a piecewise function. However, the expectation of theabsolute value is hard to derive. Fortunately, since the Lorenzopredictor always adopts 7 points to predict the data, thecorresponding value can be approximated offline. We take 1Msamples from this distribution and compute the expectation oftheir absolute values. The fitting curve is supposed to be alinear curve, because the error bound is only a scale factorfor this distribution. Then, we achieved a very good fittingcurve (E(|e|) = 1.22ε) as illustrated in Fig. 3(b). The cost(i.e., aggregated error) of the classic Lorenzo predictor can beadjusted as follows:

E′LP =∑

(i,j,k)∈{S24} |f(L)ijk − fijk + e|

≤∑

(i,j,k)∈{S24} |f(L)ijk − fijk|+

∑(i,j,k)∈{S24} |e|

≈∑

(i,j,k)∈{S24} |f(L)ijk − fijk|+ 24 ∗ 1.22ε

(6)

We estimate the prediction cost by this inequality because weopt to select the regression model considering the artifact issuethat may occur to the Lorenzo predictor in the situation withrelatively large error bounds. When the error bound is small,this extra adjustment has little impact on the final selection.

The cost value of the mean-integrated Lorenzo predic-tor(denoted E′mLP ) is estimated as the minimum value be-tween the classic Lorenzo prediction error (i.e., Formula (6))

Page 7: Error-Controlled Lossy Compression Optimized for High …tao.cs.ua.edu/paper/BigData18-SZ2.0.pdf · 2018-11-07 · Error-Controlled Lossy Compression Optimized for High Compression

and mean prediction error, as shown below:

E′mLP ≈∑

(i,j,k)∈{S24}

min(|f (L)ijk − fijk|+1.22ε , |µ− fijk|) (7)

where µ is calculated in line 4 of Algorithm 1.In summary, as for the `-PREDICTOR proposed in Algorithm

1, we estimate the error cost for the classic Lorenzo predictorand mean-integrated Lorenzo predictor by Formula (6) andFormula (7), respectively. In each data block, we select thefinal predictor with lowest cost based on their cost functions.

VIII. PERFORMANCE EVALUATION

In this section, we compare our proposed compressionmethod with three error-bounded compressors (SZ 1.4.13 [2],ZFP 0.5.2 [3], and TTHRESH [19]), which are currently thebest existing generic lossy compressors for scientific datacompression [5]. We also compare our method with other lossycompressors that are widely used in the scientific simulations,namely VAPOR [20] (wavelet compressor), FPZIP [4] andISABELA [21]. We also try best to optimize the compressionquality of all the six lossy compressors from the perspective ofPSNR. For instance, we adopt the absolute error bound modefor both SZ and ZFP in that they lead to better compressionquality than other modes in our experiments.

A. Experimental Setting

We conduct our experimental evaluations on a supercom-puter using 8,192 cores (i.e., 256 nodes, each with two IntelXeon E5-2695 v4 processors and 128 GB of memory, and eachprocessor with 16 cores). The storage system uses GeneralParallel File Systems (GPFS). These file systems are locatedon a raid array and served by multiple file servers. The I/O andstorage systems are typical high-end supercomputer facilities.We use the file-per-process mode with POSIX I/O [22] on eachprocess for reading/writing data in parallel 1. The HPC appli-cation data are from multiple domains including CESM-ATMclimate simulation [25], Hurricane ISABEL simulation [26],NYX cosmology simulation [27], SCALE-LETKF weathersimulation (called S-L Sim for short) [28]. Each applicationinvolves many simulation snapshots (or time steps). We assessonly meaningful fields with relatively large data sizes (otherfields have constant data or too small data sizes). Table Ipresents all the 104 fields across these simulations. The datasizes per snapshot are 1.2 GB, 3.2 GB, 3 GB and 2 GB forthe above four applications respectively.

TABLE ISIMULATION FIELDS USED IN THE EVALUATION

Application # Fields Dimensions ExamplesCESM-ATM 79 1800×3600 CLDHGH, CLDLOW · · ·

Hurricane 13 100×500×500 CLOUDf48, Uf48 · · ·NYX 6 512×512×512 dark matter density, v x · · ·

S-L Sim 6 98×1200×1200 U, V, W, QC · · ·

We assess the compression quality based on the four criteriaproposed in Section III. Since PSNR is the most critical indica-tor as discussed in Section III, we mainly adopt this metric to

1POSIX I/O performance is close to other parallel I/O performance suchas MPI-IO [23] when thousands of files are written/read simultaneously onGPFS, as indicated by a recent study [24].

assess the distortion of data. In addition, we also evaluate thePearson correlation and structural similarity (SSIM) index fordifferent compressors because they were mentioned in someliteratures [17], [29]. Pearson correlation can be computedby ρX,Y =E(X−µX)E(Y−µY )

σXσY, where µc and σc (c∈{X,Y })

are means and deviations of X and Y , respectively. SSIMis considered a critical metric of lossy compression by theclimate community [29]. The higher the Pearson correlationor SSIM, the better the compression quality.

B. Evaluation Results

We first check the maximum compression errors of ourcompressor using all the 104 fields and confirm that ourcompressor can respect the error bound (denoted by ε) strictly.We present a few examples in Table II.

TABLE IIMAXIMUM COMPRESSION ERROR VS. ERROR BOUND

fields bound max err bound max errHurricane-CLOUDf48 0.1 0.099999955 0.01 0.00999998

NYX-v x 0.1 0.0999966 0.01 0.009999995CESM-CLDHGH 0.1 0.099949 0.01 0.0099999

In what follows, we first compare the three fundamentalpredictors (Lorenzo, mean-integrated Lorenzo and linear re-gression) to showcase the importance of the adaptive designin Fig. 4 and Fig. 5. After that, we compare the overall rate-distortion among all the 6 lossy compressors in Fig. 6, whichis the most critical evaluation result in terms of compressionquality. We also demonstrate the difference of visual qualityacross various lossy compressors with the same compressionratios. Finally, we investigate the parallel I/O performance gainwith different execution scales using our compressor againsttwo other most efficient state-of-the-art compressors.

Fig. 4 demonstrates the effectiveness of our mean-integratedLorenzo predictor (denoted M-Lorenzo) over the originalLorenzo predictor, using Hurricane ISABEL dataset. As shownin Fig. 4 (a), the mean-integrated Lorenzo always leads to asmaller bit-rate (or higher compression ratio) than the originalLorenzo predictor does with the same level of rate distortion(i.e., PSNR). Specifically, the mean-integrated Lorenzo pre-dictor obtains the compression ratio about 3X as high as thatof the original Lorenzo predictor, when the PSNR is around30. Fig. 4 (b) shows the fraction of the fields adopting themean-integrated Lorenzo predictor in the Hurricane datasets.We observe that the percentage drops as bit-rate increases, dueto the fact that higher bit-rate corresponds to lower coverageof the densest interval, which leads to lower percentageon choosing the mean-integrated Lorenzo predictor in turn.Similar phenomenons can also be observed in other datasets,which we did not show here because of the page limitation.

In Section VII-B, we derived a penalty coefficient, which iscritical to determine the best-fit predictor. In Fig. 5, we presentthe significance of the penalty coefficient, by comparing threesolutions (the solution with/without penalty coefficient and thesolution adopting only regression-based predictor). As shownin Fig. 5(a), the solution with our derived penalty coefficientoutperforms the other two significantly. As illustrated in Fig.5(b), without the penalty coefficient, the compression quality

Page 8: Error-Controlled Lossy Compression Optimized for High …tao.cs.ua.edu/paper/BigData18-SZ2.0.pdf · 2018-11-07 · Error-Controlled Lossy Compression Optimized for High Compression

20

30

40

50

60

70

80

90

100

110

0 1 2 3 4 5 6

PS

NR

Bit Rate

M-Lorenzo

Lorenzo

(a) Overall Rate-distortion

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6

Pe

rce

nt

of

Me

an

-In

teg

rate

d L

ore

nzo

Bit Rate

(b) Percentage of M-Lorenzo Predictor

Fig. 4. Effectiveness of Mean-Integrated Lorenzo Predictor using Hurricane-ISABEL

20

30

40

50

60

70

80

90

100

110

0 1 2 3 4 5 6

PS

NR

(d

B)

Bit-rate

Our sol. w/ penalty coeff.Our sol. w/o penalty coeff.

All regression

(a) Overall Rate-distortion

40

50

60

70

80

90

0 0.5 1 1.5 2 2.5

PS

NR

(d

B)

Bit-rate

Our sol. w/ penalty coeff.Our sol. w/o penalty coeff.All regression

0

20%

40%

60%

80%

100%

0 0.5 1 1.5 2 2.5

Pe

rce

nta

ge

of

reg

ressio

n b

locks

Bit-rate

Our sol. w/ penalty coeff.Our sol. w/o penalty coeff.All regression

(b) Detailed analysis using TCf48

Fig. 5. Significance Analysis for penalty coefficient using Hurricane-ISABEL

would be degraded significantly, since a large majority (99.6%)of blocks would select the Lorenzo predictor (see red curve inthe bottom sub-figure of Fig. 5(b)) because of over-estimationof the Lorenzo prediction ability as discussed in SectionVII-B. We also present the percentage of blocks selecting theregression-based predictors, demonstrating that our adaptivesolution does select different predictors for different blocksduring the compression. The percent of regression blocksdrops from 98% to 3% as the bit-rate increases, which isconsistent with the compression quality of the two methods.

In Fig. 6, we present the overall rate-distortion (PSNRversus bit-rate (or compression ratio)) calculated using all thefields for each application (Fig. 6 (a) is missing TTHRESHbecause it cannot work on 2D dataset). It is observed thatunder the same compression ratio, our solution leads to higherPSNR with than other compressors. Specifically, our solution,SZ and ZFP generally exhibit better rate-distortion than othercompressors including FPZIP, VAPOR, and TTHRESH, all ofwhich outperform ISABELA. From among the three best com-pressors, the PSNR of our compressor is 10%∼100% higherthan that of the other two (SZ and ZFP) when the compressionratio is the same. This gap increases with higher compressionratio, as shown in the four sub-figures. In particular, as forCESM-ATM data, when the PSNR is about 45, our compressorleads the compression ratio to 200:1, while SZ and ZFP getthe compression ratio of 76:1 and 25:1 respectively. Based onthe four applications, we observe that the compression ratioof our compressor is 1.5X∼8X as high as that of SZ and ZFPwith the same PSNR, which is a significant improvement.

We select three typical examples from the 104 fields acrossdifferent applications to demonstrate the visual quality of thedecompressed data with different compression ratios comparedwith the original data (also known as raw data), due to thespace limitation of the paper.

As demonstrated in Section III (Fig. 1)), the decompresseddata under our solution has a better visual quality than either

0

10

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30

PS

NR

Bit Rate

OurSol

SZ

ZFP

VAPOR

FPZIP

ISABELA

30

40

50

60

70

80

90

0 0.5 1 1.5 2 2.5 3

(a) CESM-ATM

0

10

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30 35

PS

NR

Bit Rate

OurSol

SZ

ZFP

VAPOR

FPZIP

ISABELA

TTHRESH

40

50

60

70

80

0 1 2 3 4 5

(b) Hurricane

0

10

20

30

40

50

60

70

80

90

0 2 4 6 8 10 12 14 16 18 20

PS

NR

Bit Rate

OurSol

SZ

ZFP

VAPOR

FPZIP

ISABELA

TTHRESH 30

40

50

60

70

80

0 0.5

1 1.5

2 2.5

3

(c) NYX

0

10

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30

PS

NR

Bit Rate

OurSol

SZ

ZFP

VAPOR

FPZIP

ISABELA

TTHRESH

30

40

50

60

70

80

90

0 0.5 1 1.5 2 2.5 3

(d) S-L Sim

Fig. 6. Rate-distortion (PSNR versus Bit-rate or Compression Ratio)

SZ or ZFP does, based on the NYX (velocity x) dataset.Correspondingly, the SSIM index of our solution (0.9855) ishigher than that of SZ (0.7349) and that of ZFP (0.7004) by34% and 40%, respectively.

As mentioned previously, we note that some fields have verylarge value ranges while most of data are clustered in a smallclose-to-zero range, so the users compute the logarithm of thedata before their analysis or visualization. We compress thelog data instead of the original data since this would improvethe compression quality for all lossy compressors, as suggestedby users. As an example, we present the compression result ofdark matter density field from NYX simulation in Fig. 7. Byzooming in a small region, we can clearly see that our solutionhas much higher resolution than others. This is consistent withthe SSIM measure: the SSIM index of SZ is higher thanothers by 62.8% and 82.5%, respectively. By observing thedecompressed images, SZ suffers from a serious data loss asshown in Fig. 7 (c) because many data points would be flushedto the same values when the error bound is relatively large.ZFP suffers from the unexpected mosaic effect because it splitsthe entire dataset into pretty small blocks (4×4×4) and thedecompressed data will be flushed to the same value in eachblock when the compression ratio is very high.

Furthermore, we show the visualization result of theCLOUDf field in Hurricane-ISABELA in Fig. 8. Similarly,the decompressed data under our solution has a better visualquality than does SZ or ZFP. Besides some unconspicuousstrips (artifacts) in the blue background, SZ also has somedistortion in the enlarged region. On the other hand, ZFPhas little distortion in the background, but exhibits block-wiseeffects in the enlarged region. Our solution has little distortionin both the background and local region, which leads to veryhigh overall visual quality (SSIM 0.9966).

In addition to error-bounded lossy compressors, we alsopresent the visual quality of the down-sampling+interpolationmethod (widely used in the visualization community) in Fig.9 for comparison. During the compression, one data pointwould be sampled uniformly every 4 points for each field in

Page 9: Error-Controlled Lossy Compression Optimized for High …tao.cs.ua.edu/paper/BigData18-SZ2.0.pdf · 2018-11-07 · Error-Controlled Lossy Compression Optimized for High Compression

(a) original raw data (b) our sol. (PSNR=29,SSIM=0.6867)

(c) SZ (PSNR=24.5,SSIM=0.4218) (d) ZFP (PSNR=21.3,SSIM=0.3762)

Fig. 7. Data Distortion of NYX(dark matter:slice 100) with CR=58:1

(a) original raw data (b) our sol. (PSNR=51, SSIM=0.9966)

(c) SZ (PSNR=29.9, SSIM=0.6573) (d) ZFP (PSNR=22.5, SSIM=0.8893)

Fig. 8. Data Distortion of Hurricane(CLOUDf:slice 50) with CR=66:1

each dimension, leading to the compression ratios of 64:1.During the decompression, tricubic interpolation [30] is usedto reconstruct the missing points. We can clearly observethat our solution exhibits much better visual quality (seeFig. 7 (b) vs. Fig. 9 (a); Fig. 8 (b) vs. Fig. 9 (b)), in thatthe downsampling+interpolation method over-smoothed theregions with diverse values (Fig. 9 (a)) and over-amplifiedsome boundary data points (Fig. 9 (b)).

Pearson correlation coefficient has been used to assess thecorrelation between the original dataset and decompresseddataset [17] by the community. We have validated that thePearson correlation coefficients under our solution are higherthan those of SZ and ZFP in a large majority of cases across all

(a) dark matter density (CR=64:1,PSNR=18.1,SSIM=0.4345)

(b) CLOUDf (CR=64:1,PSNR=17.7,SSIM=0.7681)

Fig. 9. Data distortion of uniform down-sampling and tricubic interpolationof NYX dataset with similar compression ratios

the four applications. We here exemplify the correlation resultsin Table III using only NYX data due to the space limitation.We run three compressors and tune their compression ratios tobe 60:1∼70:1. The reason we cannot fix the compression ratioacross fields is that ZFP exhibits piece-wise compression ratioswith various error bounds. We did not use fixed-rate mode forZFP because it is always worse than its fixed-absolute-errormode in our experiments with respect to the rate-distortion.

TABLE IIIPEARSON CORRELATION COEFFICIENTS OF 6 FIELDS IN NYX

fields our solution SZ ZFP CRdark matter density 0.959274616 0.939890773 0.763353467 58:1

baryon density 0.999537914 0.986076774 0.999529095 66:1temperature 0.992798653 0.870542883 0.995984391 66:1velocity x 0.999961851 0.996793357 0.999871972 70:1velocity y 0.999944697 0.997787266 0.999912867 70:1velocity z 0.999867946 0.992133964 0.999826937 63:1

We evaluate the overall data dumping/loading performanceon the NYX simulation using different lossy compressorswith the same level of data distortion. Specifically, we setthe PSNR for its fields to 60 except for dark matter density(PSNR=30) and baryon density (PSNR=40), because such asetting already reaches a high visual quality (as exemplified inFig. 1(b) and Fig. 7(b)). The evaluation is weak-scaling. Eachrank processes 3 GB data and the total data size increaseslinearly with the number of cores. We assess the performanceby running different scales (2,048 cores ∼ 8,192 cores), andeach core needs to process a total of 3 GB data during theexecution. The total data size is up to 24 TB when using8,192 cores, which may take over six hours to dump. Wepresent the breakdown of the data dumping performance (sumof compression time and data writing time) and data loadingperformance (sum of data reading time and decompressiontime) in Fig. 10. Since the parallel performance is dominatedby the data reading/writing time (to be shown later) andVAPOR, FPZIP and ISABELA have low compression ratios,they exhibit much higher overall data dumping/loading timethan the other compressors. Accordingly, we omit their resultsin the figure to clearly observe the performance differenceamong the three best solutions (our solution, SZ and ZFP).

It is observed that the overall data dumping time underour solution takes only 24% and 54% of the time cost bySZ and ZFP when adopting 8,192 cores, which correspond to4.12X and 1.86X performance, respectively. The key reason isthat our compressor leads to significantly higher compression

Page 10: Error-Controlled Lossy Compression Optimized for High …tao.cs.ua.edu/paper/BigData18-SZ2.0.pdf · 2018-11-07 · Error-Controlled Lossy Compression Optimized for High Compression

0 100 200 300 400 500 600 700 800 900

1000

SZ

ZFPOurS

ol

SZ

ZFPOurS

ol

SZ

ZFPOurS

ol

Ela

pse

d T

ime

(s)

Number of Cores

compression timedata writing time

819240962048

(a) Data dumping performance

0

200

400

600

800

1000

1200

Ela

pse

d T

ime

(s)

SZ

ZFPOurS

ol

SZ

ZFPOurS

ol

SZ

ZFPOurS

ol

Number of Cores819240962048

decompression timedata reading time

(b) Data loading performance

Fig. 10. Performance evaluation using NYXratios than the other two compressors when the PSNR is inthe range of [30,60], as shown in Fig. 6(c). When runningthe simulation with 8,192 cores, our solution can also obtain1.95X higher data loading performance (49% lower time cost)than the second best solution (ZFP) does. It is a little higher incomparison with data dumping performance (1.86X), becauseof the higher decompression rate than the compression rate.

IX. CONCLUSION AND FUTURE WORK

In this paper, we propose an efficient error-bounded lossycompressor, which adaptively selects the best-fit predictionapproach from between our improved Lorenzo predictor andan optimized linear-regression based predictor, in terms of thedata features in different regions of the dataset. We evaluateour solution using 100+ fields across 4 well-known HPCsimulations, by comparing to six existing state-of-the-art lossycompressors (SZ, ZFP, TTHRESH, VAPOR, FPZIP and IS-ABELA). Experiments demonstrate that our new compressorcan achieve up to 8x compression ratio as that of othercompressors with the same PSNR. By running the three bestlossy compressors (SZ, ZFP and our solution) using up to8,192 cores, our solution has 1.86X dumping performance and1.95X loading performance over the second best compressorbecause of the significant reduction of data size. In the future,we plan to further improve the compression quality in the caseswith medium compression ratios (bit-rate).

ACKNOWLEDGMENTS

This research was supported by the Exascale Computing Project (ECP),Project Number: 17-SC-20-SC, a collaborative effort of two DOE organiza-tions the Office of Science and the National Nuclear Security Administration,responsible for the planning and preparation of a capable exascale ecosystem,including software, applications, hardware, advanced system engineering andearly testbed platforms, to support the nations exascale computing imperative.The material was supported by the U.S. Department of Energy, Office ofScience, under contract DE-AC02-06CH11357, and supported by the NationalScience Foundation under Grant No. 1619253. This research is also supportedby NSF Award No. 1513201. We acknowledge the computing resourcesprovided on Bebop, which is operated by the Laboratory Computing ResourceCenter at Argonne National Laboratory.

REFERENCES

[1] I. Foster, et al. “Computing Just What You Need: Online Data Analysisand Reduction at Extreme Scales,” in European Conference on ParallelProcessing (Euro-Par 2017), Jaipur, 2017, pages 3-19.

[2] D. Tao, S. Di, Z. Chen, and F. Cappello, “Significantly ImprovingLossy Compression for Scientific Data Sets Based on MultidimensionalPrediction and Error-Controlled Quantization,” In IEEE InternationalParallel and Distributed Processing Symposium IPDPS2017, Orlando,Florida, USA, May 29-June 2, pp. 1129–1139, 2017.

[3] P. Lindstrom. Fixed-Rate Compressed Floating-Point Arrays. IEEE Trans.on Visualization and Computer Graphics, vol. 20, no. 12, pp. 2674–2683,2014.

[4] P. Lindstrom and M. Isenburg, “Fast and Efficient Compression ofFloating-Point Data,” IEEE Trans. on Visualization and Computer Graph-ics, vol. 12, no. 5, pp. 1245–1250, 2006.

[5] T. Lu, Q. Liu, et al. “Understanding and Modeling Lossy CompressionSchemes on HPC Scientific Data,” to appear in 32nd IEEE InternationalParallel and Distributed Processing Symposium (IPDPS 2018), 2018.

[6] L. Ibarria, P. Linderstrom, J. Rossignac, and A. Szymczak, “Outofcorecompression and decompression of large ndimensional scalar fields,” inComputer Graphs Forums, vol. 22, no. 3, pages 343–348, 2003.

[7] Lustre file system. [Online]. Available at: http://lustre.org/.[8] I/O Performance Evaluation by HPC at Uni.Lu. [Online]. Available at:

https://hpc.uni.lu/systems/storage.html.[9] Gzip compression. [Online]. Available at http://www.gzip.org.[10] M. Burtscher and P. Ratanaworabhan, “High Throughput Compression

of Double-Precision Floating-Point Data,” In Data Compression Confer-ence (DCC’07), pp. 293–302, 2007.

[11] BlosC compressor. [Online]. Available at http://blosc.org[12] Ainsworth, M., Klasky, S., “Compression using lossless decimation:

analysis and application,” SIAM Journal on Scientific Computing, 39(4),B732-B757, 2017.

[13] P. Lindstrom, “Error Distributions of Lossy Floating-Point Compres-sors,” in Joint Statistical Meetings, 2017.

[14] G.K. Wallace, “The JPEG still picture compression standard,” in IEEETransactions on Consumer Electronics, vol. 38, no. 1, pages xviii-xxxiv,1992.

[15] D. Taubman and M. Marcellin, “JPEG2000 image compression fun-damentals, standards and practice: image compression fundamentals,standards and practice,” in Springer Science & Business Media, vol. 642,2012.

[16] D. Tao, S. Di, Z. Chen, F. Cappello, “In-Depth Exploration of Single-Snapshot Lossy Compression Techniques for N-Body Simulations,” inIEEE International Conference on Big Data (BigData17), 2017.

[17] A. H. Baker, H. Xu, J. M. Dennis, M. N. Levy, D. Nychka, S. A.Mickelson, J. Edwards, M. Vertenstein, A. Wegener, ”A methodology forevaluating the impact of data compression on climate simulation data”,HPDC’14, pages 203-214, 2014.

[18] S. Di and F. Cappello, “Fast Error-bounded Lossy HPC Data Compres-sion with SZ,” In Proceedings of IEEE 30th International Parallel andDistributed Processing Symposium (IPDPS16), pp. 730–739, 2016.

[19] R. Ballester-Ripoll, P. Lindstrom R. Pajarola, “TTHRESH:Tensor Compression for Multidimensional Visual Data,”https://arxiv.org/abs/1806.05952, 2018.

[20] Clyne, J., Mininni, P., Norton, A., and Rast, M., “Interactive desktopanalysis of high resolution simulations: application to turbulent plumedynamics and current sheet formation,” New Journal of Physics 9 301.,2007.

[21] S. Lakshminarasimhan, N. Shah, S. Ethier, S. Klasky, R. Latham,R. Ross, and N.F. Samatova, “Compressing the Incompressible withISABELA: In-situ Reduction of Spatio-Temporal Data,” in proceedings ofEuropean Conference on Parallel and Distributed Computing (Euro-Par),Bordeaux, France, Aug. 2011

[22] B. Welch, “POSIX IO extensions for HPC,” Proceedings of the 4thUSENIX Conference on File and Storage Technologies (FAST05), 2005.

[23] R. Thakur, W. Gropp, E. Lusk, “On implementing MPI-IO portably andwith high performance,” Proceedings of the sixth workshop on I/O inparallel and distributed systems, pages 23–32, 1999.

[24] A. Turner, “Parallel I/O Performance,” [Online]. Availableat: https://www.archer.ac.uk/training/virtual/2017-02-08-Parallel-IO/2017 02 ParallelIO ARCHERWebinar.pdf.

[25] CESM-ATM climate model. [Online]. Available at:http://www.cesm.ucar.edu/models/.

[26] Hurrican ISABEL simulation data. [Online]. Available athttp://vis.computer.org/vis2004contest/data.html

[27] NYX simulation. [Online]. Available at: https://amrex-astro.github.io/Nyx/.

[28] SCALE-LETKF weather model. [Online]. Available at:https://github.com/gylien/scale-letkf.

[29] A.H. Baker, H. Xu, D. Hammerling, M. Dorit, S. Li, J.P. Clyne, “Towarda Multi-method Approach: Lossy Data Compression for Climate Simu-lation Data,” International Conference on High Performance Computing,pages 30-42, 2017.

[30] F. Lekien, J. Marsden, “Tricubic interpolation in three dimensions,”International Journal for Numerical Methods in Engineering 63.3 (2005):455-471.