Loseless Waveform Compression

Lossless Waveform CompressionGiridhar Mandyam, Neeraj Magotra, and Samuel D. Stearns1 IntroductionCompression of waveforms is of great interest in applications where e�ciency with respectto data storage or transmission bandwidth is sought. Traditional methods for waveformcompression, while e�ective, are lossy. In certain applications, even slight compression lossesare not acceptable. For instance, real-time telemetry in space applications requires exactrecovery of the compressed signal. Furthermore, in the area of biomedical signal processing,exact recovery of the signal is necessary not only for diagnostic purposes but also to reducepotential liability for litigation. As a result, interest has increased recently in the area oflossless waveform compression.Many techniques for lossless compression exist. The most e�ective of these belong to aclass of coders commonly called entropy coders. These methods have proven e�ective fortext compression, but perform poorly on most kinds of waveform data, as they fail to exploitthe high correlation that typically exists among the data samples. Therefore, pre-processingthe data to achieve decorrelation is a desirable �rst step for data compression. This yieldsa two-stage approach to lossless waveform compression, as shown in the block diagram inFigure 1. The �rst stage is a \decorrelator," and the second stage is an entropy coder [1].1

This general framework covers most classes of lossless waveform compression.2 Compressibility of a Data StreamThe compressibility of a data stream is dependent on two factors: the amplitude distributionof the data stream, and the power spectrum of the data stream. For instance, if a singlevalue dominates the amplitude distribution, or a single frequency dominates the power spec-trum, then the data stream is highly compressible. Four sample waveforms are depictedin Figure 2 along with their respective amplitude distributions and power spectra. Theircharacteristics and compressibility are given in Table 1. Waveform 1 displays poor compress-ibility characteristics, as no one particular value dominates its amplitude distribution andno one frequency dominates its power spectrum. Waveform 4 displays high compressibility,as its amplitude distribution and its power spectrum are both nonuniform.3 Performance Criteria for Compression SchemesThere are several criteria in quantifying the performance of a compression scheme. One suchcriterion is the reduction in entropy from the input data to the output of the decorrelationstage in Figure 1. The entropy of a set of K symbols fs0; s1; � � � ; sK�1g, each with probabilityp(si), is de�ned as [2] Hp = �K�1Xi=0 p(si) log2[p(si)] bits=symbol (1)Entropy is a means of determining the minimum number of bits required to encode a streamof data symbols, given the individual probabilities of symbol occurence.2

When the symbols are digitized waveform samples, another criterion is the variance(mean-squared value) of the zero-mean output of the decorrelation stage. Given an N -point zero-mean data sequence x(n), where n is the discrete time index, the variance �2x iscalculated by the following equation:�2x = 1N � 1 N�1Xk=0 x2(n) (2)This is a much easier quantity to calculate than entropy; however, this is not as reliable asentropy. For instance, only two values might dominate a sample sequence, yet these twovalues may not be close to one another. If this is the case, the entropy of the data stream isvery low (implying good compressibility), while the variance may be high. Nevertheless, formost waveform data, using the variance of the output of the decorrelator stage to determinecompressibility is acceptable, due to the approximately white Gaussian nature of the outputof the decorrelation stage. This assumption of Gaussianity results from arguments basedon the central limit theorem [3], which basically states that the distribution of the sum ofindependent, identically distributed random variables tends to a Gaussian distribution asthe number of random variables added together approaches in�nity.The compression ratio, abbreviated as c.r., is the ratio of the length (usually measuredin bits) of the input data sequence to the length of the output data sequence for a givencompression method. This is the most important measure of performance for a losslesscompression technique. When comparing compression ratios for di�erent techniques, it isimportant to be consistent by noting information that is known globally and not included3

in the compressed data sequence.4 The Decorrelation Compression StageSeveral predictive methods exist for exploiting correlation between neighboring samples ina given data stream. These methods all follow the process shown in Figure 3: the samedecorrelating function is used in compression and reconstruction, and this function must takeas input a delayed version of the input sequence. Some of these techniques are described inthe following sections.4.1 Linear PredictionA one-step linear predictor is a nonrecursive system that predicts the next value of a datastream by using a weighted sum of a pre-speci�ed number of samples immediately precedingthe sample to be predicted. The linear predictor does not contain the feedback path inFigure 3, and is thus a special case of Figure 3. Given a sample sequence of length K, ix(n)(0 � n � K � 1), one can design a predictor of order M by using M predictor coe�cientsfbig to �nd an estimate ix(n) for each sample in ix(n):ix(n) = M�1Xi=0 biix(n� i� 1) (3)Obviously, M should be much less than K to achieve compression, because fbig must beincluded with the compressed data. The estimate ^ix(n) is not the same as the original value;therefore a residue sequence is formed to allow exact recovery of ix(n):ir(n) = ix(n)� ix(n) (4)4

If the predictor coe�cients are chosen properly, the entropy of ir(n) should be less than theentropy of ix(n). Choosing the coe�cients fbig involves solving the Yule-Walker equations:R0;0b0 +R0;1b1 + � � �+R0;M�1bM�1 = RM;0 (5)R1;0b0 +R1;1b1 + � � �+R1;M�1bM�1 = RM;1...RM�1;0b0 +RM�1;1b1 + � � �+RM�1;M�1bM�1 = RM;M�1where Ri;j is the average over n of the product ix(n)ix(n+(i� j)). This can be representedas the matrix-vector product Rb = p (6)whereR is theM byM matrix de�ned in (5), b is theM by 1 vector of predictor coe�cients,and p is the M by 1 vector in (5). Equation (6) can be solved by a variety of techniquesinvolving the inversion of symmetric matrices [4].The original data sequence ix(n) can be exactly recovered using the predictor coe�cientsfbig, the residue stream ir(n), and the �rst M samples of ix(n) [1]. This is accomplished bythe recursive relationshipix(n) = ir(n) + M�1Xi=0 biix(n� i� 1) M � n � K � 1 (7)If the original data sequence ix(n) is an integer sequence, then the predictor output canbe rounded to a nearest integer and still form an error residual sequence of comparable size.5

In this case, ir(n) is calculated asir(n) = ix(n)�NINTfM�1Xi=0 biix(n� i� 1)g (8)where NINTfg is the nearest integer function. Similarly, the ix(n) data sequence is recov-ered from the residue sequence asix(n) = ir(n) +NINTfM�1Xi=0 biix(n� i� 1)g M � n � K � 1 (9)where it is presumed that the NINTfg operation is performed (at the bit level) exactly asin (8).4.1.1 Determination of Predictor OrderFormulating an optimal predictor order M is crucial to achieving optimal compression [1],because there is a tradeo� between the lower variance of the residual sequence and theincreasing overhead due to larger predictor orders. There is no single approach to the problemof �nding the optimal predictor order; in fact, several methods exist. Each of the methodsentail �nding the sample variance of the zero-mean error residuals ir(n) as determined by(4) for an order M : �ir2(M) = 1K �M � 1 K�1Xi=M ir2(i) (10)One of the easiest methods for �nding an optimal predictor order is to incrementM startingfromM = 1 until �2ir(M) reaches a minimum, which may be termed as theminimum variancecriterion (MVC). Another method, called the Akaike Information Criteria (AIC) [5], involves6

minimizing the following function:AIC(M) = K ln�2ir(M) + 2M (11)The 2M term in the AIC serves to \penalize" for unnecessarily high predictor orders. TheAIC, however, has been shown to be statistically inconsistent, so the minimum descriptionlength (MDL) criterion has been formed [5]:MDL(M) = K ln�2ir(M) +M lnK (12)A method proposed by Tan [6] involves determining the optimal number of bits necessaryto code each residual �(M) = 12log2�2ir(M). Starting with M=1, M is increased until thefollowing criterion is no longer true:(K �M)��(M) > ��(M) (13)where ��(M) = �[�(M)� �(M � 1)] and ��(M) represents the increase in overhead bitsfor each successive M (due mainly to increased startup values and predictor coe�cients).There are several other methods for predictor-order determination; none has proven to bethe best in all situations.4.1.2 Quantization of Predictor Coe�cientsExcessive quantization of the coe�cients fbig can reduce the c.r. and a�ect exact recoveryof the original data stream ix(n). On the other hand, unnecessary bits in the representationof fbig will also decrease the c.r. The representation proposed by Tan et al [1] is brie y7

outlined here. Given predictor coe�cients fbig, an integer representation isibi = NINTf2Nib�I�1big ; 0 � i < M (14)where Nib is the number of bits for coe�cient quantization and I is the number of integer bitsin INTfbigmax, where INTfg is the maximum integer less than or equal to the operand. Forprediction, rather than using the calculated coe�cients fbig, modi�ed predictor coe�cientsare used: b�i = PRECfibig2Nib�I�1 ; 0 � i < M (15)where PRECfg is a function which converts the operand to whatever maximum precisionis available, depending upon the processor one is using. It is desirable to �nd Nib such that�2ir(M) in (10) remains roughly the same for either the set of coe�cients fbig and fb�ig.ibi can be represented as ibi = 2Nib�I�1bi + �i (16)where �i is an error function such that j�ij � 12. Coe�cient quantization error is�bi = b�i � bi = �i2Nib�I�1 (17)Using (8) to form residues, the residue error can be represented as [1]j�ir(n)j = jir(n)� ir�(n)j = jNINTfM�1Xi=0 �i2Nib�I�1 ix(n� i� 1)gj (18)It can also be seen thatjM�1Xi=0 �i2Nib�I�1 ix(n� i� 1)j < M�1Xi=0 j�ij2Nib�I�1 jix(n� i� 1)j (19)< 12M jix(n)jmax2Nib�I�1 (20)8

If the constraint j�ir(n)maxj � 1 is imposed, then the following inequality results:Nib � 1 + (I + 1) + INTflog2M jix(n)jmax2 g (21)As long as this minimum bound is met, residue error will be minimized.4.1.3 Selection of Data Frame Size KSelecting a proper frame size is very important in the process of linear prediction. Eachframe contains �xed data, including fibig and other coding parameters, plus an encoded ver-sion of the residues, fir(n)g. Hence, decreasing the framesize toward zero ultimately causesthe c.r. to decrease toward zero due to the �xed data. On the other hand, a very largeframe size may require an unreasonably high amount of computation with little gain in com-pression performance; moreover, if the data is nonstationary, large frame sizes may actuallyprovide worse compression performance than smaller data frame sizes, due to highly local-ized statistical behaviour of the data. As an example, a 10000 point seismic waveform wascompressed using linear prediction along with arithmetic coding (described in Section 5.2)[6] for di�erent frame sizes; the results are given in Table 2. In this table, two compressionratios are given, due to the fact that the original data was quantized to 32 bits, but thedata samples occupied at the most 12 bits; the compression ratio corresponding to 32 bitquantization is CR1 and for 12 bit quantization is CR2. As can be seen from the results, thecompression ratio tends to increase with increasing frame size, but the gain becomes smallerwith larger frame sizes. This indicates that one should look at the e�ects of di�erent framesizes when using linear prediction, but one should also weigh compression gains with the9

increasing computational complexity associated with larger frame sizes. We also note thatthe monotonically increasing behaviour of the compression ratio with increasing frame sizeis relevant only to the seismic data sequence used in this example; other types of waveformdata may display di�erent results.4.2 Adaptive Linear FiltersWhile the method of linear prediction presented in Section 4.1 is e�ective, it su�ers from theproblem of �nding a solution to the Yule-Walker equations in (5), which becomes increasinglycomputationally expensive with larger data block sizes. Adaptive FIR �lters have beenproposed and used successfully as a way of solving this problem [4]. With a �xed �lter size,M , there is a variety of stochastic gradient methods to adapt the �lter. One common methodis discussed here, the normalized least mean square algorithm (NLMS) [8]. Once again, thesample sequence index is denoted by n, and the set of predictor coe�cients fbig is now time-varying, and is represented by the column vector b(n) = [b0(n) � � � bM�1(n)]T, where []T is thetranspose operator. If the input to the �lter is the vector ix(n) = [ix(n�1) � � � ix(n�M)]T,then a time-varying residue [9] is given byir(n) = ix(n)�NINTfbT(n)ix(n)g (22)If two �xed parameters, a smoothing parameter � and a convergence parameter u, arespeci�ed, then b(n) is computed iteratively as follows:b(n+ 1) = b(n) + �(n)ir(n)ix(n) (23)10

�(n) = u�2M(n) (24)�2M(n) = ��2M(n� 1) + (1� �)(ir2(n� 1)) (25)In order to reverse the algorithm without loss [9], the following equation may be used alongwith (23)-(25): ix(n) = ir(n) +NINTfbT(n)ix(n)g (26)Therefore, one needs the initial predictor coe�cients b(0), the initial data vector ix(0), andthe error sequence ir(n) to reconstruct the original ix(n) sequence exactly. Moreover, in thisapproach, the coe�cients b(n) and the data vectors ix(n) do not have to be transmittedat all after start-up. Referring to Figure 3, we note that the \separable functions" used inencoding must be repeated exactly in decoding. Therefore all quantities used in (23)-(25),that is, u, �, b(0), and ix(0), must be stored in the compressed data frame exactly as theyare used in the encoding process, so that they may be used in the same way in the decodingprocess. We also note that here it is not necessary to break the data up into frames as wasthe case with the linear predictor method described in Section 4.1, and that in general thismethod requires less overhead than the linear predictor method.4.3 Lattice FiltersWhile the NLMS adaptive �lter is e�ective, it sometimes displays slow convergence, resultingin a residue sequence of high-variance. This has lead to the use of a class of adaptive �ltersknown as lattice �lters. Although the implementation of lattice �lters is more involvedthan that of the NLMS algorithm, faster convergence usually makes up for the increased11

complexity. A simpleM -stage adaptive lattice �lter is shown in Figure 4; this �lter is knownas the gradient adaptive symmetric lattice (GAL) �lter. The update equations for this �lterare [9] bi(n) = bi�1(n� 1) + ki(n)fi�1(n) (27)fi(n) = fi�1(n) + ki(n)bi�1(n� 1) (28)ki(n) = ki(n� 1) + �fi(n� 1)bi�1(n� 2) + fi�1(n� 1)bi(n� 1)�2i�1(n� 1) (29)�2i�1(n� 1) = ��2i�1(n� 2) + (1 � �)(f2i�1(n� 1) + b2i�1(n � 2)) (30)fM (n) = ix(n) +NINTfM�1Xi=0 bi(n� 1)ki(n� 1)g (31)where � is a convergence parameter, � is a smoothing parameter, fi(n) is the forward pre-diction error, and bi(n) is the backward prediction error. The recovery equation is [9]ix(n) = fM (n)�NINTfM�1Xi=0 bi(n� 1)ki(n � 1)g (32)An improvement on the GAL is the recursive least-squares lattice (RLSL) �lter [8]. TheMth-order forward and backward prediction error coe�cients, �M(n) and M (n) respectively,are given by: �M (n) = �M�1(n) + �f;M(n� 1) M�1(n� 1) (33) M(n) = M�1(n� 1) + �b;M (n� 1)�M�1(n) (34)where �f;M(n) = � �M�1(n)BM�1(n� 1) (35)12

�b;M (n) = ��M�1(n)FM�1(n) (36)�M�1(n) = ��M�1(n� 1) + M�1(n� 1) M�1(n� 1)�M�1(n) (37)FM�1(n) = �FM�1(n� 1) + M�1(n� 1)�2M�1(n) (38)BM�1(n) = �BM�1(n � 1) + M�1(n) 2M�1(n) (39) M(n) = M�1(n)� 2M�1(n) 2M�1(n)BM�1(n) (40)� is a �xed constant arbitrarily close to, but not equaling 1. The error residuals are computedas [9] ir(n) = ix(n) +NINTf MXi=1 �f;i(n� 1) i�1(n� 1)g (41)The RLSL usually will perform better than the GAL or the NLMS algorithms. As anexample, the seismic waveforms given in Figures 5-6 were each subjected to all three methods;the error variances are given in Table 3. As can be seen, the RLSL outperformed the NLMSand GAL algorithms in nearly every case.4.4 Transform-Based MethodsAlthough linear predictor and lattice methods are e�ective, both methods require signi�cantcomputation and have fairly complex implementations, due in large part to the requirementsfor long frame sizes for linear predictors and long predictor sizes and startup sequences forlattice �lters. This has led to the consideration of transform-based methods for losslesswaveform compression. Such methods have been applied to lossless image compression [10],and provide performance comparable to linear-predictor methods with relatively small frame13

sizes.Given the data vector ixi = [ix(Ni�1) � � � ix(Ni�N)]T, where i refers to ith data frameof size N , an N -point transform of ixi can be found fromzi = TNxN ixi (42)where TNxN is an N by N unitary transform matrix. The term unitary refers to the fact thatTNxN�1 = TNxNT. Many unitary transforms have proven to be e�ective in decorrelatinghighly correlated data, and �nding the inverse transform for a unitary transform is simple.Most transform-based waveform compression schemes achieve compression by quanti-zation of the transform coe�cients, i.e. the elements of zi in (42). While e�ective, suchcompression schemes are lossy. One would thus like to form a lossless method based ontransforms to take advantage of existing hardware for lossy methods. In order to do this,one could retain a subset of transform values and reconstruct from these. For instance, ifonly M real values are retained (M < N), then a new vector results:zi = [zi(0) � � � zi(M � 1) 0 � � � 0]T (43)where zi is also an N by 1 vector. An error residual vector can now be formed:iri = ixi �TNxNTzi (44)This error residual vector should be of lower entropy than the original data vector.While precise quantization of the transform coe�cients is desirable in lossy compressionschemes, the problem is not as complex in lossless schemes. A method proposed in [10]14

involved quantizing the entries of TNxN uniformly. A consequence of this type of quan-tization is that the transform is no longer unitary, and in order to perform the inversetransform operation, the inverse of the quantized transform matrix must be found explicitly.However, this method is still computationally advantageous to linear-prediction, since theinverse transform matrix needs to be calculated only once.As an example, the speech waveform (male speaker, sampled at 8 KHz and quantized to8 bits) in Figure 7 was compressed using a transform method and also the linear-predictormethod described in Section 4.1. Both methods using the same frame size (N = 8) and samenumber of coe�cients for reconstruction (M = 2). The linear predictor solved the Wiener-Hopf equation by the Levinson-Durbin method [4]. The transform used was the discretecosine transform (DCT), with the N by N DCT matrix de�ned as:TNxNij = 1pN i = 0 j = 0 : : : N � 1= s 2N cos (2i+ 1)j�2N i = 1 : : : N � 1 j = 0 : : : N � 1The entries of the DCT matrix were quantized to �ve digits past the decimal point. Theresulting residue sequence for the linear predictor had a variance of 163.5, while for the DCTthe residue variance 122.0 (the original data had a variance of 350.8). For further comparison,an RLSL �lter with 2 stages was used, which yielded a residue variance of 160.5. When an8-stage RLSL �lter was employed, the error variance reduced to 96.8. Moreover, when aframe size of 100 and a predictor length of 8 was used, the linear predictor residue variancefell to 94.5 (the DCT's performance under this scenario worsened).15

These three methods were also compared using the speech waveform in Figure 8, a femalespeaker sampled at 20 KHz and quantized to 16 bits. For N = 8 and M = 2, the residuevariance was 380890 for the DCT method, 523620 for the linear predictor, and 355950 for theRLSL. The speech waveform had a variance of 8323200. For N = 100 and M = 8, however,the linear predictor's residue variance fell to 141080; for 8 stages, the RLSL �lter yielded anerror variance of 141550. The DCT's performance worsened under this scenario. One mayconclude that transform-based methods can possibly be less e�cient in compressing dataquantized to a high number of levels.Transform-based methods are worth examining for lossless applications; they are simpleand can use existing hardware. Unlike linear predictors, they do not require large frame sizesfor adequate performance, and they do not require complex implementations and startupvalues.5 The Coding Compression StageThe second stage of waveform compression involves coding the residue sequence using anentropy coder. The term entropy coder comes from the goal of entropy coding: the lengthin bits of the encoded sequence should approach the number of symbols times the overallentropy of the original sequence in bits per symbol. Several entropy coders exist; the twomost-widely used types, Hu�man and arithmetic coders, will be discussed.16

5.1 Hu�man CodingGiven the integer residue sequence ir(n); M < n < K�1, one can determine \probabilities"(that is, frequencies) of a particular integer value occurring in the residue stream. Eachinteger value appearing in ir(n) is called a symbol; if the set of symbols fsig occurs in ir(n)with corresponding probabilities fpig, then a code can be formed as follows [11]:1. Order the symbols using their respective probabilities fpig in decreasing order, assign-ing each symbol probability a node (the code will be constructed as a binary tree); thesenodes are called leaf nodes.2. Combine the two nodes with the smallest probabilities into a new node whose proba-bility is the sum of the two previous node probabilities. Each of the two branches going intothe new node will be assigned either a one or a zero. Repeat this process until there is onlyone node left, the root node.3. The code for a particular symbol can be determined by reading all the branch valuessequentially starting from the root node and ending at the symbol leaf node.As an example, the Hu�man code for the symbol table given in Table 4 is given inFigure 9. Given a stream of Hu�man-coded data and an associated symbol probabilitytable, each symbol can be decoded uniquely.Hu�man coders are divided into two classes: �xed Hu�man coders and adaptive Hu�-man coders. Fixed Hu�man coding involves using a static symbol table based either onthe entire data sequence or on global information. Adaptive Hu�man coding involves form-17

ing a new code table for each data sequence and encoding the table in addition to thedata sequence. Alternatively, the adaptive Hu�man coder may switch at intervals betweenpreviously-selected code tables, indicating at each interval the selected table. Adaptive Hu�-man coders generally exhibit better performance in terms of the c.r. achieved, yet su�er fromincreased overhead. In real-time applications, �xed Hu�man coders work more quickly andhave simpler hardware implementations [12].5.2 Arithmetic CodingAlthough Hu�man coding is attractive because of its simplicity and e�ciency, it su�ers fromthe fact that each symbol must have a representation of at least one bit. Normally, thisis not a problem; however, in the case where the probability of a symbol approaches one,Hu�man coding becomes ine�cient. This leads to the concept of arithmetic coding [13]-[14].The fundamental idea behind arithmetic coding is the mapping of a string of symbols to aninterval in the interval [0,1] on the real line. The process begins by assigning each symbolsi to a unique interval in [0,1] of length pi. For example, assume an alphabet of symbolsfa,b,cg with respective probabilities f.3,.4,.3g and corresponding intervals assigned as inTable 5. Then the string \ab" is coded as follows: (1) The symbol \a" puts the string in theinterval [0,.3). (2) The symbol \b" implies that the string occupies the middle 40% of theinterval [0,.3], i.e. [.09,.21). (3) Finally, any number is selected from this interval, e.g. .09.This pedagogical example does not really demonstrate any of the compression abilities ofarithmetic coding - a two-symbol string was simply mapped to a two-digit number - however,18

as the string length increases, arithmetic coding produces nearly optimal results.5.2.1 Di�erent Implementations of Arithmetic CodingAs pointed out in [14], the above approach to arithmetic coding su�ers from two problems:(1) incremental transmission is not possible (i.e. the entire sequence must be coded beforetransmission), and (2) the representation for the symbol mapping table is cumbersome andcan produce signi�cant overhead. The �rst problem can be alleviated by transmitting thetop-most bit during coding when the top-most bits at each end of the interval are equal(i.e. the interval has become su�ciently narrowed) [14]. The second problem may be solvedby maintaining a running, reversible symbol mapping table which starts with equal initialprobabilities for each possible symbol. For instance, if each symbol is 8 bits long, then thesymbol mapping table normally has 256 entries, each with initial probabilites of 1/256.As symbol sizes grow larger, it eventually becomes impractical to maintain a table withentries for every possible symbol [15]. For instance, if a particular waveform residue sequenceir(n) spans the range (irL,irH), then the sequence requires a minimum of B bits per samplefor accurate representation with irH� irL << 2B�1. If B is large, say greater than 10, thenmaintaining a symbol table with 2B�1 entries would be highly ine�cient. In [15], a modi�edversion of arithmetic coding is described to address this type of situation. In this version,the interval (irL,irH) is divided up into Nf successive intervals, each of length 2Nr successivevalues. Rather than a symbol mapping table, an interval mapping table is developed fromthe number of symbols present in each interval, i.e. a frequency table f(1 : Nf ). Each19

frequency in this table is represented by Nh bits, where Nh = log2[max f(n)]. Each residueir(n) is assigned an interval number from 1 to Nf , denoted by irI(n). Then each residue canbe represented as ir(n) = irL + 2Nr (irI(n)� 1) + iro(n) (45)where iro(n) is the o�set of ir(n) in interval irI(n), a value between 0 and 2Nr � 1. There-fore, the compressed data is composed of three parts: (1) an overhead portion containinglinear prediction parameters in addition to Nr, Nf , Nh, and f(1:Nf ) (2) an arithmeticallycoded sequence of interval numbers irI(n), and (3) an o�set sequence iro(n), which can berepresented by a minimum of Nr bits per value.For this modi�ed version of arithmetic coding, an optimal value of Nr is required. Itis argued (on the basis of the central limit theorem) that for most kinds of waveforms,the decorrelated residue sequence is approximately Gaussian. This assumption has beencon�rmed experimentally with data such as speech and seismic waveforms. With Gaussianresidues, if the standard deviation of the residue sequence ir(n) is denoted by �ir, then ithas been shown that the entropy of ir(n) is [16]Hir(�ir) � log2[q2�e�2ir] = log2[4:133�ir] (46)If one forms the entropy of the interval sequence irI(n), it will be equal to Hir in (46) whenNr=0. However, the variance of the intervals (with respect to the individual frequencies) �irIis cut in half when Nr is incremented by 1. Therefore the entropy of irI(n) can be derived20

as HirI (�irI ; Nr) = log2[4:133�ir]�Nr (47)Obviously, since entropy is always nonnegative,Nr < INTflog2[4:133�ir]g. IfNr = INTflog2[4:133�ir]g,then HirI will be brought close to zero, indicating very good compressibility. Therefore thisvalue is proposed as the optimal value for Nr. It must be noted that this optimal value onlyapplies when the residue sequence is exactly Gaussian, which cannot be guaranteed. Anempirically derived formula for the optimal Nr is [15]Nr = maxf0; NINTflog2[ RK :3p2]gg (48)where K is the data frame size and R is the range of the residue sequence, i.e. irH � irl.This modi�ed arithmetic coding method was tested with respect to the waveforms inFigures 5-6 using the linear prediction method given in Section 4.1. The performance of themethod was determined by comparing the average number of bits needed to store the residuesequence, bpsir, with the average number of bits per sample of the modi�ed arithmetic coderoutput, bpsiy; the results are given in Table 6.6 ConclusionsThe two-stage lossless compression scheme in Figure 1 has been presented and developed.Di�erent linear �lter implementations of the �rst stage were presented: linear predictors,adaptive �lters, and lattice �lters. The lattice �lters, particularly the RLSL �lter, displayedfast convergence and are desirable for �xed-order predictors. Transform-based decorrelationwas also described, which displayed some advantages over �lter methods; in particular, ease21

of implementation and superior performance for small data frame sizes. The second stage wasdiscussed with respect to Hu�man and arithmetic coding. Modi�cations to basic arithmeticcoding, which are often necessary, were described.While only a few decorrelation and symbol coding methods have been discussed here,many more exist. The particular implementation of two-stage compression depends on thetype of data being compressed. Experimentation with di�erent implementations is oftenadvantageous.References[1] Stearns, S.D., Tan, L.-Z., and Magotra, N. \Lossless Compression of Waveform Datafor E�cient Storage and Transmission." IEEE Transactions on Geoscience and RemoteSensing. Vol. 31. No. 3. May, 1993. pp. 645-654.[2] Blahut, Richard E. Principles and Practice of Information Theory. Menlo Park, CA:Addison-Wesley, 1990.[3] Papoulis, Athanasios. Probability, Random Variables, and Stochastic Processes. NewYork, NY: McGraw-Hill, Inc., 1984.[4] Widrow, Bernard and Stearns, Samuel D. Adaptive Signal Processing. Englewood Cli�s,NJ: Prentice-Hall, Inc., 1985.[5] Marple, S. Lawrence. Digital Spectral Analysis with Applications. Englewood Cli�s, NJ:Prentice-Hall Inc., 1987. 22

[6] Tan, Li-Zhe. Theory and Techniques for Lossless Waveform Data Compression. Ph.D.Thesis. The University of New Mexico. 1992.[7] Stearns, S.D., Tan, L.-Z., and Magotra, N. \A Bi-level Coding Technique for Compress-ing Broadband Residue Sequences." Digital Signal Processing. Vol. 2. No. 3. July, 1992.pp. 146-156.[8] Haykin, Simon. Adaptive Filter Theory. Englewood Cli�s, NJ: Prentice-Hall, Inc., 1991.[9] McCoy, J.W, Magotra, N., and Stearns, S. \Lossless Predictive Coding." 37th IEEEMidwest Symposium on Circuits and Systems. Lafayette, LA. August, 1994.[10] Mandyam, Giridhar, Ahmed, Nasir, and Magotra, Neeraj. \A DCT-Based Scheme forLossless Image Compression." SPIE/IS&T Electronic Imaging Conference. San Jose,CA. February, 1995.[11] Jain, Anil K. Fundamentals of Digital Image Processing. Englewood Cli�s, NJ: Prentic-Hall Inc., 1989.[12] Venbrux, Jack, Yeh, Pen-Shu and Liu, Muye N. \A VLSI Chip Set for High-SpeedLossless Data Compression." IEEE Transactions on Circuits and Systems for VideoTechnology. Vol. 2. No. 4. December, 1992. pp. 381-391.[13] Rissanen, J. and Langdon, G.G. \Arithmetic Coding." IBM Journal of Research andDevelopment. Vol. 23. No. 2. March, 1979. pp. 149-162.23

Waveform Amplitude Spectrum Compressibility1 Uniform White Low2 Nonuniform White Some3 Uniform Nonwhite More4 Nonuniform Nonwhite HighTable 1: Compressibility of Example Waveforms[14] Witten, Ian H., Neal, Radford M., and Cleary, John G. \Arithmetic Coding for DataCompression." Communications of the ACM. Vol. 30. No. 6. June, 1987. pp. 520-540.[15] Stearns, Samuel D. \Arithmetic Coding in Lossless Waveform Compression." IEEETransactions on Signal Processing. To appear in August, 1995 issue.[16] Woodward, P.M. Probability and Information Theory, 2nd Edition. Pergamon Press,1964. p. 25.7 Further InformationInformation on advances in entropy coding can be found in the IEEE Transactions on Infor-mation Theory. Information on waveform coding can also be found in the IEEE Transactionson Geoscience and Remote Sensing, the IEEE Transactions on Speech and Audio Processing,and the Journal of the Acoustic Society of America. Information on statistical signal analysisand stationarity issues can be found in the text Random signals : Detection, Estimation,and Data Analysis by K. Sam Shanmugan and Arthur M. Breipohl (Wiley, 1988).

Frame Size CR1 CR250 5.8824 2.2059100 6.8966 2.5862500 8.2645 3.09921000 8.5837 3.21892000 8.8594 3.32233000 9.2094 3.45354000 9.2007 3.45035000 9.2081 3.45306000 9.2307 3.46157000 9.2317 3.46198000 9.2219 3.45829000 9.2783 3.479410000 9.3002 3.4876Table 2: E�ects of Frame Size on Compression Performance (seismic waveforms)File Input NLMS GAL RLSLanmbhz89 1.87e9 3.85e4 6.81e1 2.07e1anmbhz92 3.18e2 2.26e1 1.06e1 9.24e0anmehz92 7.84e4 2.17e4 2.28e4 1.82e4anmlhz92 4.44e5 9.40e4 7.61e4 4.54e4kipehz13 2.45e3 6.74e0 8.22e0 5.67e0kipehz20 1.52e4 9.56e1 7.17e1 4.45e1rarbhz92 1.02e7 2.05e3 4.36e2 2.71e2rarlhz92 1.76e8 1.42e7 2.21e7 2.34e7Table 3: Residue VariancesSymbol Probability codea .1 000b .1 001c .2 01d .3 10e .3 11Table 4: Symbol probability Table with codes from Figure 9

Symbol Probability Intervala .3 [0,.3)b .4 [.3,.7)c .3 [.7,1.0)Table 5: Symbol Mappings for Arithmetic CodingFile Mmin Mmax bpsix bpsir bpsiyanmbhz89 2 8 15.75 5.42 4.26anmbhz92 2 6 10.08 5.01 3.78anmehz92 2 7 7.93 6.93 5.55anmlhz92 3 3 12.70 11.97 10.56kipehz13 1 4 8.15 4.90 3.48kipehz20 2 8 10.03 6.03 4.87rarbhz92 2 8 14.33 7.42 6.20rarlhz92 2 5 17.00 15.98 14.71Table 6: Modi�ed Arithmetic Coding ResultsCoderDecorrelator

Input Output

Figure 1: Two-Stage Compression

Figure 2: Example Waveforms (Waveform 1 is on top; Waveform 4 is on bottom)

Z

Z Z

Z

-1

-1

-1

Σ

Σ

x(k)

e(k)

e(k)

x(k)

e(k-1)

x(k-1)

+

+

+

-

Any repeatable function

F{x(k-1),x(k-2), ... ;

e(k-1),e(k-2), ... }

Same function

F{x(k-1),x(k-2), ... ;

e(k-1),e(k-2), ... }

-1

Figure 3: Prediction and Recovery of Waveform Data.......

.......Σ Σ

Σ Σ

z z-1 -1

k (n)

k (n)

k (n)

k (n)

1

1

M

M

f (n)

b (n)

1

1

f (n)

b (n)

M

M

ix(n)

Output Signal

Figure 4: Gradient Adaptive Lattice (GAL) Filter

0 1 2 3 4 5 6 7

x 104

-5

0

5x 10

5

Sample number

anmbhz89

0 1 2 3 4 5 6 7 8

x 104

-2000

0

2000

Sample number

anmbhz92

0 2000 4000 6000 8000 10000 12000 14000-1

0

1x 10

4

Sample number

anmehz92

0 1 2 3 4 5 6 7 8 9

x 104

-1

0

1x 10

4

Sample number

anmlhz92

Figure 5: Seismic Database, Part A

0 1 2 3 4 5 6 7

x 104

-500

0

500

Sample number

kipehz13

0 1 2 3 4 5 6 7

x 104

-1000

0

1000

Sample number

kipehz20

0 1 2 3 4 5 6 7 8

x 104

-2

0

2x 10

4

Sample number

rarbhz92

0 1 2 3 4 5 6 7 8 9

x 104

-1

0

1x 10

5

Sample number

rarlhz92

Figure 6: Seismic Database, Part B

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-100

-80

-60

-40

-20

0

20

40

60

80

Sample numberFigure 7: Speech Waveform - 8 KHz, 8 bit Sampling0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

-1.5

-1

-0.5

0

0.5

1

1.5x 10

4

Sample numberFigure 8: Speech Waveform - 20 KHz, 16 bit Sampling

b .1

a .1

c .2

d .3

e .3

0

1

1

0

1

0

1

0

.2

1.0

.4

.6

Figure 9: Hu�man Tree for Table 4

Loseless Waveform Compression

Documents

Transcript of Loseless Waveform Compression