Context-Based Arithmetic Coding for the DCT: Achieving ... · Before moving on to discussing the...

Context-Based Arithmetic Coding for the DCT:Achieving high compression rates with block transforms

and simple context modeling

Kyle Littlefield

August 24, 2006

Abstract

Recent image compression schemes have focused primarily on wavelet transforms, culminatingin the JPEG-2000 standard. Block based DCT compression, on which the older JPEG standardis based, has been largely neglected because wavelet based coding methods appear to offer betterimage quality. This paper presents a simple compression algorithm that uses arithmetic codingon the bit-planes of the DCT coefficients. This algorithm enables lossy compression of grayscaleimages with performance that is comparable to the JPEG-2000 standard and other state-of-the-artwavelet coders.

Contents

1 Introduction 31.1 Image Test Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Measuring Image Compression . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Overall Architecture 52.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Header Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.2 DCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.3 Bit-Plane Production . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.4 Entropy Coding with Context Modeling . . . . . . . . . . . . . . . . 11

2.2 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Context Modeling in CBACD 123.1 Group Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2 Modeling significance bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2.1 Context Dilution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1

3.3 Modeling sign bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.4 Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.5 Modeling refinement bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.6 Subband Scan Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.7 Inter-plane Prediction and Context Trimming . . . . . . . . . . . . . . . . . 21

4 Coefficient Extrapolation 23

5 Results 25

6 Conclusion 256.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

7 Acknowledgements 27

List of Figures

1 The 6 images comprising the test set for comparison of the CBACD coder . 42 Architectural overview of the CBACD coder . . . . . . . . . . . . . . . . . . 53 Basis elements for the 8 * 8 DCT . . . . . . . . . . . . . . . . . . . . . . . . 84 DCT subband types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Transforming DCT coefficients into bit planes . . . . . . . . . . . . . . . . . 106 Sending a single bit plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Coefficients used in picking a context to encode a significance bit . . . . . . . 148 Groupings used to investigate context dilution within the context modeling

for significance bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Reflections used for handling requests for information in blocks that are out-

side the bounds of the image . . . . . . . . . . . . . . . . . . . . . . . . . . . 1810 Subband scanning orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2211 Distribution of normalized DCT coefficients . . . . . . . . . . . . . . . . . . 2412 Plots of CBACD performance verses JPEG, JPEG-2000, and SPIHT . . . . 26

List of Tables

1 Bit distribution between significance, sign, and refinement bits . . . . . . . . 122 Effect of group testing on image quality . . . . . . . . . . . . . . . . . . . . . 133 Effect on image quality of grouping subbands for significance context modeling 174 Image quality change due to reflecting boundaries when finding contexts for

sign and significance bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Number of bits encoded in each refinement context . . . . . . . . . . . . . . 206 PSNR improvement due to context modeling of refinement bits . . . . . . . . 207 Effect on image quality of subband scan order . . . . . . . . . . . . . . . . . 22

2

8 Comparison of CBACD, SPIHT, JPEG, and JPEG-2000 coders . . . . . . . 28

1 Introduction

The Context Based Arithmetic Coding for the Discrete Cosine Transformation (CBACD)is an image compression coder for monochrome images that is based on a discrete cosinetransform (DCT) backed by entropy coding of the bit planes of the DCT coefficients. Itevolved out of the Priority-Based Arithmetic Encoding for Wavelets (PACW) research [1].The PACW encoder uses a simple context-based arithmetic encoding algorithm on top ofa wavelet transform to achieve compression performance that is virtually identical to thatprovided by the JPEG-2000 standard and other state-of-the-art compression methods. ThePACW project evolved from the GTW coder [2], which is based on a wavelet transformfollowed by group testing. The goal of the CBACD research was to alter the PACW encoderin order to use the discrete cosine transform (DCT) instead of a wavelet transform. Althoughthese two transforms accomplish the same task, compacting the information necessary toreconstruct an image into a small number of coefficients, they behave quite differently ata fine-grained level. This necessitated development of a new context modeling scheme toreplace the one used in the PACW coder.

Most research and modern compression standards have been moving away from the DCTand towards other transforms, notably wavelet transforms. The primary reason for this hasbeen the improved performance of wavelet based coders, such as SPIHT [6] and EZW [7],over older DCT methods such as JPEG. The success of these coders has resulted in theJPEG-2000 standard, which looks likely to make wavelet transforms the de facto techniquefor lossy image compression for the foreseeable future. The CBACD research was intendedto explore whether a DCT based coder is capable of competing with such wavelet basedcoders on the basis of image distortion. The technique presented in this paper shows thata DCT based coder can achieve performance that is quite close to that achieved by waveletbased coders.

1.1 Image Test Set

The CBACD coder was designed to be effective on a wide range of natural images. Duringdevelopment a large number of pictures were used to tweak various aspects of the coder.However, all results in this paper are presented using a test set of 6 images, shown in Figure1.

1.2 Measuring Image Compression

Before moving on to discussing the CBACD algorithm, it is necessary to arrive at somestandard by which the distortion of a compressed image can be compared to the original.One such distortion measure is the peak signal to noise ratio (PSNR). Like all rate distortionmetrics, PSNR is supposed to be an accurate reflection of the human-perceived distortion

3

(a) Barbara, 512 by 512pixels

(b) Boat, 512 by 440 pixels (c) Mandrill, 512 by 512pixels

(d) Lena, 512 by 480 pix-els

(e) Pentagon, 1024 by1024 pixels

(f) Stream, 512 by 512pixels

Figure 1: The 6 images comprising the test set for comparison of the CBACD coder

induced by an image alteration process. It is defined in the following manner. Let X =(x1, x2, . . . , xn) be the original image, and Y = (y1, y2, . . . , yn) be the altered image. As apartial step in defining PSNR, the mean squared error (MSE) is defined. Just as it sounds,the MSE (σ2) is simply the mean of the squared per-pixel difference between the two images:

σ2 =n∑

i=1

(xi − yi)2 /n

PSNR is then defined as a logarithmic function of the MSE, given by:

10 log10

(M2

σ2

)The value M is a constant, and is simply the range of values a pixel can have. In the caseof this paper, where 8-bit grayscale images are exclusively used, M will always be 255. Inorder to achieve a linear increase in PSNR, an exponential decrease in MSE is needed.

PSNR has several major drawbacks. The first is that the value is not directly compa-rable across images, as there is no fixed scale for PSNR. Certain images lend themselvesto compression and will have a higher PSNR. Secondly, PSNR measures distortion only atthe per-pixel level. Structural distortion at a higher level (such as blocking artifacts) hasno direct impact on PSNR values. Finally, PSNR does not correlate well with perceiveddistortion. This is mostly a side effect of the previous point. Certainly, other distortion

4

originalimage DCT

inverseDCT

reconstructedimage

extrapolate coefficients (to bit planes that were not encoded)

denormalize coefficientsadd DCaverage

recombine bit planes

bit planes contextmodeling

arithmeticcoding

subtract DCaverage

normalizecoefficients

split intobit planes

significance pass

refinement pass

Encoder

Decoder

bit planes contextmodeling

arithmeticcoding

refinement pass

significance pass

encoded bit stream

DC average

DC average

normalization constant

normalization constant

header data

1001111011

header data

Figure 2: Architectural overview of the CBACD coder

measures have been proposed. Wang et.al. (among other researchers) has proposed severalimage quality metrics — a simple PSNR inspired metric [9] and a more complicated metricthat takes into account both local and structural changes [10], among other aspects. Thedownside of such metrics is that they are often much harder to implement then the PSNRmetric, and do not enjoy the widespread use that PSNR does. During the course of theCBACD research, the algorithm in [9] was implemented, but the results did not differ sig-nificantly from PSNR, at least across the test images used. Given the lack of any widelyaccepted alternative to PSNR, this paper chooses to stick with its use for measuring imagedistortion.

2 Overall Architecture

2.1 Encoding

The overall plan of the CBACD algorithm is not particularly novel. It starts by breaking theinput image into blocks of 8 pixels on a side, runs a discrete cosine transform (DCT) on eachblock, and then progressively codes the bit-planes of the produced DCT coefficients using anentropy coding scheme. The first part of this process, the DCT, is a standard in many imagecompression schemes. The second part, encoding the produced coefficients by bit-planes, islikewise a fairly standard technique. Figure 2 gives a compact graphical representation ofthe encoding and decoding process.

The novelty of the CBACD algorithm to this standard process lies in two areas:

• The context model used in the entropy encoding step

• The extrapolation of partially-transmitted coefficients in the decoding process

5

The details of the novel parts of the CBACD algorithm will be presented in the followingsections. This section will give an overview of the standard steps involved in the CBACDimplementation, and, where it is relevant, point out approaches used in the CBACD imple-mentation.

2.1.1 Header Contents

The first step of the encoding process is to output any meta-information about the originalimage, as well as any meta-information needed by the encoding algorithm. Collectively, thisdata forms the header of the compressed stream. In the case of the CBACD implementation,this is done by sending the uncompressed header of the input file. The header contents arenot of particular interest in the CBACD algorithm, but the consideration of the header isimportant when comparing the CBACD algorithm to other compressions techniques.

The BMP files that the CBACD implementation works with contain a large amount ofheader data, typically on the order of 1 KB. Most of the information in the BMP headeris redundant, so the actual number of bytes necessary to send this information in a fullyrealized encoder would be significantly fewer. The compression of this data is of no interestto the CBACD algorithm. However, in order to compare the CBACD algorithm againstother algorithms, the header must be taken into account, as the size of the header reducesthe number of bytes available for the compressed image data, and hence effects the qualityof the reconstructed image. The simplest thing to do would be to ignore the size of theheader, effectively assuming that the CBACD algorithm could be implemented so as torequire no header data. This is clearly not the case; information such as the height andwidth of the image is contained in the header. On the other hand, counting the full size ofthe uncompressed BMP header against the CBACD algorithm would unfairly disadvantagecomparisons of the CBACD algorithm against coders that have optimized the transmissionof header data. In order to find a compromise between these two extremes, the CBACDimplementation counts a constant of 200 header bytes, regardless of the size of the actualheader that is sent. The size of 200 bytes was chosen because implementations of the JPEGand JPEG-2000 standards, when run on single pixel images containing representative headercontents, produce output files of around 200 bytes. For example, when run on the Barbaraimage reduced to 1 pixel, the JPEG algorithm produces a 332 byte output, while the JPEG-2000 implementation produces a 193 byte output. As the modern JPEG-2000 standard cancompress a single pixel image into less than 200 bytes (including the image data itself), itseems quite reasonable to assume that the CBACD algorithm could easily be made to workwith a 200-byte header.

2.1.2 DCT

Once the header information is sent, the image is broken into blocks 8 pixels on a side. Toeach block, a discrete cosine transformation (DCT) is applied. The purpose of the DCT, likeall compressions transforms, is to compact most of the information necessary to reconstruct

6

the image into a smaller space. If each block of 64 pixels in a block is viewed as a vector,then we can express this vector in terms of the 64 basis vectors:

b1 = (1, 0, 0, 0, . . . , 0)

b2 = (0, 1, 0, 0, . . . , 0)

b3 = (0, 0, 1, 0, . . . , 0)...

b64 = (0, 0, 0, . . . , 0, 1)

This is simply the standard way of thinking about an image, with each pixel value beingindependent. Here though, we are focusing on blocks of 64 pixels instead of the individualpixels. Viewed this way, each block of 64 pixels is easily written as a linear sum of thesebasis vectors, i.e in the form

∑64i=1 cibi for some coefficients c1, c2, . . . c64. The downside to

this expression for the block is that almost all of the ci will be of approximately the samemagnitude. Indeed, for 8-bit grayscale images we might expect the average ci to be 128,while the maximum ci can only be twice this much. The DCT reexpresses each block interms of a different set of basis vectors. The basis vectors for the DCT are wave structuresgenerated by independent horizontal and vertical cosine functions. Figure 3 give a visualrepresentation of the 64 basis elements of an 8 by 8 DCT.

Computing the DCT is quite easy. First, create an 8 by 8 matrix as follows:

Ti,j =

√

18cos (2j+1)iπ

2·8 i = 0, 0 ≤ j < 8√28cos (2j+1)iπ

2·8 1 ≤ i < 8, 0 ≤ j < 8

The DCT of an image block B (viewing the image block as an 8 by 8 matrix) is then TBT−1.Using matrix multiplication is a relatively expensive way to compute the DCT, but its verysimple to implement, so this is what the CBACD implementation does.

A slight downside of using the DCT is that it restricts processing to images that are amultiple of the block size. In the case of CBACD, input images need to be a multiple of 8pixels high and a multiple of 8 pixels wide. Several standard techniques are available whenthis is not the situation. All of these involve extending the image to the next multiple of theblock width. This can be done by assigning pixels outside the bounds of the original imagesa constant value or, as is more frequently done, reflecting the image across the edge so as toretain any frequency characteristics. As the techniques for handling this situation are fairlystandardized, and have little effect on image quality, the current CBACD implementationdoes not handle this, it simply rejects images that do not meet the restrictions.

Once the DCT has been performed on each block, there two ways to index the coefficients.One way is to first index by block, then by which coefficient in the block. Alternately, we cangroup all coefficients located in the same position in different blocks and use this as the firstindex and the block as the second index. The set of coefficients having the same position

7

Figure 3: Basis elements for the 8 * 8 DCT

in a block is called a subband. Subband grouping is useful because coefficients in adjacentblocks are often correlated within the same subband.

The subbands in the DCT can be broken down into two groups. The first is the coefficientin the top left of the matrix for a block. This is referred to as the direct coefficient (DC),because its corresponding basis element has no wave characteristic (see Figure 3). Theremaining 63 coefficients in a block are generically known as alternating coefficients (AC).The AC coefficients can be broken down into further subcategories. Figure 4 shows one waythat this can be done, breaking the AC coefficients into three types, based on the behaviorof the corresponding basis vector, as shown in Figure 3. There is no exact science to thissecondary breakup of AC subbands. Other researchers split these up differently. In any case,the CBACD coder treats all subbands identically.

After each block has been transformed with the DCT, additional post processing is per-formed on the coefficients of the DCT. First, the average of the DC subband coefficientsis subtracted from every DC subband coefficient. Since the DC coefficient is essentially ameasure of the average pixel value in a block, all DC coefficients are positive. This contrastsit with the coefficients in all other subbands, which are roughly symmetrically distributedaround 0. By subtracting off the DC average, the characteristics of the DC subband aremade to closely resemble those of the other 63 subbands. It also reduces the maximum pos-sible DC coefficient by approximately a factor of 2. Since the DC coefficients are often thelargest in an image, this has the effect of making coefficients relatively larger, as compared

8

Low frequency subbands

High frequency subbands

Direct (DC)

Horizontal (H)

Vertical (VF)

High-frequency (HF)

Subband Type

Low-frequency (LF)

Figure 4: DCT subband types

to the absolute value of the largest coefficient. The DC average is sent to the decoder aspart of the header to enable it to reverse this process.

2.1.3 Bit-Plane Production

The next step of the encoding process is to break the DCT coefficients into bit planes. First,the coefficients are normalized to the range (−1, 1). This is done by dividing all coefficientsby the magnitude of the maximum coefficient (plus a little extra so the range (−1, 1) isexclusive). The value that serves as the divisor in this step is known as the normalizationconstant. As the normalization constant is needed by the decoder, it forms part of theheader, just as is done with the DC average.

The purpose of normalization is to place the decimal point at a known point in allcoefficients. By normalizing all coefficients to be less than 1, the encoder avoids having tosend any explicit information about where the decimal point lies in each coefficient. All thathas to be sent is the fractional part of each coefficient. There are several ways to do this;in CBACD, as in many other coders, bit plane coding is used. Bit plane coding works bymaking multiple passes over all coefficients, sending a little bit of information about eachcoefficient on every pass. The simplest amount of information to send on each pass is onebit from the binary expansion of the coefficients. So the first bit-plane corresponds to tellingwhether each (normalized) coefficient is ≥ 1

2. The second bit-plane corresponds to telling

whether 14

should be added to each coefficient, the third whether 18

should be added, and soforth. The signs of the coefficients form a special bit-plane where − maps to 0 and + to 1.Figure 5 shows the process of transforming the DCT coefficients into bit planes.

Sending a bit plane is further broken down into two passes. The first pass is the signifi-cance pass. Bits sent during this pass are called significance bits. In this pass, one bit is sentfor each coefficient that has not yet become significant. A coefficient is considered significantif it has a 1 in a higher bit plane. That is, if looking only at the bits which have alreadybeen encoded, the coefficient is known to be non-zero. For example, if we are sending the

9

Sign Plane

1

2

3

4

5

Raw Coefficients

-1245.7…

+800.2…

+345.6…

-1.4…

-100.9…

-600.1…

-.83047…

+.53347…

+.23040…

-.00093…

-.06727…

-.40007…

-.11010…

+.10001…

+.00111…

-.00000…

-.00010…

-.01100…

Normalized (binary)

Normalized Bit Planes

…

01

01

00

11

10

00

01

00

01

10

00

01

10

01

00

-+

-+

--

Figure 5: Transforming DCT coefficients into bit planes

001001010001000010101110011001010101000001110111000111001010000001011100000000011101010011010011000010010011

RefinementBits

Bit Plane 4

1

2

3

4

5

6

7

8

9

# Bits

SignificanceBits

Figure 6: Sending a single bit plane

4th bit plane of the coefficients illustrated in Figure 6, then the significance pass consists ofsending one bit for each of coefficients 2, 4, 5, 6, 7, and 9. Coefficients 1, 3, and 8 are skippedas they are already significant. Embedded within the significance pass is the coding of thesign plane. Only when a coefficient becomes significant is the sign sent. For example, in theexample, the sign of coefficient 5 is coded directly after the significance bit for coefficient 5.By only coding the sign plane when it is needed, the coder can avoid sending large portionsof the sign plane. This is due to the DCT compacting most coefficients to be near 0, whichresults in many coefficients never becoming significant during the encoding process.

The second pass of each bit-plane is the refinement pass. Not surprisingly, bits sentduring this pass are called refinement bits. In this pass, the bits for coefficients which werepreviously significant are sent. In the example, one bit is sent for each of coefficients 1, 3,and 8. The reason for splitting the encoding process into two passes is that learning the

10

value of a refinement bit (in terms of its effect on the reconstructed image) is somewhat lessthan the effect of learning that a coefficient is significant.

2.1.4 Entropy Coding with Context Modeling

The final part of the encoding process is to send each of the bits from the bit-plane codingprocess through an entropy coder in order to take advantage of any skew towards either 0or 1 within the bits being coded. For example, under the expectation that only a few DCTcoefficients are large, while most are close to 0, the first several significance bit planes willbe dominated by 0s. Any form of entropy coding will take advantage of this, essentially bysending less than 1 bit for each 0 and more than 1 bit for every 1. The limit to which anyentropy encoder can compress a symbol is given by − log2(p), where p is the probability ofthe symbol. The average number of bits sent by the encoder per input symbol is easily givenby summing this up across all symbols:∑

symbol

−psymbol log2(psymbol)

Capturing the probability that the next bit coded will be a 0 or 1 is done throughcontext modeling. The idea behind context modeling is quite simple. Bits to be encodedare grouped by some function. The grouping function can only make use of the informationthat has already been sent, as the decoder must use the same grouping function. Each groupforms a context and every context maintains frequency counts for the number of 0s and 1ssent using that context. As each bit is coded, the frequency counts for the context it belongsto are updated. The frequency counts are then used to predict the probability distributionfor future bits, using the obvious formula.

The specific form of entropy encoding used by CBACD is arithmetic coding. The strengthof arithmetic encoding is that it approaches the theoretical entropy coding limit as the lengthof the encoded sequence goes to infinity. The only downside is that it is computationallyexpensive, which can certainly prove to be a limit in real-time or embedded situations, butis hardly a limitation for a research project.

2.2 Decoding

Decoding is essentially just the reverse of the encoding process, as Figure 2 shows. Firstthe header is read from the compressed data stream, followed by the normalization constantand DC average. Then, the bit planes of the DCT coefficients are decoded, using thesame context modeling and arithmetic coding scheme that is used in the encoding process.The bit-planes are then recombined to form the DCT coefficients. At this point, the onlydifference between the encoding and decoding process occurs. The difference involves thethe extrapolation of DCT coefficients to the low-significance bit-planes that were not presentin the compressed data stream. The basic idea of coefficient extrapolation is to guess at thevalues in these low-significance bit-planes, so as to reduce the distortion in the reconstructed

11

Image (Bit Rate, bpp)Barbara (0.25) Barbara (1.0) Lena (0.25) Pentagon (0.25)

Bit TypeSignificance 1038752 1552934 1020167 3512034Sign 11632 48194 11558 51194Refinement 9056 44702 8521 9865

Table 1: Bit distribution between between significance, sign, and refinement bits, as encoun-tered in representative encoding runs

image. The manner in which this is done, and its effect on the output image quality isdiscussed in detail in Section 4. Once the DCT coefficients have been extrapolated, they arere-normalized (multiplied by the normalization constant), and the DC average is added backto all DC coefficients. An inverse DCT is run on each block to reconstruct the pixel valuesof the image, and this is recombined with the transmitted image header to give the entirereconstructed image.

3 Context Modeling in CBACD

The main innovation of the CBACD coder is its context modeling that drives the arithmeticcoder. Arithmetic coding is used to encode all parts of the bit-planed coefficients – signifi-cance bits, sign bits, and refinement bits. The extent to which arithmetic coding can effectthe overall result is not equal between these types. During a normal encoding/decodingsession, the number of significance bits that are encoded dwarfs the number of sign andrefinement bits. Table 1 shows the number of each type of bit that is sent during typicalencoding runs. In addition to dominating in the number of bit sent, the distribution of bitsis also at its most skewed in significance bits, so proper context modeling of these bits canhave a proportionally larger effect than the table indicates. This does not mean that it isunimportant to model the refinement or sign bits, just that the largest gains from better con-text modeling for these bits are likely to be canceled out by slight inefficiencies in modelingthe significance bits.

3.1 Group Testing

Before arithmetic coding is applied, CBACD makes limited use of group testing. Manycoders, such as EZW [7], SPIHT [6], and GTW [2], are based entirely on group testing.As observed by Hong and Ladner [2], the zerotree methods used in EZW and SPIHT areessentially special group testing algorithms. The basic idea behind group testing is to checkseveral values at once. If all values in the group are 0, then a single 0 bit is written tothe output. If one or more of the values in the group is 1, then the bit 1 is written tothe output. The group is then divided into 2 or more pieces and the process is repeatedrecursively, or another coding process is applied to all elements of the group. Because grouptesting allows a large number of 0s to be encoded with a single bit, it is very efficient when

12

Bit rate (bpp)0.0625 0.125 0.25 0.5 1.0

Image

Barbara 0.02 0.03 0.02 0.01 0.00Boat 0.09 0.06 0.08 0.04 0.01Lena 0.14 0.10 0.07 0.01 0.01Mandrill 0.04 0.03 0.03 0.01 0.00Pentagon 0.02 0.01 0.01 0.00 0.00Stream 0.02 0.05 0.02 0.01 0.01Average 0.05 0.05 0.04 0.01 0.01

Table 2: PSNR improvement (dB) due to use of group testing all coefficients in each subband.The effect is most pronounced at low bit rates, as group testing is only really effective withinthe first 2 or 3 bit planes.

most of the bits to be encoded are 0. On the other hand, entropy encoding methods, andin particular, arithmetic coding, are equally effective regardless of the distribution of 0s and1s in the input, and are theoretically capable of being at least as good as any group-testingscheme. However, in order to reach peak efficiency, the context modeling must provide ahighly accurate prediction of the probability that a bit is a 0 or a 1. The effect of even aslightly inaccurate probability can be large, particularly when the bit distribution is highlyskewed one way or the other. Initially the CBACD coder did not use any group testing.However, it was observed that it typically takes 4 to 10 bits to send a subband consistingof all 0s in a 512 by 512 pixel image (i.e. a subband with 4096 coefficients), so the grouptesting phase was added.

In order to take advantage of group testing, for each bit plane and subband, the coderfirst runs a group test to see if any of the DCT coefficients in the subband are significant.If this group test indicates that one or more is significant, the coder encodes significanceinformation for every coefficient in the subband. Otherwise the significance pass for thesubband is skipped. All group testing bits are sent using a single context. Pragmatically,this use of group testing turns out to give a sizeable increase in PSNR, as it reduces this 4-10bits to 1 (actually slightly less on average, as the group testing bits are themselves assignedto a context and arithmetically encoded). Table 2 shows the PSNR improvement this yields.As the table shows, the effect of this use of group testing on image distortion is significantlygreater at low bit rates. This is not really surprising. The group testing phase only reducesthe number of bits sent when there are no significant coefficients in a subband. When 1 ormore coefficients are significant, this technique actually sends one bit more than it wouldotherwise. As a subband is much more likely to have no significant coefficients in the firstbit planes, and these bit planes make up a proportionally greater amount of the compressedbit stream when the bit rate is low, the effect is most pronounced at low bit rates.

13

Coefficient being encoded

Intra-block/inter-subbandcoefficients

Inter-block/intra-subbandcoefficients (For which information from the current bit-plane is available.)

Inter-block/intra-subbandcoefficients (For which only information from the previous bit-plane is available.)

Figure 7: Coefficients used in picking a context to encode a significance bit. Note that forsome coefficients, significance information is already known for the current bit plane, whilefor others it is known only for the previous bit plane.

3.2 Modeling significance bits

At low bit rates, significance bits are by far the most numerous type of bit coded, so itmakes sense to have a relatively complicated model for them, backed by a large number ofcontexts, and this is what CBACD does. There are essentially two areas of correlation thatcan be used in assigning significance bits to contexts, namely frequency correlation betweenbasis elements and spatial correlation between blocks in the image. Frequency correlations isknown as either inter-subband or intra-block correlation, while spatial correlation is knownas either intra-subband or inter-block. Figure 7 shows the coefficients that are used to derivethe context for encoding a significance bit in relation to where they occur within the DCTblocks of the image.

CBACD takes a very simple approach when using these two types of correlation todetermine the context for a significance bit. The context for a significance bit is assignedbased on the weighted sum of two counts. The first count measures intra-subband correlation.This count is itself a weighted count, by a distance function that measures how far awayeach block involved is from the block being coded. The second count measures intra-blockcorrelation by simply counting how many of the other coefficients in the block are significant.The DC subband is excluded from this count as it is almost always significant and haslittle intra-block correlation with the other subbands. These two counts are weighted andadded together to give a function that represents (in some manner) the probability of the

14

significance bit being a 1. Expressed mathematically:

f(sub, x, y) = cspatial

∑i,j

isSig(sub, i, j)

dist(i, j)+ cfrequency

63∑i=1

isSig(i, x, y)

where sub is the subband being coded, and x and y are the horizontal and vertical positions ofthe block that the coefficient comes from. The function isSig(sub, x, y) determines whetherthe coefficient in the sub subband of the DCT block at location (x, y) is known to be signif-icant based on the bits of it that have been previously coded. From f(sub, x, y), a contextindex is generated by rounding to an integer through min(bf(sub, x, y)+cfloorc, maxIndex).The maxIndex restriction is applied in order to keep the number of contexts needed fairlysmall, while the cfloor value can essentially be thought of as reducing the size of the firstcontext relative to the other contexts.

This basic formula provides plenty of room for trial and error experimentation in orderto find choices that optimize image quality. Four areas of the basic formula can be easilytweaked, namely:

• The values of the constants cspatial, cfrequency, and cfloor.

• The maxIndex value.

• The distance function dist(i, j).

• The size and shape of the area over which∑

i,jisSig(sub,x+i,y+j)

dist(i,j)is computed. In other

words, the sets of pairs (i, j) over which the sum is computed.

There is no mathematical way to determine optimal values for these choices. By a trial anderror process, the following were arrived at as giving good results for the CBACD coder.

• For the constants, cspatial = 1.5, cfrequency = 0.225 and cconstant = 0.375.

• For the maxIndex value, a value of 4 was chosen, giving 5 contexts per subband. Morecontexts give somewhat improved results at high bit rates (1.0 bpp or higher), but leadto slightly lower PSNR values for low bit rates (0.25 bpp or lower).

• The distance function was set to be the square of the planar distance between the twoblocks, i.e. dist(i, j) =

√i2 + y2. Several other formulas were tried, such as Manhattan

distance, planar distance, max vertical or horizontal distance, etc with varying success.

• The area over which the intra-subband correlation count is performed consists of allblocks within a Manhattan distance of 3 from the block containing the coefficient beingcoded.

Each subband is assigned its own set of 5 contexts for modeling significance bits. Thisgives a total of 320 contexts involved in modeling significance bits.

3.2.1 Context Dilution

With 320 contexts involved in modeling significance bits, there is a definite possibility ofcontext dilution. This issue was explored by grouping subbands together in order to reduce

15

the number of contexts. A variety of groupings were tried, all of which group subbands withsimilar frequency characteristics. The eight groupings tried are shown in Figure 8. Each ofthe groupings reduces the number of contexts used for encoding significance bits by a factorof 3 to 64. The groupings work as follows:

• Grouping (a) is the grouping initially used in CBACD. Each subband is assigned it ownset of 5 contexts; no subbands are grouped together. A total of 320 contexts are used.

• Grouping (b) was used to establish a baseline for comparison. All subbands are groupedtogether to use one set of 5 contexts. All subbands are grouped together. A total of 5contexts are used.

• Grouping (c) attempts to group subbands based on the overall frequencies of the basiselements of the subbands. Subbands are assigned to contexts based on their straight-linedistance from the DC subband. A total of 50 contexts are used.

• Grouping (d) is the same as (c), but additionally separated on whether the higherfrequency component of the basis element for the subband is vertical or horizontal. Atotal of 95 contexts are used

• Grouping (e) is done by making each diagonal a group. An alternate formulation of thisgrouping is by the Manhattan distance from the DC subband. A total of 75 contextsare used.

• Grouping (f) is done by grouping all basis elements with the same maximum frequency(either horizontal or vertical). A total of 40 contexts are used.

• Grouping (g) is the same as (f), but additionally separated on whether the higherfrequency of the basis element is vertical or horizontal. A total of 75 contexts are used.

• Grouping (h) groups based broadly on the appearance of the basis elements. The groupsare taken from Tu and Trac’s CEB coder [8]. A total of 35 contexts are used.

The effect on image quality due to the various groupings is shown in Table 3. While theseresults show that some context dilution occurs when each subband is modeled individually,the effect is very slight. None of the alternate groupings do significantly better than theoriginal CBACD coder. Indeed, each of the groupings produces a lower-quality output for atleast one of the test images. Using too few contexts, such as is done in the “Single” groupingcan have a large negative effect, but using too many contexts is not a problem. Contextdilution does not appear to be a practical concern, at least with the model used in CBACD.

Since no grouping emerged as a clearly compelling alternative to the original CBACDimplementation, no change was made to the CBACD coder as a result of this study.

3.3 Modeling sign bits

Sign bits are the second most frequently encountered bit type. The CBACD coder usesbasically the same plan for choosing a context for sign bits as it uses for significance bits.For sign bits, however, there is no intra-block correlation, only intra-subband correlation is

16

0 1 2 3 4 5 6 7

8 9 10 11 12 13 14 15

16 17 18 19 20 21 22 23

24 25 26 27 28 29 30 31

32 33 34 35 36 37 38 39

40 41 42 43 44 45 46 47

48 49 50 51 52 53 54 55

56 57 58 59 60 61 62 63

(a) Basic (originalCBACD)

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

(b) Single group

0 1 2 3 4 5 6 7

1 1 2 3 4 5 6 7

2 2 2 3 4 5 6 7

3 3 3 4 5 5 6 7

4 4 4 5 5 6 7 8

5 5 5 5 6 7 7 8

6 6 6 6 7 7 8 9

7 7 7 7 8 8 9 9

(c) Circular

0 1 2 3 4 5 6 7

10 1 2 3 4 5 6 7

11 11 2 3 4 5 6 7

12 12 12 4 5 5 6 7

13 13 13 14 5 6 7 8

14 14 14 14 15 7 7 8

15 15 15 15 16 16 8 9

16 16 16 16 17 17 18 9

(d) Circular with hori-zontal / vertical split

0 1 2 3 4 5 6 7

1 2 3 4 5 6 7 8

2 3 4 5 6 7 8 9

3 4 5 6 7 8 9 10

4 5 6 7 8 9 10 11

5 6 7 8 9 10 11 12

6 7 8 9 10 11 12 13

7 8 9 10 11 12 13 14

(e) Diagonal / Manhat-tan distance

0 1 2 3 4 5 6 7

1 1 2 3 4 5 6 7

2 2 2 3 4 5 6 7

3 3 3 3 4 5 6 7

4 4 4 4 4 5 6 7

5 5 5 5 5 5 6 7

6 6 6 6 6 6 6 7

7 7 7 7 7 7 7 7

(f) Max frequency

0 1 2 3 4 5 6 7

8 1 2 3 4 5 6 7

9 9 2 3 4 5 6 7

10 10 10 3 4 5 6 7

11 11 11 11 4 5 6 7

12 12 12 12 12 5 6 7

13 13 13 13 13 13 6 7

14 14 14 14 14 14 14 7

(g) Max frequency withhorizontal / vertical split

0 1 3 3 3

2 5 5 5

4 5 5 5

4 5 5

4

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

(h) Broad types

Figure 8: Groupings used to investigate context dilution within the context modeling forsignificance bits

Grouping(a) (b) (c) (d) (e) (f) (g) (h)

Barbara (0.0625 bpp) 22.196 -0.043 +0.005 +0.004 +0.007 +0.005 +0.004 +0.011Barbara (0.125 bpp) 24.460 -0.082 +0.018 +0.017 +0.019 +0.014 +0.017 +0.016Barbara (0.25 bpp) 27.447 -0.120 +0.022 +0.024 +0.013 +0.017 +0.027 +0.002Barbara (0.5 bpp) 31.454 -0.335 +0.019 +0.028 +0.007 +0.006 +0.032 -0.044Barbara (1.0 bpp) 36.366 -0.353 -0.012 +0.010 -0.034 -0.028 +0.013 -0.116Boat (0.25 bpp) 28.627 -0.136 -0.003 -0.001 +0.023 -0.007 -0.005 -0.002Lena (0.25 bpp) 31.290 -0.100 +0.036 +0.038 +0.033 +0.034 +0.037 +0.020Mandrill (0.25 bpp) 22.847 -0.089 -0.007 +0.000 +0.004 -0.013 -0.005 -0.009Pentagon (0.25 bpp) 28.264 -0.180 -0.003 -0.003 -0.001 -0.008 -0.008 -0.028Stream (0.25 bpp) 24.544 -0.143 -0.007 -0.009 +0.021 -0.011 -0.013 -0.008Average -0.158 +0.007 +0.011 +0.009 +0.001 +0.010 -0.016Average (for 0.25 bpp) -0.128 +0.006 +0.008 +0.016 +0.002 +0.006 -0.004

Table 3: Effect on image quality (measured in PSNR) of grouping subbands for significancecontext modeling. The content of the first column is the PSNR achieved by the originalCBACD coder. The numbers in the other columns are the PSNR improvement (or decrease)relative to this value when the other subband groupings are used.

17

(a) Horizontal edges (b) Vertical edges (c) Corner

Figure 9: Reflections used for handling requests for information in blocks that are outside thebounds of the image. For the corner case, blocks that are horizontally or vertically adjacentto some block in the image are handled just as in the vertical and horizontal situation.Blocks that are in the corner region are handled by reflecting twice, once horizontally andonce vertically.

used, and the count is over a slightly smaller area – only those blocks within a Manhattandistance of 2 of the block where the coefficient being coded lies. Mathematically, the functionused is:

f(sub, x, y) =∑i,j

sign(sub, x + i, y + j)

dist(i, j)

where the sign(sub, x, y) function determines the known sign of the coefficient in the subsubband of the (x, y) DCT block. If the sign is not yet known, then it returns 0. Otherwiseit returns -1 or 1. The only difference from significance bits is that the sum can be negative,so producing a context index from the function value is done by simply truncating thepart after the decimal. Additionally, the context index produced in this way is clamped tobetween 4 and -4, inclusive, for a total of 9 contexts. Unlike for significance bits, only oneset of contexts is used for all subbands. Modeling sign bits in this manner typically leadsto a PSNR improvement of around .05 dB. However, it should be noted that almost all thisimprovement is due to modeling signs in the DC subband. If only the DC subband is codedwith this model, while all other signs bits are coded in a context fixed at a 1:1 ratio, thenthe PSNR improvement is also right around .05 dB.

3.4 Boundaries

An inherent condition of both the significance and sign context modeling is the need todeal with the boundaries of the image. This issue arises because these two models rely oncoefficients in neighboring blocks. For blocks near the edge of the image, these neighboringblocks are non-existent. The classic way to deal with this is simply to assume that allcoefficients in blocks outside the boundaries of the images are 0. They are never consideredto be significant, and their sign is never positive nor negative. Another possible way to dealwith boundaries is to reflect the image across the border. By reflecting blocks across theboundaries of the image, the context modeling for coefficients in blocks at the edges of the

18

Bit rate (bpp)0.0625 0.125 0.25 0.5 1.0

Image

Barbara +0.0018 -0.0018 -0.0011 -0.0008 -0.0005Boat +0.0024 -0.0016 -0.0005 +0.0000 +0.0003Lena -0.0102 -0.0040 -0.0055 -0.0007 -0.0009Mandrill -0.0012 -0.0008 -0.0033 -0.0008 -0.0007Pentagon -0.0009 -0.0005 -0.0002 -0.0002 -0.0002Stream -0.0018 -0.0017 -0.0012 -0.0005 -0.0003Average -0.0016 -0.0017 -0.0020 -0.0005 -0.0004

Table 4: PSNR (dB) change due to reflecting boundaries when finding a context for sign andsignificance bits. Although there are rare instances when this technique improves the imagequality, the majority of the time it results in a slight decrease.

image should be slightly improved. The CBACD coder was modified to try this out, with theimage being reflected at the middle of the edge blocks, as shown in Figure 9. The image isnot mirrored at the exact edge, as while this makes sense for blocks not on the edge, it doesnot make sense for the edge blocks in the image, and these are the blocks most impactedby the boundary handling. Although this idea has sound theoretical reasoning for why itmight positively impact image quality, in practice there is almost no effect on the quality ofthe reconstructed image. Indeed, the image distortion due to the compression process is, onaverage, very slightly degraded by reflecting across the boundaries, as Table 4 shows.

3.5 Modeling refinement bits

Refinement bits are the least common of the three types encoded. Furthermore, they are veryclose to being evenly split among 0s and 1s, and have the least correlation with neighboringcoefficients with which to predict the value. However, there is a slight preference towardsrefinement bits being 0, and this can be exploited by the arithmetic coder if appropriatecontexts are used. This preference towards 0 is due to the fact that the distribution of DCTcoefficients is skewed towards coefficients near 0.

This skew can be used to predict that a 0 is slightly more likely than a 1 for any refinementbit. The question is how to group refinement bits in order to obtain good predictions of howmuch more likely a 0 is than a 1. The CBACD coder takes a fairly simple approach tosolving this problem. It assumes that the skew between 0s and 1s is solely a function ofthe number of bits sent since the coefficient became significant. If there was no correlationamong refinement bits in neighboring blocks and the distribution of DCT coefficients wasexponential, such an approach would be optimal. Neither of these assumptions is exactlytrue. As the charts in section 4 show, DCT coefficients are not distributed according toan exponential function. However, over small portions of the distribution, an exponentialfunction fits quite well. Secondly, refinement bits do have inter-block correlation for low-frequency AC subbands and particularly the DC subband.

19

# 0s # 1s % 0s

RefinementBit Depth

1 3732 1872 66.602 1706 1221 58.283 731 568 56.744 0 05 0 06 0 0

# 0s # 1s % 0s

RefinementBit Depth

1 26758 14575 64.742 13737 9511 59.093 3610 5324 54.234 2927 2677 52.235 1491 1436 50.936 649 650 49.96

Table 5: Number of bits encoded in each refinement context. On the left, Barbara encodedat .25 bpp. There are no counts for refinement bits at depth 4, 5, and 6 because only 4bit-planes were encoded at this bit rate. On the right, Barbara encoded at 2.0 bpp.

Bit rate (bpp)0.0625 0.125 0.25 0.5 1.0

Image

Barbara 0.006 0.017 0.026 0.051 0.056Boat 0.017 0.017 0.030 0.031 0.023Lena 0.016 0.020 0.023 0.011 0.020Mandrill 0.022 0.015 0.019 0.026 0.051Pentagon 0.023 0.019 0.016 0.019 0.031Stream 0.004 0.022 0.011 0.010 0.025Average 0.015 0.018 0.021 0.025 0.034

Table 6: PSNR improvement (dB) due to context modeling of refinement bits. The effectincreases as the bit rate increases, due to the larger percentage of the total bit stream thatis composed of refinement bits.

In the CBACD coder, a different context is used for each number of bits since the coef-ficient became significant. Table 5 shows the number of 0s and 1s encoded in each of theserefinement context for a typical run. All subbands are grouped together for this contextmodeling, although the skew may be different in different subbands. The choice to groupall subbands together was chosen in order to prevent context dilution. Given the relativelysmall number of refinement bits that are coded, a dilution factor of 64 in order to break upcoefficients by subband would be counterproductive. This scheme has the potential to use upto as many contexts as there are bit planes, which in the case of the CBACD implementationis 52. In practice however, it is very rare to code beyond the 8th bit plane, even at veryhigh compression rates of 2 bpp or higher. So the actual number of refinement bit contextsthat are used to encode any image is normally between 2 and 8, depending on the imageand the compression rate. Overall, the effect of modeling refinement bits on image quality isminimal, typically only a couple hundredths of a dB of PSNR. However, the effect is alwaysa positive one. Table 6 shows the PSNR improvement due to encoding refinement bits usingthis context model.

20

3.6 Subband Scan Order

The entire bit planing process depends on coefficients being coded according to some orderingfunction. This order may either be computed dynamically based on the previously encodedinformation, such as is done in PACW, or it may be done according to a static ordering.Many compression algorithms that use the DCT make use of a static scan order over the DCTsubbands, coding all coefficients in a subband (for each bit plane) before moving on to thenext subband. Subbands are typically ordered starting with the DC subband and progressingto higher frequency subbands. Most coders make use of some sort of diagonal scan orderacross the subbands. A diagonal scan order ensures that lower frequency subbands areencoded before higher frequency subbands. Some typical examples are the zig-zag orderingused in JPEG and a straight diagonal ordering in which all diagonals are coded in the samedirection.

The CBACD coder initially used a slightly modified diagonal scan order, based on theobservation that subbands corresponding to primarily horizontal and vertical basis elementsare slightly more likely to contain significant coefficients than subbands with high-frequencycomponents in both directions. To take advantage of this, the initial CBACD coder encodedsubbands in an inward diagonal order: each diagonal was done before the next, but workinginward from both ends.

Later research focused on optimizing this subband order. In order to benefit the mostfrom entropy coding, subbands should be scanned in an order that maximizes the numberof 1s sent. This is simply a reflection of the fact that in the progressive coding scheme, thedecoder will assume 0s for any data which is not sent. From the point of view of improvingimage quality, sending a 0 adds no information that changes the decoded image. Therefore,an optimal ordering for subbands is simply the one that orders them by decreasing numberof coefficients which become significant within the bit planes coded. In order to determinethis ordering, a sample of 230 images was taken from the CBIRT Ground Truth [5] databaseat the University of Washington. The number of significant coefficients (within the first 5bit-planes) was computed, and this was used to create an optimal subband scanning order.The downside to using the CBIRT Ground Truth database is that the images in it have beenpreviously compressed with JPEG, so the order obtained in this way may reflect the JPEGquantization tables more than anything else.

This optimized ordering was then compared to the other orderings. Figure 10 shows all4 subband scanning orders that were tested. Table 7 shows the effect that using each ofthe subband scan orders has on image quality. None of the subband scan orders has anyparticular advantage or disadvantage over the others, except for the Lena image, where theInward Diagonal order does significantly worse than any of the others.

3.7 Inter-plane Prediction and Context Trimming

The contexts used in CBACD each maintain a frequency count of the 0s and 1s previouslyencoded using that context. These frequency counts are what allow each context to respondto any skew in the bits being compressed, which in turn allows the arithmetic coder to func-

21

1 2 6 7 15 16 28 29

3 5 8 14 17 27 30 43

4 9 13 18 26 31 42 44

10 12 19 25 32 41 45 54

11 20 24 33 40 46 53 55

21 23 34 39 47 52 56 61

22 35 38 48 51 57 60 62

36 37 49 50 58 59 63 64

(a) Zig-zag scan order

1 2 4 7 11 16 22 29

3 5 8 12 17 23 30 37

6 9 13 18 24 31 38 44

10 14 19 25 32 39 45 50

15 20 26 33 40 46 51 55

21 27 34 41 47 52 56 59

28 35 42 48 53 57 60 62

36 43 49 54 58 61 63 64

(b) Straight diagonal scanorder

1 2 4 7 11 16 22 29

3 6 9 13 18 24 31 37

5 10 15 20 26 33 39 44

8 14 21 28 35 41 46 50

12 19 27 36 43 48 52 55

17 25 34 42 49 54 57 59

23 32 40 47 53 58 61 62

30 38 45 51 56 60 63 64

(c) Inward diagonal scanorder

1 2 4 7 13 16 23 30

3 5 8 10 17 21 27 37

6 9 11 15 20 25 32 42

12 14 18 22 28 33 38 48

19 24 26 31 35 41 46 55

29 34 36 40 44 49 52 60

39 43 45 47 51 56 59 62

50 53 54 57 58 61 63 64

(d) Optimal order basedon decreasing significancecounts

Figure 10: Four subband scanning orders. The first two are standard scan orders, while thelast two were developed for the CBACD coder.

Subband Scan OrderOptimized Straight Diagonal Zig-Zag Inward Diagonal

Barbara (0.0625) 22.206 +0.000 +0.011 +0.004Barbara (0.125) 24.476 +0.001 +0.003 +0.001Barbara (0.25) 27.472 -0.001 +0.005 -0.001Barbara (0.5) 31.483 -0.003 -0.016 -0.004Barbara (1.0) 36.376 -0.012 -0.008 -0.012Boat (0.25) 28.627 -0.001 +0.020 +0.004Lena (0.25) 31.329 -0.001 +0.008 -0.026Mandrill (0.25) 22.845 +0.004 -0.021 +0.002Pentagon (0.25) 28.260 -0.001 -0.007 -0.010Stream (0.25) 24.535 -0.000 -0.007 +0.002Average 27.761 -0.001 -0.001 -0.004Average (for 0.25 bpp) 27.178 0.000 0.000 -0.005

Table 7: Effect on image quality of subband scan order. The content of the first column isthe PSNR achieved by the final CBACD coder. The numbers in the other columns are thePSNR improvement (or decrease) relative to this value when the other subband scan ordersare used.

22

tion. Since CBACD encodes information on a bit plane by bit plane basis, the question ofhow to preserve frequency counts across bit planes arises. At one extreme, all frequencyinformation can be preserved across bit planes. At the other extreme, no frequency infor-mation can be preserved across bit planes, resetting each context to an initial frequency ofa single 0 bit and a single 1 bit. Not surprisingly, the best answer lies somewhere betweenthese extremes, by throwing out most of the frequency information, but keeping enough toseed the context for the next bit plane. This is done by simply multiplying the frequenciesby a constant 0 < α ≤ 1. The frequency of each symbol p for the following bit plane is setto be fi+1(p) = dαfi(p)e. By trimming contexts, the frequency information for the followingbit plane is seeded with enough information that coding the first few bits is not needlesslyexpensive, but also allows the frequency information to quickly converge to a new skew if itis different than the skew in the previous bit plane. In the PACW coder from which CBACDevolved, a value of .15 for α was found to be the best. Limited testing confirmed that .15was also at or very near optimal in the CBACD coder, with a typical improvement of .1-.2dB over not trimming contexts.

The CBACD coder further attempted to improve on this process by only trimming con-texts down to a minimum frequency count. This was supposed to ensure that a minimalamount of frequency information was retained across bit planes. However, experimentationrevealed that any minimal frequency requirement actually adversely affected the coder.

4 Coefficient Extrapolation

A necessary part of the decoding process is coefficient extrapolation. Once the decoder hasreceived the first few bit planes of every coefficient, it needs to make a guess at the remainingbits of each coefficient – the bits that could not be encoded because the transmission rate limitwas reached. Strictly speaking, this is not a necessary part of the process; the decoder couldsimply assume that all unsent bits have a value of 0. This is actually a horrible choice, whatwe want the decoder to do is choose a value that minimizes the average difference betweenthe chosen value and the coefficient distribution over all possible values the coefficient couldhave (i.e. the distribution of all coefficients that have the same first few bits as the coefficientin question). The most natural extension of a coefficient from this description is to place itat the midpoint of the range the coefficient could possibly fall in. This would be the optimalchoice if the distribution of DCT coefficients was uniform. However, the DCT actually makesmost coefficients close to 0, so the distribution that the minimization should be done againstis a distribution skewed towards 0. In order to find this minimizing value, the decoder needsto know something about the coefficient distribution.

There are essentially two ways that the decoder can model the DCT coefficient distribu-tion

• The encoder sends some small number of parameters that describe the coefficient dis-tribution, and the decoder then recomputes the distribution from these parameters.

• The decoder assumes that all images have a similar coefficient distribution, and knowl-

23

1

10

100

1000

10000

100000

1000000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Coefficient magnitude

Nu

mb

er

of c

oef

ficie

nts

.

(a) Barbara

1

10

100

1000

10000

100000

1000000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9


Nu

mb

er o

f co

eff

icie

nts

(b) Lena

1

10

100

1000

10000

100000

1000000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9


Nu

mb

er o

f co

eff

icie

nts

(c) Pentagon

1

10

100

1000

10000

100000

1000000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9


Nu

mb

er o

f co

eff

icie

nts

(d) Stream

Figure 11: Distribution of normalized DCT coefficients within four of the test images. Notethat the coefficient counts are on a log scale — the distribution is weighted towards coeffi-cients near 0 much more than these figures indicate.

edge of this distribution is hard-wired into the decoder.

The downside of the first method is twofold. First, the distribution needs to be able tobe modeled with a small number of parameters. Secondly, these model parameters must besent to the decoder, which takes up valuable bits in the compressed stream. The downsideof the second method is that it is not adaptable to image specific coefficient distributioncharacteristics. By plotting histograms of the distribution of DCT coefficients from severalimages, as is done in Figure 11, we can see that the distribution of DCT coefficients is verysimilar between most images. Since this is the case, the static model works quite well, andis what is used in CBACD.

The model used in CBACD was gathered by processing the images that make up theCBIRT Ground Truth [5] database. For every possible pairing of subband, bit plane atwhich the coefficient become significant, and last bit-plane received, the value was foundthat minimizes the average distance against the distribution consisting of all coefficients fromthe database images (that meet the three restrictions). The values found in this mannerare then built into the decoder and used as the extrapolation bits. This method insuresthat at decoding time, no calculations are performed in order to determine the values ofthe extrapolated bits. The stored values are simply merged into the decoded coefficients.

24

(This can be done with a simple bit-mask implementation, although the current CBACDimplementation actually does it with a much more expensive multiply by 2, add 0 or 1, andrepeat process.)

5 Results

The final CBACD coder was compared to implementations of the JPEG and JPEG2000standards and to the SPIHT [6] coder, a wavelet-based research coder. The JPEG andJPEG2000 implementations used were part of the Java Advanced Imaging API [4]. Thecomparison to JPEG was made in order to see how far a simple DCT based coder canimprove over the established DCT technique, while the JPEG2000 and SPIHT comparisonswere made in order to compare the CBACD implementation to state-of-the-art techniques.

Encoding was run at a variety of bit rates, from a quite low 0.0625 bits per pixel rateup to a rate of 1.0 bits per pixel. The PSNR for all 4 algorithms are shown in Table 8.Additionally, the results are plotted in Figure 12. The CBACD coder is significantly betterthan the JPEG standard, both in terms of image quality and the ability to compress atlow bit rates. However, it falls short of modern wavelet based coders such as SPIHT andJPEG-2000, especially for images with few high-frequency components. For images featuringhigh-frequency intensity changes, such as Barbara, Mandrill, and Stream, the CBACD codercomes quite close to these wavelet coders.

6 Conclusion

The CBACD coder provides demonstrates that a well implemented DCT based image codercan achieve performance that is near that of modern wavelet based coders. Additionally, themachinery necessary to achieve such performance can be quite simple. Indeed, the CBACDcoder is significantly simpler than many comparable wavelet-based coders, and certainlymuch less complicated than even the minimal feature set required by JPEG2000.

6.1 Future Work

One major area for future change in the CBACD coder is the transform used. Althoughthe DCT has many desirable features, it also has a serious shortcoming in that adjacentblocks are transformed individually. No correlation across block boundaries is preserved.Hong and Ladner [3] show that for a coder based on group testing, the DCT can be replacedwith various lapped transforms that inherently consider inter-block correlation, and thatthis results in PSNR improvements of up to 1 dB. In a similar technique, Tu and Tran[8] use a preprocessing step to decorrelate neighboring DCT blocks in the encoder, and apostprocessing step to recorrelate neighboring blocks in the decoder, to achieve similar PSNRimprovement. Presumably, either of these methods could be successfully incorporated intoCBACD with similar results.

25

22

24

26

28

30

32

34

36

0 0.2 0.4 0.6 0.8 1

Bit Rate (bpp)

PS

NR

(d

B)

CBACD

JPEG

JPEG-2000

SPIHT

(a) Barbara

22

24

26

28

30

32

34

36

0 0.2 0.4 0.6 0.8 1

Bit Rate (bpp)

PS

NR

(d

B)

CBACD

JPEG

JPEG-2000

SPIHT

(b) Boat

25

27

29

31

33

35

37

0 0.2 0.4 0.6 0.8 1

Bit Rate (bpp)

PS

NR

(d

B)

CBACD

JPEG

JPEG-2000

SPIHT

(c) Lena

20

22

24

26

28

30

0 0.2 0.4 0.6 0.8 1

Bit Rate (bpp)

PS

NR

(d

B)

CBACD

JPEG

JPEG-2000

SPIHT

(d) Mandrill

24

26

28

30

32

34

0 0.2 0.4 0.6 0.8 1

Bit Rate (bpp)

PS

NR

(d

B)

CBACD

JPEG

JPEG-2000

SPIHT

(e) Pentagon

21

23

25

27

29

31

0 0.2 0.4 0.6 0.8 1

Bit Rate (bpp)

PS

NR

(d

B)

CBACD

JPEG

JPEG-2000

SPIHT

(f) Stream

Figure 12: Plots of CBACD performance verses JPEG, JPEG-2000, and SPIHT

26

7 Acknowledgements

I’d like to thank Richard Ladner for teaching the compression class out from which thisresearch evolved and for pushing me to expand upon the work done in the class and writeit up in this paper. I’d also like to thank myself for paying a whole truckload of money toUW for the opportunity to do this research.

References

[1] Dane K. Barney. Priority-Based Arithmetic Coding for Wavelets. Senior thesis, Depart-ment of Computer Science and Engineering, University of Washington, 2005.

[2] Edwin S. Hong and Richard E. Ladner. Group testing for image compression. IEEETransactions on Image Processing, 11(8):901–911, August 2002.

[3] E.S. Hong, R. E. Ladner, and E. A. Riskin. Group testing for image compression usingalternate transforms. Signal Processing: Image Communication, 18:561–574, 2003.

[4] Sun Developer Network. Java Advanced Imaging (JAI) API.http://java.sun.com/products/java-media/jai/.

[5] Department of Computer Science and Universityof Washington Engineering. Ground Truth Database.http://www.cs.washington.edu/research/imagedatabase/groundtruth/.

[6] Amir Said and William A. Pearlman. A new fast and efficient image codec based onset partitioning in hierarchical trees. IEEE Transactions on Circuits and Systems forVideo Technology, 6:243–250, 6 1996.

[7] Jerome M. Shapiro. Embedded image coding using zerotrees of wavelets coefficients.IEEE Transactions on Signal Processing, 41(12):3445–3462, December 1993.

[8] Chengjie Tu and Trac D. Tran. Context-based entropy coding of block transform coef-ficients for image compression. IEEE Transactions on Image Processing, 11(11):1271–1283.

[9] Zhou Wang and Alan Bovik. A universal image quality index. IEEE Signal ProcessingLetters, 9(3):81–84, 2004.

[10] Zhou Wang, Alan Bovik, H. Sheikh, and E. Simoncelli. Image quality assessment:From error visibility to structural similarity. IEEE Transactions on Image Processing,13(4):600–612, 2004.

27

BarbaraCBACD JPEG JPEG-2000 SPIHT

Bit rate (bpp)

0.0625 22.22 N/A 23.36 23.350.125 24.48 N/A 25.35 24.840.25 27.48 25.31 28.34 27.570.5 31.47 28.44 32.09 31.371.0 36.37 33.11 36.82 36.35

BoatCBACD JPEG JPEG-2000 SPIHT

Bit rate (bpp)

0.0625 23.30 N/A 24.68 24.950.125 25.80 N/A 26.96 27.090.25 28.65 27.38 29.72 29.720.5 32.04 30.68 32.09 32.901.0 35.82 34.07 36.45 36.37

LenaCBACD JPEG JPEG-2000 SPIHT

Bit rate (bpp)

0.0625 25.81 N/A 27.61 27.710.125 28.60 N/A 30.01 30.050.25 31.34 29.88 32.16 32.270.5 33.68 32.71 34.19 34.301.0 36.16 34.61 36.65 36.75

MandrillCBACD JPEG JPEG-2000 SPIHT

Bit rate (bpp)

0.0625 20.22 N/A 20.64 20.740.125 21.31 N/A 21.62 21.720.25 22.82 N/A 23.17 23.270.5 25.15 23.58 25.52 25.651.0 28.87 26.40 29.00 29.17

PentagonCBACD JPEG JPEG-2000 SPIHT

Bit rate (bpp)

0.0625 24.90 N/A 25.69 25.720.125 26.39 N/A 27.07 27.160.25 28.25 26.71 28.94 28.990.5 30.47 29.28 31.24 31.311.0 33.48 31.75 34.14 34.26

StreamCBACD JPEG JPEG-2000 SPIHT

Bit rate (bpp)

0.0625 21.23 N/A 22.05 22.130.125 22.82 N/A 23.30 23.330.25 24.53 23.37 24.89 24.940.5 26.86 25.67 27.21 27.241.0 30.26 28.57 30.54 30.68

Table 8: Comparison of CBACD, SPIHT, JPEG, and JPEG-2000 coders, measured by PSNR(dB). JPEG values are not available for low bit rates, as the JPEG coder will not work atsuch low bit rates.

28

Context-Based Arithmetic Coding for the DCT: Achieving ... · Before moving on to discussing the...

Documents

Transcript of Context-Based Arithmetic Coding for the DCT: Achieving ... · Before moving on to discussing the...