Comparison Di eren t Metho ds of Classi cation...

Comparison of Di�erent Methods of Classi�cation inSubband Coding of ImagesR. L. Joshi, H. Jafarkhani, J. H. Kasner,T. R. Fischer, N. Farvardin, M. W. Marcellin,and R. H. BambergerNovember 11, 19961R. L. Joshi was with the School of Electrical Engineering and Computer Science, Washington State Univer-sity and is currently with Eastman Kodak Company. T. R. Fischer and R. H. Bamberger are with the Schoolof Electrical Engineering and Computer Science, Washington State University. Their work was supported bythe National Foundation Grants MIP-9116683 and NCR-9303868. H. Jafarkhani and N. Farvardin are withthe Department of Electrical Engineering and the Institute for Systems Research, University of Maryland atCollege Park. Their work was supported in part by National Science Foundation grants NSFD MIP-91-09109and NSFD CD-88-03012. J. H. Kasner and M. W. Marcellin are with the Department of Electrical and Com-puter Engineering, University of Arizona. Their work was supported by Intel Corp. and the National ScienceFoundation Grant 9258374. 1

AbstractThis paper investigates various classi�cation techniques, applied to subband coding ofimages, as a way of exploiting the non-stationary nature of image subbands. The advan-tages of subband classi�cation are characterized in a rate-distortion framework in termsof \classi�cation gain" and overall \subband classi�cation gain." Two algorithms, max-imum classi�cation gain and equal mean-normalized standard deviation classi�cation,which allow unequal number of blocks in each class are presented. The dependence be-tween the classi�cation maps from di�erent subbands is exploited either directly whileencoding the classi�cation maps or indirectly by constraining the classi�cation maps.The trade-o� between the classi�cation gain and the amount of side information is ex-plored. Coding results for a subband image coder based on classi�cation are presented.The simulation results demonstrate the value of classi�cation in subband coding.1 PrefaceThis paper is the result of a collaboration initiated at the 1994 International Conference onImage Processing, where authors of the three papers [13], [10], and [16] recognized that theyhad independently developed very similar image coding methods and achieved similar perfor-mance. The main ingredients of these image coding algorithms are subband decompositionto remove linear dependence in the image data, classi�cation and optimum rate allocationto e�ectively model and allocate encoding rate to di�erent subsets of the subband data, andentropy-constrained trellis-coded quantization (ECTCQ) [6], [19] to encode the subbands.The particulars of the algorithms in [13], [10], and [16] vary in the choice of subband �lters,the method of implementing the ECTCQ, and the approach to classi�cation of subband data.The goal of this collaboration is to provide a thorough examination of the image codingprinciples underlying [13], [10], and [16], compare the approaches in a common framework,and provide summary conclusions about the relative gains that can be achieved through theuse of classi�cation in subband image coding. 2

2 IntroductionImage coding based on subband decomposition or discrete wavelet transform (DWT) ideas hasreceived much attention in recent years [36], [17], [1]. In addition to giving good compressionresults (in a rate-distortion sense), these systems are suitable for progressive transmission andprovide a multi-resolution capability, a feature that is desirable in some practical situations.The main issues in subband coding are design of the analysis and synthesis �lters, optimumrate allocation and quantization of subbands. In this paper, we will concentrate on thequantization aspect.It has been shown in [28] that after two levels of decomposition of an image (16-banduniform), the statistical properties of the lowest frequency subband (LFS) are similar tothose of the original image and therefore, well-established techniques for image compressionare suitable for encoding the LFS. Also, all other subbands, hereafter referred to as highfrequency subbands (HFS's), have small intra-band correlation coe�cients. Thus, subbandcoding systems which use some decorrelating technique, such as transform or prediction,for the LFS and treat the HFS's as memoryless sources can achieve reasonable performanceas demonstrated in [28], [26], and [14]. However, a simple inspection of the 16 subbandsreveals that the bulk of the energy in the HFS's is concentrated in the vicinity of areas whichcorrespond to edge activity in the original image. One of the open problems in image codingis to take advantage of this non-stationary nature of image subbands. Previous attemptsto exploit this non-stationarity include spatially adapting the subband �lters, or spatiallyadapting the quantizer, to track the local characteristics of the imagery. Examples of the �rstclass include the work on spatially-varying �lter banks [5], [2] and the time-varying waveletpacket approaches [9].Chen and Smith [4] propose use of spatially adaptive quantization in discrete cosine trans-form (DCT) coding of images. Their approach is to classify the DCT blocks according totheir ac energy and adapt the quantizer to the class being encoded. Woods and O'Neil [36]use a similar classi�cation method in subband coding. Tse [30] characterizes the bene�ts3

of classi�cation in terms of coding gain, which we shall denote as the \classi�cation gain."Tse also proposes maximizing the classi�cation gain by having di�erent number of blocksin each class. Naveen and Woods [20] also investigate the problem of classi�cation in sub-band coding, but their approach is to avoid sending any side information and to rely entirelyon the inter-subband dependence for classi�cation purposes. Shapiro [24] proposes predict-ing the signi�cant information across subbands using zerotrees. Taubman and Zakhor [29]use conditional arithmetic coding of insigni�cant information to decrease the amount of sideinformation.This paper investigates classi�cation as applied to subband coding of images. In Section 3,a justi�cation for classi�cation is provided from a rate-distortion point of view. The bene�tsof classi�cation of subbands are characterized in terms of the \classi�cation gain" and theoverall \subband classi�cation gain." In Section 4, it is argued that it is not optimal tohave equally populated classes. Two di�erent approaches which allow unequal number ofblocks in each class are presented. In the �rst approach, each subband is optimally classi�edby maximizing the coding gain for a �xed number of classes. In the second approach, blocksfrom a subband are classi�ed into prescribed number of classes such that the mean-normalizedstandard deviation of ac-energies in the resulting classes is the same.It is observed that there is some dependence between energies of blocks in di�erent sub-bands, due to the fact that high energy blocks in HFS's are usually concentrated around theedge regions in the original image. Section 5 describes attempts to exploit this dependence,either directly while encoding the classi�cation maps or indirectly by constraining the classi�-cation maps. The trade-o� between the classi�cation gain and the amount of side informationis explored. Section 6 describes an arithmetic-coded trellis-coded quantization (ACTCQ)system which is subsequently used to encode each class. In Section 7, coding results for a sub-band image coder based on classi�cation are presented. Comparisons with other approachesproposed in the literature are also presented.For simulation purposes, the following two subband decompositions are used :4

1. 16-band uniform, with 4 � 4 DCT of the LFS.2. 22-band decomposition.The frequency partitions induced by the two decompositions are shown in Figures 1 and 2respectively. The 22-band decomposition can be thought of as a 16-band uniform decomposi-tion with a further 7-band octave split of the LFS. The decompositions were generated using a2-D separable tree-structured �lter bank. In all the simulations, the 32-tap quadrature mirror�lter (QMF) of [12] or the 9-7 tap biorthogonal spline �lters of [1] were used.3 Rate-distortion formulation for classi�cation gain3.1 Classi�cation gain for a single non-stationary sourceConsider NL samples of a zero mean non-stationary source X having sample variance �2x. Letthe source be divided into N blocks of L consecutive samples. Let each block be assignedto one of J classes. Group samples from all the blocks assigned to class i, 1 � i � J , intosource Xi. Let the total number of blocks assigned to source Xi be Ni. Let �2i be the samplevariance of source Xi and pi be the probability that any given sample belongs to source Xi.Then pi = Ni=N; 1 � i � J .Assume that encoding source Xi at a rate Ri provides a mean-squared error distortion ofthe form [11] Di(Ri) = �2i�2i 2�2Ri: (1)The distortion-rate performance for the encoding of source X is assumed to have a similargeneral form. The factor �i depends on the density of the source as well as the type of encodingused (for example, whether or not entropy coding is being used with the quantization), but,in this simple analysis, is assumed to be independent of the encoding rate.Let the side rate for encoding the classi�cation information be Rs. If RT bits per sampleare available to encode source X, then only R bits per sample can be used for encoding sources5

Xi, 1 � i � J , where R = RT �Rs.The rate allocation problem for classi�cation can be de�ned asmin JXi=1 piDi(Ri); (2)subject to the constraint JXi=1 piRi = R: (3)Using Lagrangian multiplier techniques and assuming Ri > 0 for 1 � i � J , the solution ofthe above problem is Ri = R + 0:5 log2 �2i�2iQJj=1(�2j�2j )pj ; (4)and Dopt(R) = JYj=1(�2j�2j )pj2�2R: (5)Thus the classi�cation gain for a non-stationary source isGc = �2x�2x2�2RTQJj=1(�2j�2j )pj2�2R : (6)3.2 Subband classi�cation gainEquation (6) quanti�es the classi�cation gain that can obtained by classifying a single subband.Now, suppose that each subband has been classi�ed. In this subsection, we examine theoverall coding gain, referred to as the `subband classi�cation gain', due to classi�cation ofeach subband and optimal rate allocation between classes from all the subbands. Let thetotal number of subbands be M and let each subband be divided into J classes. Let fi,referred to as the decimation factor, be the ratio of the number of samples in subband i tothe number of samples in the original image. The notation is as before with the additionthat a subscript (ij) refers to the jth class from the ith subband. Assuming the analysis andsynthesis �lters satisfy the paraunitary condition [32] (only approximately satis�ed by the32-tap QMF's), we have �2x = MXi=1 JXj=1 pij�2ij; (7)6

and MXi=1 JXj=1 pijfiRij = R: (8)Then the optimum rate allocation problem can be stated asmin MXi=1 JXj=1 pijDij; (9)subject to the constraint MXi=1 JXj=1 pijfiRij = R: (10)Assuming all rates are positive, the solution to the above problem is given byRij = R + 0:5 log2 �2ij�2ijfiA ; (11)and Dopt(R) = 2�2RA; (12)where A = MYi=1 ( 1fi )fi ( JYj=1 (�2ij�2ij)pijfi): (13)Thus, the overall subband classi�cation gain isGsc = �2x�2xA 2�2Rs: (14)The subband classi�cation gain is thus increased by a reduction in the geometric mean, A,but penalized by an increase in the rate of side information, Rs.If biorthogonal �lters are being used, the distortion from each subband has to be weighteddi�erently. Thus in that case, the quantity to be minimized isMXi=1 JXj=1 wipijDij ; (15)where the weights can be calculated as speci�ed in [37]. It should be noted that the classi�ca-tion gain in Equation (14) is implicitly based on the assumption of large encoding rate. If theencoding rate is small, then the term �2i in Equation (1) should be modi�ed to the form �2i (Ri)and the optimum rate allocation should be re-derived as in [3]. However, our observation hasbeen that, regardless of the rate, a higher value of Gc corresponds to a higher classi�cationgain in practice. 7

4 Non-uniform classi�cationChen and Smith [4] divided the DCT blocks into equally populated classes. It is easy toidentify situations in which the simple approach of Chen and Smith does not work well forDCT blocks or for a subband. For example, consider a subband where 80% of the blocks havelow energy and 20% of the blocks have high energy. If the blocks are divided into two equallypopulated classes, then one of the classes will contain a mixture of high and low energy blocksin almost equal proportion. This is clearly not desirable. Classifying the subband into 5equally-populated classes will solve this problem (all the high activity blocks will be assignedto one class and the low activity blocks will be equally divided among the other four classes).But this is achieved at the cost of increase in side information (log2 5 bits/block instead of 1bit/block). Also, from a rate-distortion point of view, the choice of equally-populated classesdoes not necessarily maximize the classi�cation gain.The capability of allowing di�erent number of blocks in each class can potentially improveclassi�cation. The problem is that of determining the number of blocks in each class. Wepropose two di�erent approaches to solving this problem. The �rst one is based on theconcept of maximizing the classi�cation gain for each subband, whereas the second is basedon classifying blocks such that the resulting classes have similar statistical properties, so that,the representation of the blocks as one class is meaningful from a coding point of view.At this point, we make the following observation: For equally populated classes, assumingnonzero rates for every class, the �rst-order entropy of the side information is maximum.Thus, assuming entropy coding of side information, an additional bene�t of maximizing theclassi�cation gain is that the side information is reduced.4.1 Maximum classi�cation gainThe idea behind this classi�cation method is as follows. For each subband, we �nd theclassi�cation scheme which maximizes the classi�cation gain Gc in (6). The classi�cation map8

is sent as side information. We make some simplifying assumptions to make the problem ofmaximizing the coding gain tractable. They are:� Sources X and Xi, 1 � i � J are zero mean, so that their sample variances are equal tothe mean squared energies.� �2x = �2i , 1 � i � J .� For a �xed number of classes, the side information remains constant.Under these assumptions, the objective is to classify the blocks into one of J classes such that�2xQJj=1(�2j )pjis maximized. Let the average mean squared energy of a block, E = (PLi=1 x2i )=L, be thecriterion for classi�cation. Arrange the blocks in increasing order of energy and let the �rstN1 blocks be assigned to class 1, the next N2 blocks to class 2 and so on. Then the problemof maximizing classi�cation gain can be stated as follows:minfN1;N2;:::;NJg JYj=1(�2j )pj ; (16)subject to JXi=1Ni = N; Ni > 0; 1 � i � J: (17)The main idea of the algorithm is pairwise maximization of the coding gain. Consider theblocks belonging to two adjacent classes. Divide these blocks into two new classes in such amanner that the coding gain is maximized. Repeat this process until convergence is achieved.Algorithm:1. Initialize N1; N2; : : : ; NJ to satisfy PJi=1Ni = N;Ni > 0 for 1 � i � J . Let j = 1 andNprev = [N1; N2; : : : ; NJ ]T .2. Find N 0j and N 0j+1 such that N 0j +N 0j+1 = Nj +Nj+1 and (�2j )p0j(�2j+1)p0j+1 is minimized.This is accomplished by using golden section search [21].9

3. Nj = N 0j and Nj+1 = N 0j+1.4. j = j + 1. If j < J , go to step 2.5. Let N = [N1; N2; : : : ; NJ ]T . If N is equal to Nprev, then STOP. Otherwise j = 1.Nprev = N. Go to step 2.Proof of convergence : Consider step 2 of the algorithm. We have that(�2j )p0j(�2j+1)p0j+1 � (�2j )pj(�2j+1)pj+1: (18)This implies that (�21)p1 : : : (�2j )p0j(�2j+1)p0j+1 : : : (�2J)pJ � JYi=1(�2i )pi: (19)Thus the coding gain for N = [N1; : : : ; N 0j; N 0j+1; : : : ; NJ ]Tis greater than or equal to that forN = [N1; : : : ; Nj; Nj+1; : : : ; NJ ]T :Furthermore there are only CJ�1N�1 combinations of N which satisfy Equation (17). Hence, thealgorithm is guaranteed to converge at least to a local maximum for the classi�cation gain.In practice, a di�erent stopping criterion such as the relative increase in the coding gain canbe used.4.2 Equal mean-normalized standard deviation (EMNSD) classi�-cationIn this method, the criterion of classi�cation is the average energy of a block, the square rootof which is referred to as the block gain. The blocks are arranged in increasing order of gain.Then, the �rst N1 blocks are assigned to class 1, the next N2 blocks to class 2 and so on. Themethod �nds N1; N2; : : : such that the mean-normalized standard deviation of the gains in the10

resulting classes is equal. The idea behind this approach is to allow the possibility of having adi�erent number of blocks in each class and to have similar statistical properties within eachclass. For a stationary source, standard deviation is a measure of dispersion of samples andthe smaller is the standard deviation, the denser will be the samples about the mean. Whenone of the classes has a higher dispersion than others, the blocks in that particular class donot have the same level of activity. It is di�cult to compare dispersions in sets with di�erentmeans. The `coe�cient of variation' (de�ned by standard deviation divided by mean) is agood measure for dispersion [27].To describe the algorithm, consider the case with J = 2 and N blocks organized in anincreasing order of their gain values gi; i = 1; 2; : : : ; N . We seek an integer N1 such thatblocks indexed 1 to N1 belong to the �rst class and the remaining blocks belong to the secondclass. The mean mi and standard deviation �i of class i, i = 1; 2, are de�ned bym1 = 1N1 PN1n=1 gn;m2 = 1N�N1 PNn=N1+1 gn;�21 = 1N1 PN1n=1(gn �m1)2;�22 = 1N�N1 PNn=N1+1(gn �m2)2: (20)Here, N1 is chosen such that q1 = q2; (21)where qi = �imi : (22)An iterative algorithm to �nd N1 satisfying (21) is provided below. If there is no integer N1which solves (21), the algorithm �nds the N1 which minimizes j q1 � q2 j.Algorithm:1. Choose an initial value for N1 (e.g., N1 = N=2) and set the iteration number i = 0. Alsochoose imax as an upper limit on the number of iterations.2. Compute q1 and q2 using (22) and set i = i+ 1.11

3. If jq1�q2jq1 < � or i > imax, stop. Otherwise,if q1 < q2, set N1 = N1 +�N1if q1 > q2, set N1 = N1 ��N1and go to (2).For fast convergence, a large �N1 can be chosen at the beginning of the iteration and as theiteration number increases �N1 must be gradually decreased to one . For the case of J > 2,J ratios qi = �imi ; i = 1; 2; : : : ; J and (J � 1) thresholds1 are needed. The algorithm is stoppedwhen maxi qi�mini qimini qi < � or when the number of iterations exceeds imax. At each step of thealgorithm, the thresholds corresponding to the class with the maximum and minimum qi arealternatively adjusted so as to make the qi's as close to one another as possible.5 The trade-o� between the side rate and classi�cationgainIn a classi�cation-based subband image coder, the classi�cation map for each subband hasto be sent as side information. Thus, in this method, the side rate can be as high as 20%of the overall encoding rate at low bit rates such as r � 0:25 bits/pixel. Figure 3 plots themaximum classi�cation gain maps of subbands for a 16-band octave split of the luminancecomponent of the 512 � 512 `Lenna' image using 32-tap QMF's for decomposition (J = 4).Figure 4 shows the corresponding maps for EMNSD classi�cation using 2 classes for eachsubband and 9-7 tap biorthogonal spline �lters. These images are constructed by assigning adi�erent gray level to each class in order to visualize the classi�cation table. In Figure 3, forJ = 4, class 0 (low activity class) is represented by a gray value equal to 0. Classes 1, 2, and3 are represented by gray levels 85, 170, and 255, respectively. In Figure 4, for J = 2, thegray levels are 0 and 255 for the low and high activity classes, respectively. From the �gures itcan be seen that the classi�cation indices of blocks from di�erent subbands, which correspond1Finding N1; N2; : : : ; NJ is equivalent to �nding (J � 1) thresholds.12

to the same spatial location in the original image, are dependent. This dependence can beexploited to reduce the side rate. Another approach is to constrain the classi�cation map insome manner. This has the e�ect of reducing the side rate and reducing the complexity ofthe classi�cation procedure, albeit at the cost of some decrease in classi�cation gain. It isimportant to investigate whether the reduction in the side rate is enough to compensate forthe decrease in classi�cation gain. We describe three di�erent methods of constraining theclassi�cation maps along with a method which exploits the dependence between classi�cationmaps directly.5.1 A single classi�cation map for subbands having the same fre-quency orientationThis method is a modi�ed version of the method proposed in [16]. Consider the 22-banddecomposition (Figure 2). Suppose that the block size for classi�cation in HFS's is adjustedaccording to the decimation factor, so that each block corresponds to a 16 � 16 block in theoriginal image. For example, in Figure 2 subbands 1-3 have a block size of 1 � 1, subbands4-6 have a block size of 2 � 2 and the rest of the subbands have a block size of 4 � 4.Now group together all the HFS's having horizontal frequency orientation, i.e., subbands 2,5, 8, 14, 15, 16, and 17. A single classi�cation map is sent for these subbands, so thatthe blocks corresponding to the same spatial location have the same classi�cation index.Similar strategies are adopted for subbands having vertical and diagonal orientations. Theclassi�cation map for each orientation is determined as follows. A 4-band uniform split isperformed on the original image. The three high frequency subbands are classi�ed into Jequally likely classes. These classi�cation maps are used for the HFS's having appropriatefrequency orientation. 13

5.2 A single classi�cation map for all HFS'sFor the 16-band uniform decomposition (Figure 1), the two HFS's with highest variances areusually subbands 1 and 2. There is energy dependence between subband 1 and subbands 4,5, 6, and 7, all of which have large intra-band column correlations. Similarly, there is energydependence between subband 2 and subbands 8, 9, 10, and 11, all of which have large intra-band row correlations. Thus 4 � 4 blocks of subbands 1 and 2 are classi�ed (J = 2) and asingle classi�cation map which is the logical OR of the classi�cation tables for the above twosubbands, is transmitted to the receiver as side information. For J = 4, the highest classindex for each block, corresponding to the highest activity class in the classi�cation tables,is transmitted to the receiver. The single classi�cation map idea can be used for the 22-band decomposition by considering the decimation factor in subbands 1-6 as was described inSection 5.1.5.3 The VQ-classi�cation methodThis method is proposed in [15]. Let us assume that there are K subbands having the samefrequency orientation and index them from lowest to highest frequency by 1; 2; : : : ;K. Assumethat the block size for classi�cation in each subband is adjusted so that each block correspondsto a 16�16 block in the original image. Group all the blocks corresponding to the same spatiallocation and assign them to one of J classes. Then, due to the constraint on the classi�cationmap, we have p1j = p2j = : : : = pKj = pj; for 1 � j � J: (23)The main idea behind the VQ-classi�cation method is as follows. Consider a vector ofblocks which has been assigned to class p. This vector should be assigned to another class qonly if it results in a higher overall subband classi�cation gain. Let prime (0) denote quantitiesresulting from assigning the vector of blocks to class q. Then this decision should be taken14

only if KYi=1 ( 1fi )fi ( JYj=1 (�2ij)pjfi) > KYi=1 ( 1fi )fi ( JYj=1 (�02ij)p0jfi): (24)The algorithm for determining the classi�cation map is as follows.Algorithm:1. Classify subband 1 into J equally likely classes. Using this classi�cation map for initial-ization, calculate �2ij and pj for 1 � i � K and 1 � j � J .2. For each vector of blocks, determine the class to which the vector should be assignedin order to maximize the classi�cation gain. Update �2ij and pj for 1 � i � K and1 � j � J . Note that every time a vector of blocks is assigned to a class di�erent fromthe previous iteration, the overall subband classi�cation gain increases.3. Repeat this process till there is no change in the classi�cation map for a full iteration.It should be noted that a di�erent starting point or a di�erent ordering of vectors canresult in a di�erent convergence point for the algorithm. In our simulations, the algorithmtypically converged within 5 iterations.5.4 Exploiting the dependence between classi�cation indicesIn addition to the dependence between the classi�cation maps of di�erent subbands, there isalso some dependence between the classi�cation indices of spatially adjacent blocks from thesame subband. Any one or both of these dependencies can be exploited to reduce the sideinformation required for sending the classi�cation maps.Consider a subband coding system for images where each subband is classi�ed into Jclasses and the classi�cation map for each subband is sent as side information. Assume thatthe side information for a subband is being arithmetic coded [22]. In the absence of anydependencies, a single probability table, where each entry corresponds to the probability ofa class, is adequate. The intra-band and inter-band dependence of classi�cation maps can15

be exploited as follows. Multiple conditional probability tables are maintained, one for eachstate. The state depends on the classi�cation index of the previous block as well as the blockfrom the lower frequency subband corresponding to the same spatial location.For example, let C8(i; j) denote the classi�cation index for block (i; j) from subband 8 inFigure 2. Then the choice of probability table for encoding C8(i; j) is dependent on C8(i; j�1)as well as C5(i; j). The conditional probability tables have to be known at the decoder. Thus,if subbands 5 and 8 have 4 classes, then 16 tables have to be sent to the decoder. If allclasses have nonzero rates, the side information for sending all the probability tables canbe prohibitive. But if some classes have zero rates, all the classes having zero rates canbe combined into a single class. For example, suppose that subbands 5 and 8 have only 1class having nonzero rate, then we can modify the classi�cation map for each subband, sothat it contains 2 classes. In that case the number of probability tables is reduced to 4.Since higher frequency subbands typically tend to have many classes having zero rates, it ispossible to exploit both intra-band and interband dependencies. For low frequency bands, itis advantageous to exploit only one of the dependencies. Thus for the 22-band decomposition,for bands 10-21, both dependencies are exploited. For bands 1-9, only interband dependenceis exploited whereas for band 0 only intra-band dependence is exploited.Table 1 compares the side information for the above method with the side informationrequired for simple arithmetic coding of the classi�cation maps for the `Lenna' image withJ = 4. The block sizes are 2 � 2 for bands 0-6 and 4 � 4 for bands 7-21. The entries inthe probability tables are quantized to 5 bits and adaptive arithmetic coding [35] is used forencoding the classi�cation maps. It can be seen from the table that a side rate reduction of15 � 20% can be obtained using this method.16

Table 1: Comparison of classi�cation side information for `Lenna', J=4 (all rates in bits/pixel).overall rate side rate reduced side rate0:25 0:051 0:0400:50 0:068 0:0541:00 0:122 0:1036 Arithmetic-coded trellis-coded quantizationTrellis-coded quantization (TCQ) has been shown to be an e�ective technique with low tomoderate complexity for quantizing memoryless sources [18, 6]. The ACTCQ system withuniform thresholds, described in [14], achieves mean-square error (MSE) performance within0.5 dB of the rate-distortion bound, for generalized Gaussian (GG) sources. This system wasused as the basic quantization system for encoding samples from a subband belonging to aparticular class. We describe the ACTCQ system brie y.The block diagram of the ACTCQ system is shown in Figure 5. The TCQ encoder usesan N -state trellis de�ned by a rate-1=2 convolutional encoder. The reproduction alphabetis partitioned into four subsets (D0;D1;D2;D3) and the codewords lie on a scaled integerlattice (scale factor �). The zero codeword is labeled with D0 subset, and, from left to right,the codewords are labeled : : :D0;D1;D2;D3;D0;D1; : : :. For low bit rates, the reproductionalphabet is modi�ed by shifting all the positive codewords to the left by �. For a givensequence of data to be quantized, the TCQ encoder uses the Viterbi algorithm [7] to pick theallowed trellis path that minimizes the MSE between the input data and output codewords.It was observed in [19] that in any given trellis state, the next codeword must come fromone of the two union codebooks A0 = D0SD2 or A1 = D1SD3. The arithmetic coder is basedon this observation. Two probability tables are maintained, one for each union codebook. Thearithmetic encoder switches between the two probability tables depending on the current stateof the trellis.In practice, the reproduction alphabet cannot be in�nite. A straight truncation dependent17

on the probability density was found to be adequate for typical images. For synthetic data,the use of an escape mechanism as described in [16] can be useful. Throughout this work,Ungerboeck's [31] 8-state trellis was used.7 A subband coder based on classi�cationFor a given subband decomposition, the classi�cation map for each subband is determined us-ing one of the methods described in Section 4. For each subband, samples from blocks havingthe same classi�cation index are grouped together into one class. Each class is modeled as hav-ing a GG density, with the shape parameter chosen from the set f0:5; 0:6; 0:7; 0:8; 0:9; 1:0; 2:0g.For targeted rates of 1:0 bit/pixel or below, the high rate assumption of Section 3 for rateallocation is clearly not valid. Also after classi�cation, the classes have higher generalizedGaussian shape parameters compared to the original subband. Thus, the assumption that�2x = �2i , 1 � i � J , is also not valid. Hence, instead of using Equation (11), Shoham andGersho's optimum bit allocation algorithm [25] or one of its variants should be used. We usedWesterink's [34] algorithm to do optimal rate allocation among classes from all the subbands.The rate-distortion curves used for bit allocation are operational rate distortion curves forGG sources encoded using the ACTCQ system. After optimum rate allocation, each class isencoded using the ACTCQ system.The above system was used to encode the `goldhill' image and the luminance componentof the `Lenna' image from the USC database (both of size 512�512). Actual bit streams weregenerated and used to determine the encoding rate. Apart from the side rate for encodingthe classi�cation maps and the image size, for each class assigned nonzero rate, the variance,GG shape parameter and index of the step size being used, need to be sent to the decoder.This side rate can vary depending on the rate but was found to be never greater than 0:005bits/pixel in our simulations.The di�erent classi�cation methods discussed in Section 5 are summarized as follows.18

Table 2: PSNR Simulation Results for the 512 � 512 `Lenna'.Method 1 Method 2a Method 2bDesign Actual PSNR Actual PSNR Actual PSNRRate Rate (dB) Rate (dB) Rate (dB)0.25 0.261 33.97 0.253 33.75 0.250 33.840.50 0.498 37.30 0.504 37.20 0.503 37.221.00 1.012 41.14 1.007 40.72 0.998 40.71Method 3 Method 4a Method 4bDesign Actual PSNR Actual PSNR Actual PSNRRate Rate (dB) Rate (dB) Rate (dB)0.25 0.265 34.43 0.251 34.31 0.246 34.290.50 0.504 37.69 0.491 37.63 0.510 37.881.00 1.043 41.47 1.024 41.30 1.046 41.42� Method 1: A single classi�cation map for subbands having the same frequency orienta-tion. 22-band decomposition, no classi�cation for subband 0, and J = 4. Block sizes forother subbands are as discussed in Section 5.1.� Method 2: 16-band uniform decomposition with 4 � 4 DCT of the LFS. 4 classes inthe LFS and 2 classes in the HFS's. Block size 4 � 4 in all the subbands. Only oneclassi�cation map sent for all the HFS's as discussed in Section 5.2:(a) Maximum classi�cation gain.(b) EMNSD classi�cation.� Method 3: VQ-classi�cation method. 22-band decomposition. 2 � 2 blocks in band 0and block sizes as discussed in Section 5.3 for the other subbands.� Method 4: Using the approach in Section 5.4 to reduce the side information with J = 4in each subband. 22-band decomposition. Block size of 2 � 2 in subbands 0-6, 4� 4 inthe remaining subbands:(a) Maximum classi�cation gain. 19

(b) EMNSD classi�cation.7.1 Simulation resultsFigure 6 and Table 2 compare the performance, in peak signal-to-noise ratio (PSNR), ofdi�erent methods of classi�cation. All the coding results are for the `Lenna' image and 32-tapQMF's.Methods 4a and 4b provide almost the same performance which is better than the perfor-mance of the other methods. Method 2b performs marginally better than Method 2a at lowrates. Method 1 can be thought of as a special case of Method 3. This is re ected in the codingresults, as Method 3 performs better than Method 1 by about 0.3 dB at all rates. Method 4performs marginally better than Method 3 at low rates whereas the roles are reversed at highrates. This can be explained as follows. At low bit rates, all the classes in some of the higherfrequency subbands are allocated zero rates. For example, suppose all classes in subband 21are allocated zero rates. Then subband 21 should not have any in uence in deciding the clas-si�cation map for the subbands with diagonal frequency orientation. The VQ-classi�cationalgorithm does not take this into account. Method 2 performs worse than Method 4 by about0.4 dB. This is expected since Method 2 uses only two classes in the HFS's.Figure 7 compares the performance of the maximum classi�cation gain and EMNSD clas-si�cation algorithms for the case of J = 1; 2, and 4 using the approach described in Section5.4 to reduce the side information. The case J = 1 corresponds to no classi�cation. Figure 7also shows the performance of the subband coding system for the case of equally populatedclasses, for J = 2 and 4. These results are for the 22-band decomposition of the `Lenna' imageusing 32-tap QMF's. The block sizes used are 2� 2 for bands 0-6 and 4� 4 for the remainingsubbands (similar to Method 4). As can be seen from the �gure, non-uniform classi�cationfor 4 classes gains about 1.25 dB in PSNR compared to subband coding without classi�ca-tion. The maximum classi�cation gain and EMNSD classi�cation schemes provide almostthe same performance for J = 4. For J = 2, EMNSD classi�cation is a little better than20

the maximum classi�cation gain method at low bit rates; however, maximum classi�cationgain method outperforms EMNSD classi�cation at high bit rates (up to about 0.2 dB at rater = 1:0 bit/pixel). This is consistent with the high bit rate assumption used in the derivationof maximum classi�cation gain method. Using the maximum classi�cation gain method withJ = 4 gains about 0.25 dB compared to using J = 2. Our simulations indicate that usingmore than 4 classes does not result in any noticeable gain in PSNR, and in fact, can resultin degradation of performance. From this we can draw the conclusion that, if more than 4classes are used, the gain in PSNR is o�set by the increase in the classi�cation side rate.For J = 2, non-uniform classi�cation gains about 0.3 dB over classi�cation with equallypopulated classes. For J = 4, this di�erence is narrowed to 0.1 dB or less. This can be ex-plained as follows. Consider the case of J = 4 with uniform classi�cation. After classi�cation,during rate allocation, suppose classes 0 and 1 are assigned zero rate and classes 2 and 3 areassigned non-zero rates. This is equivalent to a non-uniform classi�cation with J = 3 andclass probabilities 0.5, 0.25, and 0.25. Thus, due to the rate allocation step following classi-�cation, some bene�ts of non-uniform classi�cation are exploited even in the case of uniformclassi�cation.In general, the classi�cation gain can vary depending on the image being encoded. In oursimulations, it was observed that use of classi�cation generally results in a gain of more than1.0 dB in PSNR for images `Lenna' and `baboon'. However, the gain for images `goldhill' and`little girl' is only about 0.3 to 0.5 dB.Figure 8 compares the performance of Method 4a for two di�erent choices of �lters, namely,32-tap QMF's and 9-7 tap biorthogonal spline �lters. The subband coder using the 9-7 tapspline �lters performs worse by about 0.2 dB at high bit rates. However, the complexity issubstantially lower when compared to the 32-tap QMF's. Also, at low encoding rates, imagescoded using the 9-7 tap spline �lters have less ringing distortion and have better perceptualquality, when compared to those coded using 32-tap QMF's.Figure 9 compares the coding results of various classi�cation methods at a target rate of21

0.25 bits/pixel for the `goldhill' image using 9-7 tap spline �lters. Only a 128 � 128 segmentof the image is displayed. The segments have been resized using pixel replication. It canbe observed that segments encoded using Methods 1 and 4 look very similar. The segmentencoded using Method 2, although close in PSNR, displays very di�erent coding artifacts. Thepattern on the roof is smeared and some blockiness is visible. This is due to the lack of bitbudget to encode the low frequency DCT coe�cients. Figure 10 compares the performance ofMethods 4a and 4b with various other coding results in the literature for the `Lenna' imageusing 32-tap QMF's.7.2 Comparison of complexity of di�erent classi�cation methodsIn this subsection, we compare the complexities of di�erent classi�cation methods described insection 5. Since, except Method 1, all other methods use iterative algorithms for classi�cation,it is di�cult analyze their complexities accurately. Our simulation results indicate that forimages of size 512 � 512, the time required for classi�cation is only about 10% of the totalexecution time at the encoder. Thus the choice of classi�cation does not signi�cantly a�ectthe overall encoder complexity. The classi�cation methods can be approximately graded interms of complexity (from lower to higher) as follows.1. Method 1, Method 2.2. Equally likely classi�cation (of all subbands).3. Method 4a, Method 4b.4. Method 3.This is assuming that the number of classes is the same for Methods 3 and 4. Simulationresults presented above indicate that, in general, more complex methods of classi�cation leadto better performance. Thus, there is also a trade-o� between computational complexity andclassi�cation gain. 22

The complexity analysis of subband split, TCQ and rate allocation using Westerink'smethod can be found in [33], [18], and [8] respectively. It should be noted that the complexityof the ACTCQ decoder is only marginally higher than entropy coded uniform quantization.The additional complexity is due to keeping track of the trellis transitions.8 ConclusionsWe have investigated various classi�cation techniques, applied to subband coding of images,as a way of exploiting the non-stationary nature of image subbands. The advantages ofsubband classi�cation have been characterized in a rate-distortion framework in terms of theresulting classi�cation gain. Two new algorithms, maximum classi�cation gain and EMNSDclassi�cation, which allow unequal number of blocks in each class have been presented. Thedependence between classi�cation maps from di�erent subbands has been exploited eitherdirectly while encoding the classi�cation maps or indirectly by constraining the classi�cationmaps. The trade-o� between the classi�cation gain and the amount of side information hasbeen explored. Coding results for a subband coder based on classi�cation have been presented.The simulation results demonstrate the value of classi�cation in subband coding which is upto 1.25 dB for the 512 � 512 `Lenna' image. At encoding rates above about 0.5 bits/pixel,Method 4 performs as well or better than all other results described in the literature. At lowerencoding rates, the method is competitive with the best previously reported results.References[1] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, \Image coding using wavelettransform," IEEE Trans. Image Proc., vol. IP-1, pp. 205{220, April 1992.[2] J. L. Arrowwood and M. J. T. Smith, \Exact reconstruction analysis/synthesis �lterbanks with time varying �lters," Proceedings, IEEE Int. Conf. Acoust., Speech and Signal23

Proc., pp. 233-236, vol. 3, April 1993.[3] K. A. Birney and T. R. Fischer, \On the modeling of DCT and subband image data forcompression," IEEE Trans. Image Proc., vol. IP-4, pp. 186-193, February 1995.[4] W. H. Chen and C. H. Smith, \Adaptive coding of monochrome and color images," IEEEtrans. Commun., vol. COM-25, pp. 1285-1292, November 1977.[5] W. C. Chung and M. J. T. Smith, \Spatially-varying IIR �lter banks for image coding,"Proceedings, IEEE Int. Conf. Acoust., Speech and Signal Proc., pp. 570-573, vol. 5, April1993.[6] T. R. Fischer and M. Wang, \Entropy-constrained trellis coded quantization," IEEETrans. Inform. Th., vol.38, no. 2, pp. 415-426, March 1992.[7] G. D. Forney, Jr., \The Viterbi Algorithm," Proc. IEEE, vol. 61, pp. 268-278, March1973.[8] Allen Gersho and Robert M. Gray, Vector quantization and signal compression, KluwerAcademic Publishers, 1992.[9] C. Herley, J. Kova�cevi�c, K. Ramchandran, and M. Vetterli, \Time-varying orthonormaltilings of the time-frequency plane," Proceedings, IEEE Int. Conf. Acoust., Speech andSignal Proc., pp. 205-208, vol. 3, April 1993.[10] H. Jafarkhani, N. Farvardin, and C.-C. Lee, \Adaptive image coding based on the discretewavelet transform," Proceedings, Int. Conf. Image Proc., vol. 3, pp. 343-347, November1994.[11] N. S. Jayant and P. Noll, \Digital coding of waveforms," Prentice-Hall, Englewood Cli�s,1984.[12] J. D. Johnston, \A �lter family designed for use in quadrature mirror �lter banks,"Proceedings, IEEE Int. Conf. Acoust., Speech and Signal Proc., pp. 291-294, April 1980.24

[13] R. L. Joshi, T. R. Fischer, and R. H. Bamberger, \Optimum classi�cation in subbandcoding of images," Proceedings, Int. Conf. Image Proc., vol. 2, pp. 883-887, November1994.[14] Rajan L. Joshi, Valerie J. Crump, and Thomas R. Fischer, \Image subband coding usingarithmetic coded trellis coded quantization," IEEE Trans. Circuits and Systems for VideoTechnology, vol. 5, no. 6, pp. 515-523, December 1995.[15] R. L. Joshi, T. R. Fischer, and R. R. Bamberger, \Comparison of di�erent methodsof classi�cation in subband coding of images," IS&T/SPIE's symposium on ElectronicImaging: Science & Technology, Technical conference 2418, Still Image Compression,February 1995.[16] J. H. Kasner and M. W. Marcellin, \Adaptive wavelet coding of images," Proceedings,Int. Conf. Image Proc., vol. 3, pp. 358-362, November 1994.[17] S. G. Mallat, \A theory for multiresolution signal decomposition : The wavelet represen-tation," IEEE Trans. Pattern Anal. and Mach. Intel., vol. 11, pp. 674{693, July 1989.[18] M. W. Marcellin and T. R. Fischer, \Trellis coded quantization of memoryless and Gauss-Markov sources," IEEE Trans. Commun., vol.38, no.1, pp. 82-93, January 1990.[19] M. W.Marcellin, \On entropy-constrained trellis coded quantization," IEEE Trans. Com-mun., pp. 14-16, January 1994.[20] T. Naveen and J.W. Woods, \Subband �nite state scalar quantization," Proceedings,IEEE Int. Conf. Acoust., Speech and Signal Proc., pp. 613-616, April 1993.[21] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, \Numerical recipesin C: the art of scienti�c computing," second edition, University Press, 1992.[22] J. Rissanen and G. G. Langdon, \Arithmetic coding," IBM J. Res. Develop., vol. 23, pp.149-162, March 1984. 25

[23] A. Said and W. A. Pearlman, \A new fast and e�cient image codec based on set parti-tioning in hierarchical trees," IEEE Trans. Circ. & Syst. Video Tech., vol. 6, no. 3, pp.243-250, June 1996.[24] J. M. Shapiro, \Embedded image coding using zerotrees of wavelet coe�cients," IEEETrans. Signal Proc., vol. 41, pp. 3445-3462, December 1993.[25] Y. Shoham and A. Gersho, \E�cient bit allocation for an arbitrary set of quantizers,"IEEE Trans. Acoust., Speech and Signal Proc., vol. ASSP-36, pp. 1445-1453, September1988.[26] P. Sriram and M. W. Marcellin, \Image coding using wavelet transforms and entropy-constrained trellis-coded quantization," IEEE Trans. Image Proc., vol. IP-4, pp. 725-733,June 1995.[27] A. Stuart and J. K. Ord, Kendall's advanced theory of statistics; vol. 1: Distributiontheory, �fth edition, Charles Gri�n, London, 1987.[28] N. Tanabe and N. Farvardin, \Subband image coding using entropy-coded quantizationover noisy channels," IEEE J. Select. Areas in Com., vol. 10, pp. 926{943, June 1992.[29] D. Taubman and A. Zakhor, \Multirate 3-D subband coding of video," IEEE Trans.Image Proc., vol. IP-3, pp. 572-588, September 1994.[30] Tse, Yi-Tong, \Video coding using global/local motion compensation, classi�ed subbandcoding, uniform threshold quantization and arithmetic coding," Ph.D. thesis, Universityof California, Los Angeles, 1992.[31] G. Ungerboeck, \Channel coding with multilevel/phase signals," IEEE Trans. Inform.Th., vol. IT-28, pp. 55-67, January 1982.[32] P. P. Vaidyanathan, Multirate systems and �lter banks, Prentice Hall PTR, EnglewoodCli�s, New Jersey, 1993. 26

[33] Martin Vetterli and Jelena Kova�cevi�c, Wavelets and subband coding, Prentice Hall PTR,Englewood Cli�s, New Jersey 1995.[34] P. H. Westerink, J. Biemond, and D. E. Boekee, \An optimal bit allocation algorithmfor subband coding," Proceedings, IEEE Int. Conf. Acoust., Speech and Signal Proc., pp.757-760, April 1988.[35] I. H. Witten, R. M. Neal, and J. G. Cleary, \Arithmetic coding for data compression,"Communications of the ACM, pp. 520-540, vol. 30, No. 6, June 1987.[36] J.W. Woods and S.D. O'Neil, \Subband coding of images," IEEE Trans. Acoust., Speechand Signal Proc., vol. ASSP-34, pp. 1278{1288, October 1986.[37] J. W. Woods and T. Naveen, \A �lter based allocation scheme for subband compressionof HDTV," IEEE Trans. Image Proc., vol. IP-1, pp. 436-440, July 1992.?�

0!v -�!hLFS01452367

89121310111415Figure 1: Frequency partition for a 16-band uniform decomposition.27

?�0!v -�!h01 234 5671011

89121314151819

16172021Figure 2: Frequency partition for a 22-band decomposition.

Figure 3: Classi�cation map for `Lenna' for a 16-band uniform decomposition (maximumclassi�cation gain method; J = 4). 28

Figure 4: Classi�cation map for `Lenna' for a 16-band uniform decomposition (EMNSD clas-si�cation method; J = 2).TCQEncoderArithmeticEncoder ArithmeticDecoderTCQDecoder? 6? 6-noiseless codingcompressed dataX Y(i,j) (i,j)Figure 5: Block diagram of the ACTCQ system.29

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.133

34

35

36

37

38

39

40

41

42

rate in bits/sample

PS

NR

in d

B

: Method 1

: Method 2a

: Method 2b

: Method 3

: Method 4a

: Method 4bFigure 6: Comparison of di�erent methods of classi�cation for the `Lenna' image.30

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.132

33

34

35

36

37

38

39

40

41

42

rate in bits/sample

PS

NR

in d

B

: J=4, Method 4a

: J=4, Method 4b

: J=4, uniform classification

: J=2, Method 4a

: J=2, Method 4b

: J=2, uniform classification

: No classificationFigure 7: Comparison of non-uniform and equally likely classi�cation for the `Lenna' image,22-band decomposition.0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

34

35

36

37

38

39

40

41

42

rate in bits/sample

PS

NR

in d

B

: 32 tap QMF filters

: 9−7 tap spline filtersFigure 8: Comparison of encoding performance of di�erent �lters for the `Lenna' image, 22-subband decomposition, maximum classi�cation gain method.31

Figure 9: Comparison of encoded images using di�erent classi�cation methods. 128 � 128segment of the `goldhill' image resized using pixel replication. Top-left : Original. Top-right :Method 4a (0.260 bits/pixel, 30.88 dB). Bottom-left : Method 1 (0.247 bits/pixel, 30.62 dB).Bottom-right : Method 2b (0.251 bits/pixel, 30.51 dB).32

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.130

32

34

36

38

40

42

rate in bits/sample

PS

NR

in d

B

: Method 4a

: Method 4b

: Joshi, Crump and Fischer [14]

: Naveen and Woods [20]

: Said and Pearlman [23]

: Sriram and Marcellin [26]

: Taubman and Zakhor [29]Figure 10: Comparison of encoding performance for the `Lenna' image.33

Comparison Di eren t Metho ds of Classi cation...

Documents

Transcript of Comparison Di eren t Metho ds of Classi cation...