Pcm Coding

6
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2012 3801 Generalized PCM Coding of Images José Prades Nebot, Member, IEEE, Marleen Morbee, and Edward J. Delp, Fellow, IEEE Abstract— Pulse-code modulation (PCM) with embedded quantization allows the rate of the PCM bitstream to be reduced by simply removing a fixed number of least significant bits from each codeword. Although this source coding technique is extremely simple, it has poor coding efficiency. In this paper, we present a generalized PCM (GPCM) algorithm for images that simply removes bits from each codeword. In contrast to PCM, however, the number and the specific bits that a GPCM encoder removes in each codeword depends on its position in the bitstream and the statistics of the image. Since GPCM allows the encoding to be performed with different degrees of computational complexity, it can adapt to the computa- tional resources that are available in each application. Exper- imental results show that GPCM outperforms PCM with a gain that depends on the rate, the computational complexity of the encoding, and the degree of inter-pixel correlation of the image. Index Terms— Binning, interpolative coding, pulse-code modulation, quantization. I. I NTRODUCTION Today, most signals of interest (e.g., voice, audio, image, video) are digitally acquired (digitized) using analog-to-digital (A/D) con- verters. A/D converters perform pulse-code modulation (PCM) with uniform quantization and fixed-length binary coding. This type of coding has several advantages. First, PCM is the simplest source coding technique [1]. Second, it is trivial to locate (and decode) any codeword in the PCM bitstream (random access). Third, when embedded quantization is used, the rate of the PCM bitstream can be easily reduced by discarding a fixed number of bits in each codeword (scalability). Despite all these advantages, PCM is not usually used in the storing or transmission of signals due to its poor coding efficiency. For this reason, the PCM bitstream is usually compressed using a sophisticated and efficient coding algorithm [1]. Apart from coding efficiency, encoding simplicity is also an important factor in applications with very limiting constraints on computational power and/or energy [2]. A simple encoder is also necessary when the signal must be acquired at a very high sampling rate. This happens, for instance, in applications that require capturing Manuscript received October 6, 2011; revised February 14, 2012; accepted April 5, 2012. Date of publication May 1, 2012; date of current version July 18, 2012. This work was supported by the Spanish Government CYCIT under Grant TEC2009-09146. The work of M. Morbee was supported by the Special Research Fund of Ghent University. The associate editor coor- dinating the review of this manuscript and approving it for publication was Prof. Zhou Wang. J. Prades Nebot is with the Institute of Telecommunications and Multimedia Applications, Universitat Politècnica de València, Valencia 46022, Spain (e-mail: [email protected]). M. Morbee is with the Department of Telecommunications and Information Processing, Ghent University, Ghent 9000, Belgium (e-mail: [email protected]). E. J. Delp is with the School of Electrical and Computer Engineer- ing, Purdue University, West Lafayette, IN 47907-2035 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2012.2197015 1057–7149/$31.00 © 2012 IEEE video at very high frame rates [3]–[6]. In some ultrahigh-speed applications, the frame rate is so high that not even PCM can be used and only a few analog frames are captured and stored [6]. High-speed video applications require source coding algorithms that are extremely simple and, consequently, many times the PCM signal (raw video) is directly transferred and stored. However, the high bit rate of raw video can make its transmission or storage difficult. Thus, the bus may not be fast enough to transfer the video out of the sensor, or the writing speed of the storage device may not be high enough to save the raw video [5]. In all these applications, a lossy coding algorithm that has proper- ties similar to those of PCM but with a better coding efficiency would be of interest. These constraints discard most of the image coding algorithms because their computational complexity is too high for very high-speed video applications. Indeed, the high coding efficiency of transform-based algorithms such as JPEG and JPEG2000 comes at the expense of using 2-D transforms, quantization, and lossless coding techniques (e.g., Huffman or arithmetic coding). Hence, although today’s hardware allows the use of transform coders in many appli- cations, they are not appropriate when an extremely low complexity coding is required [7], [8]. JPEG-LS is less complex than other image coding algorithms (e.g., JPEG, JPEG-2000, CALIC) and allows a controlled maximum coding error [9], but it is still too complex for ultrahigh-speed video applications. Recently, very simple DPCM- based image coders have been proposed for applications with high constraints in energy and/or computational resources [7], [8]. These algorithms have small memory requirements and coding latency, and perform much better than PCM thanks to the use of linear prediction and entropy coding. These features, however, make the compressed bitstream highly sensitive to transmission or storage errors and complicate random access. Moreover, part of the simplicity of these algorithms is due to the specific properties of the video of the targeted applications [7], [8]. In this paper, we present a generalized-PCM (GPCM) image coding algorithm for applications that require an extremely low computational complexity. The GPCM encoding is divided into two stages: the analysis and the encoding itself. The encoding is done by simply discarding bits of each pixel value (as PCM with embedded quantization does). 1 The type [least significant bits (LSBs) or most significant bits (MSBs)] and number of bits discarded in each pixel are determined by using the information provided by the analysis stage. Although the analysis stage involves some numerical process- ing, its complexity can be reduced by using statistical sampling [11], [12]. Moreover, GPCM can operate at different degrees of compu- tational complexity, which allows it to adapt to the computational resources that are available in each application. Although GPCM has worse coding performance than most image coding algorithms, it is extremely simple, it facilitates random access to the data, it is robust to transmission errors, it is scalable in complexity, and it outperforms PCM in coding efficiency. These features make the algorithm useful for the recording of video at extremely high frame rates. The rest of this paper is organized as follows. In Section II, we describe the GPCM coding of images. In Section III, we experimen- tally test the coding efficiency of GPCM and compare it to other image coding algorithms. Finally, in Section IV, we summarize our results. 1 In preliminary versions of this paper (i.e., [10], [11]), our algorithm was named Modulo-PCM, however, in this paper, we use the name GPCM in order to avoid confusion with the Modulo-PCM technique. In fact, our algorithm combines three coding techniques (one of which is very similar to Modulo-PCM).

Transcript of Pcm Coding

Page 1: Pcm Coding

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2012 3801

Generalized PCM Coding of Images

José Prades Nebot, Member, IEEE, Marleen Morbee,and Edward J. Delp, Fellow, IEEE

Abstract— Pulse-code modulation (PCM) with embeddedquantization allows the rate of the PCM bitstream to be reducedby simply removing a fixed number of least significant bitsfrom each codeword. Although this source coding technique isextremely simple, it has poor coding efficiency. In this paper,we present a generalized PCM (GPCM) algorithm for imagesthat simply removes bits from each codeword. In contrast toPCM, however, the number and the specific bits that a GPCMencoder removes in each codeword depends on its position inthe bitstream and the statistics of the image. Since GPCMallows the encoding to be performed with different degreesof computational complexity, it can adapt to the computa-tional resources that are available in each application. Exper-imental results show that GPCM outperforms PCM with again that depends on the rate, the computational complexityof the encoding, and the degree of inter-pixel correlation ofthe image.

Index Terms— Binning, interpolative coding, pulse-codemodulation, quantization.

I. INTRODUCTION

Today, most signals of interest (e.g., voice, audio, image, video)are digitally acquired (digitized) using analog-to-digital (A/D) con-verters. A/D converters perform pulse-code modulation (PCM) withuniform quantization and fixed-length binary coding. This type ofcoding has several advantages. First, PCM is the simplest sourcecoding technique [1]. Second, it is trivial to locate (and decode)any codeword in the PCM bitstream (random access). Third, whenembedded quantization is used, the rate of the PCM bitstream can beeasily reduced by discarding a fixed number of bits in each codeword(scalability). Despite all these advantages, PCM is not usually used inthe storing or transmission of signals due to its poor coding efficiency.For this reason, the PCM bitstream is usually compressed using asophisticated and efficient coding algorithm [1].

Apart from coding efficiency, encoding simplicity is also animportant factor in applications with very limiting constraints oncomputational power and/or energy [2]. A simple encoder is alsonecessary when the signal must be acquired at a very high samplingrate. This happens, for instance, in applications that require capturing

Manuscript received October 6, 2011; revised February 14, 2012; acceptedApril 5, 2012. Date of publication May 1, 2012; date of current versionJuly 18, 2012. This work was supported by the Spanish Government CYCITunder Grant TEC2009-09146. The work of M. Morbee was supported bythe Special Research Fund of Ghent University. The associate editor coor-dinating the review of this manuscript and approving it for publication wasProf. Zhou Wang.

J. Prades Nebot is with the Institute of Telecommunications and MultimediaApplications, Universitat Politècnica de València, Valencia 46022, Spain(e-mail: [email protected]).

M. Morbee is with the Department of Telecommunications andInformation Processing, Ghent University, Ghent 9000, Belgium (e-mail:[email protected]).

E. J. Delp is with the School of Electrical and Computer Engineer-ing, Purdue University, West Lafayette, IN 47907-2035 USA (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2012.2197015

1057–7149/$31.00 © 2012 IEEE

video at very high frame rates [3]–[6]. In some ultrahigh-speedapplications, the frame rate is so high that not even PCM can beused and only a few analog frames are captured and stored [6].High-speed video applications require source coding algorithms thatare extremely simple and, consequently, many times the PCM signal(raw video) is directly transferred and stored. However, the high bitrate of raw video can make its transmission or storage difficult. Thus,the bus may not be fast enough to transfer the video out of the sensor,or the writing speed of the storage device may not be high enoughto save the raw video [5].

In all these applications, a lossy coding algorithm that has proper-ties similar to those of PCM but with a better coding efficiency wouldbe of interest. These constraints discard most of the image codingalgorithms because their computational complexity is too high forvery high-speed video applications. Indeed, the high coding efficiencyof transform-based algorithms such as JPEG and JPEG2000 comes atthe expense of using 2-D transforms, quantization, and lossless codingtechniques (e.g., Huffman or arithmetic coding). Hence, althoughtoday’s hardware allows the use of transform coders in many appli-cations, they are not appropriate when an extremely low complexitycoding is required [7], [8]. JPEG-LS is less complex than other imagecoding algorithms (e.g., JPEG, JPEG-2000, CALIC) and allows acontrolled maximum coding error [9], but it is still too complexfor ultrahigh-speed video applications. Recently, very simple DPCM-based image coders have been proposed for applications with highconstraints in energy and/or computational resources [7], [8]. Thesealgorithms have small memory requirements and coding latency, andperform much better than PCM thanks to the use of linear predictionand entropy coding. These features, however, make the compressedbitstream highly sensitive to transmission or storage errors andcomplicate random access. Moreover, part of the simplicity of thesealgorithms is due to the specific properties of the video of the targetedapplications [7], [8].

In this paper, we present a generalized-PCM (GPCM) imagecoding algorithm for applications that require an extremely lowcomputational complexity. The GPCM encoding is divided into twostages: the analysis and the encoding itself. The encoding is done bysimply discarding bits of each pixel value (as PCM with embeddedquantization does).1 The type [least significant bits (LSBs) or mostsignificant bits (MSBs)] and number of bits discarded in each pixelare determined by using the information provided by the analysisstage. Although the analysis stage involves some numerical process-ing, its complexity can be reduced by using statistical sampling [11],[12]. Moreover, GPCM can operate at different degrees of compu-tational complexity, which allows it to adapt to the computationalresources that are available in each application. Although GPCMhas worse coding performance than most image coding algorithms,it is extremely simple, it facilitates random access to the data, itis robust to transmission errors, it is scalable in complexity, andit outperforms PCM in coding efficiency. These features make thealgorithm useful for the recording of video at extremely high framerates.

The rest of this paper is organized as follows. In Section II, wedescribe the GPCM coding of images. In Section III, we experimen-tally test the coding efficiency of GPCM and compare it to otherimage coding algorithms. Finally, in Section IV, we summarize ourresults.

1In preliminary versions of this paper (i.e., [10], [11]), our algorithmwas named Modulo-PCM, however, in this paper, we use the name GPCMin order to avoid confusion with the Modulo-PCM technique. In fact, ouralgorithm combines three coding techniques (one of which is very similar toModulo-PCM).

Page 2: Pcm Coding

3802 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2012

Fig. 1. Block diagram of the GPCM coding of an image block.

II. GPCM CODING OF IMAGES

Let us consider a monochromatic digital image encoded with PCMat R0 b/pixel. Let �0 be the quantization step size of the digitalimage. Let us also assume that uniform embedded quantization isused, i.e., the removal of the l LSBs of each codeword of the imageprovides the same bitstream as the PCM encoding of the analog imagewith R0 − l b/pixel. Our GPCM algorithm divides the input imageinto blocks of b×b pixels and encodes each block at the target rate Rusing appropriate parameter values. In Section II-A, we describe theGPCM encoding and decoding of each image block. In Section II-B, we describe the different coding techniques used in GPCM. InSection II-C, we show how the GPCM encoder assigns values to thecoding parameters of each block. Section II-D is devoted to evaluatingthe computational complexity of GPCM. Finally, in Section II-E, weshow how GPCM can achieve different degrees of coding complexityand efficiency.

A. GPCM Coding

Let x[n1, n2] be a block of b × b pixels of the input image. Forthe sake of brevity, we will drop variables n1 and n2 when this doesnot jeopardize clarity. The GPCM encoder first divides x into fourdecimated subblocks (Fig. 1)

x0[n1, n2] = x[2n1, 2n2]x1[n1, n2] = x[2n1 + 1, 2n2]x2[n1, n2] = x[2n1, 2n2 + 1]x3[n1, n2] = x[2n1 + 1, 2n2 + 1].

We refer to x0 as the PCM subblock, and x1, x2, and x3 as theGPCM subblocks. In each subblock xk , the encoder simply removesthe lk LSBs and the mk MSBs from each pixel value. Parameter m0 isalways set to zero. To simplify the assignment of values to the codingparameters, we set l1 = l2 = l3 and m1 = m2 = m3. In this way, theencoder only has to assign values to three parameters: l0, l1, and m1.The resulting codewords x̃k are transmitted to the decoder (Fig. 1).Each codeword of x̃0 represents a quantization interval of length2l0 �0. Similarly, each codeword of x̄k (k ∈ {1, 2, 3}) represents aninterval of size 2lk �0. If mk > 0, the encoder performs binning overthe codewords of x̄k [13], i.e., each codeword of x̃k represents a binof 2mk codewords x̄k .

At the decoder, x0 is first decoded using midpoint reconstruction(i.e., using the midpoint of each quantization interval). Then, thedecoder obtains an approximation of x1, x2, and x3 by performinginterpolation over the reconstructed PCM subblock x̂0. Bilinearinterpolation is used since it represents a good tradeoff betweencomplexity and interpolation accuracy [14]. Note that to interpolatethe pixels of the GPCM subblocks that are placed in the last columnand row of a block x , the first column of PCM subblock that is placedto the right of x and the first row of the PCM subblock that is placedbelow x must be previously decoded. The interpolated subblocks y1,y2, and y3 act as side information (SI) for the decoding of x1, x2,and x3, respectively (Fig. 1).

The decoding of each codeword of x̃k (k ∈ {1, 2, 3}) is dividedinto two steps: decision and reconstruction. In the decision step, thedecoder selects one of the 2m1 quantization intervals represented bythe codeword (no decision is performed when m1 = 0). Among allthe potential intervals of the codeword, the decoder selects the onethat is closest to its SI. If the decision is correct, the m1 MSBs thatwere removed by the encoder are correctly recovered. Otherwise, thedecoder incurs a decision error which may generate a large-amplitudedecoding error. After the decision, each codeword represents a singleinterval.

In the reconstruction step, the aim is to recover the l1 LSBsthat were removed from each pixel codeword by the encoder. Thedecoder first estimates the value of each pixel using its SI andits quantization interval. If we assume that the interpolation errore between a pixel value x and its SI y (e � y − x) follows aunimodal and symmetric probability distribution, then the maximuma posteriori (MAP) estimation of x when x belongs to a quantizationinterval [a, b] is given by the clipping function

x̌ =

⎧⎪⎨

⎪⎩

a, y < a

y, a ≤ y ≤ b

b, y > b.

(1)

A better but more complex reconstruction can be performedif the minimum mean squared error (MMSE) estimationis used [10].

When m1 = 0 and l0 = l1, midpoint reconstruction instead ofMAP reconstruction is used since, in this case, the SI does not offerany significant help for the decoding of the GPCM subblocks. Eachestimated value x̌ is then quantized with R0 bits. Finally, the four

Page 3: Pcm Coding

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2012 3803

TABLE I

MAXIMUM NUMBER OF OPERATIONS PER PIXEL REQUIRED IN THE GPCM ENCODING AND DECODING OF AN IMAGE

Process

Maximum average number of integer operations per pixel

Additions Subtractions Multiplications Shifts Comparisons Total

Encoding2

f

3

4 f

3

4 f

3

4 f+ 7

40

17

4 f+ 7

4

Decoding9

40 0

10

4

12

4

31

4

decoded subblocks are multiplexed (Fig. 1). Since all the blocks areencoded at the same rate and bilinear interpolation only operates overa small number of pixel values (2 or 4), random access is facilitated(i.e., locating the codeword of any pixel and decoding it is simple).

B. Coding Techniques

GPCM combines three simple techniques: PCM, interpolativecoding, and binning. GPCM behaves like PCM when m1 = 0 andl0 = l1. PCM is optimum at high rates or for blocks with very poorspatial correlation (i.e., when no benefit is obtained from exploitingthe similarity among pixels). When m1 = 0 and l0 < l1, a GPCMencoder behaves like a PCM encoder that uses two quantization stepsizes. At the decoder, the pixels of x̂0 are interpolated to generate SI,which helps in the reconstruction of the three GPCM blocks. Thistechnique can outperform conventional PCM at low rates.

When l1 +m1 = R0, interpolative coding is used [15]: the GPCMencoder only encodes and transmits x0, which is reconstructed andinterpolated at the decoder. This technique provides good results atlow rates since, at these rates, only transmitting x0 and interpolatingx̂0 provides better results than spending the available bits in theencoding of the four subblocks.

When m1 > 0 and l1 + m1 < R0, binning is performed over thecodewords of the GPCM subblocks. The use of binning is usefulin blocks with a high spatial correlation since most of the removedMSBs bits can be correctly recovered by the decoder by exploitingthe similarity among the pixels.

C. Assignment of Values to the Coding Parameters

The encoder has to assign values to the parameters l0, l1, and m1of each block. For a target rate R, the optimum (l0, l1, m1) is theone that minimizes the distortion D and fulfills the rate constraints

minimize D(l0, l1, m1)

s.t. R = R0 − 1

4[l0 + 3(l1 + m1)]

0 ≤ l0, l1, m1 ≤ R0

0 ≤ l1 + m1 ≤ R0. (2)

In [16], we developed a model for D when the interpolationerror follows a Laplacian distribution. Hence, by solving (2) usingthe distortion model in [16], a GPCM encoder could obtain theoptimum assignment for each block. Nevertheless, since solving (2)is a complex task and encoding complexity is our main concern,the encoder uses a series of assignment rules (ARs) to perform theassignment. Thus, for each possible R and b, there is an AR thatprovides an appropriate triplet (l0, l1, m1) for each block as a function

of the peak signal-to-noise ratio of its SI (PSNRSI) defined as

PSNRSI = 10 log

(2R0 − 1

)2

MSESI(dB) (3)

where MSESI is the mean squared interpolation error of the blockwhen l0 = 0 (i.e., MSESI can be computed from pixels of the originalimage). We experimentally obtained the ARs for a set of values of b(b ∈ {4, 8, 16, 32, 64, 128, 256, 512}) and R (R ∈ {1, 2, 3, 4, 5, 6})using 50 gray-scale digital images. To obtain the AR for a pairof b and R values, we classified each consecutive block of b × bpixels of the 50 images in accordance with its PSNRSI value whenl0 = 0. In this classification, intervals with a width of 1 dB were used.Then, we encoded the blocks with all the possible triplets (l0, l1, m1)

that corresponded to each target rate R. Finally, we selected the(l0, l1, m1) that provided the minimum mean squared decoding errorof the blocks in each PSNRSI interval.2

D. Computational Complexity

The GPCM encoding of a block essentially involves two processes:the computation of its MSESI (the PSNRSI can be obtained from theMSESI through a lookup table) and the encoding itself. Computingthe MSESI of a block of b×b pixels involves the bilinear interpolationof x0 (5b2/4 additions and 3b2/4 shifts), the computation of theinterpolation errors (3b2/4 subtractions), the squaring of the errors(3b2/4 multiplications), and the summation of all the errors (3b2/4additions). The multiplications and divisions by integer powers of2 and the removal of m1 MSBs and l1 LSBs can be implementedthrough shift operations. Therefore, on average, 4.25 integer oper-ations per pixel (iopp) are necessary to compute the MSESI. Thecomplexity of this computation can be reduced by computing theMSESI using only one out of f pixels that must be interpolated.In this way, MSESI is computed with 4.25/ f iopp. The higherthe sampling factor f , the lower the number of operations to becomputed but the less accurate the assignment will be (since MSESIis less accurate). Since the MSESI is the sample variance of afinite population, the error in estimating MSESI depends on both thesampling factor and the size of the population (i.e., the size of theblock). To achieve a target error variance in estimating the MSESI,the larger the size of the population, the larger the sampling factor touse [17]. Hence, for a given block size, there is a maximum factorfmax such that assignments that are different to the optimum onerarely occur if f ≤ fmax. Therefore, if f = fmax, the complexity isreduced without incurring any appreciable loss in coding efficiency.

The encoding of each pixel of a PCM subblock requires zero shiftswhen l0 = 0 and one shift when l0 > 0. The encoding of each pixel

2The ARs can be available at: http://personales.upv.es/jprades/publications.html.

Page 4: Pcm Coding

3804 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2012

)c()b()a(

)f()e()d(

Fig. 2. PSNR as a function of the rate of six digital images coded using GPCM, PCM, JPEG, and JPEG-LS. (a) Zelda. (b) Lena. (c) Peppers. (d) Man.(e) Barbara. (f) Baboon.

Algorithm 2 GPCM coding

1: Select f depending on the attainable complexity2: Select b depending on f using the block-size table3: Select an AR depending on R and b4: for “each block of b × b pixels” do5: Compute the PSNRSI using a sampling factor f6: Obtain l0, l1, and m1 from the AR7: Encode the block8: end for

in a GPCM subblock requires zero shifts when l1 +m1 = 8, one shiftwhen l1 > 0 and m1 = 0, and two shifts in the rest of the cases.

Let us assume that the pixel values are accessed line by line inraster order and that w is the width of the image in pixels. Then, theencoding has a maximum latency of b(w + 1) pixels and requires atmost b(w+1)+1 memory cells. When b is large, the memory usageand the encoding latency can be drastically reduced by encoding eachblock of a frame using the coding parameters that were computedfor its colocated block in the previous frame. This is because, if b islarge, most of the texture content of two colocated blocks will notvary significantly from frame to frame.

The decoding of a block has three stages: the reconstruction ofthe PCM subblock, its bilinear interpolation, and the decoding ofthe three GPCM subblocks. The decoding of x̂0 involves multiplyingeach codeword by �0 and adding 2l0−1�0 to the result (i.e., b2/4shifts and b2/4 additions). The bilinear interpolation of x̂0 requires5b2/4 additions and 3b2/4 shifts. The decoding of each pixel of theGPCM subblocks has two steps: the decision (only if m1 > 0) and thereconstruction. The decision is equivalent to a uniform quantizationwhich involves two shifts, one addition, and two comparisons (to clipthe result of the division) per pixel of the GPCM subblocks. Finally,the MAP reconstruction of each pixel involves two comparisons.Table I shows the maximum average number of operations per pixel

that is necessary to encode and decode an image. The maximumdecoding latency is 2w pixels, and 2w+1 memory cells are requiredat most.

E. Complexity Scalability

The smaller the block size b, the higher the adaptation of GPCMto the local statistics of the image, and, consequently, the higher itscoding efficiency. However, the smaller the block size b, the smallerthe maximum value of the sampling factor ( fmax) that can be used inthe computation of MSESI, and, hence, the higher the computationalcomplexity of the encoding. As a consequence, the block size b tradesoff coding efficiency and computational complexity. This scalabilityin complexity allows our algorithm to adapt to the computationalresources that are available in each application. In the following, wedescribe how this adaptation can be made (Algorithm 2).

First, the value of the sampling factor f is determined by thecomputational complexity that is attainable by the application. Foreach value of f , there is a minimum block size bmin such that theassignments that are different to the optimum ones rarely occur ifb ≥ bmin, and, hence, using this bmin provides both the highest spatialadaptation and accurate assignments in most cases. Appropriatevalues b for each value of f are stored in a block-size table, which isused by the GPCM encoder to determine the value of b. To build ablock-size table for our GPCM encoder, we experimentally encoded50 images at different rates (R ∈ {1, 2, 3, 4, 5}) with different valuesof b and f . Then, for each value of f , we selected the minimumvalue of b that provides good encoding results (i.e., the minimumvalue of b such that the loss in quality with this f value is negligiblewith respect to f = 1 in most of the encodings). These values areshown in Table II.

Once the values of f and b are known, the algorithm selects anAR according to the value of R and b. Finally, for each block in theimage, the algorithm first computes its PSNRSI, obtains the optimum(l0, l1, m1) from the AR, and encodes the block using the selectedparameter values.

Page 5: Pcm Coding

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2012 3805

TABLE II

BLOCK-SIZE b FOR DIFFERENT VALUES OF f

f 1 2 4 8 16 32 64 256

b 4 8 16 32 64 128 256 512

III. EXPERIMENTAL RESULTS

In this section, we experimentally analyze the performance ofGPCM and compare it with other image coding algorithms. Weencoded six gray-scale images of 512 × 512 pixels and 8 b/pixel.Each image was encoded with GPCM using six different values off (1, 2, 4, 8, 16, and 256) and their corresponding block sizes (4,8, 16, 32, 64, and 512, respectively). Fig. 2 shows the PSNR ofthe six images as a function of the rate R and b. In the figure, theimages are ordered according to their degree of spatial correlation(Zelda is the most correlated image and Baboon is the least correlatedone). The results of encoding each image using PCM, JPEG, andJPEG-LS (near-lossless mode) [9] are also shown for comparison.

Note in Fig. 2 that the higher the computational complexity (orequivalently, the smaller f or b), the higher the coding efficiency ofGPCM. Thus, the best coding efficiency is obtained when b = 4 andf = 1, while the worst coding efficiency is obtained when b = 512and f = 256 (i.e., each image is encoded using a single block).At 1 and 2 b/pixel, the use of small values of b does not generallyprovide significant improvements. The reason is that most blocks areencoded using the same assignment at these rates. Thus, at 1 b/pixel,most of the blocks are encoded using the assignment (4, 8, 0) [onlythose blocks that have a very small PSNRSI are encoded using theassignment (7, 7, 0)]. In the rest of the rates, decreasing b improvesthe coding efficiency significantly (the greater the degree of spatialcorrelation, the greater the improvement).

Decision errors may occur in those pixels that belong to edgesor textured regions (i.e., when the interpolation error may be large).Optimal parameter assignments provide decodings with a very smallnumber of decision errors. Thus, the maximum percentage of decisionerrors among all the encodings of Fig. 2 was 0.42% (Peppers at3 b/pixel with b = 4). Sometimes, an inaccurate assignment providesa value of m1 that is too large. As a consequence, a large number ofdecision errors occur in the decoding, which can significantly increasethe distortion (see Zelda encoded at 6 b/pixel with b = 4 in Fig. 2).Nevertheless, the visibility of the distortion in those pixels that areaffected by decision errors is reduced thanks to the limited sensitivityof the human visual system to amplitude errors in edges and high-activity regions.

The gain in coding efficiency of GPCM with respect to PCM islarge at 1 and 2 b/pixel and generally decreases with R. When b =512, this gain is zero above a certain rate (GPCM behaves like PCM)except for highly correlated images such as Zelda. Fig. 3 shows aportion of the Barbara image encoded at 2 and 3 b/pixel using PCM,and at 2.003 and 3.003 b/pixel using GPCM with b = 32. In theimages encoded with PCM, luminance is poorly represented and thereis false contouring [18], which is even more visible at 1 b/pixel. In theimages encoded using GPCM, the degradation comes mostly from theundersampling and interpolation performed, which reduces the edgesharpness and introduces a staircase effect in slanted edges [see, forinstance, the back of the chair and the right part of the scarf in image(b)]. False contouring is also visible in the GPCM decoded imagesin those blocks that are encoded using PCM [see, for instance, theleft part of the chin in images (b) and (d)].

Note in Fig. 2 that JPEG and JPEG-LS in near-lossless modeperform much better than GPCM at all rates. However, this betterperformance comes at the expense of a much higher computational

(a)

(c)

(b)

(d)

Fig. 3. Barbara image encoded using PCM and GPCM (with b = 32).(a) PCM (2 b/pixel). (b) GPCM (2.003 b/pixel, b = 32). (c) PCM (3 b/pixel).(d) GPCM (3.003 b/pixel, b = 32).

complexity. In particular, only the DCT stage of JPEG requiresmore operations than the whole GPCM encoding for any value ofb (the Feig–Winograd fast-DCT algorithm requires computing 462additions, 54 products, and 6 divisions in each 8×8 block, whichinvolves 8.15 iopp on average [19]). With respect to JPEG-LS, inorder to decide the encoding mode (regular or run) for each pixel,the JPEG-LS encoder must compute three gradients gi (i = 1, 2, 3)and check that |gi | ≤ δ (i.e., in each pixel, three subtractions,three absolute values, and three comparisons must be computed) [9].Therefore, just the decision of the coding mode in JPEG-LS requiresmore operations than the entire GPCM encoding even for b = 4.

Recently, very simple DPCM-based image coders have beenproposed for applications with high constraints in energyand/or computational resources [7], [8]. These algorithms per-form better than PCM and GPCM thanks to the use ofprediction and entropy coding. These features, however, make thecompressed bitstream highly sensitive to transmission/storage errorsand complicate random access. Moreover, even the simplest DPCMencoders (i.e., those that use low-order prediction with fixed coef-ficients, scalar uniform quantization, and simple entropy coding)require more computing operations than the GPCM encoder evenwhen b = 4. Additionally, these DPCM coders are not scalable incomplexity and, in some cases, part of their simplicity is due tothe specific properties of the video of the targeted applications [7],[8]. One of the advantages of DPCM image coders (and JPEG-LS)is their small memory requirements and coding latency. When b islarge, GPCM encoders can obtain these features by using in eachblock the coding parameters of the colocated block in the previousframe (see Section II-D).

IV. CONCLUSION

We have presented a GPCM coding algorithm for images which,after properly assigning values to the coding parameters, simply dis-cards bits from each codeword of the PCM bitstream. Our algorithmis very simple computationally, is robust to transmission or storageerrors, facilitates random access, and is scalable in complexity. Thesefeatures make GPCM coding useful in those applications that needto reduce the bitrate of raw video in an extremely simple way.Experimental results show that GPCM provides improvements withrespect to PCM. The magnitude of these improvements depends on

Page 6: Pcm Coding

3806 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2012

the rate (the smaller the rate, the larger the gain), the encodingcomplexity (the greater the complexity, the greater the gain), andthe degree of inter-pixel correlation (the greater the correlation, thegreater the gain). Standard coding algorithms such as JPEG andJPEG-LS perform much better than GPCM at the expense of asignificantly larger computational complexity.

ACKNOWLEDGMENT

The authors would like to thank S.-G. Park and A. Roca for helpfuldiscussions.

REFERENCES

[1] N. S. Jayant and P. Noll, Digital Coding of Waveforms. EnglewoodCliffs, NJ: Prentice-Hall, 1984.

[2] Z. Xiong, A. Liveris, and S. Cheng, “Distributed source coding forsensor networks,” IEEE Signal Process. Mag., vol. 21, no. 5, pp. 80–94,Sep. 2004.

[3] H. Shum and T. Komura, “Tracking the translational and rotationalmovement of the ball using high-speed camera movies,” in Proc. IEEEInt. Conf. Image Process., Genoa, Italy, Sep. 2005, pp. 1084–1087.

[4] J. Sekikawa and T. Kubono, “Spectroscopic imaging observation ofbreak arcs using a high-speed camera,” in Proc. IEEE Conf. Electr.Contacts, Sep. 2007, pp. 275–279.

[5] P. Gemeiner, W. Ponweiser, P. Einramhof, and M. Vincze, “Real-timeSLAM with high-speed CMOS camera,” in Proc. IEEE Int. Conf. Imag.Anal. Process., Sep. 2007, pp. 297–302.

[6] M. El-Desouki, M. J. Deen, Q. Fang, L. Liu, F. Tse, and D. Armstrong,“CMOS image sensors for high speed applications,” Sensors, vol. 9, no.1, pp. 430–444, 2009.

[7] T. H. Khan and K. A. Wahid, “Low power and low complexitycompressor for video capsule endoscopy,” IEEE Trans. Circuits Syst.Video Technol., vol. 21, no. 10, pp. 1534–1546, Oct. 2011.

[8] A. N. Kim, T. A. Ramstad, and I. Balasingham, “Very low complexitylow rate image coding for the wireless endoscope,” in Proc. Int. Symp.Appl. Sci. Biomed. Commun. Technol., Barcelona, Spain, Oct. 2011, pp.90-1–90-5.

[9] M. J. Weinberger, G. Seroussi, and G. Sapiro, “The LOCO-I losslessimage compression algorithm: Principles and standardization into JPEG-LS,” IEEE Trans. Image Process., vol. 9, no. 8, pp. 1309–1324, Aug.2000.

[10] J. Prades Nebot, A. Roca, and E. Delp, “Modulo-PCM based encodingfor high speed video cameras,” in Proc. IEEE Int. Conf. Imag. Process.,San Diego, CA, Oct. 2008, pp. 153–156.

[11] J. Prades Nebot, “Very low-complexity coding of images using adaptiveModulo-PCM,” in Proc. IEEE Int. Conf. Imag. Process., Brussels,Belgium, Sep. 2011, pp. 313–316.

[12] N.-M. Cheung, H. Wang, and A. Ortega, “Sampling-based correlationestimation for distributed source coding under rate and complexityconstraints,” IEEE Trans. Image Process., vol. 17, no. 11, pp. 2122–2137, Nov. 2008.

[13] T. M. Cover and J. A. Thomas, Elements of Information Theory. NewYork: Wiley, 1991.

[14] G. Wolberg, Digital Image Warping. New York: Wiley, 1990.[15] B. Zeng and A. Venetsanopoulos, “A JPEG-based interpolative image

coding scheme,” in Proc. IEEE Int. Conf. Acoust. Speech SignalProcess., Minneapolis, MN, Apr. 1993, pp. 393–396.

[16] M. Morbee, “Optimized information processing in resource-constrainedvision systems: From low-complexity coding to smart sensor networks,”Ph.D. dissertation, Faculty Eng. Archit., Ghent Univ., Ghent, Belgium,Mar. 2011.

[17] E. Cho, M. J. Cho, and J. Eltinge, “The variance of sample variance froma finite population,” Int. J. Pure Appl. Math., vol. 21, pp. 387–394, May2005.

[18] A. K. Jain, Fundamentals of Digital Image Processing. EnglewoodCliffs, NJ: Prentice-Hall, 1989.

[19] E. Feig and S. Winograd, “Fast algorithms for the discrete cosinetransform,” IEEE Trans. Signal Process., vol. 40, no. 9, pp. 2174–2193,Sep. 1992.

Weighted Similarity-Invariant Linear Algorithm for CameraCalibration With Rotating 1-D Objects

Kunfeng Shi, Qiulei Dong, and Fuchao Wu

Abstract— In this paper, a weighted similarity-invariant linearalgorithm for camera calibration with rotating 1-D objectsis proposed. First, we propose a new estimation method forcomputing the relative depth of the free endpoint on the 1-Dobject and prove its robustness against noise compared withthose used in previous literature. The introduced estimator isinvariant to image similarity transforms, resulting in a similarity-invariant linear calibration algorithm which is slightly moreaccurate than the well-known normalized linear algorithm. Then,we use the reciprocals of the standard deviations of the estimatedrelative depths from different images as the weights on theconstraint equations of the similarity-invariant linear calibrationalgorithm, and propose a weighted similarity-invariant linearcalibration algorithm with higher accuracy. Experimental resultson synthetic data as well as on real image data show theeffectiveness of our proposed algorithm.

Index Terms— 1-D calibration object, camera calibration,weighted similarity-invariant linear algorithm (WSILA).

I. INTRODUCTION

One-dimensional calibration techniques are an important type oftechniques that generally use a stick consisting of at least three pointsto calibrate camera parameters [1]–[11]. Since the 1-D object can beobserved without occlusion by all the referred cameras simultane-ously, 1-D calibration techniques are more applicable in multicamerasystems [12], [13] than 2-D and 3-D calibration techniques [14]–[16].

In recent years, 1-D calibration techniques have been studiedextensively. Zhang [1] for the first time proposed a linear calibrationalgorithm using a 1-D object that rotated around a fixed point. Ham-marstedt et al. [3] investigated in detail the degenerate configurationsof [1]. Wu et al. [2] reformulated the calibration equation of [1]in a geometric method and obtained a linear algorithm with similaraccuracy. Francca et al. [6] proposed a linear algorithm with thenormalized image points, which significantly improved the calibrationaccuracy of [1]. To avoid possible nonpositive-definite estimations ofthe image of absolute conic (IAC) in the linear calibration algorithms,Wang et al. [7] minimized the norm of algebraic residuals subjectto the constraint that the solution was positive definite. Miyagawaet al. [8] calibrated the camera’s focal length from a single imageof two orthogonal 1-D objects that shared one point. In addition,when multiple cameras need to be globally calibrated, single-cameracalibration algorithms can be extended by combining the intrinsic andextrinsic parameters of all the cameras. Moreover, Wang et al. [9]showed that multiple cameras can be simultaneously calibrated witha 1-D object that underwent general rigid motions.

Manuscript received August 26, 2011; revised January 18, 2012; acceptedMarch 26, 2012. Date of publication April 17, 2012; date of current versionJuly 18, 2012. This work was supported in part by the National NaturalScience Foundation of China under Grant 60835003 and Grant 60905020.The associate editor coordinating the review of this manuscript and approvingit for publication was Dr. Adrian G. Bors.

The authors are with the National Laboratory of Pattern Recognition,Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China(e-mail: [email protected]; [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2012.2195013

1057–7149/$31.00 © 2012 IEEE