IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding … · IEEE TRANSACTIONS ON...

12
IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding for JPEG2000 Han Oh, Ali Bilgin, Senior Member, IEEE, and Michael W. Marcellin, Fellow, IEEE Abstract—Due to exponential growth in image sizes, visually lossless coding is increasingly considered as an alternative to numerically lossless coding, which has limited compression ratios. This paper presents a method of encoding color images in a visually lossless manner using JPEG2000. In order to hide coding artifacts caused by quantization, visibility thresholds (VTs) are measured and used for quantization of subbands in JPEG2000. The VTs are experimentally determined from statistically mod- eled quantization distortion, which is based on the distribution of wavelet coefficients and the dead-zone quantizer of JPEG2000. The resulting VTs are adjusted for locally changing backgrounds through a visual masking model, and then used to determine the minimum number of coding passes to be included in the final codestream for visually lossless quality under the desired viewing conditions. Codestreams produced by this scheme are fully JPEG2000 Part-I compliant. Index Terms—human visual system, contrast sensitivity func- tion, visibility threshold, visually lossless coding, JPEG2000 I. I NTRODUCTION T HE contrast sensitivity function (CSF), which models the varying sensitivity of the human eye as a function of spatial frequency, has been widely exploited in many ap- plications such as image/video quality assessment, perceptual image/video compression, and watermarking. The CSF shows some variations according to the age and visual acuity of the subject, the stimulus used in the experiment, and the viewing conditions [1], [2]. However, it is generally known that the sensitivity has a maximum at 2-6 cycles/degree and is limited at both high and very low frequencies. This is mainly due to the frequency-selective responses of the neurons in the primary visual cortex (also known as V1) and low-pass optical filtering [3]. The sensitivity profile can be measured through psychophys- ical experiments by finding the just-noticeable points (i.e., visibility thresholds) of the stimulus in a pre-defined contrast unit. A simple way of modeling contrast sensitivity is to use a sinusoidal grating such as the Campbell-Robson chart [4], while a more sophisticated two-dimensional CSF model can be obtained using stimuli that have specific ranges of frequency and orientation. Such stimuli are typically generated by transforms that simulate the V1 reception field of the human visual system (HVS). The frequency bandwidth to which a V1 neuron responds is known to be proportional to its center frequency, and its orientation bandwidth is distributed broadly around 40 . The V1 reception fields are well described by H. Oh and M. W. Marcellin are with the Department of Electrical and Computer Engineering, The University of Arizona, Tucson, AZ, 85721 USA (e-mail: [email protected]; [email protected]). A. Bilgin is with the Department of Biomedical Engineering and the Department of Electrical and Computer Engineering, The University of Arizona, Tucson, AZ, 85721 USA (e-mail: [email protected]). the Gabor filter. The Gabor filter is ideal for space-frequency localization; however, it is difficult to compute. Alternatively, Watson proposed the cortex transform, which is invertible, easy to implement, and able to precisely model V1 reception fields with adjustable parameter values [5]. Over-complete transforms such as the Gabor filter or cortex transforms are good tools for modeling the HVS, but they are not appropriate for image/video compression, since they increase the number of coefficients encoded [6]. As a result, in perceptual image/video compression, complete (critically- sampled) transforms such as the discrete cosine transform (DCT) or discrete wavelet transform (DWT), which are already embedded in the codec, are typically used for both modeling and coding. JPEG2000 employs the separable DWT to decompose an image into several subbands, each having different spatial frequency and orientation [7]. As a decomposition method for simulating the HVS, the DWT has many desirable properties: linearity, invertibility, logarithmically spaced spatial frequen- cies, and four orientations of 0 , 90 , 45 and 135 . However, two of the orientations, 45 and 135 , are overlapped in the HH subband and the transform is shift-variant. Despite these shortcomings, the separable DWT is widely used in various vision models [8]–[11], as well as in JPEG2000. In JPEG2000, a color image which has three color compo- nents (red, green, and blue) is typically transformed to an im- age with one luminance component (Y) and two chrominance components (Cb and Cr) for improved compression perfor- mance. Each component is then independently transformed to the wavelet domain by the DWT, and compression is achieved via bit-plane coding of wavelet coefficients quantized by a dead-zone quantizer. Coding artifacts in JPEG2000 are caused by errors in the reconstructed wavelet coefficients, induced by quantization. These coding artifacts are perceived differently based on the sensitivity of the HVS to the subband where the quantization distortion occurs. In [12], visual weighting factors for wavelet subbands were derived from contrast sensitivity curves obtained from sinusoidal gratings. These weighting factors are included in JPEG2000 Part I as examples that can be used to aid in perceptual coding of images. The Kakadu implementation of JPEG2000 [13] can (optionally) use weighting factors to perform rate allocation in its post-compression rate-distortion optimization process. A visual masking process is described in [7] that can be implemented in a JPEG2000 Part I compatible manner, while JPEG2000 Part II includes visual masking tools based on the work reported in [14]. These tools have all proven useful to improve the visual quality of an image compressed at a given bitrate, but do not control the quality directly. Work on direct quality control in wavelet image coding includes the work of Watson et al, where visibility thresholds

Transcript of IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding … · IEEE TRANSACTIONS ON...

Page 1: IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding … · IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding for JPEG2000 Han Oh, Ali Bilgin, Senior

IEEE TRANSACTIONS ON IMAGE PROCESSING 1

Visually Lossless Encoding for JPEG2000Han Oh, Ali Bilgin, Senior Member, IEEE, and Michael W. Marcellin, Fellow, IEEE

Abstract—Due to exponential growth in image sizes, visuallylossless coding is increasingly considered as an alternative tonumerically lossless coding, which has limited compression ratios.This paper presents a method of encoding color images in avisually lossless manner using JPEG2000. In order to hide codingartifacts caused by quantization, visibility thresholds (VTs) aremeasured and used for quantization of subbands in JPEG2000.The VTs are experimentally determined from statistically mod-eled quantization distortion, which is based on the distribution ofwavelet coefficients and the dead-zone quantizer of JPEG2000.The resulting VTs are adjusted for locally changing backgroundsthrough a visual masking model, and then used to determinethe minimum number of coding passes to be included in thefinal codestream for visually lossless quality under the desiredviewing conditions. Codestreams produced by this scheme arefully JPEG2000 Part-I compliant.

Index Terms—human visual system, contrast sensitivity func-tion, visibility threshold, visually lossless coding, JPEG2000

I. INTRODUCTION

THE contrast sensitivity function (CSF), which modelsthe varying sensitivity of the human eye as a function

of spatial frequency, has been widely exploited in many ap-plications such as image/video quality assessment, perceptualimage/video compression, and watermarking. The CSF showssome variations according to the age and visual acuity of thesubject, the stimulus used in the experiment, and the viewingconditions [1], [2]. However, it is generally known that thesensitivity has a maximum at 2-6 cycles/degree and is limitedat both high and very low frequencies. This is mainly due tothe frequency-selective responses of the neurons in the primaryvisual cortex (also known as V1) and low-pass optical filtering[3].

The sensitivity profile can be measured through psychophys-ical experiments by finding the just-noticeable points (i.e.,visibility thresholds) of the stimulus in a pre-defined contrastunit. A simple way of modeling contrast sensitivity is touse a sinusoidal grating such as the Campbell-Robson chart[4], while a more sophisticated two-dimensional CSF modelcan be obtained using stimuli that have specific ranges offrequency and orientation. Such stimuli are typically generatedby transforms that simulate the V1 reception field of the humanvisual system (HVS). The frequency bandwidth to which aV1 neuron responds is known to be proportional to its centerfrequency, and its orientation bandwidth is distributed broadlyaround 40◦. The V1 reception fields are well described by

H. Oh and M. W. Marcellin are with the Department of Electrical andComputer Engineering, The University of Arizona, Tucson, AZ, 85721 USA(e-mail: [email protected]; [email protected]).

A. Bilgin is with the Department of Biomedical Engineering and theDepartment of Electrical and Computer Engineering, The University ofArizona, Tucson, AZ, 85721 USA (e-mail: [email protected]).

the Gabor filter. The Gabor filter is ideal for space-frequencylocalization; however, it is difficult to compute. Alternatively,Watson proposed the cortex transform, which is invertible,easy to implement, and able to precisely model V1 receptionfields with adjustable parameter values [5].

Over-complete transforms such as the Gabor filter or cortextransforms are good tools for modeling the HVS, but theyare not appropriate for image/video compression, since theyincrease the number of coefficients encoded [6]. As a result,in perceptual image/video compression, complete (critically-sampled) transforms such as the discrete cosine transform(DCT) or discrete wavelet transform (DWT), which are alreadyembedded in the codec, are typically used for both modelingand coding.

JPEG2000 employs the separable DWT to decompose animage into several subbands, each having different spatialfrequency and orientation [7]. As a decomposition method forsimulating the HVS, the DWT has many desirable properties:linearity, invertibility, logarithmically spaced spatial frequen-cies, and four orientations of 0◦, 90◦, 45◦ and 135◦. However,two of the orientations, 45◦ and 135◦, are overlapped in theHH subband and the transform is shift-variant. Despite theseshortcomings, the separable DWT is widely used in variousvision models [8]–[11], as well as in JPEG2000.

In JPEG2000, a color image which has three color compo-nents (red, green, and blue) is typically transformed to an im-age with one luminance component (Y) and two chrominancecomponents (Cb and Cr) for improved compression perfor-mance. Each component is then independently transformed tothe wavelet domain by the DWT, and compression is achievedvia bit-plane coding of wavelet coefficients quantized by adead-zone quantizer. Coding artifacts in JPEG2000 are causedby errors in the reconstructed wavelet coefficients, induced byquantization. These coding artifacts are perceived differentlybased on the sensitivity of the HVS to the subband where thequantization distortion occurs.

In [12], visual weighting factors for wavelet subbandswere derived from contrast sensitivity curves obtained fromsinusoidal gratings. These weighting factors are included inJPEG2000 Part I as examples that can be used to aid inperceptual coding of images. The Kakadu implementationof JPEG2000 [13] can (optionally) use weighting factors toperform rate allocation in its post-compression rate-distortionoptimization process. A visual masking process is described in[7] that can be implemented in a JPEG2000 Part I compatiblemanner, while JPEG2000 Part II includes visual masking toolsbased on the work reported in [14]. These tools have all provenuseful to improve the visual quality of an image compressedat a given bitrate, but do not control the quality directly.

Work on direct quality control in wavelet image codingincludes the work of Watson et al, where visibility thresholds

Page 2: IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding … · IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding for JPEG2000 Han Oh, Ali Bilgin, Senior

IEEE TRANSACTIONS ON IMAGE PROCESSING 2

(VTs) were measured for individual wavelet subbands usingrandomly generated uniform noise as a surrogate for quan-tization distortion [15]. The resulting VTs have since beenemployed in wavelet-based codecs (e.g., [16]). These codecsproduce superior results when compared to codecs based onconventional MSE/PSNR metrics. However, as shown below,uniform noise does not well model the distortion producedby the dead-zone quantizer of JPEG2000. Thus, a directuse of these VTs in JPEG2000 may fail to produce a trulyvisually lossless encoded image. Indeed, rather than “visuallylossless coding,” the work of [16] refers to “almost transparentcoding.” For example, a mean opinion score of about 4.6(out of 5) is reported for the Goldhill image in that work.Other efforts to measure VTs [17]–[19] have quantized actualwavelet coefficients obtained from natural images, rather thanthe synthetic distortions used by Watson. These natural VTsmay be more accurate because they take into account spatialcorrelations between wavelet coefficients. However, these VTswere developed using a uniform quantizer. This paper showsbelow, that for use in JPEG2000, a distortion model based ona dead-zone quantizer yields superior results.

Other approaches to controlling JPEG2000 image qualitydirectly (rather than controlling bit-rate) exist in the literature.For example, the use of post compression rate-distortionoptimization to achieve a target MSE or PSNR is discussed in[7] and in [20]. Predicting MSE/PSNR prior to compression isdiscussed in [21]. In that work, predictions of MSE are basedon target compression ratios and image activity measureswhich require significantly lower complexity than compressionitself. This prediction could be used to choose a compressionratio (a priori) to achieve a desired MSE/PSNR. However,for the purpose of visually lossless compression, methods thattarget a given MSE/PNSR will not generally be optimal. Thisfollows from the fact that the MSE/PSNR needed to achievevisually lossless coding is not known, and varies from imageto image.

Building on our previous work [22]–[24], this paper pro-poses a visually lossless JPEG2000 encoder for color images.In the YCbCr color space, VTs are measured for a realisticquantization distortion model, which takes into account thestatistical characteristics of wavelet coefficients as well as thedead-zone quantization of JPEG2000. The measured VTs arethen adjusted to account for the visual masking effects of theunderlying background image. The proposed coding scheme isimplemented without violating the JPEG2000 Part I standardand automatically produces visually lossless imagery at lowerbitrates than other visually lossless schemes from the literature.

This paper is organized as follows. The quantization dis-tortions that occur in JPEG2000 are modeled in SectionII. Section III describes a psychophysical experiment thatemploys the quantization model to determine VTs. Section IVdescribes a model for visual masking effects. The proposedvisually lossless coding method and results are presented inSection V. Section VI summarizes the work.

II. QUANTIZATION DISTORTION MODELING

Wavelet coefficients are often modeled by a generalizedGaussian distribution [25] with probability density function

-6 -4 -2 0 2 4 60

0.05

0.1

0.15

0.2

0.25

-30 -20 -10 0 10 20 300

0.02

0.04

0.06

0.08

0.1

-6 -4 -2 0 2 4 60

0.05

0.1

0.15

0.2

0.25

(a) (b)

(c)

Fig. 1. Probability density functions of: (a) wavelet coefficients in HL, LH,and HH subbands (σ2 = 50); (b) quantization distortions in HL, LH, andHH subbands (σ2 = 50, ∆ = 5); and (c) quantization distortions in theLL subband (σ2 = 2000, ∆ = 5). Dashed lines represent the commonlyassumed uniform distribution.

(PDF)

f(y) =α ·A(α, σ)

2Γ(1/α)exp (−A(α, σ)|y − µ|)α (1)

where

A(α, σ) = σ−1

(Γ(3/α)

Γ(1/α)

)1/2

and Γ(·) is the Gamma function. The parameters µ and σ arethe mean and standard deviation, respectively. The parameter αis called the shape parameter. Wavelet coefficients for the HL,LH, and HH subbands, whose distributions have high-kurtosisand heavy-tailed symmetric densities, are well modeled by theLaplacian distribution with µ = 0 and α = 1, as shown in Fig.1 (a) for a variance of σ2 = 50.

JPEG2000 quantizes these wavelet coefficients using thefollowing dead-zone quantizer:

q = Q(y) = sign(y) ·⌊|y|∆

⌋. (2)

Here, q is the quantization index which is subsequently en-coded using embedded bit-plane coding. The dequantizationprocedure in the decoder is expressed by

y = Q−1(q) =

{0 q = 0

sign(q)(|q|+ δ)∆ q 6= 0(3)

where δ = 1/2 corresponds to mid-point reconstruction [26].The resulting quantization distortions in the HL, LH, and

HH subbands are not uniformly distributed over the interval(−∆/2,∆/2) as is commonly assumed. A more appropriatemodel for quantization distortions produced by the dead-zone

Page 3: IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding … · IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding for JPEG2000 Han Oh, Ali Bilgin, Senior

IEEE TRANSACTIONS ON IMAGE PROCESSING 3

quantizer and mid-point reconstruction is given by the PDF

f(d) =

1√2σe−

√2|d|σ + 1−p1

∆ 0 ≤ |d| ≤ ∆2

1√2σe−

√2|d|σ

∆2 < |d| ≤ ∆

0 otherwise

(4)

where p1 =∫∆

−∆1/√

2σe−

√2|y|σ dy = 1−e−

√2∆/σ . The second

term of the first line in (4) follows from assuming that thequantization distortion is uniform only for wavelet coefficientswhose magnitudes are larger than ∆ (i.e., coefficients not inthe dead-zone). The first term (in the first two lines of (4))follows from the observation that the quantization errors ofthe remaining wavelet coefficients (i.e., those coefficients inthe dead-zone interval (−∆,∆)) are equal to the coefficientsthemselves, since the dead-zone quantizer maps these coeffi-cients to zero. Fig. 1 (b) shows the model of (4) correspondingto ∆ = 5 for the coefficient distribution shown in Fig. 1 (a).

Wavelet coefficients in the LL subband are often modeledby the Gaussian distribution with µ = 0 and α = 2 in(1). Assuming the standard deviation of the LL subband islarge compared to the quantization step size results in thequantization distortion of the LL subband being modeled by

f(d) =

1√12σ

+ 1−p2

∆ 0 ≤ |d| ≤ ∆2

1√12σ

∆2 < |d| ≤ ∆

0 otherwise

(5)

where p2 = ∆/√

3σ. This model is shown in Fig. 1 (c) for∆ = 5 and σ2 = 2000.

For comparison, Fig. 1 (b) and (c) also show the commonlyused uniform distortion model. It is worth noting that theuniform model (incorrectly) implies that quantization errorsare bounded by ±∆/2, while the model of (4) correctlyindicates that errors can be as large as ±∆. It is also importantto note that the uniform model depends only on the parameter∆, while the model of (4) depends on ∆ as well as thecoefficient variance σ2.

III. VISIBILITY THRESHOLDSFOR QUANTIZATION DISTORTIONS

For irreversible compression in Part I of the JPEG2000 stan-dard, a color image having RGB components is first convertedinto an image with one luminance component (xY ) and twochrominance components (xCb and xCr) by the irreversiblecolor transform (ICT) [7]. Each component is then trans-formed using the dyadic Cohen-Daubechies-Feauveau (CDF)9/7 DWT [27] and quantized independently. In what follows,a normalized CDF 9/7 DWT designed for efficient implemen-tation of JPEG2000 is assumed. This transform preserves thenominal range of input values since the nominal gains of theanalysis kernels are one (i.e., hdcL =

∑4n=−4 hL[n] = 1 and

hnyqH =∑3n=−3(−1)nhH [n] = 1). The normalization used in

[15] produces wavelet coefficients 2k times larger for level k.A K level dyadic wavelet decomposition has 3K + 1

subbands. Since K = 5 is usually sufficient to obtain nearoptimal compression performance [7], VTs are estimated for

(a) (b)

Fig. 2. (a) Example quantization distortion in (HH,2) and (b) the corre-sponding representation in the image domain. The reader is invited to zoomin if the stimulus is not visible in (b).

16 subbands in each of the three color components for thiswork. The center spatial frequency f of level k is defined as

fk = r · 2−k = dv tan( π

180

)· 2−k (6)

where r is the visual resolution in pixels/degree, d is thedisplay resolution in pixels/cm, and v is the viewing distancein cm [15].

A stimulus is an RGB image obtained by applying theinverse wavelet transform and the inverse ICT to waveletdata containing quantization distortions. In this work, stimulusgeneration begins with wavelet subband data corresponding toa 512×512 color image. These initial data have all coefficientsof all subbands (luminance and chrominance) set to 0. Onesubband of one component is then selected. Quantizationdistortion is randomly generated for this subband of interestaccording to either (4) or (5) depending on the subband. Thisdistortion is added to a region of the subband. An example isshown in Fig. 2 (a) for a subband in the luminance component.In this figure, 0 is represented by mid-gray, while positiveand negative values are lighter and darker, respectively. Theinverse DWT of each component is followed by the inverseICT. The distortion depicted in Fig. 2 (a) results in theexample stimulus of Fig. 2 (b). In the wavelet domain, thesize of the region to which distortion is added is N ×N withN = min{64, 512× 2−k} which corresponds to a JPEG2000codeblock of nominal size 64× 64. This covers a supportingregion, in the image domain, of nominal size M ×M withM = N · 2k.

To measure the visibility threshold for a given subband withassumed subband variance σ2

b, a two-alternative forced-choice(2AFC) method is used. In this method, an image that containsa stimulus and an image that does not contain a stimulus aredisplayed sequentially (in random order), and a human subjectis asked to decide which image contains the stimulus. Thedisplay time for each image is 2 seconds with an interval of 2seconds between subsequent images. The subject is then givenan unlimited amount of time to select which image containsthe stimulus. The experiment is iterated while varying ∆ inorder to find the largest value of ∆ for which the stimulusremains invisible. Specifically, 32 iterations of the QUESTstaircase procedure in the Psychophysics Toolbox [28] are used

Page 4: IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding … · IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding for JPEG2000 Han Oh, Ali Bilgin, Senior

IEEE TRANSACTIONS ON IMAGE PROCESSING 4

1 2 3 4 50

0.5

1

1.5

2

2.5

1901FP A1901FP BU2410 A

1 2 3 4 50

1

2

3

4

5

6

1901FP A1901FP BU2410 A

1 2 3 4 50

0.5

1

1.5

2

2.5

1901FP A1901FP BU2410 AV

isibility threshold

Wavelet Transform Level(a) HH (b) HL (c) LH

Wavelet Transform Level Wavelet Transform Level

Fig. 3. Measured visibility thresholds as a function of transform level (σ2b = 50). Threshold values in the HH subband are larger than those in the HL

and LH subbands at the same level. Thresholds in HL and LH are very similar. The measured values are connected by straight lines solely to aid in viewingtrends.

to determine the value of ∆ corresponding to the 82% correct-point of a fitted Weibull function [15]. The obtained value of∆ is then the VT of the subband for the assumed coefficientvariance σ2

b. Different values of σ2b generally result in different

VTs as discussed in the next subsection.The experimental environment was arranged similarly to a

typical office environment. Stimuli were displayed on a LCDmonitor in ambient light. In this threshold experiment, twoLCD monitors, a Dell 1901FP and a Dell U2410, were used.The Dell 1901FP 19-in LCD monitor has a dot pitch of 0.294mm, a resolution of 1280× 1024, an image brightness of 250cd/m2, and a contrast ratio of 800:1. The Dell U2410 24-inLCD monitor has an In-Plane Switching (IPS) panel, a dotpitch of 0.27 mm, a resolution of 1920 × 1200, an imagebrightness of 400 cd/m2, and a contrast ratio of 1000:1. Eachmonitor was connected to the PC through a Digital VisualInterface (DVI) cable. The reasons for using LCDs, unlike pre-vious studies, are 1) the MTF (Modulation Transfer Function)of the rendering device is an important factor in determiningthresholds, and LCDs have recently become the most commondisplay device; and 2) a previous study revealed that stimuliare more detectable in LCDs than in CRTs [29]. The viewingdistance was 60 cm (23.6 inches) with a resulting visualresolution of 35.62 pixels/degree and 38.72 pixels/degree forthe Dell 1901FP and Dell U2410, respectively. This viewingdistance was selected because it has been commonly assumedfor typical viewing conditions in previous studies [15], [16],[30]. Based on this selection, the goal of this study is toachieve visually lossless performance under these conditions.However, the proposed methodology is general and can beadapted to yield visually lossless compression under otherdesired conditions.

A. Visibility Thresholds for the Luminance Component

Fig. 3 shows the visibility thresholds (i.e., the maximumquantization step sizes at which quantization distortions re-main invisible) obtained for 5 levels of HL, LH, and HHsubbands. The thresholds of Fig. 3 correspond to the casewhen σ2

b is fixed at 50. The x-axis and y-axis indicate thetransform level and the obtained threshold, respectively. Thepoints shown in Fig. 3 are the values determined by the

QUEST procedure for individual subbands ((HH,1), (HH,2),and so on). Vertical bars indicate ± 1 standard deviation.Two subjects, denoted by A and B, conducted the experimentsusing the Dell 1901FP monitor. Subject A also conducted theexperiments using the Dell U2410 monitor. Both subjects arefamiliar with wavelet quantization distortion and have normalvisual acuity. The results in Fig. 3 are labeled by the monitorand observer to which they correspond (e.g., U2410 A).

As can be seen in Fig. 3, all three monitor/observer com-binations produced similar results. In most cases, intersectingthe error intervals for the three measurements yields a non-empty set. To ensure a conservative design, the smallest ofthe three measured values is used as the visibility thresholdfor each subband in all subsequent discussions. As expected,threshold values in the HH subband are larger than those inthe HL and LH subbands at the same level, which is knownas the oblique effect [31]. Also, changes in threshold valuesas a function of spatial frequency are more pronounced in theHH subband than in other subbands. Thresholds in the HLand LH subbands are very similar. Hereafter, the HL and LHsubbands are regarded as equivalent and one set of visibilitythresholds is reported and used for both.

Changing the assumed variance of the wavelet coefficientscan have a dramatic effect on the visibility thresholds. Forexample, Fig. 4 shows the measured thresholds, as a functionof σ2

b for the (HL/LH,3) subband. The four data points denotedby asterisks represent thresholds measured for four differentvalues of σ2

b. The error bars represent ± 1 standard deviation.From Fig. 4, it can be seen that the value of the thresholdincreases as the coefficient variance increases. This trend canbe explained by a simple analysis of the quantization error.

Integrating the second line of (4), the probability of “large”distortions, which lie in (−∆,−∆/2)∪(∆/2,∆), is e−

√2∆

2σb −e−

√2∆σb for the HL, LH, and HH subbands. These large dis-

tortions are caused by the dead-zone quantizer and tend to bemore visible than the smaller distortions in [−∆/2,∆/2]. For afixed value of ∆, the probability of large distortions decreasesas the variance σ2

b increases, provided that the step size ∆is sufficiently small (i.e., ∆ <

√2σb ln 2). Furthermore, the

variance of the quantization distortion generated according to

Page 5: IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding … · IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding for JPEG2000 Han Oh, Ali Bilgin, Senior

IEEE TRANSACTIONS ON IMAGE PROCESSING 5

0 50 100 150 2000.4

0.5

0.6

0.7

0.8

visi

bilit

y th

resh

old

variance ( )2sb

Fig. 4. Threshold values and linear model for (HL/LH,3) as a function ofcoefficient variance σ2

b.

10 50 100 150 200

0.030

0.035

0.040

variance ( )2sb

yP

Fig. 5. Average power of the quantization distortion as a function ofcoefficient variance σ2

b for ∆ = 0.55.

(4) is Py = σ2b − e−

√2∆/σb

(1112∆2 + σ2

b +√

2σb∆). This

in turn yields a signal of variance Gb · Py in the stimulusimage, where Gb is the synthesis gain of the 9/7 DWT forsubband b [7]. Fig. 5 shows Py as a function of the coefficientvariance σ2

b for a fixed quantization step size. The variance ofthe quantization distortion decreases as the coefficient varianceincreases, except for very small variances. This implies thatlower variance distortion generated by a larger coefficientvariance may elevate the threshold value.

The form of Py and the measured thresholds of Fig. 4suggest a complicated relationship between thresholds and σ2

b.However, as shown in Fig. 4, a simple linear function workswell enough when error bars are considered. In this manner,the threshold tb for a given subband b is determined by

tb = ab · σb2 + rb (7)

where σb2 is the variance of coefficients in subband b. The

linear parameters ab and rb are summarized in Table I,and are obtained by least squares fitting of the thresholdvalues measured for different assumed values of σb

2 for eachsubband.

TABLE ILINEAR PARAMETERS ab AND rb

subband ab rb subband ab rb

HH,1 105.67 × 10−4 4.85 HH,4 10.16 × 10−4 0.47HL/LH,1 46.03 × 10−4 1.98 HL/LH,4 7.75 × 10−4 0.36

HH,2 19.94 × 10−4 0.92 HH,5 7.91 × 10−4 0.36HL/LH,2 13.84 × 10−4 0.64 HL/LH,5 7.16 × 10−4 0.33

HH,3 11.04 × 10−4 0.51HL/LH,3 10.83 × 10−4 0.50

TABLE IIAVERAGE VARIANCES OF WAVELET COEFFICIENTS FOR CHROMINANCE

COMPONENTS

subband Cb Cr subband Cb Cr

HH,1 0.18 0.17 HH,4 1.43 1.37HL/LH,1 1.33 1.43 HL/LH,4 5.34 7.85

HH,2 0.72 0.74 HH,5 1.52 1.45HL/LH,2 3.06 3.52 HL/LH,5 5.34 8.93

HH,3 1.16 1.20 LL, 5 150.08 109.51HL/LH,3 4.26 5.14

The variance of the (LL,5) subband is usually much largerthan that of the other subbands, and the distortion is notsignificantly affected by variance changes. This results in afixed threshold of tLL,5 = 0.63 for the LL band.

B. Visibility Thresholds for Chrominance Components

It is well known that the human visual system is lesssensitive to quantization distortion in the chrominance com-ponents than in the luminance component. Also, the varianceof the chrominance components is much smaller than that ofthe luminance component. The average variances calculatedfrom 10 natural images (not used in subsequent calibration orevaluation) are listed in Table II. These images are availablefor viewing at [32]. Table III contains measured thresholdvalues for the chrominance components, determined in thesame fashion as for the luminance component using the valuesfrom Table II for σb

2 when generating the stimulus images1.Experimental results indicate that the chrominance thresholdsare insensitive to variance changes. Thus, the thresholds forthe chrominance components are fixed to the values given inTable III. From this table, it can be seen that threshold valuesfor the chrominance components are larger than those for theluminance component, and that the Cb component is the leastsensitive component.

IV. VISIBILITY THRESHOLD ADJUSTMENT BASED ONVISUAL MASKING EFFECTS

In actual image coding, unlike the psychophysical exper-iments described above, all subbands are quantized simul-taneously and quantization distortion is superimposed on a“background” image. Threshold changes for the compound

1In the case of (HH,1), (HL/LH,1) and (HH,2), the measured thresholdswere infinite. Adopting this result would effectively result in chrominancesubsampling. Instead, large but finite values were chosen in Table III to pro-vide a more conservative approach to these subbands, resulting in negligiblefile size increases.

Page 6: IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding … · IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding for JPEG2000 Han Oh, Ali Bilgin, Senior

IEEE TRANSACTIONS ON IMAGE PROCESSING 6

TABLE IIIVISIBILITY THRESHOLDS FOR CHROMINANCE COMPONENTS

subband Cb Cr subband Cb Cr

HH,1 24.72 15.49 HH,4 4.95 1.25

HL/LH,1 14.50 6.35 HL/LH,4 3.26 0.69

HH,2 14.77 7.40 HH,5 1.05 0.56

HL/LH,2 6.36 2.60 HL/LH,5 1.05 0.58

HH,3 11.55 2.57 LL, 5 1.32 0.66

HL/LH,3 4.03 1.23

distortions caused by quantizing subbands simultaneously havebeen studied in [18]. Visibility of quantization distortionsfrom similar subbands is known to linearly increase when thedistortion is highly visible. However, such compound distor-tions are negligible at the thresholds chosen for this work,and therefore threshold adjustments for compound distortionare not needed. This result is consistent with [15], whichprovides individual quantization step sizes for visually losslesscompression without considering other subbands.

On the other hand, VTs can vary significantly with imagebackground. In this context, the image background is calledthe masker, while the distortion is referred to as the target.The change of threshold values according to the contrast ofthe image background is represented by the target thresholdversus masker contrast (TvC) function [33]. As the contrastof the masker increases, the threshold decreases slightly andthen begins to increase. The target becoming less visibledue to a high-contrast masker is called masking, while theopposite effect, induced by a low-contrast background is calledfacilitation. The effect of facilitation is generally insignificant,and only the masking effect is considered in this work. Inwhat follows, visibility thresholds are modified to exploit themasking effect. In particular, thresholds are increased withinspatial regions depending on the relevant image background.These threshold increases decrease file size without introduc-ing visible distortion increases.

In coding of color images, the chrominance componentstypically occupy a much smaller portion than the luminancecomponent in the overall file. Thus, for simplicity, thresholdadjustments are made only for the luminance component inthis work.

The contrast of a luminance masker within a spatial regionis determined by the luminance wavelet coefficients in thecorresponding spatial region of all subbands. However, sincemasking occurs most strongly when the target and masker havethe same frequency and orientation, this work considers onlywavelet coefficients in the subband to be encoded, and thevisually lossless masked threshold tb is determined as

tb = tb ·mb (8)

where tb is the base threshold value calculated from (7) andmb is a masking factor calculated from the magnitudes ofwavelet coefficients in subband b. This intra-band maskingmodel may provide slightly lower accuracy than models thatconsider both intra- and inter-band masking effects, but has theadvantage of simplicity, as well as enabling parallel processingbecause subbands can be encoded independently.

0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

magnitude of wavelet coefficient ( log |y[n]| )

self-

cont

rast

mas

king

fact

or (l

og s

[n] )

wavelet coefficient magnitude (log | n |)

self-

cont

rast

mas

king

fact

or (

log

b

[

n

])

Fig. 6. Log-log plot of self-masking factor vs. wavelet coefficient magnitudefor σ2

b = 50, w1 = 1 and ρ1 = 0.4.

The masking factor mb is calculated using two visual mask-ing models, the self-contrast masking model and the texturemasking model. The self-contrast masking model approxi-mates the change of threshold in the TvC function according tothe magnitude of wavelet coefficients. The corresponding self-contrast masking factor, sb[n], at two-dimensional location nin subband b is defined by

sb[n] = max

{1, w1

(|y[n]|y+b + ε

)ρ1}

(9)

where y[n] is the wavelet coefficient at location n and y+b is

the conditional expectation of y given y ≥ 0, calculated from(4) as E[y|y ≥ 0] =

√2σb/2. The small constant ε > 0 is

included for stability of the equation. The parameter ρ1 reflectsnonlinearity of self-contrast masking and has a value between0 and 1. The parameter w1 adjusts the degree of self-contrastmasking and together with ρ1 prevents over-masking.

The self-contrast masking factor, as shown in Fig. 6, takesa value of 1 (no masking) for small values of |y[n]|. On a log-log scale, it increases linearly with slope ρ1 when log |y[n]|exceeds log y+

b by more than logw−1/ρ1

1 . This self-contrastmasking model can have problems at edges, because edgesgenerate large coefficients, but are perceptually more sensitive[14]. Therefore, care is needed in choosing the parameters w1

and ρ1 to ensure visually lossless quality in all regions.In addition to self-contrast, texture activity can significantly

affect distortion visibility. Specifically, VTs increase as thetexture beneath the distortion becomes more difficult to pre-dict. In this work, the texture masking factor, τb[n], for thetexture activity of small local texture-block j is given by

τb[n] = max{

1, w2

(σ2j

)ρ2}

(10)

where σ2j is the variance of reconstructed wavelet coefficients

in texture-block j. The parameters w2 and ρ2 play similar rolesto those of w1 and ρ1 in (9).

The texture masking model is similar to the neighbormasking models in [16] and [34], but computes the maskingfactor only once per texture-block for computational efficiency.Every location in the texture-block is then assigned the same

Page 7: IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding … · IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding for JPEG2000 Han Oh, Ali Bilgin, Senior

IEEE TRANSACTIONS ON IMAGE PROCESSING 7

(a) (b)

Fig. 7. (a) 512 × 512 reference image horse and (b) the product of its twomasking factors, sb[n] · τb[n], in the wavelet domain for W = 32. Brighterintensities represent higher masking effects.

value. A texture-block of size W ×W in the spatial domainimplies a texture-block of size N ×N in the wavelet domain,with N = W ·2−k for DWT level k. Since N is usually smallerthan the size of a JPEG2000 codeblock, each codeblockcan contain several texture-blocks. Note that although thetexture masking factor is constant over a texture-block, it isnot generally constant over a codeblock. Texture masking isapplied only to the high-frequency subbands (k ≤ 3) becauselower frequency subbands have insufficiently large texture-blocks to calculate a variance. Because textures can be affectedby quantization, reconstructed wavelet coefficients are usedwhen calculating the variance σ2

j [35].

Fig. 7 depicts the image horse and the product of thetwo masking factors, sb[n] · τb[n], in the wavelet domain.Brighter intensities represent higher masking factors. Theself-masking sb[n] is strong along prominent edges, andthe texture-masking τb[n] is more pronounced in complextextures. The value of sb[n] · τb[n] is 1.0 (i.e., no masking) inflat background areas, such as the sky and horse body. Also,it can be seen that masking values in a subband are influencedby the orientation of maskers.

In JPEG2000, thresholds may be applied at the level ofeach codeblock via codestream truncation, without requiringmodification of the decoder. To this end, the masking factorsfor all wavelet coefficients in a codeblock are combined toyield a single masking factor via the Minkowski mean,

mb =

(1

‖B‖∑n∈B

(sb[n] · τb[n])β

) 1β

(11)

where ‖B‖ is the number of coefficients in codeblock B. Theparameter β lies between 0 and 1 and controls the degree ofoverall masking. If β = 1, (11) computes the average value ofsb[n] · τb[n] without any weighting. As β is decreased, lessweight is placed on areas with large masking factors. In thisway, the overall masking factor for the block becomes moreconservative.

V. VISUALLY LOSSLESS CODING AND EXPERIMENTALRESULTS

A. Visually Lossless Coding

In JPEG2000, wavelet coefficients in subband b are initiallyquantized with a quantization step size ∆b. The subband ispartitioned into codeblocks, and the quantization indices ofeach codeblock are bit-plane coded. Each bit-plane is codedin three coding passes, except the most significant bit-plane(MSB), which is coded in one coding pass. Thus, a codeblockwith M bit-planes has 3M − 2 coding passes. One or moreleast significant coding passes may optionally be omitted fromthe final codestream to increase the effective quantization stepsize within a codeblock. Omitting the p least significant bitsof the quantization index of a coefficient results in an effectivequantization step size of 2p∆b for that coefficient.

To ensure that the effective quantization step sizes are lessthan the thresholds derived previously, the following procedureis followed for all luminance subbands except (LL,5). First,the variance σ2

b,i for codeblock i in subband b is calculated.Then a base threshold tb,i for that codeblock is determinedusing (7). The self-masking factor sb,i[n] is calculated foreach wavelet coefficient in the codeblock according to (9).During the bit-plane coding, the texture-masking factor τb,i[n]is calculated for each coefficient in the codeblock accordingto (10), followed by mb,i (11) and tb,i (8). The maximumabsolute error for the codeblock is calculated at the end ofeach coding pass z as

D(z) = maxn∈B

(|y[n]− y(z)[n]|

)(12)

where y(z)[n] denotes the reconstructed value of y[n] usingthe quantization index q(z)[n] , which has been encoded onlyup to coding pass z. Coding is terminated when D(z) fallsbelow the masked threshold tb,i.

In principle, τb,i[n] must be recalculated at the end of eachcoding pass, since it is computed using reconstructed coeffi-cients. In practice, to reduce the computation load, the texturemasking factors are not evaluated (and are set to 1.0) untilD(z) falls below a prescribed maximum masked threshold,tmaxb = 6rb. In addition to complexity reduction, tmax

b setsa conservative upper bound on the effective quantization stepsize that can be applied to any given codeblock, possibly atthe expense of some small increase in file size.

Pseudo-code for the proposed algorithm is given in Fig.8. It is worth emphasizing that the base VTs computed inthe second line of Fig. 8 are adaptive codeblock-by-codeblockand image-by-image. This is in contrast to [16] where a fixedset of base VTs (one per subband) is employed, independentof the image to be encoded. The adaptivity of the base VTsproposed here stems from the dependence of the quantizationdistortion model on coefficient variance.

For (LL,5) of the luminance component, the visibilitythreshold tLL,5 = 0.63 is used directly as the quantization stepsize ∆LL,5, and all bit-planes are included in the codestream.The same procedure is applied for all chrominance subbands.That is, the thresholds of Table III are used as quantizationstep sizes, and all bitplanes are included in the codestream.

Page 8: IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding … · IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding for JPEG2000 Han Oh, Ali Bilgin, Senior

IEEE TRANSACTIONS ON IMAGE PROCESSING 8IEEE TRANSACTIONS ON IMAGE PROCESSING 11

TABLE IXT-TEST RESULTS

Test Value=0.3333

t-score Degree offreedom

Significance(2-tailed)

Meandifference

95% confidenceinterval of the

difference

Lower Upper

Monochrome 1.259 10 0.237 0.0178 -0.0137 0.0493

Color 1.292 10 0.225 0.0133 -0.0096 0.0361

frequency of 0.348, with a standard deviation of 0.034. In aone-sample t-test that measures whether the mean frequencyfor images encoded using the proposed method significantlydiffers from the hypothesized frequency of 1/3, as shown inTable IX, the hypothesis that the responses were randomlychosen at the 5% significance level could not be rejected forthe obtained data. Based on these results, we claim that theproposed coding method is visually lossless with the chosenimage set.

Figure 10 (b) shows a color image encoded by the proposedmethod. The image exhibits a PSNR of 36.06 dB for theluminance component, but the errors are not detected whenthe images are displayed using a 1:1 scale. Because PDF filesmay inappropriately scale images according to window sizeor user settings and may introduce new distortion when theyare generated, more examples of encoded images are providedonline at [39] for proper comparison.

VI. CONCLUSIONS

The HVS has varying sensitivity to different color compo-nents, spatial frequencies, orientations, and underlying back-ground images. Using this fact, a visually lossless codingalgorithm has been presented for 8-bit monochrome and 24-bit color images. Visibility thresholds were measured forstatistically modeled quantization distortion, and have differentvalues depending on the local variances of the subband. Sincethis quantization distortion appears on various backgroundimages, unlike the uniform background of the experiment, thethreshold values were adjusted using the self-masking modeland texture masking model to cope with spatially changingvisual masking effects. The resulting thresholds were used todetermine the quantization level for visually lossless coding.The proposed JPEG2000 Part I compliant coding schemesuccessfully yielded compressed images, whose quality is in-distinguishable to those of the original images, at significantlylower bitrates than those of numerically lossless coding andother visually lossless algorithms in the literature.

To improve the performance of the proposed algorithm,when modeling the distribution of wavelet coefficients, a moresophisticated model such as Gaussian Scale Mixture (GSM)[40], which takes into account spatial correlation betweenwavelet coefficients, may be used. However, a generalizedGaussian model that only uses variance as a parameter wasused to avoid the intricacy of a multi-variable experiment.Also, to preserve independence between codeblocks, lightadaptation using the LL subband [13] and inter-band maskingeffects were excluded. Nevertheless, the proposed coding

compute the variance σ2b,i for codeblock i in subband b

determine the base threshold tb,i from (10)compute the self-masking factors sb,i[n]

for coding pass z = 0 to 3(M − 1)

{perform bit-plane coding for coding pass zcompute the maximum distortion D(z)

if D(z) < tmaxb , then

{compute the texture masking factors τb,i[n]

determine the masked threshold tb,i from (11)if D(z) ≤ tb,i, then

coding is terminated}

}

scheme well predicted the minimum bitrate required for vi-sually lossless coding.

REFERENCES

[1] N. V. Graham, Visual Pattern Analyzers. New York: Oxford UniversityPress, 1989.

[2] S. Daly, “Application of a noise-adaptive contrast sensitivity functionto image data compression,” Optical Engineering, vol. 29, no. 8, pp.977–987, 1990.

[3] C. Blakemore and F. Campbell, “On the existence of neurones in thehuman visual system selectively sensitive to the orientation and sizeof retinal images,” Journal of Physiology, vol. 203, pp. 237–260, July1969.

[4] F. W. Campbell and J. G. Robson, “Application of Fourier analysis tothe visibility of gratings,” Journal of Physiology, vol. 197, pp. 551–566,1968.

[5] A. B. Watson, “The cortex transform: rapid computation of simulatedneural images,” Computer Vision, Graphics, and Image Processing,vol. 39, no. 3, pp. 311–327, 1987.

[6] D. Wu, D. M. Tan, M. Baird, J. DeCampo, C. White, and H. R. Wu,“Perceptually lossless medical image coding,” IEEE Transactions onImage Processing, vol. 25, no. 3, pp. 335–344, 2006.

[7] D. S. Taubman and M. W. Marcellin, JPEG2000: image compressionfundamentals, standards, and practice. Boston: Kluwer AcademicPublishers, 2002.

[8] M. Bolin and G. Meyer, “A perceptually based adaptive samplingalgorithm,” in SIGGRAPH 99 Conference Proceedings, 1998, pp. 299–309.

[9] Y. Lai and C. Kuo, “Image quality measurement using the Haar wavelet,”in Proceedings of the SPIE, vol. 3169, 1997, pp. 127–138.

[10] A. P. Bradley, “A wavelet visible difference predictor,” IEEE Transac-tions on Image Processing, vol. 8, no. 5, pp. 717–730, 1999.

[11] M. Masry, S. S. Hemami, and Y. Sermadevi, “A scalable wavelet-basedvideo distortion metric and applications,” IEEE Transaction on Circuitsand Systems for Video Technology, vol. 16, no. 2, pp. 260–273, 2006.

[12] A. B. Watson, G. Y. Yang, and J. A. Solomon, “Visibility of waveletquantization noise,” IEEE Transactions on Image Processing, vol. 6,no. 8, pp. 1164–1175, 1997.

[13] Z. Liu, L. J. Karam, and A. B. Watson, “JPEG2000 encoding withperceptual distortion control,” IEEE Transactions on Image Processing,vol. 15, pp. 1763–1778, 2006.

[14] M. G. Ramos and S. S. Hemami, “Suprathreshold wavelet coefficientquantization in complex stimuli: psychophysical evaluation and analy-sis,” Journal of the Optical Society of America A, vol. 18, pp. 2385–2397,2001.

[15] D. M. Chandler and S. S. Hemami, “Effects of natural images on thedetectability of simple and compound wavelet subband quantizationdistortions,” Journal of the Optical Society of America A, vol. 20, no. 7,pp. 1164–1180, 2003.

[16] ——, “Dynamic contrast-based quantization for lossy wavelet imagecompression,” IEEE Transactions on Image Processing, vol. 14, no. 4,pp. 397–410, 2005.

Fig. 8. Pseudo-code of the proposed encoder for codeblock i of subband bfor each luminance subband except (LL,5).

The proposed algorithm does not require post-compressionrate-distortion (PCRD) optimization [7] to select which codingpasses are included in the final codestream. The algorithmautomatically computes only compressed coding passes thatwill be included in the final codestream. Thus, as in [16], the“overcoding” that is typical of many JPEG2000 implementa-tions is avoided, which can significantly reduce computationalcomplexity. It is important to note that all codestreams pro-duced by the proposed algorithm are compatible with Part-I ofthe JPEG2000 standard, and can be decoded by any JPEG2000decoder.

As mentioned previously, prior to bit-plane coding, allwavelet coefficients in luminance subband b are quantizedusing a step size of ∆b. An ideal value of ∆b would be thefinal masked threshold. However, this value is unknown beforeencoding and varies from codeblock to codeblock. A simplestrategy would be to select ∆b as the minimum possible valueof the final masked threshold, i.e., rb in Table I. In this work, aslightly larger value of ∆b = 1.03rb was chosen. This choiceyields slightly better compression performance with no impacton visual quality. Indeed, this value is well within the errorbars depicted in Fig. 4 for all subbands.

Calibration of the masking models was performed using sev-eral 8-bit monochrome images. The purpose of this calibrationwas to find a set of model parameters which yields the largestmasking factors (i.e., the smallest file sizes), while maintainingvisually lossless quality via compression as described above,followed by decompression using an unmodified JPEG2000Part I decoder. Table IV shows the parameter values obtainedby visual inspection using numerous combinations of values.As verified by validation experiments described below, theseparameter values provide visually lossless quality for a widerange of natural images. Calibration by image type is possible,and different parameter values for different image types mayprovide better compression performance, but at the expense ofthe compressor becoming image type specific.

TABLE IVMASKING MODEL PARAMETERS

w1 ρ1 w2 ρ2 β

0.8 0.35 0.35 0.4 0.8

TABLE VBITRATES AND PSNRS FOR THE PROPOSED VISUALLY LOSSLESSJPEG2000 ENCODER FOR 8-BIT MONOCHROME IMAGES (IMAGES

DENOTED BY * WERE USED DURING MASKING MODEL CALIBRATION)

Image Dimension(W×H)

Lossless(bpp)

Proposed(bpp) Ratio Y PSNR

(dB)

1 airplane* 512 × 512 3.99 1.37 2.91 42.812 baboon* 512 × 512 6.11 2.70 2.26 36.643 balloon 512 × 512 4.32 1.62 2.67 40.764 barbara* 512 × 512 4.78 1.73 2.76 39.675 barnlake 512 × 512 5.59 2.68 2.09 38.756 bike 2048×2560 4.52 1.62 2.79 40.447 blastoff 512 × 512 5.39 2.29 2.35 38.388 boats 720 × 576 4.06 1.27 3.20 42.529 cameramn 256 × 256 4.99 1.92 2.60 39.9210 goldhill 720 × 576 4.61 1.79 2.58 40.9911 hatgirl 512 × 512 5.27 2.34 2.25 39.0812 horse 512 × 512 5.25 2.26 2.32 39.3313 jill* 512 × 512 3.14 0.91 3.45 44.6014 lena* 512 × 512 4.30 1.43 3.01 41.5715 man 1024×1024 4.82 1.88 2.56 40.5216 monarch 768 × 512 3.81 1.08 3.53 42.7817 onthepad 512 × 512 6.50 2.97 2.19 35.4118 peppers 512 × 512 4.62 1.61 2.87 39.7019 thecook 512 × 512 5.48 2.57 2.13 38.8320 woman 2048×2560 4.51 1.61 2.80 40.57

Average - 4.80 1.91 2.51 40.23

B. Coding Results

The proposed encoder was implemented in Kakadu V6.1[13]. All reported results were generated with this encoderand the unmodified decoder, kdu expand. Tables V and VI,respectively, show the bitrates obtained for encoding 8-bitmonochrome and 24-bit color images using the proposedJPEG2000 coding scheme. For comparison, bitrates obtainedfor numerically lossless JPEG2000 compression are also in-cluded. Images used to generate these results were obtainedfrom the USC database [36], LIVE database [37], KodakPhotoCD [38], and ISO JPEG2000 test suite. Images markedwith an asterisk in Table V were used during masking modelcalibration.

For monochrome images, the numerically lossless codingmethod of JPEG2000 yields an average bitrate of 4.80 bits-per-pixel (bpp), while the proposed visually lossless codingmethod achieves an average bitrate of 1.91 bpp, an improve-ment in compression ratio of 2.5 to 1, without perceivablequality degradation. In the case of color images, the numeri-cally lossless coding method and the proposed visually losslesscoding method achieve, respectively, 10.50 bpp and 1.91 bppon average, for an improvement in compression ratio of 5.5to 1.

As can be seen in the tables, different images encoded in theproposed visually lossless manner have significantly differentbitrates. Also, peak-signal-to-noise ratios for the luminancecomponent (Y PSNRs) vary widely depending on the image.

Page 9: IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding … · IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding for JPEG2000 Han Oh, Ali Bilgin, Senior

IEEE TRANSACTIONS ON IMAGE PROCESSING 9

TABLE VIBITRATES AND PSNRS FOR THE PROPOSED VISUALLY LOSSLESS

JPEG2000 ENCODER FOR 24-BIT COLOR IMAGES

Image Dimension(W×H)

Lossless(bpp)

Proposed(bpp) Ratio Y PSNR

(dB)

1 bike 2048×2560 11.96 1.72 6.95 40.442 bikes 768 × 512 10.81 2.40 4.51 39.533 building2 640 × 512 13.90 2.79 4.99 36.064 buildings 768 × 512 11.13 2.30 4.84 37.965 caps 768 × 512 8.09 1.10 7.35 43.526 carnival 610 × 488 9.67 1.75 5.53 41.407 cemetery 627 × 482 12.69 2.37 5.35 38.888 church 634 × 505 10.98 1.85 5.93 39.969 coins 640 × 512 11.88 2.35 5.05 40.39

10 dancers 618 × 453 11.98 2.28 5.26 38.8211 flowers 640 × 512 12.36 2.36 5.23 36.4712 house 768 × 512 10.08 1.99 5.06 41.2213 lighthse 480 × 720 10.19 1.73 5.88 40.1114 lighthse2 768 × 512 9.75 1.71 5.72 40.2615 manfish 634 × 438 10.06 1.79 5.61 40.0816 monarch 768 × 512 8.98 1.21 7.41 42.8317 ocean 768 × 512 8.77 1.50 5.83 41.7118 painted 768 × 512 10.16 2.14 4.74 40.4319 parrots 768 × 512 8.48 0.94 9.06 43.4620 plane 768 × 512 8.06 1.21 6.64 43.0921 rapids 768 × 512 10.16 2.35 4.33 40.5422 sailing1 768 × 512 9.59 1.91 5.02 39.9823 sailing2 480 × 720 9.39 1.22 7.67 41.8524 sailing3 480 × 720 9.57 1.39 6.87 41.9625 sailing4 768 × 512 9.29 1.91 4.86 41.2626 statue 480 × 720 9.55 1.52 6.27 41.7027 stream 768 × 512 11.86 2.84 4.17 36.3928 sculpture 632 × 505 13.31 2.83 4.71 37.8729 woman 2048×2560 11.50 1.67 6.90 40.5730 woman2 480 × 720 11.61 2.38 4.87 39.8831 womanhat 480 × 720 9.75 1.67 5.82 41.76

Average - 10.50 1.91 5.50 40.34

In particular, the resulting bitrates range from 0.91 bpp to 2.97bpp for monochrome images, and from 0.94 bpp to 2.84 bppfor color images. The monochrome images have a minimumPSNR of 35.41 dB and a maximum PSNR of 44.60 dB. Theluminance PSNR values for color images are similar. Theseresults demonstrate that bitrate and PSNR are not effectivecriteria for determining visual lossless quality.

Table VII compares the proposed coding method with thevisually lossless method proposed in [30]. This table reportsbitrates obtained by each method for encoding five 512× 5128-bit digitized radiographs used in [30]. As described inthat work, the images were obtained by cropping 512 × 512diagnostically relevant regions of 2048 × 2056 radiographsacquired using a Lumisys 75 laser film scanner (8-bits/pixelgrayscale, 146 dots/inch). The five images are images of thehip, chest, neck, leg, and abdomen. As can be seen from thetable, the method proposed here provides significantly lowerbitrates. These lower bitrates result in a lower average PSNR,but do not result in visual artifacts.

Comparison with other methods from the literature is moredifficult. As compared to [16], the bitrates obtained here formonochrome images are considerably higher. However, asdiscussed in the introduction, the encoder of [16] does notprovide truly visually lossless quality as evidenced by their use

TABLE VIIBITRATE AND PSNR COMPARISON WITH [30] FOR VISUALLY LOSSLESS

8-BIT 512 × 512 DIGITIZED RADIOGRAPHS

Image[30] Proposed

Ratio(bpp) PSNR(dB) (bpp) PSNR(dB)

158630 1.47 43.16 1.06 42.49 1.39157952 1.96 44.27 1.26 42.19 1.56158180 1.62 44.72 0.96 42.46 1.69158208 1.35 45.97 0.56 43.16 2.41157514 1.68 44.12 0.94 42.38 1.79

Average 1.62 44.45 0.96 42.53 1.69

of the term “almost transparent coding” rather than “visuallylossless coding,” and by their reported mean opinion scores.This is likely caused by the use of a uniform probability modelfor quantization distortion, as well as the use of a CRT for thepsychovisual experiments in [15], from where the VTs usedin [16] are drawn.

As also mentioned in the introduction, the Kakadu imple-mentation of JPEG2000 provides a CSF-based visual weight-ing option that can enhance the visual quality of com-pressed imagery. Command line options for the applicationkdu compress allow the user to select a target bitrate, oralternatively, a target distortion-rate slope. But, no guidance isgiven as to how a target bitrate or distortion-rate slope shouldbe chosen to yield visually lossless quality. If an “Oracle”provides the bitrates from the proposed method (from TablesV and VI) as target bitrates to kdu compress (with visualweighting), very high quality imagery results. In fact, thequality rivals that of the proposed method. However, it mustbe emphasized that kdu compress has no such Oracle, andthat a major contribution of the proposed method is that itautomatically adjusts the compression on an image-by-imagebasis to achieve low bitrates while preserving visually losslessquality.

It is worth noting that the base visibility thresholds canbe used alone (without masking) to achieve visually losslessencoding. Indeed, results for this case were reported in [22].Specifically, for a given codeblock, the coefficient variance iscomputed and used to compute the base threshold tb from(7). Bitplane coding is then carried out until the maximumabsolute coefficient error in the codeblock falls below tb.Visually lossless encoding is achieved, but with about 10%larger file sizes than in the masked case. The masked casecan be understood to be uniformly more aggressive than theunmasked case. This is seen by noting that the masking modelcan only increase the base threshold. That is, from (8) and (11),it follows that tb ≥ tb. Thus, for each codeblock, the numberof coding passes included in the masked case is no more thanin the unmasked case. In turn, the quantization error of eachcoefficient in the masked case is lower bounded by that in theunmasked case. Accordingly, only the masked version of theencoder is considered in the validation experiments below.

C. Validation Experiments

To verify that the images encoded with the proposed schemeare visually lossless, a three-alternative forced-choice (3AFC)

Page 10: IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding … · IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding for JPEG2000 Han Oh, Ali Bilgin, Senior

IEEE TRANSACTIONS ON IMAGE PROCESSING 10

(a)

(b)

Fig. 9. (a) original monochrome image horse and (b) visually lossless image encoded by the proposed method. Images are cropped to 512 × 350 afterdecompression to avoid rescaling during display. The images should be viewed with viewer scaling set to 100%. Image details can be found in Table V.

method was used. For each test image, two original copiesand one compressed copy were displayed side by side withthe position of the compressed copy chosen randomly. Thetest images were selected for presentation in a fixed order. Foreach test image, the subject was given an unlimited amount oftime to choose the copy that looked different from the othertwo copies. If the images are compressed in a visually losslessmanner, the rate at which the subject correctly chooses thecompressed copy should be 1/3.

3AFC testing has been used previously in [30]. Evidently,3AFC testing can provide a more rigorous validation than2AFC. This can be seen by considering the 2AFC procedure.

In 2AFC testing, an original copy is displayed together withonly one compressed copy. The subject is asked to choosethe copy which has been compressed. If the images arecompressed losslessly, the rate at which the subject chooses thecorrect copy should be 1/2. However, consider the case whenthe images are compressed with very high quality, but arenot visually lossless. In the case of 2AFC testing, the subjectwould be able to perceive that differences exist between thetwo copies, but might not be able to tell which copy has beencompressed. In this situation, the rate of correct response mightstill be 1/2, even though the images are not visually lossless.On the other hand, for 3AFC testing, the subject can identify

Page 11: IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding … · IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding for JPEG2000 Han Oh, Ali Bilgin, Senior

IEEE TRANSACTIONS ON IMAGE PROCESSING 11

TABLE VIIIT-TEST RESULTS

Test Value=0.3333

t-scoreDegrees

offreedom

Significance(2-tailed)

Meandifference

95% confidenceinterval of the

difference

Lower Upper

1.798 20 0.087 0.0167 -0.0026 0.0347

the compressed image as the one that differs from the othertwo, resulting in a correct response rate of greater than 1/3,indicating the fact that the imagery is not visually lossless.

As test images, 8 monochrome and 11 color images wereselected from the images in Tables V and VI. Each has a lowluminance PSNR (less than 40 dB) and none were used duringalgorithm design/calibration. Additionally, the two radiographsfrom Table VII having the lowest PSNRs were included inthe validation test set. In the validation experiment, the DellU2410 LCD monitor with a resolution of 1920 × 1200 wasused. The maximum dimension of images displayed duringtesting was 570×975 so that three copies fit side by side acrossthe screen. Test images exceeding these maximum dimensionswere cropped after decompression. Sixteen observers partici-pated in the experiment. All were familiar with compressionartifacts that occur in JPEG2000 and have normal or corrected-to-normal vision. Each subject performed five trials for eachof the 21 images for a total of 1680 evaluations. Duringthe experiment no feedback was provided on the correctnessof choices. Although the proposed encoder was designed toachieve visually lossless quality at a typical viewing distanceof 60 cm, observers were allowed to view the images as closelyas desired.

Under the null hypothesis that the three images are indistin-guishable, the compressed image should be chosen correctlywith a frequency of 1/3. Images compressed using the pro-posed method were selected with a mean frequency of 0.3494,and a standard deviation of 0.041. Table VIII shows theresults of a one-sample t-test that measures whether the meanfrequency for images encoded using the proposed methodsignificantly differs from the hypothesized frequency of 1/3.In this test, the hypothesis that the responses were randomlychosen could not be rejected at the 5% significance level.Based on these results, it is claimed that the proposed codingmethod is visually lossless under the viewing conditions setduring the design and validation studies.

Fig. 9 shows an original monochrome image together witha version that has been encoded by the proposed method.The encoded image exhibits a PSNR of 39.33 dB, and nodifferences are visible when the images are displayed usinga 1:1 scale. Because PDF viewers may inappropriately scaleimages according to window size or user settings, and unin-tended distortions may be introduced during PDF creation,more examples of encoded images (both monochrome andcolor) are provided online [32] for proper comparison.

VI. CONCLUSIONS

The HVS has varying sensitivity to different color compo-nents, spatial frequencies, orientations, and underlying back-ground images. Using this fact, a visually lossless codingalgorithm has been presented for 8-bit monochrome and 24-bit color images. Visibility thresholds were measured forstatistically modeled quantization distortion, and have differentvalues depending on the local variances within each subband.Since quantization distortion appears on various backgroundimages, the threshold values are adjusted using a self-maskingmodel and a texture masking model to cope with spatiallychanging visual masking effects. The resulting thresholds areused to determine the maximum quantization for visuallylossless coding. The proposed JPEG2000 Part I compliantcoding scheme successfully yields compressed images, whosequality is indistinguishable to those of the original images, atsignificantly lower bitrates than those of numerically losslesscoding and other visually lossless algorithms in the literature.

To improve the performance of the proposed algorithm, thedistribution model for wavelet coefficients could be replacedby a more sophisticated model such as Gaussian Scale Mixture(GSM) [39], which takes into account spatial correlationbetween wavelet coefficients. However, a generalized Gaussianmodel that only uses variance as a parameter was used hereto avoid the intricacy of a multi-variable experiment. Also, topreserve independent coding of codeblocks, light adaptationusing the LL subband [16] and inter-band masking effectswere excluded. Nevertheless, the proposed coding scheme wellpredicts the appropriate compression parameters required forvisually lossless coding.

REFERENCES

[1] N. V. Graham, Visual Pattern Analyzers. New York: Oxford UniversityPress, 1989.

[2] S. Daly, “Application of a noise-adaptive contrast sensitivity functionto image data compression,” Optical Engineering, vol. 29, no. 8, pp.977–987, 1990.

[3] C. Blakemore and F. Campbell, “On the existence of neurones in thehuman visual system selectively sensitive to the orientation and sizeof retinal images,” Journal of Physiology, vol. 203, pp. 237–260, July1969.

[4] F. W. Campbell and J. G. Robson, “Application of Fourier analysis tothe visibility of gratings,” Journal of Physiology, vol. 197, pp. 551–566,1968.

[5] A. B. Watson, “The cortex transform: rapid computation of simulatedneural images,” Computer Vision, Graphics, and Image Processing,vol. 39, no. 3, pp. 311–327, 1987.

[6] D. Wu, D. M. Tan, M. Baird, J. DeCampo, C. White, and H. R. Wu,“Perceptually lossless medical image coding,” IEEE Transactions onImage Processing, vol. 25, no. 3, pp. 335–344, 2006.

[7] D. S. Taubman and M. W. Marcellin, JPEG2000: image compressionfundamentals, standards, and practice. Boston: Kluwer AcademicPublishers, 2002.

[8] M. Bolin and G. Meyer, “A perceptually based adaptive samplingalgorithm,” in SIGGRAPH 99 Conference Proceedings, 1998, pp. 299–309.

[9] Y. Lai and C. Kuo, “Image quality measurement using the Haar wavelet,”in Proceedings of the SPIE, vol. 3169, 1997, pp. 127–138.

[10] A. P. Bradley, “A wavelet visible difference predictor,” IEEE Transac-tions on Image Processing, vol. 8, no. 5, pp. 717–730, 1999.

[11] M. Masry, S. S. Hemami, and Y. Sermadevi, “A scalable wavelet-basedvideo distortion metric and applications,” IEEE Transaction on Circuitsand Systems for Video Technology, vol. 16, no. 2, pp. 260–273, 2006.

[12] M. Nadenau and J. Reichel, “Opponent color, human vision and waveletsfor image compression,” in Proceedings of the Seventh Color ImagingConference, 1999, pp. 237–242.

Page 12: IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding … · IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Visually Lossless Encoding for JPEG2000 Han Oh, Ali Bilgin, Senior

IEEE TRANSACTIONS ON IMAGE PROCESSING 12

[13] Kakadu software. [Online]. Available: http://www.kakadusoftware.com[14] W. Zeng, S. Daly, and S. Lei, “An overview of the visual optimization

tools in JPEG 2000,” Signal Processing: Image Communication, vol. 17,pp. 85–104, 2002.

[15] A. B. Watson, G. Y. Yang, and J. A. Solomon, “Visibility of waveletquantization noise,” IEEE Transactions on Image Processing, vol. 6,no. 8, pp. 1164–1175, 1997.

[16] Z. Liu, L. J. Karam, and A. B. Watson, “JPEG2000 encoding withperceptual distortion control,” IEEE Transactions on Image Processing,vol. 15, pp. 1763–1778, 2006.

[17] M. G. Ramos and S. S. Hemami, “Suprathreshold wavelet coefficientquantization in complex stimuli: psychophysical evaluation and analy-sis,” Journal of the Optical Society of America A, vol. 18, pp. 2385–2397,2001.

[18] D. M. Chandler and S. S. Hemami, “Effects of natural images on thedetectability of simple and compound wavelet subband quantizationdistortions,” Journal of the Optical Society of America A, vol. 20, no. 7,pp. 1164–1180, 2003.

[19] ——, “Dynamic contrast-based quantization for lossy wavelet imagecompression,” IEEE Transactions on Image Processing, vol. 14, no. 4,pp. 397–410, 2005.

[20] M. D. Smith and J. Villasenor, “JPEG-2000 rate control for digitalcinema,” SMPTE Motion Imaging Journal, vol. 115, no. 10, pp. 394–399, 2006.

[21] L. Li and Z.-S. Wang, “Compression quality prediction model forJPEG2000,” IEEE Transactions on Image Processing, vol. 19, no. 2,pp. 384–398, 2010.

[22] H. Oh, A. Bilgin, and M. W. Marcellin, “Visibility thresholds forquantization distortion in JPEG2000,” in Proceedings of QoMex, SanDiego, July 2009.

[23] H. Oh, Y. Kim, M. W. Marcellin, and A. Bilgin, “Visually lossless codingfor color aerial images using JPEG2000,” in Proceedings of InternationalTelemetering Conference, Las Vegas, Oct 2009.

[24] H. Oh, A. Bilgin, and M. W. Marcellin, “Visually lossless JPEG2000using adaptive visibility thresholds and visual masking effects,” in Pro-ceedings of Asilomar Conference on Signals, Systems and Computers,Pacific Grove, Nov 2009.

[25] S. G. Mallat, “A theory for multiresolution signal decomposition - thewavelet representation,” IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 11, no. 7, pp. 674–693, 1989.

[26] M. W. Marcellin, M. A. Lepley, A. Bilgin, T. J. Flohr, T. T. Chinen,and J. H. Kasner, “An overview of quantization in JPEG2000,” SignalProcessing: Image Communication, vol. 17, pp. 73–84, 2002.

[27] A. Cohen, I. Daubechies, and J. C. Feauveau, “Biorthogonal bases ofcompactly supported wavelets,” Communications on Pure and AppliedMathematics, vol. 45, no. 5, pp. 485–560, 1992.

[28] D. H. Brainard, “The psychophysics toolbox,” Spatial Vision, vol. 10,no. 4, pp. 433–436, 1997.

[29] M. Menozzi, U. Napflin, and H. Krueger, “CRT versus LCD: A pilotstudy on visual performance and suitability of two display technologiesfor use in office work,” Displays, vol. 20, no. 1, pp. 3–10, 1999.

[30] D. M. Chandler, N. L. Dykes, and S. S. Hemami, “Visually losslesscompression of digitized radiographs based on contrast sensitivity andvisual masking,” in Proceedings of the SPIE, vol. 5749, 2005, pp. 359–372.

[31] C. S. Furmanski and S. A. Engel, “An oblique effect in human primaryvisual cortex,” Nature Neuroscience, vol. 3, pp. 535–536, 2000.

[32] Supplemental images. [Online]. Available: http://www.spacl.ece.arizona.edu/ohhan/visually lossless/

[33] G. E. Legge and J. M. Foley, “Contrast masking in human vision,”Journal of the Optical Society of America A, vol. 70, no. 12, pp. 1458–1471, 1980.

[34] D. S. Taubman, “High performance scalable image compression withEBCOT,” IEEE Transactions on Image Processing, vol. 9, no. 7, pp.1158–1170, 2000.

[35] S. Daly, “The visible differences predictor: and algorithm for theassessment of image fidelity,” in Digital Images and Human Vision,A. B. Watson, Ed. Cambridge, MA: The MIT Press, pp. 179–206.

[36] USC-SIPI image database. [Online]. Available: http://sipi.usc.edu/database/

[37] H. R. Sheikh, Z. Wang, L. Cormack, and A. C. Bovik. LIVEimage quality assessment database release 2. [Online]. Available:http://live.ece.utexas.edu/research/quality

[38] Kodak lossless true color image suite. [Online]. Available: http://r0k.us/graphics/kodak/

[39] M. J. Wainwright and E. P. Simoncelli, “Scale mixtures of gaussiansand the statistics of natural images,” in Advances in Neural InformationProcessing Systems, S. A. Solla, T. K. Leen, and K. R. Muller, Eds.Cambridge, MA: The MIT Press, pp. 855–861.

PLACEPHOTOHERE

Han Oh Biography text here.

Ali Bilgin Biography text here.

Michael W. Marcellin Biography text here.