WAVELET-BASED FOVEATED IMAGE QUALITY...

WAVELET-BASED FOVEATED IMAGE QUALITY MEASUREMENT FORREGION OF INTEREST IMAGE CODING

Zhou Wang1, Alan C. Bovik1, and Ligang Lu2

1Laboratory for Image and Video Engineering (LIVE), Dept. of Electrical and Computer Engineering,The University of Texas at Austin, Austin, TX 78712-1084, USA

2Video and Image Systems, IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA([email protected], [email protected], [email protected])

ABSTRACT

Region of interest (ROI) image and video compressiontechniques have been widely used in visualcommunication applications in an effort to deliver goodquality images and videos at limited bandwidths. Mostimage quality metrics have been developed for uniformresolution images. These metrics are not appropriate forthe assessment of ROI coded images, where space-variantresolution is necessary. The spatial resolution of thehuman visual system (HVS) is highest around the point offixation and decreases rapidly with increasing eccentricity.Since the ROIs are usually the regions “fixated” by humaneyes, the foveation property of the HVS supplies a naturalapproach for guiding the design of ROI image qualitymeasurement algorithms. We have developed an objectivequality metric for ROI coded images in the wavelettransform domain. This metric can serve to mediate thecompression and enhancement of ROI coded images andvideos. We show its effectiveness by applying it to anembedded foveated image coding system.

1. INTRODUCTION

Region of interest (ROI) image coding allows theassignment of more bits to the ROIs than other parts of theimage. It is a useful tool for visual communicationapplications where the available bandwidth is limited.While there has been a large amount of work in uniformresolution image quality measurement, little has beendone in the assessment of ROI coded images. Qualityassessment method plays an important role in ROI imagecoding, because image coding is essentially anoptimization procedure that maximizes the image qualitywith a limited number of bits, where the quality metricserves as a guide for bit assignment. The development of

This research is supported in part by IBM Corporation, TexasInstrument, Inc., and Texas Advanced Technology Program.

ROI image quality metrics is also very important for thepostprocessing or quality enhancement of ROI codedimages. However, uniform resolution image qualitymeasurement approaches such as peak signal-to-noiseratio (PSNR) are still inappropriately used for theevaluation of ROI image coding and postprocessing [1, 2].

The motivation of this work is that the human visualsystem (HVS) is highly space-variant in sampling, coding,processing and understanding. The spatial resolution ofthe HVS is the highest around the point of fixation(foveation point) and decreases rapidly with increasingeccentricity. This feature delivers a natural way to definean image quality measure for the case that the human eyesare fixating at a given point in the image. For example, afoveated PSNR (F-PSNR) metric was proposed in [3] forfoveated video compression. By thinking of the ROIs ascollections of pixels that are possibly “fixated” by humaneyes, a natural ROI image quality metric can be designedthat utilizes a foveation model of the HVS.

In this paper, we develop a foveation-based HVSmodel in the discrete wavelet transform (DWT) domainbecause wavelet analysis supplies a convenient way tosimultaneously examine localized spatial as well asfrequency information. A new image quality metric calledthe foveated wavelet image quality index (FWQI) is thendefined for ROI coded images.

2. FOVEATED WAVELET IMAGE QUALITYMEASUREMENT

The photoreceptors (cones and rods) and ganglion cellsare non-uniformly distributed in the retina in the humaneye [4]. The density and sensitivity of cone receptors andganglion cells play important roles in determining theability of our eyes to resolve what we see. The resolutionis the highest around the foveation point and decreasesdramatically with increasing eccentricity. Psychologicalexperiments had been conducted to measure the contrastsensitivity as a function of retinal eccentricity [5-7]. In [5],a model that fits the experimental data was given by

+=

2

20 exp),(

e

eefCTefCT α , (1)

where f is the spatial frequency (cycles/degree), e is theretinal eccentricity (degrees), CT0 is a constant minimalcontrast threshold, α is the spatial frequency decayconstant, e2 is the half-resolution eccentricity, and CT(f, e)is the visible contrast threshold as a function of f and e.The best fitting parameter values given in [5] are α =0.106, e2 = 2.3, and CT0 = 1/64. It was also reported in [5]that the same values of α and e2 provide a good fit to thedata in [6] with CT0 = 1/75, and an adequate fit to the datain [7] with CT0 = 1/76, respectively. We use the parameterselections as in [5]. The contrast sensitivity is defined as:

),(

1),(

efCTefCS = . (2)

For a given e, equation (1) can be used to find its criticalfrequency or so called cutoff frequency fc by setting CT to1.0 (the maximum possible contrast) and solving for e

α)(

)1ln(

2

02

ee

CTefc +

= (cycles/degree). (3)

Given a pixel x in an N pixels wide image, its distancefrom the foveation point xf is

2)( fxxx −=d (pixels)

and its eccentricity is given by ( )Nvdve /)(tan),( 1 xx −= ,

where v is the viewing distance in image width. In Fig. 1,we show the normalized contrast sensitivity as a functionof pixel position for N = 512 and v = 1 and 6, respectively.fc as a function of pixel position is also given. The CS isnormalized so that the highest value is 1.0 at 0eccentricity. The maximum perceived resolution is alsolimited by the display resolution 180/Nvr π≈(pixels/degree). The Nyquist display frequency is given by

2/rfd = (cycles/degree). Combining this with (3), the

cutoff frequency for x is ))),((min()( dcm fdff xx = . We

define the foveation-based error sensitivity as:

>

≤=)(0

)()0,(

))(,(),,(

x

xx,

x

m

mf

fffor

ffforfCS

vefCSfvS . (4)

Now let us consider the wavelet transforms. In the 1-DDWT, the input discrete signal s is convolved withhighpass and lowpass analysis filters and downsampled bytwo, resulting in transformed signals sH and sL. The signalsL can be further decomposed and the process may berepeated multiple times. The number of repetitions definesthe wavelet decomposition level λ. For image processing,the horizontal and vertical wavelet decompositions areapplied alternatively, yielding LL, HL, LH and HHsubbands. The LL subband may be further decomposedand the process repeated multiple times. Let ),( θλrepresent the subband of level λ and orientation θ , whereθ is an index representing the LL, LH, HH or HLsubband. The wavelet coefficients at different subbands

supply information of variable perceptual importance. In[8], psychovisual measurement results were given for thevisual sensitivity in wavelet decompositions. A model thatfits the experimental data is Ylog = alog + fk(log –

20 )log fgθ [8], where Y is the visually detectable noise

threshold and λ−= 2rf [8] is the spatial frequency. The

visual sensitivity in subband ),( θλ is given by:

20 )/2(log(

,

,

,

,

10),(

rgfkWaA

A

Y

AS

θλ

θλ

θλ

θλ

θλθλ == , (5)

where θλ ,A is the basis function amplitude given in [8].

Let θλ ,B denote the set of wavelet coefficient positions

residing in subband ),( θλ . For each θλ ,B , we calculate

the corresponding foveation point fθλ ,x in it. Given a

wavelet coefficient at θλ ,Bx ∈ , its equivalent distance

from the foveation point in the spatial domain is given by

2,, 2)( fd θλλ

θλ xxx −= . With this distance, we have

))(,2,(),,( , xx θλλ drvSfvS ff

−= for θλ ,Bx ∈ . (6)

A foveation-based visual sensitivity model in the DWTdomain is obtained by combining (5) and (6):

[ ] [ ] 21 ))(,2,(),(),( ,

βθλ

λβθλ xx drvSSvS fw−⋅=

for θλ ,Bx ∈ , (7)

where β1 and β2 are parameters used to control themagnitudes of wS and fS , respectively. We use β1 = 1

and β2 = 2.5. Fig. 2 shows ),( xvS for v = 1, 3, 6 and 10,

respectively. For the evaluation of image quality, insteadof using the traditional error summation methods, wedesigned a new quality index [9] by modeling any signaldistortion as a combination of three factors: loss ofcorrelation, mean distortion and variance distortion. Forany 2-D signal, the measurement results are a 2-D qualitymap as well as an overall quality index. Readers can referto [9] and http://anchovy.ece.utexas.edu/~zwang/research/quality_index/demo.html for more details anddemonstrative images of the new quality index. In thispaper, we adapt the index into the DWT domain anddefine a foveated wavelet image quality index (FWQI) as:

∑

∑

=

=

⋅

⋅=

M

n

M

n

cvS

QcvSFWQI

1

1

)(),(

)()(),(

nn

nnn

xx

xxx, (8)

where M is the number of the wavelet coefficients, )( nxcis the wavelet coefficient of the original image at location

nx , and )( nxQ is the quality value at location nx in the

quality index map. Since ),( nxvS varies with v, FWQI of

an test image is a function of v, instead of a single value.The above model is developed for the case of a single

foveation point. We consider the ROIs as the groups ofpossibly fixated pixels. This corresponds to the case ofmultiple foveation points. Our model can easily adapt tothis case. Suppose that there are P foveation points in theimage, with ),( xvSi for Pi ,,2,1 L= , then the overall

error sensitivity should be given by the maximum value ofthem: )),((max),(

1xx vSvS i

Pi L== .

3. IMAGE CODING USING THE FOVEATEDQUALITY METRIC

SPIHT [10] is a very efficient progressive wavelet imagecoding algorithm. We designed a modified SPIHTalgorithm and tuned it using the above FWQI model tooptimize the foveated visual quality at any given bit rate.We call the new coding algorithm the embedded waveletimage coding (EFIC) algorithm [11]. The encodedbitstream can be truncated at arbitrary places to createreconstructed images with different quality and depth offoveation. Fig. 3 gives the FWQI comparison of the EFICand SPIHT compressed 8bits/pixel (bpp) “Zelda” imagesat 0.015265, 0.0625 and 0.25bpp, respectively. FWQI foreach image is given as a function of the viewing distance.It can be observed that significant quality gain is achievedthroughout the whole range of the viewing distances. Fig.4 shows the SPIHT and EFIC decoded images. Comparedwith SPIHT, EFIC provides better foveated visual quality.When sufficient bit rate is available, the EFIC codedimage approaches uniform resolution.

4. CONCLUSIONS

We propose a foveation-based sophisticated ROI imagequality metric in the wavelet transform domain. Thismetric can serve as a very useful tool for foveated ROIimage coding and quality enhancement.

REFERENCES[1] E. Atsumi, and N. Farvardin, “Lossy/lossless region-

of-interest image coding based on set partitioning inhierarchical trees,” IEEE International Conference onImage Processing, vol. 1, pp. 87-91, 1998.

[2] J. Jung, S. Joung, Y. Jang, and J. Paik, “Enhancementof region-of-interest coded images by using adaptiveregularization,” IEEE International Conference onConsumer Electronics, pp. 62-63, 2000.

[3] S. Lee, M. S. Pattichis, and A. C. Bovik, “Foveatedvideo quality assessment,” IEEE Trans. Multimedia,to appear, 2001.

[4] W. S. Geisler, Vision Notes, The University of Texasat Austin, 1999.

[5] W. S. Geisler, and J. S. Perry, “A real-time foveatedmultiresolution system for low-bandwidth videocommunication,” Proc. of SPIE, vol. 3299, 1998.

[6] T. L. Arnow, and W. S. Geisler, “Visual detectionfollowing retinal damage: predictions of aninhomogeneous retino-cortical model,” Proceedingsof SPIE, vol. 2674, pp. 119-130, 1996.

[7] M. S. Banks, A. B. Sekuler, and S. J. Anderson,“Peripheral spatial vision: limits imposed by optics,photoreceptors, and receptor pooling,” Journal of theOptic. Soc. of America, vol. 8, pp. 1775-1787, 1991.

[8] A. B. Watson, G. Y. Yang, J. A. Solomon, and J.Villasenor, “Visibility of wavelet quantization noise,”IEEE Trans. Image Processing, vol. 6, no. 8, pp.1164-1175, Aug. 1997.

[9] Z. Wang and A. C. Bovik, “A universal image qualityindex,” submitted to IEEE Signal Proc. Letters, 2001.

[10]A. Said, and W. A. Pearlman, “A new, fast, andefficient image codec based on set partitioning inhierarchical trees,” IEEE Trans. Circuits & Systemsfor Video Tech., vol. 6, no. 3, pp. 243-250, June 1996.

[11] Z. Wang and A. C. Bovik, “Embedded foveationimage coding,” submitted to IEEE Trans. ImageProcessing, 2001.

Pixel Position (pixels)

Spa

tial F

requ

ency

(cy

cles

/deg

ree)

−250 −200 −150 −100 −50 0 50 100 150 200 2500

5

10

15

20

25

30

35

40

Pixel Position (pixels)

Spa

tial F

requ

ency

(cy

cles

/deg

ree)

−250 −200 −150 −100 −50 0 50 100 150 200 2500

5

10

15

20

25

30

35

40

Fig. 1 Normalized contrast sensitivity (Brightness indicates the strength of contrast sensitivity) for N = 512 and v= 1 (Left) and v = 6 (Right) times of the image width, respectively. The white curves show the cutoff frequencies.

Fig. 4 “Zelda” image (512×512, 8bpp) compression result comparison at 0.015625 (CR=512:1), 0.0625 (CR=128:1)and 0.25bpp (CR=32:1). Upper: SPIHT coded images; Bottom: EFIC coded images.

Fig. 2 Foveation-based error sensitivity mask in theDWT domain. The top-left, top-right, bottom-left,and bottom-right are for v = 1, 3, 6 and 10 times ofthe image width, respectively. (Brightnesslogarithmically enhanced for display purpose)

0 2 4 6 8 10 120

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Viewing Distance (image width)

FW

QI

SPIHT, 0.25bppEFIC, 0.25bppSPIHT, 0.0625bppEFIC, 0.0625bppSPIHT, 0.015625bppEFIC, 0.015625bpp

Fig. 3 FWQI comparison of EFIC and SPIHT compressed“Zelda” image (512×512, 8bpp) at 0.015625bpp,0.0625bpp and 0.25bpp.

WAVELET-BASED FOVEATED IMAGE QUALITY...

Documents

Transcript of WAVELET-BASED FOVEATED IMAGE QUALITY...