Color Correction and Compression for Multi-view Video...

Color Correction and Compression forMulti-view Video using H.264 Features

Boxin Shi, Yangxi Li, Lin Liu, and Chao Xu

Key Laboratory of Machine Perception (Ministry of Education),Peking University, Beijing, 100871, China

{shiboxin,liyangxi,liulin,xuchao}@cis.pku.edu.cn

Abstract. Multi-view video is a new video application requiring effi-cient coding algorithm to compress the huge data, while the color varia-tions among different viewpoints deteriorate the visual quality of multi-view video. This paper deals with these two problems simultaneouslyfocusing on color. With the spatial block-matching information frommulti-view video coding, color annotated images can be produced givengray input in one view and color input in another view. Then the colorimages are rendered from colorization process. By discarding most ofthe chrominance information before encoding and restoring them whendecoding, this novel scheme can greatly improve the compression ratewithout much loss in visual quality, while at the same time producecolor images with similar appearance to reference without correction.

1 Introduction

With the improvement of technologies in image processing and computer vision,the “Second-generation image coding” [1] has raised great interest because ofits higher potential in coding efficiency and closer relationship with perceptualquality. Different from traditional “transform + entropy coding” compressionschemes relying on statistical redundancy, the “Second-generation image coding”focuses on visual redundancy by utilizing features within images and videos. Onegood example of this kind of system can be found in [2]. The coding efficiencywas exploited by removing some parts of an image intentionally then transferredthem in a compressed manner, and finally the whole image could be restoredin the decoder side. This idea can be concluded as “encoder removes whereasdecoder restores” [3], which motivates us in incorporating this next generationcompression idea into the next generation video application—Multi-view VideoCoding (MVC).

The multi-view video captured by a combination of synchronous camerasfrom different positions can provide people with more realistic experience. Someresearch groups have proposed several multi-view video systems such as freeviewpoint television (FTV) [4] and 3DTV [5]. Although the multi-view videosystems own many advantages over the current mono-view video systems, someproblems have restricted the widely-use of this technology. First, since the multi-view video sequences are captured by many cameras at the same time, there are

2 ACCV2009, Paper ID: 181

huge amount of data required to capture, process and transfer efficiently. Second,although the set of cameras has been adjusted to the same configuration asprecisely as possible, it is still difficult to avoid the chrominance discrepanciesamong different viewpoints due to the scene illumination, camera calibration andjitter speed. In many cases, color correction has to be used as pre-processing todeal with this problem [6].

Before we introduce the improvement to the aforementioned two problems, weneed to define the reference view and the target view according to their differentfunctions in our scheme. The reference view is one designated camera view seen ashaving correct color and used as reference by other views when coding; while thetarget views are all the other views which are encoded referring to reference view.A special point in our scheme is that the reference view is color sequence and thetarget views are all gray sequences. Given these basic definitions, next we willsummarize the solutions. The first problem can be improved because we combinethe “encoder removes whereas decoder restores” idea with existed MVC schemeand exploit color redundancy by discarding all the chrominance information intarget views before encoding. Then the color is left to be restored through ourproposed color annotation and colorization strategy when decoding. Thus, thecoding efficiency is further improved because of the discarded chrominance. Sincecolor is restored as similar to reference as possible, the color similarity betweendifferent views are also guaranteed without color correction, which improves thesecond problem.

In order to implement such a scheme, two critical questions have to be an-swered in this paper: 1) How to produce the side information using codec toprovide enough cues for colorization? 2) How to design the colorization methodusing the side information from codec to reconstruct the color image withoutdeteriorating the visual quality? In the following we will give a description toour complete framework in Section 2, and the first question will be explained indetail in Section 3 while the second in Section 4. Section 5 is about the imple-mentation issues and experiment results. Finally Section 6 comes as conclusion.

2 Framework of Proposed Scheme

The complete framework of our scheme is shown in Fig. 1. As it is illustratedin the legend in upper-left corner, blocks in different gray-scales are depicted todistinguish different views.

The state-of-the-art MVC scheme is based on H.264/AVC (using H.264 forshort in the following statement), and coding efficiency is improved by exploitingnot only the temporal motion redundancy between subsequent frames but alsospatial motion redundancy between neighboring viewpoints. This solution canbe seen as an extension of the traditional motion compensation to differentviewpoints. Implementation of MVC schemes considering spatial prediction areintroduced in [7] and [8]. In order to emphasize the color redundancy, we usea simplified prediction structure similar to [7] and [8], as it is shown in upper-right corner in Fig. 1. The reference view in our system not only serves as

ACCV2009, Paper ID: 181 3

Fig. 1. The framework of our color correction and compression scheme.

temporal/spatial reference for motion prediction, but also the spatial referencefor color restoration. Mean while, in our scheme, Motion Vectors (MVs) not onlytake the responsibility of motion compensation for coding purposes, but also asside information for colorization. The corresponding blocks from reference viewsare utilized to indicate the chrominance values for target views according to theirMVs from reference views, and this process is called color annotation. Finally thecolorization technique will render the color annotated frame covered by partialcolor to one with complete color appearance.

3 Color Annotation Using H.264 Features

The emerging technology focusing on adding color to gray image provides thepossibility of using color redundancy in video coding. In [9] the author succeededin integrating color transfer [10] into coding pipeline, and improved the codingefficiency, but it was for mono-view video coding and until recently few researcheshave been done for MVC. On the other hand, current research did not considerutilizing the latest color restoration techniques as well as the advanced featuresin codec such as H.264. By exploring these current achievements, we find thatH.264 codec can provide lot’s of useful information to enhance color annotationprocess, which brings benefits to restoring better color quality.

3.1 Scribble-based Colorization

The concept of colorization is often used in computer graphics referring to addingcolor to gray image. According to [11], the existed colorization method is roughly


divided into two categories: example-based and scribble-based . The example-based method requires the user to provide an example image as reference ofcolor tones. For example, in [12] the author proposed a pixel-based approach tocolorize an image by matching swatches between the grayscale target and colorreference images. The scribble-based colorization depends on scribbles placed byusers onto the target image and performs color expansion to produce the colorimage. Levin [13] proposed a simple but effective colorization algorithm withconvincing result. As long as the color was properly annotated, their method,without doing image segmentation, would render a delicate color picture.

In our application for MVC, neither do we want to render a color picturefrom an artist’s view like graphics applications, nor can we provide an exampleimage for reference. However, we have spatial reference color frames and MV-based color annotation as prerequisites for automatic scribble generation. Wecan use the prediction blocks from coding pipeline to play the role of scribblesto annotate color. The correctness of this idea is based on a simple intuition: thematching blocks from reference frames indicate the correct color values for targetframes, since space-time neighboring frames own close similarity. Fortunately, themotion prediction technology in H.264 is powerful enough in block matchinig,which promises the accuracy of blocks for color annotation.

3.2 Color Annotation from H.264 based MVC

Being the lasted video coding standard, H.264 owns many highlighted features,as described by Wiegand [14]. It is our task to combine these features with ourcolorization scheme, because we do not want to introduce extra operations toautomatic color scribble annotation. The color annotation mainly occurs duringthe prediction procedure, and among various highlighted features in H.264, twoof them contributes significantly to color annotation.

1) Directional spatial prediction for intra coding: This new technique meanscoding by extrapolating in current picture without reference to other frames. Wedo not intend to use this technique, however, in our scheme we just want to pre-vent it in chrominance channels, because we only have gray target sequences andonly want to search corresponding blocks in reference views according to lumi-nance similarity. It is the codec who decides how many blocks will be inter/intracoded considering rate-distortion optimization. The intra coding result on grayblocks is still gray, so we can only get chrominance values indicated from intercoding between reference and target views, i.e. in chrominance channels we needto assign each matching block with MVs pointing to chrominance values fromreference. Then after the annotation for chrominance, some intra blocks are leftwithout color, and these blocks are just the target for the following colorizationtasks. One example of color annotation output from inter coding blocks can beseen in Fig. 2(a), and all the gray areas are left by intra blocks.

2) Variable block-size motion compensation with small block sizes: H.264supports more flexibility in block sizes including 7 types from 4×4 to 16×16.The larger blocks can save bits and computations in consistent textures while thesmaller ones can describe the details in an accurate way. Thus, the color scribbles


(a) (b)

Fig. 2. Color annotation result: (a)variable-size block; (b) fixed-size block. (Please referto electronic version for color figures.)

can also be flexible according to luminance consistency, which provides moreprecise color annotation than using fixed-size blocks. Fig. 2 shows an comparisonof color scribbles depicted using 7 types variable-size block and 16×16 fixed-sizeblock only. We can see that in (b) the yellow skirt can not be depicted accuratelyusing fixed-size blocks only.

4 Optimization based Colorization

The scribble-based colorization method in [13] using optimization is a very simplemethod with excellent performance. Although in some new researches [15], moresurprising result is provided considering edge detection or texture consistency,these methods introduce too many complicated operations beyond our needs. Asit is shown in Fig. 2, our scribbles are square blocks covering the majority part ofthe image, which makes our colorization task much easier than modern graphicsproblems. In our annotated image, we have matching color blocks assigned byMVs as known color areas and in uncolored areas we have luminance intensityto guide the color expansion. This makes our application situation satisfies theassumption in [13]: neighboring pixels in space-time that have similar intensitiesshould have similar color. Therefore, we colorize our annotated images using theoptimization method according to their luminance similarity to their neighbors.

The colorization is controlled by minimizing the difference of a pixel and itsneighbors around in the U and V channels of the YUV color space. During ourcolor annotation process, we assign color values according to MVs only in Uand V channels, so in U and V channels of target images where the values arezero are the pixels required to be colorized according to their Y values. Take Uchannel for example, we minimize the difference cost of a pixel U(x, y) with itsweighted average of color around neighboring area N (e.g. a 3×3 window), andm, n is the width and height of the picture:

cost(U) =m∑

x=1

n∑y=1

(U(x, y)−∑

N

wNU(x + ∆x, y + ∆y))2 (1)


The weight factor wN is calculated from the square difference between differentluminance in Y channel, σN is the standard deviation of pixel U(x, y) aroundthe neighboring area N :

wN = exp(−(Y (x, y)− Y (x + ∆x, y + ∆y))2

2σ2N

)(2)

The optimization problem above can be solved using a common method suchas the least squares, because the constraints are linear and the optimizationfunction owns a quadratic form. After the colorization for both U and V channels,we obtain a complete color picture.

5 Experiment Results

5.1 Implementation Issues

Both the MVC prediction structure and color annotation of our scheme areconstructed under the framework of H.264, but we introduce many new featuresaccording to our application. We design our new codec by revising the H.264reference software JM v10.2 [16]: 1) In H.264, Decoded Picture Buffer (DPB)stores all the reference frames required to encode the current frame for motionestimation in the temporal direction. However the spatial reference frame mustbe taken into consideration when designing the DPB for MVC. Thus, we modifyoriginal JM’s DPB for spatial prediction as it is shown in Fig. 1. 2) In orderto produce the color annotated frames, the MVs of inter frame coding betweenreference and target views have to be utilized. This requires a modification todecoder in generating the color annotated frames. When the decoder performsspatial motion compensation using luminance MVs, we need to attach every MVwith the corresponding chrominance blocks from reference frames.

We use two different MVC sequences to test our scheme. The first sequenceis called flamenco2 published by KDDI [17]. It is captured using 5 cross cameraswith 20cm spacing, and it is a non-rectified sequence which has severe color vari-ation among different viewpoints. The second sequence is called rena publishedby Nagoya University [18]. This is also a sequence required color correction, andit is capture using 100 cameras with 5cm spacing in 1D-parallel distribution.Both the sequences have a frame rate of 30Hz, and resolution of 640×480.

In our experiment, the flamenco2 is cropped to 320×240 and we extract thefirst 100 frames from each sequence. In order to show the results in a simple waywithout losing generality, we choose two neighboring views in each sequence, oneas reference and the other as target view. For flamenco2 we use viewpoint 0 asreference while viewpoint 1 as target; for rena the reference and target viewsare designated from viewpoint 51 and 52 respectively. We test the visual qualityand coding bit rate under 4 different Quantization Parameter (QP) values: 22,27, 32, and 37.


(a1) (a2) (a3) (a4) (a5)

(b1) (b2) (b3) (b4) (b5)

(c1) (c2) (c3) (c4) (c5)

(d1) (d2) (d3) (d4) (d5)

Fig. 3. Visual quality comparison: (a1)-(a5): The first frame from flamenco2, (a1)original reference viewpoint, (a2) original target viewpoint, (a3) color correction resultfrom histogram matching, (a4) color annotation result from proposed method, (a5)colorization result from proposed method; (b1)-(b5) are the corresponding results fromthe 40th frame of flamenco2; (c1)-(c5) from the first frame of rena; (d1)-(d5) from 40thframe of rena. (Please refer to electronic version for color figures.)

5.2 Experiment Results for Color Correction

Since our color sequence is generated from color annotated frames referring toreference, and we assume that the matching blocks from reference indicatingthe correct color values, so the colorized results should also be seen as colorcorrected results. In the comparison of visual quality, we show the first and 40thframes of each sequence with original picture, color corrected result and ourcolorized output. We use the histogram matching method in [6] as the controlgroup of color correction comparison. For fair comparison, the time-constantmapping function in [6] is not considered, because we do not introduce any timeconstraint in our method. The result of visual quality comparison can be foundin Fig. 3.

The goal of color correction is to make the color appearance of target viewsimilar to reference view as closely as possible. Take (a1) and (a2) in Fig. 3 as anexample, there exists severe red color cast phenomenon in (a2) and we want tomake it as blue as (a1). The histogram matching output in (a3) solves majority of


the problem, but some red residua can still be observed. While in (a5), colorizedfrom the color annotation in (a4), we restore the reference color appearance ontarget views with a better similarity. The same benefits of proposed method canalso be told from the other three groups’ results. Some blocking artifacts can befound in our colorized result. This is because our method is based on the de-coding output and the block-based color annotation may introduce inconsistentboundaries.

5.3 Experiment Results for Color Compression

The chrominance information in target views is restored from color annotationand colorization on gray sequence when decoding, that means during encodingand transfer stage only luminance sequence is processed in target views, andthe reference view is processed with common MVC method. Because the colorsequence used in our test owns severe color variations and the final output aimsat correcting it towards reference, this makes the calculation of PSNR on re-constructed image to original image not reasonable. Obviously, we cannot judgethat a reconstructed image with more precisely corrected color has a lower ob-jective quality due to its corrected color difference to the original one. This isdifferent from the PSNR evaluation in color correction methods like [6]. Thesemethods using color correction as pre-processing before encoding and the PSNRcalculation can be performed on corrected sequence, while our method belongsto post-processing and we do not have colorized sequence before finishing decod-ing. Therefore, we only give the bit rate comparison under different QP valuesin Table 1.

The bit rate saving of proposed method steps to a higher stair comparing tocolor correction based pre-processing. In the test of flamenco2, we lower the bitrate by about 20% on average comparing to others. All the saving bits are derivedfrom the chrominance coefficients. The detailed inter frame bits distribution canbe read from Table 2 including the bit costs on mode, motion, luminance coeffi-cients (Coeffs. Y) and chrominance coefficients (Coeffs. C). As to the results ofrena, the bit rate saving seems too much to be reasonable. But this is the case.The first reason is rena’s small camera spacing benefits our spatial predictionbased color annotation. The second reason is our scheme does not rely on mo-tion redundancy. The inter frame motion in rena is not so violent, which meansthe traditional motion-based coding is not able to exert its power. The thirdreason is this sequence is noisy in chrominance. The poor-quality image bringsdifficulties to transform based coding. However, color compression does not havethese limitations. On the contrary, we can further improve the coding efficiencythrough saving more bits on chrominance coefficients. From Table 2 we can con-clude that the capability of color correction in reducing chrominance coefficientsis very limited, while compression through color redundancy can maximum thisability since the bits cost on chrominance coefficients is tending to zero.

However, our scheme also has several limitations: 1) The tradeoff betweenlower bit rate and better visual quality should be considered according to ap-plication, because sometimes the colorized result may not perform well in visual


quality; 2) Our color annotation is based on spatial prediction and the spatialmatching will be greatly deteriorated if severe occlusions or great variations existin the neighboring viewpoints. But when compressing multi-view sequences withsmall camera spacing like rena, our scheme may bring significant improvementin coding efficiency.

Table 1. Bit rate comparison (kbps). Method 1 = No correction; Method 2 = His-togram matching; Method 3 = Proposed method

Sequence Method QP=22 QP=27 QP=32 QP=37

flamenco2 1 1530.76 822.59 433.46 229.662 1513.69 803.67 410.19 209.973 1229.50 703.51 375.39 193.68

rena 1 1427.09 549.55 268.81 153.712 1840.32 611.96 262.55 140.303 673.62 306.99 158.41 94.44

Table 2. Inter frame bit rate distribution (bits/frame, QP=32). Method 1 = No cor-rection; Method 2 = Histogram matching; Method 3 = Proposed method

Sequence Method Mode Motion Coeffs. Y Coeffs. C

flamenco2 1 1122.80 5173.13 5522.36 1693.862 1131.83 5168.40 5221.51 1224.243 1114.83 5156.43 5542.56 6.89

rena 1 1744.13 2776.73 539.22 3004.552 1808.17 2661.33 522.14 2760.913 1684.58 2557.30 616.69 22.16

6 Conclusion

This paper introduces a new coding plus color correction scheme for multi-viewvideo by exploring the color redundancy. Some advanced features in H.264 codecare utilized as automatic color annotation, and then an optimization based col-orization is performed to render the color picture. Different from motion basedcoding, we focus on color, and the final output of our method brings benefits toboth color correction and color compression problems, to perform more consis-tent color appearance and save more bits for MVC.

In future works, we will try to incorporate colorization and MVC pipelinemore closely with complicated MVC prediction structure. Furthermore, better


colorization method should be studied to prevent the blocking artifacts andimprove colorized visual quality.

Acknowledgement

This work was supported in part by China 973 Research Program under Grant2009CB320900.

References

1. Reid, M., Millar, R., Black, N.: Second-generation image coding: An overview. In:ACM Computer Surveys, vol. 29, pp. 3–29. (1997)

2. Liu, D. and et al.: Image compression with edge-based inpainting. In: IEEETransactions on Circuit and Systems for Video Technology, vol. 17, pp. 1273–1287.(2007)

3. Rane, S., Sapiro, G., Bertalmio, M.: Structure and texture filling-in of missingimage blocks in wireless transmission and compression applications. In: IEEETransactions on Image Process, vol. 12, pp. 296–303. (2003)

4. Tanimoto, M.: Free viewpoint television—FTV. In: Picture Coding Symposium(PCS). (2004)

5. Matusik, W., Pfister, H.: 3DTV: A scalable system for real-time acquisition, trans-mission, and autostereoscopic display of dynamic scenes. In: ACM SIGGRAPH,pp. 814–824. (2004)

6. Fecker, U., Barkowsky, M., Kaup, A.: Histogram-based prefiltering for luminanceand chrominance compensation of multiview video. In: IEEE Transactions onCircuits and Systems for Video Technology, vol. 18, pp. 1258–1267. (2008)

7. Kimata, H. and et al.: Multi-view video coding using reference picture selectionfor free-viewpoint video communication. In: Picture Coding Symposium (PCS).(2004)

8. Mueller, K. and et al.: Multi-view video coding based on H.264/MPEG4-AVCusing hierarchical b pictures. In: Proc. Picture Coding Symposium (PCS). (2006)

9. Kumar, R., Mitra, S.: Motion estimation based color transfer and its applicationto color video compression. In: Pattern Analysis Application, vol. 11, pp. 131–139.(2008)

10. Reinhard, E. and et al.: Color transfer between images. In: IEEE ComputerGraphics Applications, vol. 21, pp. 34–41. (2001)

11. Liu, X. and et al.: Intrinsic colorization. In: ACM SIGGRAPH Asia, pp. 151:1–152:9. (2008)

12. Welsh, T. and et al.: Transferring color to grayscale images. In: ACM SIGGRAPH,pp. 277–280. (2002)

13. Levin, A., Lischinski, D., Weiss, Y.: Colorization using optimization. In: ACMSIGGRAPH, pp. 689–694. (2004)

14. Wiegand, T. and et al.: Overview of the H.264/AVC video coding standard. In:IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp.560–576. (2003)

15. Qu, Y., Wong, T., Heng, P.: Manga colorization. In: ACM SIGGRAPH, pp.1214–1220. (2004)

16. H.264/AVC JM reference software, http://iphome.hhi.de/suehring/tml17. Flamenco2 sequence download, ftp://ftp.ne.jp/kddi/multiview18. Rena sequence download, http://www.tanimoto.nuee.nagoya-u.ac.jp

Color Correction and Compression for Multi-view Video...

Documents

Transcript of Color Correction and Compression for Multi-view Video...