Mode dependent down-sampling and interpolation scheme for high ...

16
Mode dependent down-sampling and interpolation scheme for high efficiency video coding Qingbo Wu, Hongliang Li n School of Electronic Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China article info Article history: Received 1 June 2012 Received in revised form 20 November 2012 Accepted 17 March 2013 Available online 27 March 2013 Keywords: Video coding High definition H.264/AVC Down-sampling abstract In this paper, a mode dependent down-sampling and interpolation scheme is proposed to improve the coding efficiency of the intra prediction module. In the proposed method, we elaborately design the down-sampling structures and interpolation schemes for each directional intra prediction mode by minimizing the spatial prediction distance. The sampled pixels are predicted with a traditional directional intra prediction scheme, and the non-sampled pixels are predicted from the interpolation of their neighboring reconstructed sampling pixels. Both the residuals of the sampled and non-sampled pixels are encoded at last. Experimental results show that the proposed method achieves an average 7.52% bitrate reduction relative to KTA reference software. Since the down- sampling structure and interpolation method is only related to the intra mode, there is no additional overhead at the encoder. & 2013 Elsevier B.V. All rights reserved. 1. Introduction In the past decades, video coding technologies have greatly promoted the development of digital multimedia contents related industry. Lots of these services, like TV broadcasting, network video, DVD, Free-viewpoint TV (FTV) [1], etc., have deeply changed our lives. International video coding standards play an important role on these advanced technology promotion. Recently, the state-of- the-art video coding standard H.264/AVC [24] has been widely used and achieved remarkable success. To satisfy the rapid increasing demand for high-definition (HD) and ultra-HD (UHD) video contents, a higher requirement for more efficient video coding has been brought forward. Recently, the next generation video coding standard named High Efficiency Video Coding (HEVC) is stepping up in development. In order to better adapt to HD contents, many novel coding tools are adopted in HEVC. Meanwhile, the main hybrid coding framework and some classical schemes are reserved with respect to H.264/AVC. For the intra-frame prediction module, there are up to 36 intra modes available in the latest HEVC draft [5]. The novel angular intra prediction (AIP) still inherits the main idea of directional intra prediction (DIP) in H.264/AVC. Similarly, the spatial correlation and structural regularity are also used to design the just-noticeable difference model [6]. Some inherent flaws in DIP have been reported which limits the intra prediction performance. As investigated in [7,8], two issues negatively affect the performance of DIP. Firstly, for HD contents, smaller partition block is preferred due to complex texture, which introduces more bits to signal intra mode information for each block. Secondly, along with the distance increase between the reference pixel and the pixel to be predicted, the prediction quality could degrade significantly for high texture, high detail contents in HD video. Two kinds of strategies are widely discussed in the literatures to solve these problems. The first strategy focuses on reducing spatial redundancy by refining the conventional intra prediction structure. Choi et al. [7] modified Intra16 16 vertical and horizontal modes into a line-by-line structure. In [8], a bi-MB level horizontal spatial prediction was proposed. The Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/image Signal Processing: Image Communication 0923-5965/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.image.2013.03.003 n Corresponding author. Tel.: +86 28 61830586; fax: +86 28 61830064. E-mail addresses: [email protected] (Q. Wu), [email protected] (H. Li). Signal Processing: Image Communication 28 (2013) 581596

Transcript of Mode dependent down-sampling and interpolation scheme for high ...

Page 1: Mode dependent down-sampling and interpolation scheme for high ...

Contents lists available at SciVerse ScienceDirect

Signal Processing: Image Communication

Signal Processing: Image Communication 28 (2013) 581–596

0923-59http://d

n CorrE-m

hlli@ue

journal homepage: www.elsevier.com/locate/image

Mode dependent down-sampling and interpolation schemefor high efficiency video coding

Qingbo Wu, Hongliang Li n

School of Electronic Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

a r t i c l e i n f o

Article history:Received 1 June 2012Received in revised form20 November 2012Accepted 17 March 2013Available online 27 March 2013

Keywords:Video codingHigh definitionH.264/AVCDown-sampling

65/$ - see front matter & 2013 Elsevier B.V.x.doi.org/10.1016/j.image.2013.03.003

esponding author. Tel.: +86 28 61830586; faail addresses: [email protected] (Q. Wu),stc.edu.cn (H. Li).

a b s t r a c t

In this paper, a mode dependent down-sampling and interpolation scheme is proposed toimprove the coding efficiency of the intra prediction module. In the proposed method, weelaborately design the down-sampling structures and interpolation schemes for eachdirectional intra prediction mode by minimizing the spatial prediction distance. Thesampled pixels are predicted with a traditional directional intra prediction scheme, andthe non-sampled pixels are predicted from the interpolation of their neighboringreconstructed sampling pixels. Both the residuals of the sampled and non-sampled pixelsare encoded at last. Experimental results show that the proposed method achieves anaverage 7.52% bitrate reduction relative to KTA reference software. Since the down-sampling structure and interpolation method is only related to the intra mode, there is noadditional overhead at the encoder.

& 2013 Elsevier B.V. All rights reserved.

1. Introduction

In the past decades, video coding technologies havegreatly promoted the development of digital multimediacontents related industry. Lots of these services, like TVbroadcasting, network video, DVD, Free-viewpoint TV(FTV) [1], etc., have deeply changed our lives. Internationalvideo coding standards play an important role on theseadvanced technology promotion. Recently, the state-of-the-art video coding standard H.264/AVC [2–4] has beenwidely used and achieved remarkable success. To satisfythe rapid increasing demand for high-definition (HD) andultra-HD (UHD) video contents, a higher requirement formore efficient video coding has been brought forward.

Recently, the next generation video coding standardnamed High Efficiency Video Coding (HEVC) is stepping upin development. In order to better adapt to HD contents,many novel coding tools are adopted in HEVC. Meanwhile,the main hybrid coding framework and some classical

All rights reserved.

x: +86 28 61830064.

schemes are reserved with respect to H.264/AVC. For theintra-frame prediction module, there are up to 36 intramodes available in the latest HEVC draft [5]. The novelangular intra prediction (AIP) still inherits the main idea ofdirectional intra prediction (DIP) in H.264/AVC. Similarly,the spatial correlation and structural regularity are alsoused to design the just-noticeable difference model [6].

Some inherent flaws in DIP have been reported whichlimits the intra prediction performance. As investigated in[7,8], two issues negatively affect the performance of DIP.Firstly, for HD contents, smaller partition block is preferreddue to complex texture, which introduces more bits tosignal intra mode information for each block. Secondly,along with the distance increase between the referencepixel and the pixel to be predicted, the prediction qualitycould degrade significantly for high texture, high detailcontents in HD video. Two kinds of strategies are widelydiscussed in the literatures to solve these problems. Thefirst strategy focuses on reducing spatial redundancy byrefining the conventional intra prediction structure.Choi et al. [7] modified Intra16�16 vertical and horizontalmodes into a line-by-line structure. In [8], a bi-MBlevel horizontal spatial prediction was proposed. The

Page 2: Mode dependent down-sampling and interpolation scheme for high ...

Q. Wu, H. Li / Signal Processing: Image Communication 28 (2013) 581–596582

bi-directional intra prediction (BIP) is proposed in [9],where 9 extra bi-directional intra modes were introduced.These methods improve the prediction accuracy by intro-ducing more correlated reference pixels or elaboratemodes. This inspires us to design a flexible interpolationscheme for each directional intra mode.

The second strategy tries to reduce the spatial redun-dancy with the down-sampling based coding (DBC)scheme [10–17]. In [11], an efficient super resolutiontechnique is proposed to reconstruct the down-samplingimage by exploiting the inter-resolution correlation andinter-frame correlation. In [12], the wavelet transformbased down-sampling is adaptively executed according tolocal region's smoothness. In [14–17], different samplingrates are explored to adapt to various local characteristics.A resample-based intra prediction method is furtherproposed in [17]. Particularly, in [13], Wu et al. proposedan interesting adaptive down-sampling scheme withdirectional prefiltering. A good performance can beachieved by the DBC schemes in low bit-rates. However,it is hard to extended the DBC methods to the medium orhigh bit-rates applications, which is widespread for HDcontent. In addition, the uniform down-sampling (UDS)scheme is also unsuitable for the high detail regions. Thismakes us consider introducing appropriate down-sampling structures for different local characteristics.

In this paper, we propose a novel mode dependentdown-sampling and interpolation (MDDI) scheme whichcan replace the conventional intra prediction structure inH.264/AVC. In our proposed MDDI scheme, we divide thepixels in current block into two subsets, the down-sampled and non-sampled ones. For the down-sampledpixels, we employ the DIP scheme to compute the predic-tion value. Then, the directional interpolation will be usedto predict the non-sampled pixels. An adaptive down-sampling structure is designed according to the intra modedirections. To guarantee the reconstruction quality, boththe residuals of the sampled and non-sampled pixels aretransmitted to the decoder. Since the down-sampling andinterpolation scheme is only related to the intra mode,there is no additional syntax modification. Experimentalresults show that significant RD performance improve-ment can be achieved compared to both the DIP and UDSschemes.

The reminder of this paper is organized as follows. Weinvestigate the prediction distance characteristics fordown-sampled image in Section 2, followed by a detaileddescription for the proposed MDDI scheme in Section 3.Experiment results are shown in Section 4. At last, wedraw the conclusion of this paper in Section 5.

2. Analysis of prediction distance characteristics fordown-sampling scheme

2.1. Analysis of intra prediction accuracy

The main task of intra prediction module is to removethe spatial redundancy between the correlated pixels incurrent block. Firstly, we will investigate the quantitativerelation of intra prediction error and the pixels’ correla-tion. The prediction error between the ith pixel and its jth

reference sample is measured with square error SEi;j, andtheir Pearson correlation coefficient is labeled by ρi;j. Itshould be noted that if there are multiple referencesamples for the ith pixel, we count their weighted averagevalue as the jth reference sample. The weights are deter-mined by the intra prediction mode. To make the investi-gation cover all directional intra prediction modes andmore spatial distances, we count each intra mode's SEi;jand ρi;j values for all intra 8�8 blocks. The statisticalresults are shown in Fig. 1.

There are four CIF sequences with different localfeatures involved in this investigation. The sequence fore-man has simple foreground and its background containsrich structural information. The foreground and back-ground in mother–daughter are both simple. The sequenceFOOTBALL contains complex foreground and stefan hascomplex background. In each sequence, the similar linearrelationship always can be found between the SEi;j andlnðρ−1i;j Þ for all directional intra modes. It is consistent withour intuitive understanding that the higher correlation interm of ρi;j brings lower prediction error in term of SEi;j.Without losing of generality, we can formulate the kthintra block's sum of prediction error D(k) as

DðkÞ ¼ ∑N

i ¼ 1SEi;jðkÞ

¼ ∑N

i ¼ 1½a � lnðρ−1i;j ðkÞÞ þ b� ð1Þ

where a and b are linear fitting parameters for eachsequence.

Based on the discussion in [18], we know that theautocorrelation coefficient of 1-D stationary Markov pro-cess can be represented as a 1-D distance power of thesuccessive random variables’ correlation coefficient. Then,we can further rewrite (1) as

DðkÞ ¼ ∑N

i ¼ 1SEi;jðkÞ

¼ ∑N

i ¼ 1½a � lnðρ−SðiÞÞ þ b�

¼−a � ðln ρÞ � ∑N

i ¼ 1SðiÞ þ N � b ð2Þ

where ρ, with jρjo1, is a correlation coefficient parameter,S(i) is the ith pixel's 2-D prediction distance model whichwould be discussed in the following subsection.

From the observation in Fig. 1, we know that a40 andln ρo0. Then, the problem of minimizing the sum of intraprediction error D(k) is equivalent to minimize the sum ofprediction distances in current block.

2.2. Prediction distance model

In H.264/AVC, 8 directional intra prediction modes aredesigned to remove the spatial redundancy as shown inFig. 2. During this process, it is a common assumption thatthe intra prediction residual grows larger as the spatialdistance increase. Based on this hypothesis, the intra modedetermination in DIP is equivalent to minimize the sum ofprediction distance (SPD) between the reference pixelsand the pixels under prediction in current block, which

Page 3: Mode dependent down-sampling and interpolation scheme for high ...

Fig. 2. Directional intra prediction modes.

ρ ρ

ρ ρ

Fig. 1. Relationship between intra prediction error and pixels' correlation. (a) foreman, (b) mother–daughter, (c) FOOTBALL, (d) stefan.

Q. Wu, H. Li / Signal Processing: Image Communication 28 (2013) 581–596 583

can be modeled as follows:

~m ¼ arg minm∈M

∑N

i ¼ 1SmðiÞ ð3Þ

where M is all directional intra modes set except for DCmode, m is the candidate mode, N is the number of pixelsunder prediction in current block, and Sm(i) represents theprediction distance of the ith pixel under intra predictionmode m.

To jointly estimate the effect of reference samplesnumber and dominant direction for each intra mode, wedefine the prediction distance Sm(i) as

SmðiÞ ¼1L

∑L

j ¼ 1li;jðsin θmi;j þ εÞ ð4Þ

where L is the number of reference pixels, li;j is theManhattan distance between the ith pixel to be predictedand its jth reference sample, θmi;j represents the angledisplacement between the dominant direction assignedby optimal intra mode ~m and the prediction direction

corresponding to candidate mode m. The prediction direc-tion is appointed by the ith pixel to be predicted as well asits jth reference sample assigned by intra mode m. ε is a

Page 4: Mode dependent down-sampling and interpolation scheme for high ...

Q. Wu, H. Li / Signal Processing: Image Communication 28 (2013) 581–596584

constant value to avoid zero value for the Sm(i), here wecan set it to 1.

For clarity, the distance estimation procedure in a 4�4block is illustrated in Fig. 3, where the dominant directionof current block is marked by the solid line, the dashedlines between pixel i and pixels j−1∼jþ 1 point to theprediction directions, the white circles represent the pixels

Fig. 3. Prediction distance estimation. The solid line point to thedominant direction and the dashed lines point to prediction directions.

0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Sm(i)

ln(ρi,j−1)

0 5 10 15 20 250

0.1

0.2

0.3

0.4

0.5

0.6

0.7

ln(ρi,j−1)

Sm(i)

Fig. 4. Relationship between the prediction distance and the pixels' correlatio

to be predicted and the grey circles represent adjacentreconstructed pixels. The angle displacements betweenthe dominant direction and the prediction directions havebeen shown with θmi;j−1∼θ

mi;jþ1. In this discussion, the opti-

mal intra mode ~m is assumed to intra mode 0 as shown inFig. 2, the candidate mode m is intra mode 3, then threereference samples of j−1∼jþ 1 are used according to thedescription in [2]. Thus, we can set the parameters in (4) asthat L¼3, θmi;j−1 ¼ arctan 1

2, θmi;j ¼ π=4, θmi;j−1 ¼ arctan 32. We

can easily obtain the Manhattan distances between thepixel to be predicted and the reference pixels as li;j−1 ¼ 3,li;j ¼ 4 and li;jþ1 ¼ 5. Finally, the prediction distance for theith pixel in Fig. 3 can be obtained as SmðiÞ ¼ 6:8.

To verify that the proposed prediction distance modelcould efficiently reflect the correlation of two neighboringpixels, we make an investigation for some sequences.Without losing of generality, we count the predictiondistances and correlation coefficients for all intra 8�8block with intra mode 0. In this investigation, the eightpixels upper the current block are used as the referencesamples. The prediction distances between all of thereference samples and the pixels in current block are

0 5 10 15 20 250

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Sm(i)

ln(ρi,j−1)

0 5 10 15 20 250

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Sm(i)

ln(ρi,j−1)

n coefficient. (a) foreman, (b) mother–daughter, (c) FOOTBALL, (d) stefan.

Page 5: Mode dependent down-sampling and interpolation scheme for high ...

Q. Wu, H. Li / Signal Processing: Image Communication 28 (2013) 581–596 585

computed according to (4) where the dominant directionis set to vertical direction. The direction between thereference sample and current pixel is used as the predic-tion direction. As shown in Fig. 4, for all of the sequences,an obvious linear relationship can be found between theproposed prediction distance model and the logarithm ofcorrelation coefficients. That is, the proposed predictiondistance model is valid for measuring the intra predictionerror in (2).

2.3. Analysis of the down-sampling based predictiondistance

Since the proposed prediction distance model is basedon the general assumption that the nearest samples aremost likely to be highly correlated with the samples in thecurrent block [19], we will first discuss its validity in thedown-sampling based intra prediction scheme. Withoutlosing of generality, we will discuss a 1D condition wherethe sampled pixels reconstructed in successive steps asshown in Fig. 5. The circles represent different pixels andtheir relative positions are labeled by the symbols “a–c”.

For directly prediction scheme, the pixel “a” is theavailable reference sample to predict the pixel “b–c”. Asdiscussed in [19], the nearest samples usually produce thebest prediction. Then, we can represent the intensityvalues’ relationship in Fig. 5 as

jI a−Icj24 jIb−Icj2 ð5Þwhere I a represents the reconstruction value of “a” andIb; Ic represent the original intensity values of “b–c”. Let usdenote the difference of two prediction residuals in (5) as

Δres ¼ jI a−Icj2−jIb−Icj2

¼ I2a−I

2b−2I aIc þ 2IbIc ð6Þ

In the down-sampling based prediction scheme, thesampled pixel “b”will be first predicted from “a”. Then, thepixel “c” is predicted from the reconstruction value of “b”.The residual of “c” can be represented as

jI b−Icj2 ¼ jIb−Ic þ εj2

¼ ðIb−IcÞ2 þ ε2 þ 2εðIb−IcÞ ð7Þwhere I b represents the reconstruction value of “b” and ε isthe quantization error of Ib.

Since the relationship of I a; Ib and Ic has been shown in(5), we can represent the difference of Ic's predictionresiduals relative to I a and I b as a dependent variable ofε, that is

f ðεÞ ¼ jI a−Icj2−jI b−Icj2

¼ −ε2−2εðIb−IcÞ þ Δres ð8ÞTo guarantee the nearest sample produces smaller

prediction residual in the down-sampling based predictionscheme, we only need to make the residual differencefunction satisfy f ðεÞ40. Accordingly, we further analyze

Fig. 5. Stepwise reconstruction and prediction for 1D condition.

the roots of the quadratic equation f ðεÞ ¼ 0 which can besolved as ε1 ¼ I a−Ib and ε2 ¼ 2Ic−ðI a þ IbÞ.

From the inequality (5), we can further deduce that

ε1ε2 ¼ ðI a−IbÞ½2Ic−ðI a þ IbÞ�o0 ð9ÞFor convenience, we set ε1o0 in this discussion.

According to the property of quadratic function, we knowthat if only the quantization error satisfies ε∈ðε1; ε2Þ, theinequality of f ðεÞ40 always holds.

As discussed in [20], the quantization error ε for theuniform scalar quantization can be represented as

ε¼ ⌊uQc � Q−uþ Q

2ð10Þ

where ⌊ � ⌋ is the floor operator, Q is the quantization stepand u is the input of the quantizer. We can deduce therange of the quantization error as ε∈½−Q=2;Q=2Þ. Appar-ently, if the absolute values of ε1 and ε2 are both greaterthan Q=2, the f ðεÞ could be always greater than 0.

In particular, since the ε1 represents the predictionresidual which is smaller than 0, we use EðI−resÞ to approx-imate it in the following investigation where Eð�Þ is theexpectation operator and I−res represents the negative intraprediction residuals. Similarly, ε2 is also approximated as2EðIþresÞ since ε2 satisfies ε2o2ðIc−I aÞ. The relationshipbetween EðjI−resjÞ and Q=2 is investigated by counting theirdifference which is denoted by Δ−. The difference between2EðjIþresjÞ and Q=2 is denoted by Δþ. The two differencemetrics are defined as

Δ− ¼ Eð I−res Þ−Q2

����

����

Δþ ¼ 2Eð Iþres Þ−Q2

����

����

ð11Þ

For H.264/AVC encoder, the quantization step Q [21] isfurther represented as a dependent variable of quantiza-tion parameter QP as

Q ¼ 2ðQP−12Þ=6 ð12ÞSince we mainly concern about the intra prediction

accuracy in the larger blocks, all the 8�8 blocks areconsidered in the following investigation. The commonquantization parameter setting with QP∈½22;37� isemployed here. The statistical results of four sequenceswith different features are shown in Fig. 6.

It can be seen that both the Δ− and Δþ are greater than0 at most time. That is, the value scope of ε is between ε1and ε2. So, we can conclude that the nearest samples arestill highly correlated with the pixels in current block forthe down-sampling based intra prediction scheme.

Based on the proposed prediction distance model, wecan analyze the impact of the down-sampling method forintra prediction procedure. The widely used 1/4 uniformdown-sampling (1/2 sampling rate in both vertical andhorizontal directions) is discussed in the following.

As shown in Fig. 7, the down-sampling pixels in a 4�4block are labeled with different shapes in accordance withthe reconstruction order, where the square ones arepredicted and reconstructed firstly, then followed by thetriangle and hexagon ones, the circle ones are recon-structed at last. The numbers placed on top of the 4�4

Page 6: Mode dependent down-sampling and interpolation scheme for high ...

22 24 26 28 30 32 34 36 38−5

0

5

10

15

20

25

QP

Δ−

foremanmother−daughterFOOTBALLstefan

22 24 26 28 30 32 34 36 380

10

20

30

40

50

60

QP

Δ+

foremanmother−daughterFOOTBALLstefan

Fig. 6. Comparison for the value ranges of intra prediction residuals and quantization error. (a) Comparison for the lower bound. (b) Comparison for theupper bound.

Fig. 7. Uniform down-sampling structure in 4�4 block. The differentshapes correspond to different reconstruction orders.

Q. Wu, H. Li / Signal Processing: Image Communication 28 (2013) 581–596586

block indicate horizontal ordinate in current block, and theones on the left indicate the vertical coordinate.

For clarity, more detailed description about this processis shown in Fig. 8, where the dotted shapes represent thenon-sampled pixels, and the grey circles represent theneighboring reconstructed pixels, the dashed lines point tothe pixels to be predicted from their references. To verifythe superiority of the down-sampling method, the SPD ofoptimal intra mode ~m is compared to the SPD of the down-sampling method. For each square sample in Fig. 8(a), thepredicted value is still generated from the DIP method.With the same assumption in Fig. 3, the intra mode 0 isalso used as ~m as shown in Fig. 8(a), where the squareones are first predicted with the reconstructed pixel inneighboring blocks. Then the triangle samples are pre-dicted from the interpolation of the two reconstructedpixels in vertical direction. Since the square samples havebeen reconstructed in previous coding order, they turninto grey in Fig. 8(b). The similar process can be directlyextended to the hexagon and white circle samples in Fig. 8(c) and (d), where the interpolation direction could beadjusted along with the dashed line. For the calculation ofprediction distance, the same procedure in the Section 2.2is directly used in Fig. 8(a), where the DIP prediction valueis employed. For each triangle sample in Fig. 8(b), tworeferences in vertical direction are available, so the para-meters in (4) can be set as L¼2, θmi;j ¼ 0, θmi;jþ1 ¼ 0 and thetwo Manhattan distances are li;j ¼ 1, li;jþ1 ¼ 1, respectively.

Then the prediction distance for the ith triangle sample inFig. 8(b) can be obtained as SmðiÞ ¼ 1. While, in Fig. 8(c),although there are same reference samples number andManhattan distance parameters, the angle displacementsare different from those in Fig. 8(b) significantly, that isθmi;j ¼ π=2 and θmi;jþ1 ¼ π=2. So we can get the predictiondistance in Fig. 8(c) as SmðiÞ ¼ 2. Particularly, in Fig. 8(d),three reconstructed pixels are used to predict the whitecircle sample, and the parameters in (4) will be adjusted asL¼3, θmi;j−1 ¼ π=2, θmi;j ¼ π=4 and θmi;jþ1 ¼ 0. The Manhattandistances are reset as li;j−1 ¼ 1, li;j ¼ 2 and li;jþ1 ¼ 1. Finally,the prediction for each white circle sample can beobtained as SmðiÞ ¼ 2:1.

It is obvious that for the samples in Fig. 8(b)–(d), theprediction distances are the same for the one with sameshape. From the derivation above, we can conclude thatthe SPD of current block with down-sampling method isequal to 32.4. In contrast, for the conventional DIP method,the SPD is 40 under intra mode 0. That is, the SPD withdown-sampling method is significantly less than the onewith original DIP method. It is logical to believe that thedown-sampling based prediction method could improvethe intra prediction accuracy efficiently.

Although the superiority of the uniform down-samplingstructure has been proven in the above discussion, theinflexible design also limits the further improvement of theintra prediction precision. Due to the limitations of the fixedreconstruction order and interpolation direction, the angledisplacement parameter θmi;j in (4) is not able to adapt todifferent dominant direction assigned by various intramode. So, appropriate down-sampling and interpolationstructure should be designed for each directional intramode, respectively.

3. Proposed algorithm

As discussed in Section 2, we have highlighted theimportance in making full use of the direction informationassigned by intra mode to optimize the down-samplingstructure. Since the coding performance is based on the

Page 7: Mode dependent down-sampling and interpolation scheme for high ...

Fig. 8. Prediction and reconstruction structure for down-sampled pixels in different coding order. (a) First order. (b) Second order. (c) Third order.(d) Fourth order.

Q. Wu, H. Li / Signal Processing: Image Communication 28 (2013) 581–596 587

rate-distortion (RD) measurement, a RD performanceanalysis will be presented first here which is followedthe detail description of the proposed MDDI scheme.

3.1. Problem formulation

In intra prediction coding, to determine the optimalintra mode, the RD measurement is employed as follows:

Jm ¼Dm þ λ� ðRcm þ Rh

mÞ ð13Þ

where Jm represents the RD cost under candidate intramode m in current block, Dm indicates its reconstructiondistortion, λ is the Lagrangian multiplier related withquantization parameter (QP), and the sum of Rm

cand Rm

h

represents the total cost of bit rates. Rmcis the overhead for

transform coefficients and Rmhcovers the cost for signaling

selected intra mode in macroblock (MB) header.For the down-sampling based intra prediction, there

are two available schemes to signal the down-samplingstructure. The first scheme is to add the down-samplingmethods as extra modes in the loop of optimal intra modedetermination. Meanwhile, additional overhead and cod-ing pass will be needed to select a new optimal intramode. The second one is just to modify the existingdirectional intra modes according to specific down-sampling structure, where there is no change in codecsyntax and coding loop required. In view of compatibilityand complexity control, the latter scheme is employed inour proposed algorithm.

In keeping with the principle in (13), we can formulatethe RD cost of down-sampling based intra predictioncoding as

Jm ¼ λ� Rhm þ ∑

K

k ¼ 1½DmðkÞ þ λ� Rc

mðkÞ� ð14Þ

where k represents the order of the down-sampled sub-block and K is the number of all sub-blocks. In ourproposed method, since the MB header is identical forthe same intra prediction block size, the term of Rm

his

placed out of the summation.To guarantee that the down-sampling based intra

prediction method is superior to the DIP method, we onlyneed to make the following condition hold

∑K

k ¼ 1½DmðkÞ þ λ� Rc

mðkÞ�≤Dm þ λ� Rcm ð15Þ

As discussed in Section 2, the down-sampling basedintra prediction method could improve prediction accu-racy, which obviously reduces Rc

mðkÞ in (14) and (15). In theideal case, if the prediction residuals in current block areall zero, the optimal distortion Dm(k) can be obtained here.That is, the inequality in (15) is certainly satisfied. Other-wise, the Dm(k) could become larger since more non-zeroresiduals are eliminated by quantization. The detailedeffect of this condition will be discussed in Section 4.

In addition, for the blocks with different sizes, the totalRD cost of the whole MB needs to be considered as

JBS ¼ λ� RHBS þ ∑

U

u ¼ 1½DBSðuÞ þ λ� RC

BSðuÞ� ð16Þ

where JBS represents the RD cost of the whole MB withselected intra prediction block size (BS), RBS

Hindicates the

total bits overhead for MB header which includes selectedBS, intra prediction modes and so on. U represents thenumber of intra prediction blocks in MB which is deter-mined by BS. DBS(u) and RC

BSðuÞ represent the reconstruc-tion distortion and transform coefficients overhead in eachintra prediction block, respectively.

Apparently, the larger the BS selected, the less the RBSH

obtained. That is, if only the prediction accuracy could beimproved efficiently for large intra prediction block, a

Page 8: Mode dependent down-sampling and interpolation scheme for high ...

Q. Wu, H. Li / Signal Processing: Image Communication 28 (2013) 581–596588

smaller JBS could be obtained. Therefore, a better RDperformance can be achieved for the whole MB.

In particular, another constraint for sampling rate mustbe considered in view of compatibility in subsequentresidual transform and entropy. In H.264/AVC, separable2D DCT is used for the residual block transform. Werepresent the transform size with XT � XT , and the numberof down-sampled pixels is represented by Xd. To guaranteethat the existing transform size is compatible with thesampled sub-block, we need to ensure that Xd is divisibleby XT � XT . Then, for each specific intra mode m, theproblem of optimizing down-sampling structure can beexpressed as follows by rewriting (3) as

~W ¼ arg minW

∑N

i ¼ 1Swim ðiÞ

s:t: Xd ¼ ∥W∥Xd≤NXd%ðXT � XT Þ ¼ 0 ð17Þ

where wi represents the indicator to label whether the ithpixel is down-sampled or not, and the symbol “%” repre-sents the modulus operator. If it is down-sampled, we canset wi¼1, otherwise, wi¼0. W ¼ fw1;w2;⋯wNg representsall available down-sampling structure. Since only “0” and“1” are valid in W, Xd can be expressed as l1 norm of Wwhich is equal to the number of “1” in W. The solution of(17) will be discussed later in the following sub-section.

Then, substituting (4) into (3), the optimization pro-blem of selecting optimal intra mode based on down-sampling method can be rewritten as

~m ¼ arg minm∈M

1

L∑N

i ¼ 1∑L

j ¼ 1l i;jðsin θ

mi;j þ εÞ ð18Þ

where L, l i;j and θ i;j are the modified parameters in (4)corresponding to the optimized down-sampling structure.

3.2. Mode dependent down-sampling and interpolationscheme

The framework of our proposed MDDI scheme isillustrated in Fig. 9, where the forward and inverse trans-form/quantization would be modified as the MDDI basedintra prediction is enabled.

In some previous works [7,8], the 16�16 block isusually preferred to implement optimizing algorithm.However, only three directional intra modes are availablefor this intra block type, which is insufficient to make fulluse of the directional information. In our proposed MDDIscheme, the 8�8 block is selected, in which the direc-tional intra modes can be up to 8 and it is both widely usedin image and video coding.

As discussed in Section 3.1, we can obtain the optimaldown-sampling structure for each intra mode by solvingthe problem in (17). Since the down-sampled pixels arestill predicted by DIP method, the parameters in (17) areall definite values. However, the non-sampled pixels arepredicted from the interpolation of the reconstructedsamples in some sampling locations. Although many out-standing directional interpolation methods [22,23] havebeen proposed recently, the adaptive down-sampling

structure brings some difficulties in directly employingthese interpolation methods. In fact, since the uniquedown-sampling structure is corresponding to each intramode, the available reference samples will vary from onemode to the other ones. So, an elaborate interpolationscheme should be designed here to adapt to differentreference samples’ distribution.

We define the interpolation principles for non-sampledpixels as follows:

As the nearest two reconstructed samples in dominantdirection are available, the interpolation operation isformulated as

Imp ¼ ðIm0 þ Im1 þ 1Þ⪢1 ð19Þ

As there is only one reconstructed sample available indominant direction and the other nearest two sampleswith different angle displacements, the interpolationoperation is formulated as

Imp ¼ ð3Im0 þ I1 þ 2Þ⪢2 ð20Þ

As there is no reconstructed sample available in domi-nant direction or only one is available and the othertwo samples having the same displacements, the inter-polation is formulated as

Imp ¼ ð2I0 þ I1 þ I2 þ 2Þ⪢2 ð21Þ

where Ipm

represents the prediction value of current non-sampled pixel under intra mode m, Im0 represents thereference samples which is ahead of current non-sampled pixel in dominant direction, and Im1 representthe one after current pixel in dominant direction. If thereference samples are not in the dominant direction, wedenote them with I0∼I2 according to the distance where I0is the one with the smallest spatial distance and angledisplacement relative to current pixel. For clarity, wefurther illustrate the interpolation process for differentreference samples’ distribution in Fig. 10. Fig. 10(a) showsthe condition in (19) where there are two referencesamples available in the dominant direction. Fig. 10(b)shows the condition in (20) where there is only onereference sample available in the dominant direction andthe other samples with different prediction distances.Figs. 10(c) and (d) show the two conditions in (21). InFig. 10(c), only one sample in the dominant direction isavailable but the other samples have the same predictiondistances. In Fig. 10(d), there is no sample in the dominantdirection is available.

From the prediction distance definition in (4), we knowthat the spatial distance plays a major role in determiningprediction distance. In some special cases, such as hightexture areas with great gradient change, the effect ofprediction angle displacement would significantlyincrease. That is, it should be given priority to avoid thegreat distortion induced by interpolation direction devia-tion. Accordingly, a high sampling rate is needed to reducethe interpolation prediction in other directions whichdeviate from the dominant direction.

Page 9: Mode dependent down-sampling and interpolation scheme for high ...

Fig. 9. Implementation of MDDI in H.264/AVC framework.

Fig. 10. Interpolation principles for different reference samples distribution. The circles with solid edges represent the available reference samples, and thecircles with dashed edges represent the non-sampled to be predicted. The solid lines with arrow point to the dominant directions and the dashed linesrepresent the interpolation from other directions.

Q. Wu, H. Li / Signal Processing: Image Communication 28 (2013) 581–596 589

Since we implement the MDDI scheme in 8�8 block,the sampled sub-block should be transformed/quantizedin a 4�4 unit. Then, we can set the parameters N¼64 andXT¼4 in (17). Finally, we select the samples number asXd¼32 where the sampling rate is 1=2 for each 8�8 block.A higher sampling rate can be used by changing Xd withsome smaller values. However, the higher sampling ratewill reduce interpolation accuracy for increasing the pre-diction distance. To trade off the non-sampled pixels’amount and the interpolation accuracy, we employ thesampling ration 1/2 in designing our mode dependentdown-sampling structure.

We further add a constraint to set the sampling rate indominant direction greater than or equal to 1=2, exceptwhen only one pixel is available in dominant direction. Inthis way, there are as much as possible non-sampled pixelshaving reference samples in dominant direction. Theoptimized down-sampling structure for each directionalintra prediction mode can be obtained as shown in Fig. 11.The grey circles represent the sampled pixels predicted byDIP and the white ones represent the non-sampled pixelspredicted by proposed interpolation, the solid lines indi-cate the interpolation direction and reference samplesadjacent relation. The thin dotted lines surrounding thecircles represent the block boundary. The thick dashedlines across each block center are used to plot out thepixels that are assigned to different 4�4 units. As shownin Fig. 11, the proposed down-sampling structures are

elaborately designed for each directional intra mode ofthe H.264/AVC encoder. Although only eight down-sampling patterns are shown here, it can be easilyextended to more sampling structures by introducingmore directional prediction modes.

In each block, all sampled pixels on top of the dividingline are replaced into a regular 4�4 block in raster scanorder for subsequent transform and quantization. Then thesame process is also implemented on the other sampledpixels under the dividing line. As all sampled pixels arereconstructed, the non-sampled pixels can be predicted byinterpolation of all reconstructed samples, and the sametransform and quantization operation is achieved like thesampled ones. For clarity, a detailed description of thisprocess for vertical intra mode is illustrated in Fig. 12,where the numbers under the circles indicate the pixels’position in the original 8�8 block. A similar procedurecan be extended to different intra modes.

4. Experimental results

To verify the performance of our proposed down-sampling based intra prediction algorithm, the MDDIscheme is implemented on VCEG KTA2.4r1 referencesoftware [24] under the common test conditions ofVCEG [25]. In this intra only experiment, we employ H.264/AVC High Profile where the CABAC entropy coding isadopted and the high complexity RDO process is enabled.

Page 10: Mode dependent down-sampling and interpolation scheme for high ...

Fig. 11. Optimized down-sampling structure for each directional intra prediction mode. (a) Vertical. (b) Horizontal. (c) Down left. (d) Down right.(e) Vertical right. (f) Horizontal down. (g) Vertical left. (h) Horizontal up.

Fig. 12. The procedure of MDDI scheme for vertical intra mode.

Q. Wu, H. Li / Signal Processing: Image Communication 28 (2013) 581–596590

The quantization parameters are set to QP¼ f22;27;32;37g.All HEVC sequences in [26] are encoded the first 100 framesin our experiment. The H.264/AVC High Profile with con-ventional DIP (“Anchor”) is utilized as the benchmark. Tomake a comprehensive evaluation for the proposed MDDIscheme, three different intra prediction methods areinvolved in our experiment for comparison. They are UDS,BIP and BIP+MDDT (Mode Dependent Directional Trans-form) schemes. The BIP and MDDT methods are bothproposed in [9]. But they focus on different coding modules

where BIP is related to prediction and MDDT works ontransform. So, we separately compare these two schemeswith our MDDI scheme in the following simulations. Itshould be noted that only 8�8 block is modified and DCmode remains unchanged for both of the UDS and MDDIschemes. As a metric to evaluate coding efficiency, BD-BR(Bjonteggard Delta Bit-Rate) and BD-PSNR (BjonteggardDelta PSNR) [27] are used here. In addition, to furthercompare the computational complexity between the pro-posed scheme and the anchor, both the qualitative analysis

Page 11: Mode dependent down-sampling and interpolation scheme for high ...

Q. Wu, H. Li / Signal Processing: Image Communication 28 (2013) 581–596 591

and quantitative comparison in term of time consumptionare made in the this section. The percentage of difference ofcoding time (ΔT%) is defined as

ΔT ¼ Tpro−Tanc

Tanc� 100 ð22Þ

where Tpro and Tanc denote the coding time of the proposedscheme and the anchor respectively.

4.1. Intra prediction block size distribution

To explore the characteristics of MDDI-preferred MB,we investigate the intra prediction block size distributionin this subsection. In this experiment, four sequences withdifferent video contents are tested here. As shown inTable 1, for the sequences with complex scenes, subtle4�4 block is preferred in DIP method. While in MDDIscheme, the modified 8�8 block takes majority in allavailable block sizes. Especially for the sequencesBQSquare and PartyScene, more I8MB are selected inaccordance with RD performance, where the largest dif-ference of I8MB ratio can be more than 60% in BQSquare. Tohighlight these results, the larger I8MB ratio is labeled inbold in Table 1. However, for sequence vidyo4 which has asimple indoor scene, the I8MB ratio even drops in MDDIscheme. In fact, it is consistent with the principle of ourMDDI scheme which is mainly developed to adapt to thehigh detail and high texture conditions.

In addition, another interesting character can be foundin the sequences BQSquare and PartyScene. For DIPmethod, the I4MB ratio is inversely proportional to QP,and I8MB is proportional to QP. However, the oppositeresult arises in MDDI scheme. Generally, for smaller QPvalue, more non-zero coefficients would be reserved afterquantization. The prediction accuracy plays more impor-tant role in bits saving. For small QP value, more I4MB isselected, which usually produces much more preciseprediction relative to I8MB in DIP method. In our MDDIscheme, the improved I8MB can achieve comparable

Table 1Intra prediction block size distribution (%) for different methods.

Sequence QP DIP

I4MB I8MB

BQTerrace (Class B) 37 27.06 45.2532 37.82 42.2827 53 33.622 32.62 56.78

PartyScene (Class C) 37 53.14 33.6532 72.63 23.5327 82.18 16.0922 88.72 10.77

BQSquare (Class D) 37 65.9 16.9232 66.41 20.5127 72.05 24.3622 74.87 17.44

vidyo4 (Class E) 37 5.03 41.0632 7.67 56.6727 12.83 50.8622 17 64.31

prediction accuracy with respect to I4MB, so the super-iority of less bits for MB header is more obvious as QPis small.

4.2. Coding performance

Some selected rate-PSNR curves are shown in Fig. 13.The detailed coding gains for all sequences are presentedin Table 2. It can be found that a superior RD performancecan be achieved with the proposed MDDI scheme. Exceptfor two sequences of Kimono1 and vidyo3, the proposedMDDI scheme outperforms the anchor significantly. Up to26.97% bitrate savings have been achieved for sequenceBQSquare. On average, the 7.52% bitrate savings can beachieved for all resolutions sequences from Class A to ClassE. In addition, from the observation in Fig. 13, we can findthat our proposed method works well in a wide range ofbit-rates for the sequences with different features. Thesequence Basketball has medium details. The sequencevidyo4 has simple content in a indoor scene. Both thesequences BQMall and BQTerrace have complex contents ina outdoor scene. For the sequence BasketballPass shown inFig. 13(a), the MDDI scheme' bit-rates range from 900 kbpsto 4900 kbps, and the coding gain does not decrease in thehigh bit-rates. In particular, as shown in Fig. 13(b), for thehigh details video contents like BQMall whose bitrate canbe up to 28 000 kbps, our MDDI scheme still achievesconsistent coding gains. Since both the residuals of down-sampled and non-sampled pixels are encoded in ourproposed method, we efficiently avoid the bit-rates limitwhich exists in most down-sampling and super-resolutioncombination coding schemes.

To make a complete comparison between the MDDIscheme and the other schemes, some subjective results forthe sequence BQSquare are further shown in Fig. 14. It canbe seen that our proposed MDDI scheme shows a consis-tent superiority in this perceptual quality comparison.Comparing with the BIP and BIP+MDDT schemes, the chairleg in the top is more clear and sharp in the MDDI scheme.

MDDI

I16MB I4MB I8MB I16MB

27.7 26.7 40.74 32.5619.9 31.85 41.74 26.4113.39 38.62 45.91 15.4810.6 28.93 61.76 9.3

13.21 41.73 48.78 9.493.85 44.68 49.1 6.221.73 35.77 61.09 3.140.51 24.49 73.72 1.79

17.18 41.54 35.13 23.3313.08 29.74 56.41 13.853.59 25.9 58.46 15.647.69 18.21 68.46 13.33

53.92 8.28 23.78 67.9435.67 11.86 36 52.1436.31 20.06 40.39 39.5618.69 29.47 47.03 23.5

Page 12: Mode dependent down-sampling and interpolation scheme for high ...

Fig. 13. Rate-PSNR curves for different sequences. (a) BasketballPass. (b) BQMall. (c) vidyo4. (d) BQTerrace.

Table 2Coding gain of different schemes in terms of BDBR (%) and BDPSNR (dB).

Sequence UDS BIP BIP+MDDT MDDI

BDBR BDPSNR BDBR BDPSNR BDBR BDPSNR BDBR BDPSNR

Class APeopleOnStreet 5.07 −0.29 −2.81 0.17 −8.50 0.52 −2.66 0.16Traffic 5.90 −0.32 −2.49 0.14 −7.81 0.45 −4.01 0.22

Average Class A 5.49 −0.31 −2.65 0.16 −8.16 0.49 −3.34 0.19

Class BBasketballDrive −0.51 0.01 −1.68 0.05 −3.56 0.12 −13.13 0.42BQTerrace 0.44 −0.05 −1.63 0.11 −4.26 0.26 −10.75 0.66Cactus 1.45 −0.06 −2.42 0.09 −5.67 0.24 −3.44 0.14ParkScence 3.09 −0.14 −2.44 0.11 −6.04 0.29 −2.05 0.09Kimono1 4.95 −0.18 −4.67 0.17 −8.30 0.30 7.09 −0.25

Average Class B 1.88 −0.08 −2.57 0.11 −5.57 0.24 −4.46 0.21

Class CBasketballDrill 0.52 −0.03 −2.16 0.11 −6.32 0.33 −8.55 0.44BQMall −3.76 0.23 −2.06 0.13 −5.36 0.34 −12.07 0.79PartyScene −9.28 0.73 −1.80 0.15 −4.15 0.34 −19.02 1.62RaceHorses 0.06 −0.01 −1.64 0.11 −3.71 0.26 −4.21 0.29

Average Class C −3.12 0.23 −1.92 0.13 −4.89 0.32 −10.96 0.79

Class DBasketballPass −2.08 0.12 −2.06 0.12 −4.13 0.26 −14.26 0.91BlowingBubbles −5.11 0.32 −1.81 0.11 −4.65 0.30 −12.64 0.81BQSquare −14.18 1.35 −2.00 0.19 −4.87 0.46 −26.97 2.94RaceHorses 0.83 −0.06 −1.81 0.12 −5.45 0.39 −3.55 0.24

Average Class D −5.14 0.43 −1.92 0.14 −4.78 0.35 −14.36 1.23

Class Evidyo1 3.92 −0.20 −2.89 0.15 −7.19 0.39 −2.74 0.15vidyo3 6.71 −0.38 −2.14 0.13 −6.96 0.42 1.61 −0.10vidyo4 3.61 −0.17 −2.57 0.12 −6.22 0.31 −4.04 0.20

Average Class E 4.75 −0.25 −2.53 0.13 −6.79 0.37 −1.72 0.08

Average Total 0.09 0.05 −2.32 0.13 −6.04 0.35 −7.52 0.54

Q. Wu, H. Li / Signal Processing: Image Communication 28 (2013) 581–596592

Page 13: Mode dependent down-sampling and interpolation scheme for high ...

Fig. 14. Subjective quality comparison for different schemes. The detailed results (bitrate, PSNR) are shown beside each scheme name. (a) Original.(b) Anchor (3821.28, 29.40). (c) BIP (3845.76, 29.65). (d) BIP+MDDT (3752.64, 29.81). (e) UDS (3875.04, 29.80). (f) MDDI (3799.68, 31.23).

Q. Wu, H. Li / Signal Processing: Image Communication 28 (2013) 581–596 593

In addition, the MDDI scheme also shows a better texturerecovery performance for the tile in the bottom relative tothe other schemes.

As mentioned in previous discussion, two factors bringthe contributions for the coding gains. Firstly, less over-head for MB header makes the modified 8�8 preferredcompared to the 4�4 block. Secondly, more accurate intraprediction of the proposed MDDI scheme further reducethe bits for transform coefficients. To verify these assump-tions, we investigate the intra prediction residual charac-teristics for different methods in the following.

The detailed comparison is illustrated in Fig. 15. Theintra modes distribution for different methods are shownin Fig. 15(a) and (b), where the straight lines indicate thedirection assigned by specific intra mode in Fig. 2, and thesquares represent the DC mode. The different colorscorrespond to various intra prediction block sizes. Thered indicate the 4�4 block, the green represent 8�8block and the blue is for 16�16 block. Apparently, more

8�8 blocks arise in Fig. 15(b), and selected directionalintra modes are generally consistent with the texture inthese local regions. Furthermore, the intra predictionresidual for different methods are shown in Fig. 15(c)and (d). For comparison, both the absolute values of theresiduals for different methods are normalized to 0–1, andthe heat map is employed to visualize the amplitude of theabsolute residuals. From the observation in Fig. 15(c) and(d), we can find the similar residual distribution in boththe DIP and MDDI methods. Some worse predictions canbe found in MDDI scheme which correspond to brighterareas in Fig. 15(d). In fact, it is also consistent with theanalysis about prediction distance. Since the proposedscheme is designed to improve the prediction accuracyon 8�8 block, its SPD is still larger than the one in 4�4block. However, comparing with the bits saving on MBheader, the prediction accuracy is acceptable.

To further investigate the prediction accuracy of ourproposed MDDI scheme, a fairer comparison is done in

Page 14: Mode dependent down-sampling and interpolation scheme for high ...

Fig. 15. Intra prediction modes and residual distribution of BQSquare for different methods as QP¼22. (a) Intra modes for DIP. (b) Intra modes for MDDI.(c) Prediction residual for DIP. (d) Prediction residual for MDDI.

Fig. 16. 8�8 intra prediction residual distribution of BQSquare for different methods as QP¼22. (a) Prediction residual for DIP. (b) Prediction residualfor MDDI.

Q. Wu, H. Li / Signal Processing: Image Communication 28 (2013) 581–596594

Fig. 16, where the residuals are only produced by 8�8intra prediction blocks. The same normalization in Fig. 15is also executed for the absolute residuals here. It isobvious that there are more sparse bright areas in ourMDDI scheme as shown in Fig. 16(b). Especially, in thelower right corner of the heat map, the brightness inFig. 16(b) is obviously weaker than the one in Fig. 16(a),which means more accurate prediction can be achieved inour proposed MDDI scheme. In addition, for some areas,

which are both highlighting in Fig. 16(a) and (b), a sparsecharacteristics can be found in the latter, where many dark-ness fringes arise in the highlighting areas in Fig. 16(b).Since the proposed MDDI scheme is based on down-sampling method, the larger residuals still remain in thedown-sampled pixels which correspond to DIP method andsmaller residuals show up in the non-sampled pixelscorresponding to MDDI scheme. Finally, it shows the resultof alternate appearance of the brightness and darkness

Page 15: Mode dependent down-sampling and interpolation scheme for high ...

Fig. 17. Intra prediction block size distribution for different methods with QP¼22. (a) Kimono1. (b) vidyo3.

Table 3Calculation amount analysis for additional interpolation prediction.

Intra mode 0 1 3 4 5 6 7 8 Avg.

Addition 64 64 68 71 65 64 64 65 65.625Multiplication 0 0 6 11 9 6 12 9 6.625Shift 32 32 32 32 32 32 32 32 32

Q. Wu, H. Li / Signal Processing: Image Communication 28 (2013) 581–596 595

strips in Fig. 16(b). It would not bring more high frequencycomponent in transform, because the down-sampled andnon-sampled pixels are separately operated as shown inFig. 12. And more bits saving can be obtained from theprecise interpolation prediction in non-sampled pixels.

At the same time, some failed results are also reportedlike the sequences Kimono1 and vidyo3. To analyze thereason of these failures, we investigate their intra predic-tion block size distribution in Fig. 17. From the observationin Fig. 17 and video content analysis, there are twocommon features that account for this result. Firstly, onlysimple scene exist in the sequence. Secondly, a high I8MBratio arises in the block size distribution. Since our MDDIscheme is primarily developed for complex context orenvironment, the superiority in prediction accuracyreduces for simple scene where the DIP method hasachieved efficient prediction. In addition, in MDDI scheme,the 8�8 block is divided into four 4�4 blocks to executetransform and entropy coding. For the case that a preciseprediction has been achieved in the whole 8�8 block, theseparate entropy coding for each 4�4 block may increasethe total run lengths relative to one 8�8 block coding. Justlike the sequence Kimono1 in Fig. 17(a), the I8MB ratio canbe more than 90% in DIP method, which means theconventional DIP method has achieved efficient prediction.In Fig. 17(b), the I8MB ratio also takes majority in all blocksizes. That is, both of these sequences are not consistentwith the complex scene assumption in our MDDI scheme.Finally, the I8MB ratio of MDDI scheme drops in both ofthese sequences with respect to DIP method as shown inFig. 17. It brings some limitations in the real application forthe proposed MDDI scheme. However, this problem couldbe solved by combining the conventional DIP scheme withour MDDI scheme at the cost of additional complexity andsyntax modification. A more robust scheme will be studiedin our future work to improve the MDDI scheme.

4.3. Computational complexity

A qualitative complexity analysis will be firstly made inthis section. Although there are no additional codingpasses induced in the optimal intra mode determination

process, the additional calculation for interpolation pre-diction also induces slight computation complexityincrease in our MDDI scheme. Since the H.264/AVC enco-der needs to traverse every intra mode to select theoptimal coding options, the average calculation amountin prediction module is positively related with the codec'scomplexity. Accordingly, we count the additional interpo-lation computations for the non-sampled pixels accordingto (19)–(21). All of the addition, multiplication and shiftoperations for each directional intra mode in the 8�8block have been counted and the results are shown inTable 3.

To quantitatively evaluate the efficiency of our pro-posed MDDI scheme, we compare the executing timebetween the MDDI and the DIP methods. To be fair, bothof the test methods are implemented under the same KTAreference software [24] and hardware configurations. Thecomplexity assessing experiment is implemented on a PC.The system platform is the Intel Pentium Dual-CoreProcessor of speed 2.8 GHz, 2GB DDR2 RAM, and MicrosoftWindows XP. A detailed comparison of results in terms ofΔT has been shown in Table 4.

It can be seen that the proposed MDDI scheme inducesadditional 15:67% time consuming relative to the anchorscheme on average. It is consistent with our qualitativeanalysis, that is, the interpolation operations will bringadditional computation in the MDDI scheme. In addition,comparing with the 0.54 dB coding gains on average, thecomplexity increase is acceptable.

5. Conclusion and future works

In this paper, an improved intra prediction scheme waspresented. Intra mode syntax was exploited to design an

Page 16: Mode dependent down-sampling and interpolation scheme for high ...

Table 4Average encoding time comparison in terms of ΔT (%).

Sequence ΔT

Class APeopleOnStreet 16.59Traffic 16.07

Average Class A 16.33

Class BBasketballDrive 16.75BQTerrace 15.61Cactus 16.72ParkScence 15.64Kimono1 17.44

Average Class B 16.43

Class CBasketballDrill 15.47BQMall 15.86PartyScene 12.89RaceHorses 14.21

Average Class C 14.61

Class DBasketballPass 15.24BlowingBubbles 13.90BQSquare 11.39RaceHorses 14.69

Average Class D 13.81

Class Evidyo1 17.65vidyo3 18.33vidyo4 17.56

Average Class E 17.85

Average Total 15.67

Q. Wu, H. Li / Signal Processing: Image Communication 28 (2013) 581–596596

adaptive down-sampling structure, where the directionalinterpolation can be naturally adopted to achieve highprecision prediction for the non-sampled pixels. Sincethere are no extra coding passes and overhead introducedin the proposed MDDI scheme, we can achieve a signifi-cant performance improvement with negligible complex-ity increase.

In our future works, the MDDI scheme with moreflexible sampling rate and interpolation method will bestudied and implemented on HEVC test model which haslarger block sizes and plentiful directional modes.

Acknowledgments

This work was partially supported by NSFC (No.61271289), by the Ph.D. Programs Foundation of Ministryof Education of China (No. 20110185110002) and by theFundamental Research Funds for the Central Universities(ZYGX2012YB007).

References

[1] M. Tanimoto, Ftv: free-viewpoint television, Signal Processing:Image Communication 27 (6) (2012) 555–570.

[2] ITU-T Recommendation H.264 and ISO/IEC 14496-10 (MPEG-4) AVC,Advanced Video Coding for Generic Audiovisual Services (Mar.2005).

[3] T. Wiegand, G. Sullivan, G. Bjontegaard, A. Luthra, Overview of theH.264/AVC video coding standard, IEEE Transactions on Circuits andSystems for Video Technology 13 (7) (2003) 560–576.

[4] A. Puria, X. Chenb, A. Luthrac, Video coding using the H.264/MPEG-4AVC compression standard, Signal Processing: Image Communica-tion 19 (9) (2004) 793–849.

[5] Joint Collaborative Team on Video Coding, San José, CA, USA, Highefficiency video coding (HEVC) text specification draft 6, JCTVC-H1003 (Feb. 2012).

[6] J. Wu, F. Qi, G. Shi, Self-Similarity based structural regularity for just-noticeable difference estimation, Journal of Visual Communicationand Image Representation 23 (6) (2012) 845–852.

[7] J.-A. Choi, Y.-S. Ho, Efficient intra coding structure for high resolutionvideos using line-by-line prediction and adaptive transform selec-tion, in: IEEE International Conference on Image Processing, 2010,pp. 1217–1220.

[8] W. Wu, P. Tao, M. Xiao, J. Wen, R. Li, Improved intra prediction forhigh definition video using localized horizontal spatial prediction,in: IEEE International Conference on Image Processing, 2010,pp. 1245–1248.

[9] Y. Ye, M. Karczewicz, Improved H.264 intra coding based on bi-directional intra prediction, directional transform, and adaptivecoefficient scanning, in: IEEE International Conference on ImageProcessing, 2008, pp. 2116–2119.

[10] A.M. Bruckstein, M. Elad, R. Kimmel, Down-scaling for better trans-form compression, IEEE Transactions on Image Processing 12 (9)(2003) 1132–1144.

[11] M. Shen, P. Xue, C. Wang, Down-sampling based video coding usingsuper-resolution technique, IEEE Transactions on Circuits and Sys-tems for Video Technology 21 (6) (2011) 755–765.

[12] J. Wu, Y. Xing, G. Shi, L. Jiao, Image compression with downsamplingand overlapped transform at low bit rates, in: IEEE InternationalConference on Image Processing, 2009, pp. 29–32.

[13] X. Wu, X. Zhang, X. Wang, Low bit-rate image compression viaadaptive down-sampling and constrained least squares upconver-sion, IEEE Transactions on Image Processing 18 (3) (2009) 552–561.

[14] W. Lin, L. Dong, Adaptive downsampling to improve image compres-sion at low bit rates, IEEE Transactions on Image Processing 15 (9)(2006) 2513–2521.

[15] V.-A. Nguyen, Y.-P. Tan, W. Lin, Adaptive downsampling/upsamplingfor better video compression at low bit rate, in: IEEE InternationalSymposium on Circuits and Systems, 2008, pp. 1624–1627.

[16] D. Zheng, D. Wang, L. Zhang, High definition video intra-only codingbased on node-cell macroblock pixel structure and 2-d interleaveddct, in: IEEE International Conference on Image Processing, 2011,pp. 1681–1684.

[17] C. Lai, Y. Lin, New intra prediction using the correlation betweenpixels and lines, Joint Collaborative Team on Video Coding (JCT-VC)of ITU-T VCEG and ISO/IEC MPEG, Dresden, DE, Doc. JCTVC-A025(Apr. 2010).

[18] T. Wiegand, H. Schwarz, Source coding: Part I of fundamentals ofsource and video coding, Foundations and Trends in Signal Proces-sing 4 (1–2) (2011) 1–222.

[19] I.E. Richardson, The H.264 Advanced Video Compression Standard,2nd ed. John Wiley & Sons Ltd., 2010.

[20] R.M. Gray, Source Coding Theory, Kluwer Academic Publishers,1990.

[21] T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, G.J. Sullivan, Rate-constrained coder control and comparison of video coding stan-dards, IEEE Transactions on Circuits and Systems for Video Technol-ogy 13 (7) (2003) 688–703.

[22] Y. Liu, K.N. Ngan, Weighted adaptive lifting-based wavelet transformfor image coding, IEEE Transactions on Image Processing 17 (4)(2008) 500–511.

[23] W. Dong, G. Shi, J. Xu, Adaptive nonseparable interpolation for imagecompression with directional wavelet transform, IEEE Signal Proces-sing Letters 15 (2008) 233–236.

[24] ITU-T VCEG KTA Reference Software, ⟨http://iphome.hhi.de/suehring/tml/download/KTA⟩ (Jan. 2011).

[25] T.K. Tan, G. Sullivan, T. Wedi, Recommended Simulation CommonConditions for Coding Efficiency Experiments Rev. 1, ITU-T Q.6/SG16,Marrakech, Morocco, VCEG-AE010 (Jan. 2007).

[26] F. Bossen, Common test conditions and software reference config-urations, ITU-T and ISO/IEC, JCTVC-B300 (Jul. 2010).

[27] G. Bjontegaard, Calculation of average PSNR differences betweenRD-Curves, ITU-T VCEG, VCEG-M33 (Apr. 2001).