Accurate Parameter Estimation and Efficient Fade Detection for Weighted Prediction in h 264 Video...

8/15/2019 Accurate Parameter Estimation and Efficient Fade Detection for Weighted Prediction in h 264 Video Compression

1/4

ACCURATE PARAMETER ESTIMATION AND EFFICIENT FADE DETECTION FOR

WEIGHTED PREDICITON IN H.264 VIDEO COMPRESSION

Rui Zhang* and Guy Cote**

Cisco Systems Inc, 170 West Tasman Drive, San Jose, CA 95134*

Apple, 1 Infinite Loop, Cupertino, CA 95014**

[email protected], [email protected]

ABSTRACT

Weighted prediction is a useful tool in video compression to

encode scenes with lighting changes, such as fading scenes.

Estimating weighted prediction parameters has been

extensively discussed in the literature, however no

mathematical model has been proposed. Moreover, the

detection of the fading scenes in a real-time encoding system

has received little attention. This paper addresses both of these

aspects. An accurate parameter estimation algorithm for H.264

encoding is first derived for both the multiplicative factor and

the additive offset based on a fading model. An efficient

algorithm is then proposed to detect fade in a real-time

encoding system, with simple statistics calculations, very low

storage requirement, and low encoding delay. Simulation

results show very accurate detection and compression gains of

5-30% over existing techniques.

Index Terms— Weighted prediction, fade detection, video

compression, H.264

1. INTRODUCTION

Motion compensation is a major tool to achieve compression

efficiency in video coding systems, where the current picture is

predicted from a reference picture and only the prediction

difference is encoded. The higher correlated the prediction

picture is to the current picture, the higher the compression

efficiency is. However, in some video scenes, particularly

fading scenes, the current picture is more correlated to the

reference picture scaled by a weighting factor than to the

reference picture itself. Hence the weighted prediction (WP) is

a useful tool under such scenarios. Modern video coding

standards, such as H.264, have adopted WP to improve coding

efficiency in certain conditions.

In a real-time coding system, there are typically two steps

for WP. First, the fading scenes are detected; second, the WP parameters (a multiplicative factor and an additive offset in

H.264) are estimated. In a practical system, both tasks need to

be accomplished with low delay; simple calculations and low

storage requirement are also required.

For the fading detection problem, most of the algorithms in

the literature rely on a relatively long window of pictures to

observe enough statistics for an accurate detection. For

example, Altar proposed a method by exploiting the average

luminance changes and semi-parabolic behavior of the variance

curve [1]; Qian and al. proposed an algorithm that exploits the

accumulating histogram differences [2]. However, such

methods require the availability of the statistics of the entire

fade duration, which introduces long delays and is impractical

in real-time encoding systems. In this paper, we focus on

algorithms that detect the fade in a very short window of

pictures, and that are robust to different conditions, such asmotion.

For the parameter estimation problem, the simple and

empirical method of using pixel average values (DC) is often

used in the literature [3] and the H.264 reference software [4].

The multiplicative weighting factor is calculated as the ratio of

the DC values for the current picture and the reference picture;

the additive offset is set as zero. In this paper, an accurate

estimation for both the multiplicative weighting factor and the

additive offset are derived mathematically from the fade model.

The simulation results show that this accuracy in parameter

estimation reduces bit rate by 5%-30% for the same video

quality. This paper focuses only on uni-directional prediction

and global WP. The algorithm can be easily extended to bi-

directional prediction and localized WP [5].The rest of the paper is organized as follows. An overview

of WP in an encoding system is presented in Section 2. An

accurate parameter estimation method is then derived from the

mathematical model of fade in Section 3. An efficient and

robust fade detection algorithm which uses simple statistics is

described in Section 4. Simulation results are presented in

Section 5.

2. WEIGHTED PREDICTION OVERVIEW

Figure 1 shows the procedure of applying WP in a real-time

encoding system. First, some statistics are generated through

video analysis. The statistics within a small window, fromseveral previous pictures till the current picture, are then used

to detect fade. Each picture is assigned a state value indicating

if the picture is in the state of NORMAL or in the state of

FADE. Such state values are saved for each picture. When

encoding a picture, if there is a FADE state in either the current

picture or one of its reference pictures, the WP will be used for

2836978-1-4244-1764-3/08/$25.00 ©2008 IEEE ICIP 2008


2/4

this current-reference pair, and statistics of current picture and

the corresponding reference picture are processed to estimate

the WP parameters. These parameters are then passed on to the

encoding engine. Otherwise the normal encoding is done.

Figure 1: Weighted Prediction Workflow Chart

3. PARAMETER ESTIMATION

This section first describes the general mathematical model of

fading scenes. The proposed parameter estimation algorithm is

then derived.

First consider the following fade model. Let

denote the pixel value at position (i,j) in frame t in one original

sequence f, and denote the pixel value at position

(i,j) in frame t in another original sequence g. The linear

combination of these two sequences within one particular

period T is represented as:

),,( jit f

),,( jit g

),,()(),,()(),,( jit g t jit f t jit F (3-1)

where 1)()( t t . When g is a solid color, and the

weighting factor )(t is getting larger (smaller), it is called

fade in (out) of f .

Now consider the WP model. For weighted uni-prediction,

when pixel at position (i,j) in frame t is predicted from pixel at

position (m,n) in frame t-1 the following relationship is

assumed:

)(),,1()(),,( t onmt F t w jit F (3-2)

From the fade model, we can derive

)],,1()1()1(

)(

),,()([),,1()1(

)(),,(

nmt g t t

t

jit g t nmt F t

t jit F

(3-3)

Hence only when g is a solid color C , i.e. the values are same

regardless of time and location, we can match exactly to the

WP model with

C t t

t t t o

t

t t w

)]1()1(

)()([)(

)1(

)()(

(3-4)

current

picture

state

Note that all )(t , )(t and C are unknown to the

encoder. We have to estimate and with

observations of the fading scenes.

)(t w )(t ostatistics

Now we derive the parameter estimation method for the fade

case. Assuming the signals are ergodic, the mean and variances

of the original sequences f can be defined as:

22 ))((var )(

))(()(

t f iancet

mt f meant m

f

f (3-5)

The mean and variance of the combined signal can be defined

as:

))((var )(

))(()(

2 t F iancet

t F meant M

F

(3-6)

Then we have:

C t mt t M

C t mt t M

))1(1()1()1(

))(1()()(

(3-7)

and

22222

22222

)1()1()1()1(

)()()()(

t t t t

t t t t

f F

f F (3-8)

Therefore we can derive the weight using

C t M

C t M

t

t t w

)1(

)(

)1(

)()(

(3-9)

or

)1(

)(

)1(

)()(

2

2

t

t

t

t t w

F

F

(3-10)

Since the solid color value C is generally unknown to the

encoder, using the square root of variance makes a more

accurate and robust estimation.

After the weight is derived, the offset can be easily

calculated as:

)1()()()( t M t wt M t o (3-11)

In H.264, after the fade is detected and the WP is to be used,the parameters are calculated for each pair between current

picture and the reference picture.

4. FADE DETECTION

Fading effects result to the lighting changes, which can be

reflected in both the luma average values and the luma)(t M

no

WP

arameters

reference

pictures

state

Video

Analysis

Fade

Detection

WP

Parameter

EstimationEncoding

Picture-state

Records

Decision

of using

WP

yes

2837


3/4

variance values . We propose to check both of these

two statistics to achieve simple yet efficient and robust

detection.

)(2

t F

(() t t

First we look at the first order derivative of the luma

average values for each picture. From Equation (3-7) we have:

)))(1()( C mt M (4-1)

For a linear fading model where T t t /)( , )(t M is a

constant value [1]. For more general cases, should

have same sign during the fade. For example, for a fade out of

signal f into black scene, is always greater than zero,

while

)(t M

)( C m

))1( t )(( t is always less than zero, hence

is always less than zero; Furthermore, the fade is

always a steady change between pictures, i.e. the changes

between adjacent frames are very similar, so we expect the

second derivative of the luma average values

are close to zero.

)

)( M t

(t M

M )1( t )( M t

We also define the ratio of the luma variance for two

adjacent pictures as:

)1(

)(

)1(

)(2

2

t

t

F

F

)

)(2

2

t

t t r

(4-2)

It is obvious that for a fade out of f , is always less than

one, while for a fade in of f , it is always great than one. To

avoid some false alarms of entering into the fading mode, we

also expect there are real fading changes between the pictures,

i.e., is a little bit away from one when entering from the

NORMAL mode to the FADE mode.

)(t r

)(t r

Fading is a continuous behavior. It should be detected using

a window of pictures. In the following representation, statistics

of N frames are used. N equal to 1 means there is only the

current picture statistics used. This implies the delay of N-1frames between the video analysis and the encoding. In a

practical encoding system, only a short window is allowed to

achieve low delay. For example, with a hierarchical B picture

GOP (Group of Pictures) structure of IbBbP, where B is a

reference bi-directional picture, and b is a non-reference bi-

directional picture, N can be set as 4 without introducing

further delay.

In summary, we define the following criteria for the fade

detection, using the above functions of the two statistics. For

each current picture, its state is initialized as NORMAL. Only

if all of the criteria are satisfied, a fade is declared. The state of

the current picture and the states of the pictures in the past N-1

frames are then set as FADE.

1. Detect a luminance level change (picture getting brighter or darker) among the past N frames, i.e.,

(t M , )1( t M ,…, )1( N t have the

same sign.

M

2. Detect a steady change between pictures (the changes between adjacent frames are similar), i.e.,

)(t M , )1( t M ,…, )2( N t M

are within a threshold MAX_DELTA_DELTA_DC.

3. Detect a consistent change of the luma variance(continuously larger than one or less than one) among

the past N frames, i.e., 1)( t r , 1)1( t r ,…,

1)1( N t r have the same sign.

4. Detect a noticeable changes in the ratio of variances,i.e., all )(t r , )1( t r ,…, )1( N t are less than

a threshold FADE_MIN_VAR_RATIO or greaterthan a threshold FADE_MAX_VAR_RATIO. This

criterion is only checked if the previous frame t-1 is

in NORMAL state to avoid the false alarm of

entering into the FADE state.

r

Default values for MAX_DELTA_DELTA_DC,

FADE_MIN_VAR_RATIO, FADE_MAX_VAR_RATIO have

been determined experimentally to 10, 0.96, 1.05, respectively.

. When all of the above criteria are satisfied, the states of

frame t, t-1,…,t-N+1 are all set as FADE. Note that the fade is a

continuous behavior, so the states of all the frames in this N

frame window are set at the same time; Also note that the delay

happens during the fade detection. A frame can transit from

NORMAL to FADE state, but once a frame is in FADE state, itwill stay in FADE state (i.e., transition from FADE to

NORMAL state is not allowed for the same frame). For

example, when frame 0-3 are analyzed, the above criteria are

not satisfied, their states are as NORMAL; but when frame 1-4

are analyzed, the above criteria are satisfied, hence the states of

frame 1-4 are all set as FADE, reflecting the entering into

FADE from frame 1. Then when frame 2-5 are analyzed, the

above criteria are not satisfied, so frame 5 is in NORMAL state,

but frame 2-4 are still in the FADE state, reflecting the leaving

of the FADE states in frame 5.

After the fade detection, the decision to use WP or not is

made. If there any of the reference pictures or the current

picture are in FADE state, then the WP is used. For each

reference picture in the prediction list, if its state or the state ofthe current picture is FADE, the WP parameters for this pair are

calculated and transmitted in the bitstream.

5. SIMULATION RESULTS

Three sequences were used in the simulation. The “Trailer” is a

480x204 sequence from a movie trailer with fade out; The

“Low-Motion” and “High-Motion” are synthetically generated

720x480 sequence with both fade in and fade out, with low

motion scenes and high motion scenes respectively.

To evaluate the effectiveness of the fade detection

algorithms with different delays, delay of 2 frames and delay of

3 frames are simulated. Both are short windows and suitable for

real-time encoding systems. Figure 2 shows the detection

results for the Trailer sequences. In “True Transition”, value 1

means a NORMAL state, while value 2 means a FADE state.

The detection errors are calculated as the difference between

the true transition and the detected transition. For this particular

sequence, delay of 2 frames introduced some false alarm, while

delay of 3 frames gave the correct detections. The false alarm

happened on some zoom scenes, where the statistics happened

to be similar to the fade case. For the other two synthetic

2838


4/4

sequences, both delay of 2 frames and 3 frames gave correct

detections. So delay of 3 frames is in general sufficient for the

fade detection with the proposed algorithms.

Results for Trailer Sequence

34

35

36

37

38

39

40

41

200 400 600 800 1000 1200

bit rate

P S N R

No WP

WP with DC

Proposed WP

To evaluate the performance of the proposed parameter

estimation algorithm, all three sequences are encoded using

QP=28,32,36 and 40 with a H.264 codec of I/P pictures only.

Three methods are compared. In “No WP”, no weighted

prediction is used; In “WP withDC”, the weight is estimated asthe ratio of the luma DC values, while the offset is set as zero.

This is the algorithm used in the JM encoder and is the most

popular method; “Proposed WP” represents our proposed

algorithm. The detection results with delay of 3 frames are used

to decide which pictures use weighted prediction for both “WP

withDC” and “Proposed WP” so the only difference is the

parameter estimation. Figure 3 and Figure 4 illustrate the rate-

distortion (RD) performance of all the methods for “Trailer”

and “High-Motion” respectively. Table 1 gives the average

PSNR gain and bitrate savings using the measurement in [6]. It

clearly shows that the proposed WP algorithm outperforms the

traditional methods with 5%-30% bitrate savings. The gains are

bigger in lower bit rate and in higher motion scenes.

Figure 3: RD Performance for “Trailer”

Results for High-motion Sequence

31

33

35

37

39

200 1200 2200 3200 4200 5200

bit rate

P S N R

No WP

WP with DC

Proposed WP

6. CONCLUSIONS

In this paper, an accurate weighted prediction parameters

estimation algorithm and an efficient and robust fade detection

algorithm were proposed. The algorithms use very simple

statistics with low delay, which is suitable for practical real-

time encoding systems. Simulation results show accurate

detection results and significant compression efficiency gains.

Figure 4: RD Performance for "High-Motion"

REFERENCES

Table 1: Performance comparison

Sequence WP withDC Proposed WP

Bitrate(%) PSNR(dB) Bitrate(%) PSNR(dB)

Trailer -17.93 0.88 -22.78 1.10

Low-Mot -20.02 1.12 -34.35 1.85

High-Mot -9.86 0.54 -23.11 1.06

[1] A. M. Alattar, “Detecting Fade Regions in Uncompressed

Video Sequences”, pp 3025-3028, ICASSP 1997.

[2] X. Qian, G. Liu and R. Su, “Effective Fades and Flashlight

Detection Based on Accumulating Histogram Differences”, pp.

1245-1258, IEEE Transactions on CSVT, vol. 16, No. 10,2006.

[3] J. Boyce, “Weighted Prediction in the H.264/MPEG4 AVC

video coding standard,” ISCAS, pp. 789-792, May 2004.Fade Detection Results

-1

0

1

2

1 6 11 16 21 26

Picture nubmer

T r a n s i t i o n s t a t e

True Transition

Delay 2 Detection Error

Delay 3 Detection Error

[4] JVT Reference Software,

http://bs.hhi.de/~suehring/download

[5] P. Yin, A. Tourapis and J. Boyce, “Localized Weighted

Prediction for Video Coding”, pp.4365-4368, ISCAS, May

2005

[6] G. Bjontegaard, “Calculation of average PSNR differences

between RD-Curves”, document VCEG-M33, Mar’01.

Figure 2: Fade Detection Results for “Trailer”

2839

Accurate Parameter Estimation and Efficient Fade Detection for Weighted Prediction in h 264 Video...

Documents

Transcript of Accurate Parameter Estimation and Efficient Fade Detection for Weighted Prediction in h 264 Video...