Accurate Parameter Estimation and Efficient Fade Detection for Weighted Prediction in h 264 Video...

download Accurate Parameter Estimation and Efficient Fade Detection for Weighted Prediction in h 264 Video Compression

of 4

Transcript of Accurate Parameter Estimation and Efficient Fade Detection for Weighted Prediction in h 264 Video...

  • 8/15/2019 Accurate Parameter Estimation and Efficient Fade Detection for Weighted Prediction in h 264 Video Compression

    1/4

    ACCURATE PARAMETER ESTIMATION AND EFFICIENT FADE DETECTION FOR

    WEIGHTED PREDICITON IN H.264 VIDEO COMPRESSION

     Rui Zhang* and Guy Cote**

    Cisco Systems Inc, 170 West Tasman Drive, San Jose, CA 95134*

    Apple, 1 Infinite Loop, Cupertino, CA 95014**

    [email protected], [email protected]

    ABSTRACT

    Weighted prediction is a useful tool in video compression to

    encode scenes with lighting changes, such as fading scenes.

    Estimating weighted prediction parameters has been

    extensively discussed in the literature, however no

    mathematical model has been proposed. Moreover, the

    detection of the fading scenes in a real-time encoding system

    has received little attention. This paper addresses both of these

    aspects. An accurate parameter estimation algorithm for H.264

    encoding is first derived for both the multiplicative factor and

    the additive offset based on a fading model. An efficient

    algorithm is then proposed to detect fade in a real-time

    encoding system, with simple statistics calculations, very low

    storage requirement, and  low encoding delay. Simulation

    results show very accurate detection and compression gains of

    5-30% over existing techniques.

     Index Terms— Weighted prediction, fade detection, video

    compression, H.264 

    1.  INTRODUCTION

    Motion compensation is a major tool to achieve compression

    efficiency in video coding systems, where the current picture is

     predicted from a reference picture and only the prediction

    difference is encoded. The higher correlated the prediction

     picture is to the current picture, the higher the compression

    efficiency is. However, in some video scenes, particularly

    fading scenes, the current picture is more correlated to the

    reference picture scaled by a weighting factor than to the

    reference picture itself. Hence the weighted prediction (WP) is

    a useful tool under such scenarios. Modern video coding

    standards, such as H.264, have adopted WP to improve coding

    efficiency in certain conditions.

    In a real-time coding system, there are typically two steps

    for WP. First, the fading scenes are detected; second, the WP parameters (a multiplicative factor and an additive offset in

    H.264) are estimated. In a practical system, both tasks need to

     be accomplished with low delay; simple calculations and low

    storage requirement are also required.

    For the fading detection problem, most of the algorithms in

    the literature rely on a relatively long window of pictures to

    observe enough statistics for an accurate detection. For

    example, Altar proposed a method by exploiting the average

    luminance changes and semi-parabolic behavior of the variance

    curve [1]; Qian and al. proposed an algorithm that exploits the

    accumulating histogram differences [2]. However, such

    methods require the availability of the statistics of the entire

    fade duration, which introduces long delays and is impractical

    in real-time encoding systems. In this paper, we focus on

    algorithms that detect the fade in a very short window of

     pictures, and that are robust to different conditions, such asmotion.

    For the parameter estimation problem, the simple and

    empirical method of using pixel average values (DC) is often

    used in the literature [3] and the H.264 reference software [4].

    The multiplicative weighting factor is calculated as the ratio of

    the DC values for the current picture and the reference picture;

    the additive offset is set as zero. In this paper, an accurate

    estimation for both the multiplicative weighting factor and the

    additive offset are derived mathematically from the fade model.

    The simulation results show that this accuracy in parameter

    estimation reduces bit rate by 5%-30% for the same video

    quality. This paper focuses only on uni-directional prediction

    and global WP. The algorithm can be easily extended to bi-

    directional prediction and localized WP [5].The rest of the paper is organized as follows. An overview

    of WP in an encoding system is presented in Section 2. An

    accurate parameter estimation method is then derived from the

    mathematical model of fade in Section 3. An efficient and

    robust fade detection algorithm which uses simple statistics is

    described in Section 4. Simulation results are presented in

    Section 5. 

    2.  WEIGHTED PREDICTION OVERVIEW

    Figure 1 shows the procedure of applying WP in a real-time

    encoding system. First, some statistics are generated through

    video analysis. The statistics within a small window, fromseveral previous pictures till the current picture, are then used

    to detect fade. Each picture is assigned a state value indicating

    if the picture is in the state of NORMAL or in the state of

    FADE. Such state values are saved for each picture. When

    encoding a picture, if there is a FADE state in either the current

     picture or one of its reference pictures, the WP will be used for

    2836978-1-4244-1764-3/08/$25.00 ©2008 IEEE ICIP 2008

  • 8/15/2019 Accurate Parameter Estimation and Efficient Fade Detection for Weighted Prediction in h 264 Video Compression

    2/4

    this current-reference pair, and statistics of current picture and

    the corresponding reference picture are processed to estimate

    the WP parameters. These parameters are then passed on to the

    encoding engine. Otherwise the normal encoding is done.

    Figure 1: Weighted Prediction Workflow Chart

    3.  PARAMETER ESTIMATION

    This section first describes the general mathematical model of

    fading scenes. The proposed parameter estimation algorithm is

    then derived.

    First consider the following fade model. Let

    denote the pixel value at position (i,j) in frame t in one original

    sequence  f, and denote the pixel value at position

    (i,j)  in frame t in another original sequence  g. The linear

    combination of these two sequences within one particular

     period T  is represented as:

    ),,(   jit  f 

    ),,(   jit  g 

    ),,()(),,()(),,(   jit  g t  jit  f t  jit  F           (3-1) 

    where 1)()(     t t       . When  g   is a solid color, and the

    weighting factor )(t   is getting larger (smaller), it is called

    fade in (out) of f .

     Now consider the WP model. For weighted uni-prediction,

    when pixel at position (i,j) in frame t  is predicted from pixel at

     position (m,n)  in frame t-1  the following relationship is

    assumed:

    )(),,1()(),,(   t onmt  F t w jit  F      (3-2) 

    From the fade model, we can derive

    )],,1()1()1(

    )(

    ),,()([),,1()1(

    )(),,(

    nmt  g t t 

     jit  g t nmt  F t 

    t  jit  F 

       

     

       

     

    (3-3) 

    Hence only when  g  is a solid color C , i.e. the values are same

    regardless of time and location, we can match exactly to the

    WP model with

    C t t 

    t t t o

    t t w

    )]1()1(

    )()([)(

    )1(

    )()(

       

       

     

     

      (3-4) 

    current

     picture

    state

     Note that all )(t   , )(t      and C   are unknown to the

    encoder. We have to estimate and with

    observations of the fading scenes.

    )(t w )(t ostatistics

     Now we derive the parameter estimation method for the fade

    case. Assuming the signals are ergodic, the mean and variances

    of the original sequences f can be defined as:

    22 ))((var )(

    ))(()(

        

    t  f iancet 

    mt  f meant m

     f 

     f   (3-5) 

    The mean and variance of the combined signal can be defined

    as:

    ))((var )(

    ))(()(

    2 t  F iancet 

    t  F meant  M 

     F   

     

      (3-6) 

    Then we have:

    C t mt t  M 

    C t mt t  M 

    ))1(1()1()1(

    ))(1()()(

      

        (3-7)

    and

    22222

    22222

    )1()1()1()1(

    )()()()(

         

         

    t t t t 

    t t t t 

     f  F 

     f  F    (3-8) 

    Therefore we can derive the weight using

    C t  M 

    C t  M 

    t t w

    )1(

    )(

    )1(

    )()(

     

        (3-9) 

    or

    )1(

    )(

    )1(

    )()(

    2

    2

    t t w

     F 

     F 

     

     

     

       (3-10) 

    Since the solid color value C   is generally unknown to the

    encoder, using the square root of variance makes a more

    accurate and robust estimation.

    After the weight is derived, the offset can be easily

    calculated as:

    )1()()()(     t  M t wt  M t o   (3-11) 

    In H.264, after the fade is detected and the WP is to be used,the parameters are calculated for each pair between current

     picture and the reference picture.

    4.  FADE DETECTION

    Fading effects result to the lighting changes, which can be

    reflected in both the luma average values and the luma)(t  M 

    no

    WP

    arameters

    reference

     pictures

    state

    Video

    Analysis

    Fade

    Detection

    WP

    Parameter

    EstimationEncoding

    Picture-state

    Records

    Decision

    of using

    WP

    yes

    2837

  • 8/15/2019 Accurate Parameter Estimation and Efficient Fade Detection for Weighted Prediction in h 264 Video Compression

    3/4

    variance values . We propose to check both of these

    two statistics to achieve simple yet efficient and robust

    detection.

    )(2

    t  F  

    (()   t t  

    First we look at the first order derivative of the luma

    average values for each picture. From Equation (3-7) we have:

    )))(1()(   C mt  M           (4-1) 

    For a linear fading model where T t t  /)(     , )(t  M    is a

    constant value [1]. For more general cases, should

    have same sign during the fade. For example, for a fade out of

    signal  f   into black scene, is always greater than zero,

    while

    )(t  M 

    )(   C m

    ))1(   t )((   t       is always less than zero, hence

    is always less than zero; Furthermore, the fade is

    always a steady change between pictures, i.e. the changes

     between adjacent frames are very similar, so we expect the

    second derivative of the luma average values

    are close to zero.

    )

    )(     M t 

    (t  M 

     M  )1(   t )(     M t 

    We also define the ratio of the luma variance for two

    adjacent pictures as:

    )1(

    )(

    )1(

    )(2

    2

     F 

     F 

     

     

    )

    )(2

    2

    t t r 

     

       (4-2) 

    It is obvious that for a fade out of  f , is always less than

    one, while for a fade in of  f , it is always great than one. To

    avoid some false alarms of entering into the fading mode, we

    also expect there are real fading changes between the pictures,

    i.e., is a little bit away from one when entering from the

     NORMAL mode to the FADE mode.

    )(t r 

    )(t r 

    Fading is a continuous behavior. It should be detected using

    a window of pictures. In the following representation, statistics

    of N frames are used. N equal to 1 means there is only the

    current picture statistics used. This implies the delay of N-1frames between the video analysis and the encoding. In a

     practical encoding system, only a short window is allowed to

    achieve low delay. For example, with a hierarchical B picture

    GOP (Group of Pictures) structure of IbBbP, where B is a

    reference bi-directional picture, and b is a non-reference bi-

    directional picture, N can be set as 4 without introducing

    further delay.

    In summary, we define the following criteria for the fade

    detection, using the above functions of the two statistics. For

    each current picture, its state is initialized as NORMAL. Only

    if all of the criteria are satisfied, a fade is declared. The state of

    the current picture and the states of the pictures in the past N-1

    frames are then set as FADE.

    1.  Detect a luminance level change (picture getting brighter or darker) among the past N frames, i.e.,

    (t  M  , )1(     t  M  ,…, )1(     N t    have the

    same sign.

     M 

    2.  Detect a steady change between pictures (the changes between adjacent frames are similar), i.e.,

    )(t  M  , )1(   t  M  ,…, )2(     N t  M   

    are within a threshold MAX_DELTA_DELTA_DC.

    3.  Detect a consistent change of the luma variance(continuously larger than one or less than one) among

    the past N frames, i.e.,  1)(   t r    ,  1)1(   t r    ,…, 

    1)1(    N t r    have the same sign.

    4.  Detect a noticeable changes in the ratio of variances,i.e., all )(t r   ,  )1(   t r   ,…,  )1(    N t   are less than

    a threshold FADE_MIN_VAR_RATIO or greaterthan a threshold FADE_MAX_VAR_RATIO. This

    criterion is only checked if the previous frame t-1 is

    in NORMAL state to avoid the false alarm of

    entering into the FADE state.

     

    Default values for MAX_DELTA_DELTA_DC,

    FADE_MIN_VAR_RATIO, FADE_MAX_VAR_RATIO have

     been determined experimentally to 10, 0.96, 1.05, respectively.

    . When all of the above criteria are satisfied, the states of

    frame t, t-1,…,t-N+1 are all set as FADE. Note that the fade is a

    continuous behavior, so the states of all the frames in this N

    frame window are set at the same time; Also note that the delay

    happens during the fade detection. A frame can transit from

     NORMAL to FADE state, but once a frame is in FADE state, itwill stay in FADE state (i.e., transition from FADE to

     NORMAL state is not allowed for the same frame). For

    example, when frame 0-3 are analyzed, the above criteria are

    not satisfied, their states are as NORMAL; but when frame 1-4

    are analyzed, the above criteria are satisfied, hence the states of

    frame 1-4 are all set as FADE, reflecting the entering into

    FADE from frame 1. Then when frame 2-5 are analyzed, the

    above criteria are not satisfied, so frame 5 is in NORMAL state,

     but frame 2-4 are still in the FADE state, reflecting the leaving

    of the FADE states in frame 5.

    After the fade detection, the decision to use WP or not is

    made. If there any of the reference pictures or the current

     picture are in FADE state, then the WP is used. For each

    reference picture in the prediction list, if its state or the state ofthe current picture is FADE, the WP parameters for this pair are

    calculated and transmitted in the bitstream. 

    5.  SIMULATION RESULTS

    Three sequences were used in the simulation. The “Trailer” is a

    480x204 sequence from a movie trailer with fade out; The

    “Low-Motion” and “High-Motion” are synthetically generated

    720x480 sequence with both fade in and fade out, with low

    motion scenes and high motion scenes respectively.

    To evaluate the effectiveness of the fade detection

    algorithms with different delays, delay of 2 frames and delay of

    3 frames are simulated. Both are short windows and suitable for

    real-time encoding systems. Figure 2  shows the detection

    results for the Trailer sequences. In “True Transition”, value 1

    means a NORMAL state, while value 2 means a FADE state.

    The detection errors are calculated as the difference between

    the true transition and the detected transition. For this particular

    sequence, delay of 2 frames introduced some false alarm, while

    delay of 3 frames gave the correct detections. The false alarm

    happened on some zoom scenes, where the statistics happened

    to be similar to the fade case. For the other two synthetic

    2838

  • 8/15/2019 Accurate Parameter Estimation and Efficient Fade Detection for Weighted Prediction in h 264 Video Compression

    4/4

    sequences, both delay of 2 frames and 3 frames gave correct

    detections. So delay of 3 frames is in general sufficient for the

    fade detection with the proposed algorithms.

    Results for Trailer Sequence

    34

    35

    36

    37

    38

    39

    40

    41

    200 400 600 800 1000 1200

    bit rate

       P   S   N   R

    No WP

    WP with DC

    Proposed WP

     

    To evaluate the performance of the proposed parameter

    estimation algorithm, all three sequences are encoded using

    QP=28,32,36 and 40 with a H.264 codec of I/P pictures only.

    Three methods are compared. In “No WP”, no weighted

     prediction is used; In “WP withDC”, the weight is estimated asthe ratio of the luma DC values, while the offset is set as zero.

    This is the algorithm used in the JM encoder and is the most

     popular method; “Proposed WP” represents our proposed

    algorithm. The detection results with delay of 3 frames are used

    to decide which pictures use weighted prediction for both “WP

    withDC” and “Proposed WP” so the only difference is the

     parameter estimation. Figure 3 and Figure 4 illustrate the rate-

    distortion (RD) performance of all the methods for “Trailer”

    and “High-Motion” respectively. Table 1  gives the average

    PSNR gain and bitrate savings using the measurement in [6]. It

    clearly shows that the proposed WP algorithm outperforms the

    traditional methods with 5%-30% bitrate savings. The gains are

     bigger in lower bit rate and in higher motion scenes.

    Figure 3: RD Performance for “Trailer”

    Results for High-motion Sequence

    31

    33

    35

    37

    39

    200 1200 2200 3200 4200 5200

    bit rate

       P   S   N   R

    No WP

    WP with DC

    Proposed WP

     

    6.  CONCLUSIONS

    In this paper, an accurate weighted prediction parameters

    estimation algorithm and an efficient and robust fade detection

    algorithm were proposed. The algorithms use very simple

    statistics with low delay, which is suitable for practical real-

    time encoding systems. Simulation results show accurate

    detection results and significant compression efficiency gains.

    Figure 4: RD Performance for "High-Motion"

    REFERENCES

    Table 1: Performance comparison

    Sequence WP withDC Proposed WP

    Bitrate(%) PSNR(dB) Bitrate(%) PSNR(dB)

    Trailer -17.93 0.88 -22.78 1.10

    Low-Mot -20.02 1.12 -34.35 1.85

    High-Mot -9.86 0.54 -23.11 1.06

    [1] A. M. Alattar, “Detecting Fade Regions in Uncompressed

    Video Sequences”, pp 3025-3028, ICASSP 1997.

    [2] X. Qian, G. Liu and R. Su, “Effective Fades and Flashlight

    Detection Based on Accumulating Histogram Differences”, pp.

    1245-1258, IEEE Transactions on CSVT, vol. 16, No. 10,2006.

    [3] J. Boyce, “Weighted Prediction in the H.264/MPEG4 AVC

    video coding standard,” ISCAS, pp. 789-792, May 2004.Fade Detection Results

    -1

    0

    1

    2

    1 6 11 16 21 26

    Picture nubmer 

       T  r  a  n  s   i   t   i  o  n  s   t  a   t  e

    True Transition

    Delay 2 Detection Error 

    Delay 3 Detection Error 

     

    [4] JVT Reference Software,

    http://bs.hhi.de/~suehring/download 

    [5] P. Yin, A. Tourapis and J. Boyce, “Localized Weighted

    Prediction for Video Coding”, pp.4365-4368, ISCAS, May

    2005

    [6] G. Bjontegaard, “Calculation of average PSNR differences

     between RD-Curves”, document VCEG-M33, Mar’01.

    Figure 2: Fade Detection Results for “Trailer”

    2839