Accurate Parameter Estimation and Efficient Fade Detection for Weighted Prediction in h 264 Video...
-
Upload
bulli-koteswararao -
Category
Documents
-
view
214 -
download
0
Transcript of Accurate Parameter Estimation and Efficient Fade Detection for Weighted Prediction in h 264 Video...
-
8/15/2019 Accurate Parameter Estimation and Efficient Fade Detection for Weighted Prediction in h 264 Video Compression
1/4
ACCURATE PARAMETER ESTIMATION AND EFFICIENT FADE DETECTION FOR
WEIGHTED PREDICITON IN H.264 VIDEO COMPRESSION
Rui Zhang* and Guy Cote**
Cisco Systems Inc, 170 West Tasman Drive, San Jose, CA 95134*
Apple, 1 Infinite Loop, Cupertino, CA 95014**
[email protected], [email protected]
ABSTRACT
Weighted prediction is a useful tool in video compression to
encode scenes with lighting changes, such as fading scenes.
Estimating weighted prediction parameters has been
extensively discussed in the literature, however no
mathematical model has been proposed. Moreover, the
detection of the fading scenes in a real-time encoding system
has received little attention. This paper addresses both of these
aspects. An accurate parameter estimation algorithm for H.264
encoding is first derived for both the multiplicative factor and
the additive offset based on a fading model. An efficient
algorithm is then proposed to detect fade in a real-time
encoding system, with simple statistics calculations, very low
storage requirement, and low encoding delay. Simulation
results show very accurate detection and compression gains of
5-30% over existing techniques.
Index Terms— Weighted prediction, fade detection, video
compression, H.264
1. INTRODUCTION
Motion compensation is a major tool to achieve compression
efficiency in video coding systems, where the current picture is
predicted from a reference picture and only the prediction
difference is encoded. The higher correlated the prediction
picture is to the current picture, the higher the compression
efficiency is. However, in some video scenes, particularly
fading scenes, the current picture is more correlated to the
reference picture scaled by a weighting factor than to the
reference picture itself. Hence the weighted prediction (WP) is
a useful tool under such scenarios. Modern video coding
standards, such as H.264, have adopted WP to improve coding
efficiency in certain conditions.
In a real-time coding system, there are typically two steps
for WP. First, the fading scenes are detected; second, the WP parameters (a multiplicative factor and an additive offset in
H.264) are estimated. In a practical system, both tasks need to
be accomplished with low delay; simple calculations and low
storage requirement are also required.
For the fading detection problem, most of the algorithms in
the literature rely on a relatively long window of pictures to
observe enough statistics for an accurate detection. For
example, Altar proposed a method by exploiting the average
luminance changes and semi-parabolic behavior of the variance
curve [1]; Qian and al. proposed an algorithm that exploits the
accumulating histogram differences [2]. However, such
methods require the availability of the statistics of the entire
fade duration, which introduces long delays and is impractical
in real-time encoding systems. In this paper, we focus on
algorithms that detect the fade in a very short window of
pictures, and that are robust to different conditions, such asmotion.
For the parameter estimation problem, the simple and
empirical method of using pixel average values (DC) is often
used in the literature [3] and the H.264 reference software [4].
The multiplicative weighting factor is calculated as the ratio of
the DC values for the current picture and the reference picture;
the additive offset is set as zero. In this paper, an accurate
estimation for both the multiplicative weighting factor and the
additive offset are derived mathematically from the fade model.
The simulation results show that this accuracy in parameter
estimation reduces bit rate by 5%-30% for the same video
quality. This paper focuses only on uni-directional prediction
and global WP. The algorithm can be easily extended to bi-
directional prediction and localized WP [5].The rest of the paper is organized as follows. An overview
of WP in an encoding system is presented in Section 2. An
accurate parameter estimation method is then derived from the
mathematical model of fade in Section 3. An efficient and
robust fade detection algorithm which uses simple statistics is
described in Section 4. Simulation results are presented in
Section 5.
2. WEIGHTED PREDICTION OVERVIEW
Figure 1 shows the procedure of applying WP in a real-time
encoding system. First, some statistics are generated through
video analysis. The statistics within a small window, fromseveral previous pictures till the current picture, are then used
to detect fade. Each picture is assigned a state value indicating
if the picture is in the state of NORMAL or in the state of
FADE. Such state values are saved for each picture. When
encoding a picture, if there is a FADE state in either the current
picture or one of its reference pictures, the WP will be used for
2836978-1-4244-1764-3/08/$25.00 ©2008 IEEE ICIP 2008
-
8/15/2019 Accurate Parameter Estimation and Efficient Fade Detection for Weighted Prediction in h 264 Video Compression
2/4
this current-reference pair, and statistics of current picture and
the corresponding reference picture are processed to estimate
the WP parameters. These parameters are then passed on to the
encoding engine. Otherwise the normal encoding is done.
Figure 1: Weighted Prediction Workflow Chart
3. PARAMETER ESTIMATION
This section first describes the general mathematical model of
fading scenes. The proposed parameter estimation algorithm is
then derived.
First consider the following fade model. Let
denote the pixel value at position (i,j) in frame t in one original
sequence f, and denote the pixel value at position
(i,j) in frame t in another original sequence g. The linear
combination of these two sequences within one particular
period T is represented as:
),,( jit f
),,( jit g
),,()(),,()(),,( jit g t jit f t jit F (3-1)
where 1)()( t t . When g is a solid color, and the
weighting factor )(t is getting larger (smaller), it is called
fade in (out) of f .
Now consider the WP model. For weighted uni-prediction,
when pixel at position (i,j) in frame t is predicted from pixel at
position (m,n) in frame t-1 the following relationship is
assumed:
)(),,1()(),,( t onmt F t w jit F (3-2)
From the fade model, we can derive
)],,1()1()1(
)(
),,()([),,1()1(
)(),,(
nmt g t t
t
jit g t nmt F t
t jit F
(3-3)
Hence only when g is a solid color C , i.e. the values are same
regardless of time and location, we can match exactly to the
WP model with
C t t
t t t o
t
t t w
)]1()1(
)()([)(
)1(
)()(
(3-4)
current
picture
state
Note that all )(t , )(t and C are unknown to the
encoder. We have to estimate and with
observations of the fading scenes.
)(t w )(t ostatistics
Now we derive the parameter estimation method for the fade
case. Assuming the signals are ergodic, the mean and variances
of the original sequences f can be defined as:
22 ))((var )(
))(()(
t f iancet
mt f meant m
f
f (3-5)
The mean and variance of the combined signal can be defined
as:
))((var )(
))(()(
2 t F iancet
t F meant M
F
(3-6)
Then we have:
C t mt t M
C t mt t M
))1(1()1()1(
))(1()()(
(3-7)
and
22222
22222
)1()1()1()1(
)()()()(
t t t t
t t t t
f F
f F (3-8)
Therefore we can derive the weight using
C t M
C t M
t
t t w
)1(
)(
)1(
)()(
(3-9)
or
)1(
)(
)1(
)()(
2
2
t
t
t
t t w
F
F
(3-10)
Since the solid color value C is generally unknown to the
encoder, using the square root of variance makes a more
accurate and robust estimation.
After the weight is derived, the offset can be easily
calculated as:
)1()()()( t M t wt M t o (3-11)
In H.264, after the fade is detected and the WP is to be used,the parameters are calculated for each pair between current
picture and the reference picture.
4. FADE DETECTION
Fading effects result to the lighting changes, which can be
reflected in both the luma average values and the luma)(t M
no
WP
arameters
reference
pictures
state
Video
Analysis
Fade
Detection
WP
Parameter
EstimationEncoding
Picture-state
Records
Decision
of using
WP
yes
2837
-
8/15/2019 Accurate Parameter Estimation and Efficient Fade Detection for Weighted Prediction in h 264 Video Compression
3/4
variance values . We propose to check both of these
two statistics to achieve simple yet efficient and robust
detection.
)(2
t F
(() t t
First we look at the first order derivative of the luma
average values for each picture. From Equation (3-7) we have:
)))(1()( C mt M (4-1)
For a linear fading model where T t t /)( , )(t M is a
constant value [1]. For more general cases, should
have same sign during the fade. For example, for a fade out of
signal f into black scene, is always greater than zero,
while
)(t M
)( C m
))1( t )(( t is always less than zero, hence
is always less than zero; Furthermore, the fade is
always a steady change between pictures, i.e. the changes
between adjacent frames are very similar, so we expect the
second derivative of the luma average values
are close to zero.
)
)( M t
(t M
M )1( t )( M t
We also define the ratio of the luma variance for two
adjacent pictures as:
)1(
)(
)1(
)(2
2
t
t
F
F
)
)(2
2
t
t t r
(4-2)
It is obvious that for a fade out of f , is always less than
one, while for a fade in of f , it is always great than one. To
avoid some false alarms of entering into the fading mode, we
also expect there are real fading changes between the pictures,
i.e., is a little bit away from one when entering from the
NORMAL mode to the FADE mode.
)(t r
)(t r
Fading is a continuous behavior. It should be detected using
a window of pictures. In the following representation, statistics
of N frames are used. N equal to 1 means there is only the
current picture statistics used. This implies the delay of N-1frames between the video analysis and the encoding. In a
practical encoding system, only a short window is allowed to
achieve low delay. For example, with a hierarchical B picture
GOP (Group of Pictures) structure of IbBbP, where B is a
reference bi-directional picture, and b is a non-reference bi-
directional picture, N can be set as 4 without introducing
further delay.
In summary, we define the following criteria for the fade
detection, using the above functions of the two statistics. For
each current picture, its state is initialized as NORMAL. Only
if all of the criteria are satisfied, a fade is declared. The state of
the current picture and the states of the pictures in the past N-1
frames are then set as FADE.
1. Detect a luminance level change (picture getting brighter or darker) among the past N frames, i.e.,
(t M , )1( t M ,…, )1( N t have the
same sign.
M
2. Detect a steady change between pictures (the changes between adjacent frames are similar), i.e.,
)(t M , )1( t M ,…, )2( N t M
are within a threshold MAX_DELTA_DELTA_DC.
3. Detect a consistent change of the luma variance(continuously larger than one or less than one) among
the past N frames, i.e., 1)( t r , 1)1( t r ,…,
1)1( N t r have the same sign.
4. Detect a noticeable changes in the ratio of variances,i.e., all )(t r , )1( t r ,…, )1( N t are less than
a threshold FADE_MIN_VAR_RATIO or greaterthan a threshold FADE_MAX_VAR_RATIO. This
criterion is only checked if the previous frame t-1 is
in NORMAL state to avoid the false alarm of
entering into the FADE state.
r
Default values for MAX_DELTA_DELTA_DC,
FADE_MIN_VAR_RATIO, FADE_MAX_VAR_RATIO have
been determined experimentally to 10, 0.96, 1.05, respectively.
. When all of the above criteria are satisfied, the states of
frame t, t-1,…,t-N+1 are all set as FADE. Note that the fade is a
continuous behavior, so the states of all the frames in this N
frame window are set at the same time; Also note that the delay
happens during the fade detection. A frame can transit from
NORMAL to FADE state, but once a frame is in FADE state, itwill stay in FADE state (i.e., transition from FADE to
NORMAL state is not allowed for the same frame). For
example, when frame 0-3 are analyzed, the above criteria are
not satisfied, their states are as NORMAL; but when frame 1-4
are analyzed, the above criteria are satisfied, hence the states of
frame 1-4 are all set as FADE, reflecting the entering into
FADE from frame 1. Then when frame 2-5 are analyzed, the
above criteria are not satisfied, so frame 5 is in NORMAL state,
but frame 2-4 are still in the FADE state, reflecting the leaving
of the FADE states in frame 5.
After the fade detection, the decision to use WP or not is
made. If there any of the reference pictures or the current
picture are in FADE state, then the WP is used. For each
reference picture in the prediction list, if its state or the state ofthe current picture is FADE, the WP parameters for this pair are
calculated and transmitted in the bitstream.
5. SIMULATION RESULTS
Three sequences were used in the simulation. The “Trailer” is a
480x204 sequence from a movie trailer with fade out; The
“Low-Motion” and “High-Motion” are synthetically generated
720x480 sequence with both fade in and fade out, with low
motion scenes and high motion scenes respectively.
To evaluate the effectiveness of the fade detection
algorithms with different delays, delay of 2 frames and delay of
3 frames are simulated. Both are short windows and suitable for
real-time encoding systems. Figure 2 shows the detection
results for the Trailer sequences. In “True Transition”, value 1
means a NORMAL state, while value 2 means a FADE state.
The detection errors are calculated as the difference between
the true transition and the detected transition. For this particular
sequence, delay of 2 frames introduced some false alarm, while
delay of 3 frames gave the correct detections. The false alarm
happened on some zoom scenes, where the statistics happened
to be similar to the fade case. For the other two synthetic
2838
-
8/15/2019 Accurate Parameter Estimation and Efficient Fade Detection for Weighted Prediction in h 264 Video Compression
4/4
sequences, both delay of 2 frames and 3 frames gave correct
detections. So delay of 3 frames is in general sufficient for the
fade detection with the proposed algorithms.
Results for Trailer Sequence
34
35
36
37
38
39
40
41
200 400 600 800 1000 1200
bit rate
P S N R
No WP
WP with DC
Proposed WP
To evaluate the performance of the proposed parameter
estimation algorithm, all three sequences are encoded using
QP=28,32,36 and 40 with a H.264 codec of I/P pictures only.
Three methods are compared. In “No WP”, no weighted
prediction is used; In “WP withDC”, the weight is estimated asthe ratio of the luma DC values, while the offset is set as zero.
This is the algorithm used in the JM encoder and is the most
popular method; “Proposed WP” represents our proposed
algorithm. The detection results with delay of 3 frames are used
to decide which pictures use weighted prediction for both “WP
withDC” and “Proposed WP” so the only difference is the
parameter estimation. Figure 3 and Figure 4 illustrate the rate-
distortion (RD) performance of all the methods for “Trailer”
and “High-Motion” respectively. Table 1 gives the average
PSNR gain and bitrate savings using the measurement in [6]. It
clearly shows that the proposed WP algorithm outperforms the
traditional methods with 5%-30% bitrate savings. The gains are
bigger in lower bit rate and in higher motion scenes.
Figure 3: RD Performance for “Trailer”
Results for High-motion Sequence
31
33
35
37
39
200 1200 2200 3200 4200 5200
bit rate
P S N R
No WP
WP with DC
Proposed WP
6. CONCLUSIONS
In this paper, an accurate weighted prediction parameters
estimation algorithm and an efficient and robust fade detection
algorithm were proposed. The algorithms use very simple
statistics with low delay, which is suitable for practical real-
time encoding systems. Simulation results show accurate
detection results and significant compression efficiency gains.
Figure 4: RD Performance for "High-Motion"
REFERENCES
Table 1: Performance comparison
Sequence WP withDC Proposed WP
Bitrate(%) PSNR(dB) Bitrate(%) PSNR(dB)
Trailer -17.93 0.88 -22.78 1.10
Low-Mot -20.02 1.12 -34.35 1.85
High-Mot -9.86 0.54 -23.11 1.06
[1] A. M. Alattar, “Detecting Fade Regions in Uncompressed
Video Sequences”, pp 3025-3028, ICASSP 1997.
[2] X. Qian, G. Liu and R. Su, “Effective Fades and Flashlight
Detection Based on Accumulating Histogram Differences”, pp.
1245-1258, IEEE Transactions on CSVT, vol. 16, No. 10,2006.
[3] J. Boyce, “Weighted Prediction in the H.264/MPEG4 AVC
video coding standard,” ISCAS, pp. 789-792, May 2004.Fade Detection Results
-1
0
1
2
1 6 11 16 21 26
Picture nubmer
T r a n s i t i o n s t a t e
True Transition
Delay 2 Detection Error
Delay 3 Detection Error
[4] JVT Reference Software,
http://bs.hhi.de/~suehring/download
[5] P. Yin, A. Tourapis and J. Boyce, “Localized Weighted
Prediction for Video Coding”, pp.4365-4368, ISCAS, May
2005
[6] G. Bjontegaard, “Calculation of average PSNR differences
between RD-Curves”, document VCEG-M33, Mar’01.
Figure 2: Fade Detection Results for “Trailer”
2839