CHAPTER 3 LINEAR PREDICTION TECHNIQUES FOR VBR VIDEO...

40

CHAPTER 3

LINEAR PREDICTION TECHNIQUES FOR VBR VIDEO

TRAFFIC

3.1 INTRODUCTION

Optimal allocation of network resources for streaming of

multimedia content is a major challenge facing the research community today.

Videos encoded using popular compression standards such as MPEG-4 and

H.264 generate variable bit rate (VBR) traffic. Allocating fixed bandwidth

for such traffic would either degrade the quality of service or lead to under-

utilization of the network resources. Instead, the network bandwidth can be

allocated dynamically based on the nature of data currently being transmitted.

Traffic Prediction is the process of predicting future network traffic

based on the characteristics of the past traffic. Optimal dynamic allocation of

network resources is possible if future traffic content can be predicted in

advance. Since predictions have to be done in real-time, the technique used

must be simple, fast and accurate.

Linear Prediction techniques are simple and efficient, and thus are

suitable for real-time predictions. These techniques are based on estimating

the future traffic patterns as a linear weighted sum of past traffic patterns.

Initially, a suitable mathematical model is identified for the observed past

traffic, and the model parameters are dynamically changed, (based on certain

41

criteria like, prediction error, scene-changes etc.,) so as to adapt to the

changes in the future traffic behavior. Predictions are generally done at frame

level or GOP level. Many of the researchers in the past have focused on

predictions over I-frame traffic, and occasionally P or B.

Most of the existing techniques are linear regression based (AR,

DAR, ARMA, ARIMA, etc.,) and a few are based on simple adaptive filters

like LMS (Lease Mean Square), NLMS (Normalized Least Mean Square),

RLS (Recursive Least Square) etc., Estimation of model parameters is

generally done using either of, Yule-Walker, Levinson-Durbin, Least Squares,

Maximum Entropy, Maximum Likelihood (Porat 1994).

The accuracy of the linear predictor depends on the number of past

observations used in prediction, which is called the prediction order. Higher

order predictors might give better prediction results, but at the expense of

increased computational complexity. Moreover, MPEG encoded traces

exhibit both SRD (Short Range Dependent) and LRD (Long Range

Dependent) behaviors. Thus, for real-time predictions, lower order predictors

are preferred over higher order predictors, as these are simple and also show

better performance if appropriately used. All of the regression based models

can be classified under SRD models.

Predicting accurately traffic with unknown behavior is an acid test

for predictors. MPEG encoded VBR Video traces exhibit high level of

burstiness, which is an indicative of high and sudden traffic fluctuations.

Higher level of burstiness is generally attributed to changes in content and

activity of the video. Smoothing traffic prior to the prediction can produce

better prediction performances. Traffic smoothing can be achieved by

aggregation, differencing, or removing sudden peaks etc.

42

Almost all of the predictors perform well when the traffic variations

are less significant. But if the variations are high with no trend, an uncertainty

prevails on the forecast, and prediction performances decrease. ARIMA based

models are understood to perform better in such situations. Performance

evaluation of the prediction models is done by comparing the characteristics

of the predicted trace with that of the empirical trace, using either of QQ

plots, RPE, ACF comparison, Histogram comparison, SNR-1

, RMSE, Leaky

bucket simulations, NMSE, etc.,

A brief discussion on some of the popular existing techniques is

given in Section 2.1.3.

3.2 PROPOSED WORK

3.2.1 Moving Average Based Predictors for VBR Video Traffic

The aim is to achieve high accuracy in online VBR traffic

prediction at reduced complexity. Here some new prediction techniques are

proposed, which are equivalent in complexity to a second order linear

predictor, but perform much better. This work focuses on real-time prediction

of video traffic encoded in MPEG-4 standard.

An MPEG-4 encoded video is made up of units called groups of

pictures (GOPs), which consist of I, P and B frames. The GOP-level traffic

can be considered as an aggregation of the frame-level traffic. The focus here

is at single-step-ahead GOP size prediction, where the size of the next GOP in

the video traffic sequence is predicted using the previously encountered GOP

sizes.

New prediction techniques for MPEG-4 encoded variable bit rate

video traffic are proposed based on the concept of moving average and the

43

discussion is extended further using the gradient-descent approach. The

resultant predictors are both simple and accurate, and suitable for real-time

prediction. Here, NLMS technique is used as a base-line predictor for

comparative study and a performance improvement of about 11% is achieved.

3.2.1.1 Simple linear prediction

As stated in Section 2.1.1, given two consecutive GOP sizes Gi-2

and Gi-1, the value of Gi is predicted as follows:

Linear Predictor: 12 −− += iii GGG βα , (3.1)

where the coefficients α and β depend on the characteristics of the traffic and

are determined through the method of least squares.

The least squares method determines the values of α and β, for

which the sum of squares of prediction errors is minimum. This method

requires the entire video trace to be known in advance, and hence is not

suitable for real-time prediction. On the other hand, the LMS filter (Yoo

2002) is a real-time approach to prediction, where the coefficients are

adaptively set after each prediction step based on the prediction error in that

step. Another issue with the above given linear predictor is that it assumes the

size of every GOP to be linearly dependent on the sizes of the previous two

GOP’s and ignores its dependence on all other GOP sizes. In other words, the

predictor considers only the local traffic characteristics and ignores the

average traffic characteristics.

The proposed new traffic predictors exploit the average

characteristics of the video traffic. The method is applied to online traffic

prediction and is found to outperform the conventional linear predictors.

44

3.2.1.2 Moving average predictors

Based on the average measures as discussed in Section 2.1.3.3, two

new predictors for GOP size prediction are proposed here. The size of the

next GOP is predicted as a linear combination of the current GOP size and

one of the moving average measures. Given a GOP size Gk, the value of Gk+1

is predicted as follows:

SMA Predictor: Gk+1 = α × Gk + β × SMAk (3.2)

EMA Predictor: Gk+1 = α × Gk + β × EMAk (3.3)

In the next section, it is shown that through reasonable

approximations, these predictors can be simplified further.

3.2.1.3 Shot change detection

A shot is a group of consecutive GOP’s having similar sizes. The

accuracy of GOP prediction can be improved if we have a separate predictor

for each shot. In the case of second-order linear predictor, the values of α and

β can be varied for each shot (Kwong and Johnston 1992). At first a simple

algorithm for shot boundary detection is given in Figure 3.1.

A shot boundary is declared when the standard deviation (SD) of

GOP sizes (since the last shot boundary) exceeds a given threshold T.

The smaller the value of T, greater is the number of shots and so higher

is the prediction accuracy. For experiments, the value of T was set as

20000 bytes.

3.2.1.4 α-Predictors

The shot detection algorithm was applied to a MPEG-4 video trace

45

Algorithm: Shot_Boundary

prev = 0

for k = 1 to no_of_gops

∑ +=−=

k

previiG

prevkG

1

1

( )∑ +=−

−=

k

previi GG

prevkSD

1

21

if SD > T

Declare ‘k’ as a shot boundary

prev = k-1

end if

end for

Figure 3.1 Algorithm for Shot Boundary Detection

Figure 3.2 Plot of alpha (α) vs. beta (β) for Tokyo Olympics (MQ)

of Tokyo Olympics encoded in medium quality. The parameter T in the shot

detection algorithm was set to 20000 bytes. For each shot detected, the values

of the linear predictor coefficients α and β were determined using the method

of least squares. It was found that there was a high negative correlation

46

between α and β with the coefficient of determination equal to 0.940994. The

plot of α vs β is shown in Figure 3.2, which neatly fits into a straight line. The

corresponding equation is 1.008284 α + 1.020820 β = 1 and can be

approximated to α + β = 1. Clearly, it is reasonable to approximate β to 1 – α.

Based on this observation; the proposed moving average predictors are

modified to arrive at predictors which are simpler in nature. These simplified

predictors from now will be referred to as α-SMA and α-EMA. Given a GOP

size Gk, the value of Gk+1 is predicted as follows:

α-SMA Predictor: Gk+1 = α × Gk + (1 – α) × SMAk (3.4)

α-EMA Predictor: Gk+1 = α × Gk + (1 – α) × EMAk (3.5)

In addition, a modification to the linear predictor denoted as

(α-Linear predictor) is also introduced,

α-Linear Predictor: Gk+1 = α × Gk + (1 – α) × Gk-1 (3.6)

and its performance is analyzed.

3.2.1.5 Real-Time prediction

Let us analyze the performance of the proposed predictors on two

prediction tasks, namely, offline traffic prediction, where the entire video

trace is known in advance and online traffic prediction, where the predictor

coefficients need to be learnt in real-time. For online traffic prediction,

approach similar to that of the Normalized Least Mean Square (NLMS) filter

(Yoo 2002) is used.

An online version for the, α -Moving Average predictors based on

the NLMS predictor is presented first. The gradient-descent approach is used

47

for optimization. Initially, the coefficient α is set to 0 and after each prediction

step, it is adjusted so as to minimize the mean square error.

Given GOP size Gk and moving average MAk, the prediction error

in predicting Gk+1 is given by

errk = Gk+1 – (αk × Gk + (1 – αk) × MAk) (3.7)

where αk is the value of the predictor coefficient at instant k.

The mean square error (MSE) is given by

ξk = E{errk2} (3.8)

where E{.} denotes expected value. The partial derivative of the MSE

function with respect to coefficient αk is

∇ ξk = ∇ E{errk2} = -2E{errk(Gk - MAk)} (3.9)

The coefficient is then updated by taking steps proportional to the

negative of the gradient as follows:

αk+1 = αk - ∇2

µξk = αk + µE{errk(Gk - MAk)} (3.10)

where 0 < µ < 2 is called the step-size and determines the convergence of the

algorithm.

By approximating the expectation term, we get

αk+1 = αk + µerrk(Gk - MAk). (3.11)

Generally, a normalized version of this update equation is used as given

below:

48

( )

221

kk

kkkkk

MAG

MAGerr

+

−××+=+

µαα (3.12)

In a similar manner, an online version of the α-linear predictor will

require the following update equation:

( )

21

2

11

−

−+

+

−××+=

kk

kkkkk

GG

GGerrµαα (3.13)

In the case of the EMA and SMA predictors with two coefficients,

the online versions will contain two update equations as given below:

1 2 2

k k

k k

k k

err G

G MA

µα α+

× ×= +

+ (3.14)

1 2 2

k k

k k

k k

err MA

G MA

µα α+

× ×= +

+ (3.15)

3.2.1.6 Experiments and results

The proposed predictors were tested on MPEG-4 video traces from

the video trace library and their performances were compared with that of the

linear predictor in (Lanfranchi and Bing 2008) and the NLMS predictor.

Three videos were considered for the experiment, namely Star Wars IV, NBC

News and Tokyo Olympics. In all these videos, the frame rate was 30 fps and

each GOP was encoded in G16B1 pattern. Table 3.1 contains the detailed

video characteristics for the video traces encoded in low (QP = 28), medium

(QP = 12) and high (QP = 2) quality, where QP is the quantization parameter

used in encoding.

49

Table 3.1 Video Characteristics Table Type Styles

Quality

Quantization

Parameter

(QP)

Frame

Compression

Ratio

Mean

Frame

Size

S.D. of

Frame

Sizes

Mean

Frame

Bit Rate

Peak

Frame

Bit Rate

Frame

Peak /

Mean

Ratio

Star Wars IV (CIF 352x288: 53953 frames)

High 2 39.47 3852.25 32319.06 924540 8665680 9.37

Medium 12 240.74 631.65 6847.82 151595 2478240 16.35

Low 28 329.94 460.89 3955.55 110613 1495920 13.52

NBC News (CIF 352x288: 49523 frames)

High 2 10.99 13839.02 56641.99 3321367 14670480 4.42

Medium 12 93.94 1618.79 14378.43 388510 4074000 10.49

Low 28 168.09 904.66 7110.84 217118 2040960 9.40

Tokyo Olympics (CIF 352x288: 133127 frames)

High 2 19.60 7757.12 56863.20 1861705 18666000 10.03

Medium 12 124.52 1221.19 10418.80 293084 4393200 14.99

Low 28 193.22 787 4966.64 188879 1755120 9.29

It can be seen that the peak-to-mean ratio of the frame sizes, which

is the representative of the burstiness of a video, is maximum for the traces

with medium quality encoding. To analyze the performance of the proposed

predictors, Relative Percentage Error (RPE) given in (Lanfranchi and Bing

2008) is used:

1

1

Li i

Li i

RPEG

ε=

=

∑

∑= (3.16)

where L is the total number of GOP’s predicted and εi is the error

corresponding to the ith

predicted GOP given by Gi.

Offline traffic prediction

The performance of the SMA and α-SMA predictors is evaluated in

offline video traffic prediction, where the least squares method was used for

determining the coefficients. From the plot in Figure 3.3, which corresponds

50

to the Tokyo Olympics (MQ) trace, it is seen that the SMA predictor

outperforms the α -SMA predictor for all values of N. Also, at N=4, the RPE

is minimum for both predictors and lesser than the RPE for the linear and α -

linear predictor.

Similarly, in Figure 3.4, the EMA predictor outperforms the α -

EMA predictor and for values of 0.25 < δ < 0.68, both these predictors

outperform the linear and α -linear predictors. Also, note that when δ tends to

1, the performance of the EMA predictor is same as that of the linear

predictor and the α -EMA predictor is same as that of the α -linear predictor.

The error statistics for the various offline predictors are tabulated in Tables

3.2 and 3.3. For each video trace, the statistics reported corresponds to the

parameter values that gave minimum RPE. For the EMA and α -EMA

predictors, the value of δ that gave the least RPE is denoted by δ min. The

average values of δ min for these predictors are 0.414 and 0.373 respectively.

The low values of δ min indicate that the past traffic has a large influence on

future predictions. For the SMA and α -SMA predictors, the value of N that

gave minimum RPE is denoted as Nmin. The value of Nmin varies from 1 to 9

for different video traces.

Figure 3.3 Plot of N vs. RPE forTokyo Olympics (MQ)

51

Figure 3.4 Plot of δ vs. RPE for Tokyo Olympics (MQ)

Table 3.2 Error Statistics for SMA, EMA and Linear Predictors (offline

prediction case)

SMA Predictor EMA Predictor Linear Predictor

Quality Nmin

Mean

|є| S.D. |є| RPE δmin

Mean

|є| S.D. |є| RPE

Mean

|є| S.D. |є| RPE

Star Wars IV

LQ 5 104.62 125.48 22.689 0.383 104.17 126.20 22.591 105.14 129.64 22.802

MQ 3 153.31 203.39 24.257 0.493 152.73 203.41 24.165 153.56 207.36 24.297

HQ 1 749.92 1012.22 19.453 0.611 749.60 1005.56 19.445 749.92 1012.22 19.453

NBC News

LQ 3 205.72 245.81 22.736 0.54 204.303 247.19 22.578 206.34 254.63 22.804

MQ 3 369.12 454.40 22.794 0.408 367.44 451.82 22.69 371.59 470.37 22.947

HQ 5 1693.33 2041.72 12.233 0.316 1690.1 2037.86 12.209 1706.62 2101.41 12.329

Tokyo Olympics

LQ 4 109.49 130.86 13.91 0.421 109.36 130.84 13.892 109.82 132.73 13.951

MQ 4 180.19 240.03 14.752 0.368 180.08 239.61 14.743 180.89 241.46 14.809

HQ 9 885.37 1317.34 11.41 0.183 884.68 1317.53 11.401 891.16 1317.92 11.485

The small values of Nmin indicate that the simple moving average

based predictors perform well only when the average is computed for a few

previous frames in the local neighborhood.

Online traffic prediction

The online versions of the predictors were tested on different video

traces. For the EMA and α -EMA predictors, the value of δ was taken as 0.15.

52

For the SMA predictor, N was set as 20, while for the α -SMA predictor, the

value of N was fixed at 4. As expected, it is seen from Tables 3.2, 3.3 and 3.4

that the online versions do not perform as well as their offline counterparts.

Table 3.3 Error statistics for α -SMA, α -EMA and α -Linear Predictors

(offline prediction case)

α-SMA Predictor α-EMA Predictor α-Linear Predictor

Quality Nmin

Mean

|є| S.D. |є| RPE δmin

Mean

|є| S.D. |є| RPE

Mean

|є| S.D. |є| RPE

Star Wars IV

LQ 5 105.96 125.19 22.979 0.439 105.54 126.61 22.888 106.02 130.87 22.992

MQ 1 155.09 209.80 24.539 0.617 154.85 205.89 24.5 155.09 209.80 24.539

HQ 1 754.20 1022.59 19.564 0.001 753.93 988.34 19.557 754.20 1022.59 19.564

NBC News

LQ 3 207.71 246.06 22.955 0.444 206.34 244.42 22.827 208.12 256.76 23

MQ 3 374.02 453.98 23.097 0.429 372.21 451.79 22.985 374.55 474.80 23.129

HQ 5 1701.43 2040.30 12.291 0.273 1697.17 2028.62 12.26 1712.97 2107.78 12.375

Tokyo Olympics

LQ 4 109.85 131.12 13.976 0.476 109.75 131.44 13.942 110.01 133.46 13.976

MQ 4 180.80 240.88 14.802 0.455 180.66 241.11 14.79 181.23 243.06 14.837

HQ 7 887.55 1323.42 11.438 0.222 891.16 1317.92 11.427 892.51 1324.71 11.502

The NLMS predictor with prediction order 2 was used as the base-

line predictor for performance analysis. The value of step-size � that gave

minimum prediction error for the base-line predictor was noted and used as

the step-size for the proposed online predictors. It is clearly seen that the

proposed predictors perform better than NLMS predictor (even with higher

prediction order). Unlike offline prediction, the α -predictors perform the best

among all the online predictors with an average performance improvement of

10-11% over the base-line predictor.

Surprisingly, the α-SMA predictor gives the best overall

performance in online video traffic prediction. An important advantage of this

predictor is that it has the least complexity among the online predictors

53

considered and requires only a single variable to be changed after each

prediction step.

Clearly, the predictor is both simple and accurate and so it is well

suited for real-time VBR traffic prediction.

Table 3.4 Comparison of Performance of Online Predictors

Relative Percentage Error

Higher-order

Predictors

Base-line

Predictor Proposed Predictors

% Improvement

Over Base-line

Predictor

NLMS NLMS

Quality

P=8 P=6 P=4 P=2

α-

Linear SMA

α-

SMA EMA

α-

EMA

α-

Linear

α-

SMA

α-

EMA

Star Wars IV

LQ 26.218 25.818 25.580 25.171 22.957 24.287 22.893 24.126 22.886 8.79 9.05 9.08

MQ 28.664 28.278 27.863 27.253 24.482 26.296 24.440 26.076 24.381 10.17 10.32 10.54

HQ 23.916 23.509 23.208 23.797 19.564 22.883 19.654 22.406 19.596 17.79 17.41 17.65

NBC News

LQ 25.809 25.723 25.560 25.742 22.891 24.438 22.720 24.108 22.838 11.08 11.74 11.28

MQ 26.244 26.047 25.822 25.955 23.048 24.683 22.863 24.319 22.900 11.20 11.91 11.77

HQ 15.017 14.538 14.310 13.912 12.362 13.469 12.253 13.231 12.213 11.14 11.92 12.21

Tokyo Olympics

LQ 15.691 15.518 15.358 15.080 13.981 14.693 13.954 14.672 13.963 7.29 7.47 7.41

MQ 17.021 16.898 16.780 16.381 14.853 15.909 14.826 15.886 14.889 9.33 9.49 9.11

HQ 13.506 13.346 13.155 12.692 11.482 12.274 11.436 12.318 11.446 9.53 9.90 9.82

Average 10.70 11.02 10.99

3.2.2 VSSNLMS Augmented ARIMA Based Prediction for VBR

Video Traffic

An ARIMA based model augmented by VSSNLMS for real-time

prediction of VBR video traffic is introduced here. The synergy of the two

can successfully address the challenges in traffic prediction such as accuracy

in prediction, resource management and utilization. ARIMA application on a

VBR video trace results in a component wise representation of the trace

which is then used for prediction. The step-size-adjusted ALP applied

afterwards, ensures consistency in error fluctuation and better accuracy in turn.

54

Performance evaluation of the proposed method is carried out using

RMSE. The average prediction accuracy is improved by 26% and the

average error variance is reduced by 26%. The performance of the proposed

method is thoroughly investigated by applying it on video traces of different

qualities and characteristics.

3.2.2.1 ARIMA process for GOP (16, 1)

ARIMA (Autoregressive Integrated Moving Average) as

introduced in Section 2.1.3.5 is a statistical methodology in time series

analysis which is chiefly used in the forecast or prediction of future terms

based on the characteristics of the past terms. It is the combination of three

components namely the Autoregression (AR), Integration (I) and Moving

Average (MA). The model is generally referred to as an ARIMA (p,d,q)

model, where p, d, and q are integers greater than or equal to zero and refer to

the order of the autoregressive, integrated and moving average parts of the

model, respectively. When one of the terms is zero, it is usual to drop AR, I or

MA. For instance, an I(1) model is ARIMA(0,1,0) and MA(1) model is

ARIMA(0,0,1). The proposed work makes good use of ARIMA for modeling

the frame size process.

The entire work can be divided into two segments. The input to the

first segment is the trace consisting of a frame size sequence. Here, an

ARIMA based mechanism is used to predict the future frame size based on

the past and current frame sizes. It is well established fact that ARIMA based

predictors perform the best for traffics with no seasonality or trend present in

it (Kang et al 2010). So first of all the input trace is preprocessed for the

removal of the seasonality components and trend, if any, for making it fit for

ARIMA. Then the trace is decomposed into several component processes and

represented as linear combination of its own past values along with the past

values of a newly generated ARIMA process obtained from the original trace

55

as done in (Kang et al 2010). The prediction is done over the ARIMA model

to yield the predicted values of the future frame sizes. A comparison between

predicted and actual values is done to evaluate the performance of the model.

First of all the input trace is prepared. For this, the seasonality

components must be removed. Another added advantage of this process is that

the input can be decomposed and expressed in terms of additive components

so that a separate model can be used for each subsequence.

Let Xt be the input frame size sequence with a regular fixed GOP

pattern for a VBR compressed video, denoted by GOP(s,S) where s and S

being the difference between successive P to (I or P) and consecutive I to I

frame sizes respectively. The sample process Xt is decomposed as:

t

S

t

s

t XX ε++=tX (3.17)

where s

tX and S

tX denote the seasonal components that respectively appear in

every sth

and Sth

sample, and tε is the error term. Then the differencing

operation is performed multiple times for each lag. The difference orders D

and d are set prior to performing the differencing operation. As it is indicated

earlier that most of the traces used for experimentation in this work are

encoded in G16B1 pattern. Assuming a GOP pattern GOP (16, 1), the

differenced process Yt can be formulated as:

tt XY Dd BB )1()1( 161 −−= (3.18)

where B is the backward operator, which is widely used in statistics to make

the time series expression more compact and is given by:

ktt

k XXB −= (3.19)

56

Thus (1 – B ) Xt denotes the differenced time series, Xt - Xt-1.

Let ARIMA model under consideration be (1,1,1)1 × (1,1,1)16 , making d = D

= 1, the Equation (3.18) can be rewritten as:

1 16

t t

16 17 1

t

Y (1 )(1 )X

(1 )X

B B

B B B

= − −

= − + − (3.20)

Expanding using the above convention

11716 −−− −+−= tttt XXXXtY (3.21)

In deriving the state space model for ARIMA, we need to represent

Xt as a linear combination of its past values. From Equation (3.21), it is

verifiable that

tttt YXXX +−+= −−− 17161tX (3.22)

The differenced process Yt is a multiplicative ARMA process

represented as ARMA (1,1)1 × (1,1)16. This can be represented as

tBBBB εθφ )1)(1()1)(1( 161161 Θ++=Φ−− (3.23)

where φ , Φ, θ and Θ are the coefficients of moving average and

autoregressive expression.

To make the above equation more manageable a new AR process

Zt is introduced as:

tBB εφ =Φ−−= tt ZZ )1)(1( 161 (3.24)

Now, rewrite Yt from Zt as

57

17161 −−− Θ+Θ++= tttt ZZZZ θθtY (3.25)

Equation (3.22) can be rewritten by substituting for Yt using Equation (3.25)

as

ttt ZXX += (3.26)

where 17161 −−− −+= tttt XXXX and 17161 −−− Θ+Θ++= ttttt ZZZZZ θθ (3.27)

Thus, the terms Xt-1, Xt-16, Xt-17, Zt-1, Zt-16, Zt-17, are enough to

express Xt. However as these terms are certain lags apart, all the terms Xt-1

through Xt-17 and Zt through Zt-17 are mandatory for properly representing Xt.

The Equation (3.26) above is the governing equation of the ARIMA and

hence the entire prediction process. It is used to generate the predicted frame

size sequence. The error is measured by taking the difference between the

actual value and the predicted value in each step.

3.2.2.2 Proposed methodology

The second segment of the work chiefly comprises of the

application of VSSNLMS over the result obtained in the first segment. Two

major modifications to the traditional NLMS have been proposed here.

Initially, a threshold is chosen for the prediction error based on the

characteristics of the input trace. This can ensure that the error values

obtained would not vary beyond the threshold in either direction, giving out a

lesser variance in error. This flexibility imparts a great deal of freedom

in network management while allocating resources, as the fluctuation of error

is curtailed by the threshold value. This has been shown schematically in

Figure 3.5.

58

Next the policy of using variable step-size in NLMS has been

adopted. Adaptive Linear Prediction is a rudimentary and renowned technique

in the fields of signal processing and forecasting.

Figure 3.5 Schematic Diagram of the Proposed Work

A plethora of experiments had happened where the researchers

center on ALP and pave the way for novel prediction methodologies that vary

in effectiveness, easiness, accuracy and so on. It is a linear prediction

mechanism where the predicted value can be expressed as a linear

combination of a certain number of past values of the same process.

3.2.2.3 LMS algorithm

This algorithm belongs to the class of adaptive linear predictors,

where the predictor coefficients (also called as weights) are updated at every

prediction step based on the prediction error.

The fundamental equation that governs the pth

order linear predictor

is expressed as:

( )∑=

−+ ==N

l

t

T

nlttNt XwXlwX0

ˆ

(3.28)

Step-Size-Adjusted ALP

Original trace

ARIMA trace

Threshold

Equation to compute ‘next’ µ

Algorithm to compute ‘next’ µ

Predicted process

59

where ( ) ( )( )T

ttt pwww 1,,.........0 −= denotes the prediction filter coefficient

vector which minimizes the mean square error. Parameter p indicates the

number of past values used for prediction.

Let NtNtt XXe −− −= ˆ be the prediction error at the tth

instant. The

new weights for the prediction at time instant ‘t+1’ is given by

W(t+1) = W(t) + � e(t)X(t) (3.29)

where � is the step-size and is usually a fixed value. Also ( ]2,0∈µ (Haykin

1991). It is fixed throughout the prediction process. But, it is difficult to

choose an optimal step-size. A higher step-size contributes to faster

convergence and poorer performance (i.e. high prediction error). Similarly, a

smaller step-size leads to slower convergence and provides better

performance.

It is proved that LMS will converge in the mean if � satisfies the

condition, 0<µ

1<

max

2

µ where �max is the eigenvalue of the autocorrelation

coefficient function.

The optimal weights for the LMS predictor can be obtained by

solving Weiner-Hopf equations, which requires the knowledge of

autocorrelation function of the whole trace (Haykin 1991). But for online

prediction, the whole trace is not known in advance. So, the weights obtained

give a theoretical minimum on the mean square error.

3.2.2.4 NLMS algorithm

Unlike LMS predictors, the NLMS predictors have a different

weights updating equation as given below. Starting with an initial estimate of

60

filter coefficient w, and for each new data point, the ALP method updates wt

using the recursive equation by

Wt+1 = Wt +2t

tt

X

e Xµ (3.30)

NLMS has its advantage over LMS in terms of its sensitivity to

step-size �, i.e. NLMS is relatively less sensitive to changes in �. Moreover,

NLMS will converge in the mean when 0<� <2 (Haykin 1991). A larger �

results in larger prediction error and similarly a smaller � results in smaller

prediction error. On the other hand, the convergence rate is high for larger

values of � and is low for smaller values of �.

During the process of updating the weight vector of generic NLMS,

the step-size parameter � would be generally fixed. This will be referred to as

FSSNLMS. But it has been verified in (Zhao et al 2002) that the usage of a

variable step-size (VSS) for the prediction of each value could result in better

prediction accuracy. Presented here is a variable step-size prediction

technique along with the incorporation of a threshold value to reduce the error

variance.

The NLMS mechanism is altered to include a VSS value at every

prediction step. To derive the value of the step-size to be used for predicting

the next frame size, a modified version of the variable step-size prediction

equation in (Zhao et al 2002) is used. The equation is as follows:

( )2

12

2

11 −+ ++= kkkk eqeqγαµµ (3.31)

where 21 ,, qandqγα are constants having values determined in (Zhao et al

2002) as 0.98, 0.015, 0.7 and 0.3 respectively.

61

The above equation does not work well for traces with large frame

sizes. Thus the above equation is modified as below.

( )

21

2

21

2

121

−

−

++

++=

kk

kk

kkee

eqeqγαµµ (3.32)

where, µ i and ei represent step-size and prediction error respectively for the ith

value in the trace. This change of equation can be explained by the

observation that the original equation is largely unfit for huge frame sizes,

generally present in high quality video frame size sequences. The step-size for

the (k+1)th

prediction instance (µ k+1) is computed using Equation (3.32), if

the prediction error in the kth

prediction instance is less than a predefined

threshold T. Otherwise, the value of µk+1 is determined by the algorithm given

in Figure 3.6.

Figure 3.6 Algorithm for Computing �k+1

3.2.2.5 Experimental results

A variety of analyses have been performed here for video traffic

with varying qualities for 10 minute traces taken from the movie Star Wars-

IV and NBC News. The two traces analyzed here are taken from videos of

different ilk; the movie Star Wars IV has rapid scene-changes and

Algorithm: Find_ µk+1

for (i = k; i > 0 ;i--)

if (ei < T)

{ ik µµ =+1

break;}

end for

62

insignificant correlations. On contrary, the NBC news possesses relatively

high degree of correlation. This establishes that the procedure explained here

is capable of processing and yielding good results regardless of the broad type

of video being used as input.

The general attributes of the input video traces are as follows. All

the traces being experimented here are encoded using H.264 or MPEG4. Each

of them is of duration 10 minutes and contains a total of 18000 frames. There

will be 30 frames every second. The GOP pattern is G16B1 and CIF

resolution is 352 ×288.

The frame size statistics of the individual input video traces used

for analysis is summarized in Table 3.5.

Table 3.5 Frame Size Statistics of Input Video Traces

Name Quality Max

Frame size

Min

Frame Size

Avg Frame

Size Burstiness

Low 13344 168 699.58 19.06

Medium 82152 168 6332.6 12.97 Star

Wars IV High 476912 19600 111352 4.28

Low 19024 168 1553.7 12.25

Medium 140040 168 19227 7.28 NBC-

News High 706032 61056 228174 11.56

A comparison of performance for the different prediction schemes

of traditional FSSNLMS, ARIMA and ARIMA augmented by VSSNLMS, is

carried out using RMSE as the parameter. The results obtained are

summarized in Table 3.6.

It can be observed from the table that, combining ARIMA with

variable step-size NLMS is performing the best for all of the input samples.

63

Also, ARIMA stands superior to the traditional FSSNLMS. So, generically

the method presented here, i.e., ARIMA incorporated with VSSNLMS, can be

learned to perform well for input traces of substantial differences in quality.

As evident from Table 3.6, the average performance improvement of the

proposed method over FSSNLMS is about 26% and over standard ARIMA it

Table 3.6 Performance Analysis by RMSE

RMSE % Percentage of Performance

Improved

Inp

ut

tra

ce

Qu

ali

ty

FSSNLMS ARIMA

ARIMA

augmented

with

VSSNLMS

ARIMA

over

FSSNLMS

ARIMA

augmented

with

VSSNLMS

over

FSSNLMS

ARIMA

augmented

with

VSSNLMS

over

ARIMA

Low 4.64 3.87 3.13 16.59 32.54 23.64

Medium 9.21 8.29 6.47 9.98 18.89 10.97

Star

Wars

IV High 14.29 11.76 10.40 17.70 27.22 13.07

Low 4.83 3.75 3.24 22.36 32.91 15.74

Medium 7.87 6.72 5.12 14.61 29.86 21.73 NBC -

News High 13.98 12.96 10.12 7.29 20.45 16.54

Average performance improvement 14.75 26.97 16.94

is about 16%. When the traces with extremely opposing characteristics (Star

Wars-IV and NBC News) were tested with the method, the results were found

to be quite satisfactory, based on which it can be justified that the proposed

method performs relatively better and can handle traffic of any genre

efficiently.

A detailed analysis is performed to verify how striking the feature

of having a provision to specify the threshold to make the prediction error

consistent. Performance analysis is done using the error variance as a

measure. Right through the analysis, the error threshold is kept as 10 percent

of the average frame size. It has been observed that the method described here

gives the minimum variance in prediction error, and outperforms ARIMA and

64

traditional FSSNLMS methods. This has been quantitatively portrayed in

Table 3.7. The best case average performance improvement is about 26%.

Table 3.7 Performance Analysis by Error Variance

Variance Percentage of Performance

Improved

Inp

ut

trace

Qu

ali

ty

FSSNLMS ARIMA

ARIMA

with

VSSNLMS

ARIMA

over

FSSNLMS

ARIMA

with

VSSNLMS

over

FSSNLMS

ARIMA

with

VSSNLMS

over

ARIMA

Low 14517.414 12026.137 10023.76 17.16 30.95 16.65

Medium 56156.529 50700.641 45510.812 9.72 18.96 10.24

Star

Wars

IV High 181345.601 162231.823 137691.16 10.54 24.07 15.13

Low 11610.331 9467.55 7935.801 18.46 31.65 16.18

Medium 13007.78 12752.034 10401.548 1.97 20.04 18.43 NBC-

News High 186534.925 156091.6 120225.71 16.32 35.55 22.98

Average Performance improvement 12.36 26.87 16.60

Figure 3.7 below is a depiction of the prediction error of a high

quality video trace of the movie Star Wars-IV, obtained when experimented

with the proposed method. The threshold value fixed was 20000 and it can be

easily observed that the fluctuation in error is limited by the threshold.

Figure 3.7 Fluctuation of Error (High Quality – Star Wars-IV)

65

3.3 SUMMARY

Some simple and accurate predictors for real-time MPEG-4 GOP

traffic prediction have been implemented and their performances have been

thoroughly analyzed. Performance analysis of the proposed methods is carried

out using RPE. Through extensive simulation results, it is shown that, that the

proposed predictors outperform the standard NLMS predictor in online

prediction. To Justify and validate the statements made earlier, a wide variety

of experiments were conducted on traces of different qualities and types. The

proposed techniques are both simple and accurate, and suitable for the real-

time prediction. The average performance improvement achieved over the

baseline predictor is upto 11%.

An ARIMA based mechanism augmented by VSSNLMS for the

prediction of VBR video traffic was devised and implemented. Variable Step

Size for NLMS has been implemented using a modified equation. Upon

evaluating the performance using RMSE, an improvement of 16% over

standard ARIMA was observed. Also, the error variance was reduced by 26%.

The methodology was justified through a detailed analysis of different

qualities of VBR video traces taken from two different video files. The

provision of specifying a threshold value to limit the error fluctuation

effectively controls the variation in prediction errors. Here, the threshold

value was fixed empirically based on the characteristics of the input trace.

CHAPTER 3 LINEAR PREDICTION TECHNIQUES FOR VBR VIDEO...

Documents

Transcript of CHAPTER 3 LINEAR PREDICTION TECHNIQUES FOR VBR VIDEO...