16. Mean Square Estimation - Innovative · PDF file13 where are the optimum values from...

1

16. Mean Square Estimation

Given some information that is related to an unknown quantity ofinterest, the problem is to obtain a good estimate for the unknown interms of the observed data.

Suppose represent a sequence of random variables about whom one set of observations are available, and Yrepresents an unknown random variable. The problem is to obtain a good estimate for Y in terms of the observations Let

represent such an estimate for Y.Note that can be a linear or a nonlinear function of the observation

Clearly

represents the error in the above estimate, and the square of

nXXX ,,, 21

.,,, 21 nXXX

)(),,,(ˆ21 XXXXY n ϕϕ == (16-1)

)(⋅ϕ.,,, 21 nXXX

)(ˆ)( XYYYX ϕε −=−= (16-2)2|| ε

PILLAI

2

the error. Since is a random variable, represents the meansquare error. One strategy to obtain a good estimator would be tominimize the mean square error by varying over all possible forms of and this procedure gives rise to the Minimization of the Mean Square Error (MMSE) criterion for estimation. Thus under MMSE criterion,the estimator is chosen such that the mean square error is at its minimum.

Next we show that the conditional mean of Y given X is the best estimator in the above sense.Theorem1: Under MMSE criterion, the best estimator for the unknownY in terms of is given by the conditional mean of Ygives X. Thus

Proof : Let represent an estimate of Y in terms of Then the error and the mean square

error is given by

ε }|| { 2εE

),(⋅ϕ

)(⋅ϕ

nXXX ,,, 21

}.|{)(ˆ XYEXY ==ϕ (16-3)

)(ˆ XY ϕ=).,,,( 21 nXXXX = ,YY −=ε

}|)(| {}|ˆ| {}|| { 2222 XYEYYEE ϕεσε −=−== (16-4) PILLAI

}|| { 2εE

3

Since

we can rewrite (16-4) as

where the inner expectation is with respect to Y, and the outer one iswith respect to Thus

To obtain the best estimator we need to minimize in (16-6)with respect to In (16-6), since and the variable appears only in the integrand term, minimizationof the mean square error in (16-6) with respect to is equivalent to minimization of with respect to

}]|{[][ XzEEzE zX=

}]|)(| {[}|)(| {

z

2

z

22 XXYEEXYE YX ϕϕσε −=−=

.X

∫∞+

∞−−=

−=

.)(}|)(| {

}]|)(| {[

2

22

dxXfXXYE

XXYEE

Xϕ

ϕσε

(16-6)

(16-5)

,ϕ 2εσ

.ϕ ,0)( ≥Xf X ,0}|)(| { 2 ≥− XXYE ϕϕ

ϕ2εσ

}|)(| { 2 XXYE ϕ− .ϕ

PILLAI

4

Since X is fixed at some value, is no longer random, and hence minimization of is equivalent to

This gives

or

But

since when is a fixed number Using (16-9)

)(Xϕ}|)(| {

2 XXYE ϕ−

.0}|)(| { 2 =−

∂∂ XXYE ϕϕ

(16-7)

0}|)({| =− XXYE ϕ

(16-8)

),(}|)({ XXXE ϕϕ = (16-9)

)( , XxX ϕ= ).(xϕ

PILLAI

.0}|)({}|{ =− XXEXYE ϕ

5

in (16-8) we get the desired estimator to be

Thus the conditional mean of Y given represents the bestestimator for Y that minimizes the mean square error.

The minimum value of the mean square error is given by

As an example, suppose is the unknown. Then the bestMMSE estimator is given by

Clearly if then indeed is the best estimator for Y

}.,,,|{} |{)(ˆ21 nXXXYEXYEXY ===ϕ

nXXX ,,, 21

.0)}|{var(

]} |)|(| {[}|)|(| {)var(

222min

≥=

−=−=

XYE

XXYEYEEXYEYEXY

σ

(16-11)

(16-10)

3XY =

,3XY = 3ˆ XY =

.}|{}|{ˆ 3

3 XXXEXYEY === (16-12)

PILLAI

6

in terms of X. Thus the best estimator can be nonlinear.Next, we will consider a less trivial example.

Example : Let

where k > 0 is a suitable normalization constant. To determine the bestestimate for Y in terms of X, we need

Thus

Hence the best MMSE estimator is given by

<<<

=otherwise, 0

10 ,),(,

yx kxyyxf YX

).|(| xyf XY

1.x0 ,2

)1(2

),()(

212

1

1

,

<<−

==

== ∫ ∫xkxkxy

kxydydyyxfxf

x

x xYXX

y

x

1

1

.10 ;1

22/)1( )(

),()|( 22

, <<<−

=−

== yxxy

xkxkxy

xfyxf

xyfX

YXXY

PILLAI

(16-13)

7

Once again the best estimator is nonlinear. In general the best estimator is difficult to evaluate, and hence next we will examine the special subclass of best linear estimators.

Best Linear EstimatorIn this case the estimator is a linear function of the

observations Thus

where are unknown quantities to be determined. The mean square error is given by

.1

)1(32

11

32

132

)|( }|{)(ˆ

2

2

2

31

2

3

1

2

121

12

1

22

|

xxx

xx

xy

dyydyy

dyxyfyXYEXY

x

xxx xy

x XY

−++

=−−

=−

=

==

===

∫∫

∫

−−

ϕ

}|{ XYE

(16-14)

Y.,,, 21 nXXX

naaa ,,, 21

∑=

=+++=n

iiinnl XaXaXaXaY

12211 .ˆ (16-15)

)ˆ( lYY −=εPILLAI

8

and under the MMSE criterion should be chosen so that the mean square error is at its minimum possible value. Let represent that minimum possible value. Then

To minimize (16-16), we can equate

This gives

But

naaa ,,, 21

}|| {}|ˆ{|}|| { 222 ∑−=−= iil XaYEYYEE ε

}|| { 2εE2nσ

∑=

−=n

iiiaaan XaYE

n 1

2

,,,

2 }.|{|min21

σ (16-17)

,n.,,kEak

21 ,0}|{| 2 ==∂∂

ε (16-18)

.02||}|{|*2

2 =

∂∂

=

∂∂

=∂∂

kkk aE

aEE

aε

εε

ε (16-19)

PILLAI

(16-16)

9

Substituting (16-19) in to (16-18), we get

or the best linear estimator must satisfy

Notice that in (16-21), represents the estimation error and represents the data. Thus from

(16-21), the error is orthogonal to the data for the best linear estimator. This is the orthogonality principle.

In other words, in the linear estimator (16-15), the unknownconstants must be selected such that the error

.)()(

11k

k

n

iii

kk

n

iii

k

Xa

Xa

aY

a

XaY

a−=

∂

∂−

∂∂

=∂

−∂=

∂∂ ∑∑

==ε

,0} {2}|{| *2

=−=∂

∂k

k

XEa

Eε

ε

.,,2,1 ,0} { * nkXE k ==ε

(16-20)

(16-21)

∑ =− ni ii XaY 1 ),(

ε

εnkX k →=1 ,

nkX k →=1 ,

naaa ,,, 21

PILLAI

10

is orthogonal to every data for thebest linear estimator that minimizes the mean square error.

Interestingly a general form of the orthogonality principle holds good in the case of nonlinear estimators also.Nonlinear Orthogonality Rule: Let represent any functional form of the data and the best estimator for Y given With

we shall show that

implying that

This follows since

∑ =−= ni ii XaY

1 ε nXXX ,,, 21

)(Xh}|{ XYE .X

}|{ XYEYe −=

).( }|{ XhXYEYe ⊥−=

,0)}({ =XehE

.0)}({)}({]}|)([{)}({})(]|[{)}({

)}(])|[{()}({

=−=−=

−=−=

XYhEXYhEXXYhEEXYhEXhXYEEXYhE

XhXYEYEXehE

PILLAI

(16-22)

11

Thus in the nonlinear version of the orthogonality rule the error is orthogonal to any functional form of the data.

The orthogonality principle in (16-20) can be used to obtainthe unknowns in the linear case.

For example suppose n = 2, and we need to estimate Y in terms of linearly. Thus

From (16-20), the orthogonality rule gives

Thus

or

naaa ,,, 21

21 and XX

2211ˆ XaXaYl +=

0}){(}X {

0}){(}X {*22211

*2

*12211

*1

=−−=

=−−=

XXaXaYEE

XXaXaYEE

ε

ε

}{}|{|}{

}{}{}|{|*2 2

221

*21

*1 2

*121

21

YXEaXEaXXE

YXEaXXEaXE

=+

=+

PILLAI

12

(16-23) can be solved to obtain in terms of the cross-correlations.The minimum value of the mean square error in (16-17) is given by

But using (16-21), the second term in (16-24) is zero, since the error isorthogonal to the data where are chosen to be optimum. Thus the minimum value of the mean square error is givenby

21 and aa

=

}{

}{

}|{| }{

}{ }|{|*2

*1

2

12

2*21

*12

21

YXE

YXEaa

XEXXE

XXEXE(16-23)

2nσ

naaa ,,, 21,iX

}. {min} {min

})({min}{min

}|{|min

*

1,,,

*

,,,

1

*

,,,

*

,,,

2

,,,

2

2121

2121

21

l

n

iiaaaaaa

n

iiiaaaaaa

aaan

XEaYE

XaYEE

E

nn

nn

n

εε

εεε

εσ

∑

∑

=

=

−=

−==

=

(16-24)

PILLAI

13

where are the optimum values from (16-21).Since the linear estimate in (16-15) is only a special case of

the general estimator in (16-1), the best linear estimator that satisfies (16-20) cannot be superior to the best nonlinear estimator

Often the best linear estimator will be inferior to the best estimator in (16-3).

This raises the following question. Are there situations in which the best estimator in (16-3) also turns out to be linear ? Inthose situations it is enough to use (16-21) and obtain the best linear estimators, since they also represent the best global estimators.Such is the case if Y and are distributed as jointly Gaussian

We summarize this in the next theorem and prove that result.Theorem2: If and Y are jointly Gaussian zero

naaa ,,, 21

)(Xϕ

}.|{ XYE

}{}|{|

}){(} {

1

*2

1

**2

∑

∑

=

=

−=

−==

n

iii

n

iiin

YXEaYE

YXaYEYE εσ

(16-25)

nXXX ,,, 21

nXXX ,,, 21PILLAI

14

mean random variables, then the best estimate for Y in terms of is always linear.

Proof : Let

represent the best (possibly nonlinear) estimate of Y, and

the best linear estimate of Y. Then from (16-21)

is orthogonal to the data Thus

Also from (16-28),

nXXX ,,, 21

}|{),,,(ˆ 21 XYEXXXY n ==ϕ (16-26)

∑=

=n

iiil XaY

1

ˆ (16-27)

.1 , nkX k →=

(16-28)

.1 ,0}X { *k nkE →==ε (16-29)

.0}{}{}{1

=−= ∑=

n

iii XEaYEE ε (16-30)

PILLAI

1

n

l i ii

Y Y Y a Xε=

= − = −∑∆

15

Using (16-29)-(16-30), we get

From (16-31), we obtain that and are zero mean uncorrelated random variables for But itself represents a Gaussian random variable, since from (16-28) it represents a linear combinationof a set of jointly Gaussian random variables. Thus and X are jointly Gaussian and uncorrelated random variables. As a result, andX are independent random variables. Thus from their independence

But from (16-30), and hence from (16-32)

Substituting (16-28) into (16-33), we get

.1 nk →=ε kX

ε

ε

.1 ,0}{}{} { ** nkXEEXE kk →=== εε (16-31)

ε

}.{}|{ εε EXE = (16-32)

,0}{ =εE

.0}|{ =XE ε (16-33)

0}|{}|{ 1

=−= ∑=

XXaYEXEn

iiiε

PILLAI

16

or

From (16-26), represents the best possible estimator,and from (16-28), represents the best linear estimator.Thus the best linear estimator is also the best possible overall estimatorin the Gaussian case.

Next we turn our attention to prediction problems using linearestimators.

Linear PredictionSuppose are known and is unknown.

Thus and this represents a one-step prediction problem.If the unknown is then it represents a k-step ahead predictionproblem. Returning back to the one-step predictor, let represent the best linear predictor. Then

.}|{}|{1

1

l

n

iii

n

iii YXaXXaEXYE === ∑∑

==(16-34)

)(}|{ xXYE ϕ=∑ =

ni ii Xa

1

nXXX ,,, 21 1+nX,1+= nXY

,knX +

1ˆ

+nX

PILLAI

17

where the error

is orthogonal to the data, i.e.,

Using (16-36) in (16-37), we get

Suppose represents the sample of a wide sense stationary iX

(16-35)

,1 ,

ˆ

1

1

1

12211

1111

==

++++=

+=−=

+

+

=

+

=+++

∑

∑

n

n

iii

nnn

n

iiinnnn

aXa

XXaXaXa

XaXXXε

(16-36)

.1 ,0} { * nkXE kn →==ε (16-37)

∑+

=→===

1

1

** .1 ,0}{} {n

ikiikn nkXXEaXE ε (16-38)

PILLAI

11

ˆ = ,n

n i ii

X a X+=

−∑∆

18

stochastic process so that

Thus (16-38) becomes

Expanding (16-40) for we get the following set oflinear equations.

Similarly using (16-25), the minimum mean square error is given by

** )(}{ ikkiki rrkiRXXE −− ==−=

)(tX

(16-39)

.1 ,1 ,0}{ 1

1

1

* nkaraXE n

n

ikiikn →==== +

+

=−∑ε (16-40)

,,,2,1 nk =

.0

20

10

10*

33*

22*

11

121302*

11

1231201

nkrrararara

krrararara

krrararara

nnnn

nnn

nnn

=←=+++++

=←=+++++

=←=+++++

−−−

−−

−

(16-41)

PILLAI

19

The n equations in (16-41) together with (16-42) can be represented as

Let

.

}){(

} {} {}|{|

01*

23*

12*

1

1

1

*1

1

1

*1

*1

*22

rrararara

raXXaE

XEYEE

nnnn

n

iini

n

inii

nnnn

+++++=

==

===

−−

+

=−+

+

=+

+

∑∑

εεεσ

(16-42)

.

0 0 0 0

1

2n

3

2

1

0*

1*

1*

10*

2*

1

20*

1*2

110*

1

210

=

−

−−

−

−

σn

nn

nn

n

n

n

a

aaa

rrrr

rrrr

rrrr

rrrr

rrrr

(16-43)

PILLAI

20

Notice that is Hermitian Toeplitz and positive definite. Using (16-44), the unknowns in (16-43) can be represented as

Let

nT

.

0*

1*

1*

110*

1

210

=

−

−

rrrr

rrrr

rrrr

T

nn

n

n

n (16-44)

=

=

−

−

1

2

2n

13

2

1

of

columnLast

0 0 0 0

1

n

nn

n T

T

a

aaa

σ

σ

(16-45)

PILLAI

21

Then from (16-45),

Thus

.

1,12,11,1

1,22221

1,11211

1

=

++++

+

+

−

nnn

nn

nn

nnnn

nnnn

n

TTT

TTT

TTT

T

.

1

1,1

1,2

1,1

22

1

=

++

+

+

nnn

nn

nn

n

nT

T

T

a

aa

σ

,0 11,1

2 >= ++ nnn

n Tσ

(16-46)

(16-47)

(16-48)PILLAI

22

and

Eq. (16-49) represents the best linear predictor coefficients, and they can be evaluated from the last column of in (16-45). Using these,The best one-step ahead predictor in (16-35) taken the form

and from (16-48), the minimum mean square error is given by the(n +1, n +1) entry of

From (16-36), since the one-step linear prediction error

.

1

1,1

1,2

1,1

1,12

1

=

++

+

+

++

nnn

nn

nn

nnn

n T

T

T

Ta

aa

(16-49)

nT

.1−nT

.)(1ˆ1

1,1,11 ∑

=

++++

−=

n

ii

ninnn

nn XT

TX (16-50)

,11111 XaXaXaX nnnnnn ++++= −−+ε (16-51)PILLAI

23

we can represent (16-51) formally as follows

Thus, let

them from the above figure, we also have the representation

The filter

represents an AR(n) filter, and this shows that linear prediction leadsto an auto regressive (AR) model.

nn

nnn zazazaX ε→++++→ −−−

−+ 1 1

21

11

,1)( 12

11 n

nnn zazazazA −−−

− ++++= (16-52)

. )(

1 1+→→ nn

n XzA

ε

nnnn zazazazA

zH −−−

− ++++==

12

111

1)(

1)( (16-53)

PILLAI

24

The polynomial in (16-52)-(16-53) can be simplified using (16-43)-(16-44). To see this, we rewrite as

To simplify (16-54), we can make use of the following matrix identity

)(zAn

)(zAn

=

=

+++++=

−−−−−−−−−

−−−

−−−

2n

11)1(2

1

1)1(

121

)1(21

0 0 0

]1,,,,[

1

]1,,,,[

1)(

σ

nnn

n

nn

nnnn

n

Tzzza

aa

zzz

zazazazazA

(16-54)

PILLAI

.

0 0

1

−

=

−

− BCADC

AIABI

DCBA

(16-55)

25

Taking determinants, we get

In particular if we get

Using (16-57) in (16-54), with

. 1BCADA

DCBA −−= (16-56)

,0≡D

.0

)1( 1

CBA

ABCA

n−=−

(16-57)

=== −−−−

2

1)1( 0

, ],1,,,,[

n

nnn BTAzzzC

σ

PILLAI

26

we get

Referring back to (16-43), using Cramer’s rule to solve for we get

.

1

||

0 1,,,

0

0 0

||

)1( )(

1)1(10

*2

*1

110*

1

210

2

1

2

−−−−

−−

−

−−

=−

=

zzz

rrrr

rrrr

rrrr

T

zz

T

TzA

nnnn

n

n

n

n

nn

n

n

n

nσ

σ

(16-58)

),1(1 =+na

PILLAI

1||||

||

1201

102

1 === −−

−

+n

nn

n

n

n

n

n TT

Trr

rr

a σ

σ

27

or

Thus the polynomial (16-58) reduces to

The polynomial in (16-53) can be alternatively represented as

in (16-60), and in fact represents a stable

.0||

||1

2 >=−n

nn T

Tσ

.1

1

||

1 )(

12

11

1)1(10

*2

*1

110*

1

210

1

nnn

nnnn

n

n

nn

zazaza

zzz

rrrr

rrrr

rrrr

TzA

−−−

−

−−−−

−−

−

−

++++=

=

(16-59)

(16-60)

PILLAI

)(zAn

)( ~ )(

1)( nARzA

zHn

=

28

AR filter of order n, whose input error signal is white noise of constant spectral height equal to and output is It can be shown that has all its zeros in provided

thus establishing stability.

Linear prediction ErrorFrom (16-59), the mean square error using n samples is given

by

Suppose one more sample from the past is available to evaluate( i.e., are available). Proceeding as above

the new coefficients and the mean square error can be determined.From (16-59)-(16-61),

nε||/|| 1−nn TT .1+nX

1+nX

.0||

||1

2 >=−n

nn T

Tσ (16-61)

011 ,,,, XXXX nn −2

1+nσ

PILLAI

.|||| 12

1n

nn T

T ++ =σ (16-62)

)(zAn 1|| >z0|| >nT

29

Using another matrix identity it is easy to show that

Since we must have or for every n.From (16-63), we have

or

since Thus the mean square error decreases as moreand more samples are used from the past in the linear predictor.In general from (16-64), the mean square errors for the one-step predictor form a monotonic nonincreasing sequence

).||1( ||

|||| 21

1

2

1 +−

+ −= nn

nn s

TTT (16-63)

,0 || >kT 0)||1( 21 >− +ns 1 || 1 <+ns

)||1( ||

|||||| 2

11

1

221

+−

+ −=

+

nn

n

n

n sTT

TT

nn σσ

, )||1( 221

221 nnnn s σσσ <−= ++

(16-64)

PILLAI

.1)||1( 21 <− +ns

30

whose limiting value Clearly, corresponds to the irreducible error in linear

prediction using the entire past samples, and it is related to the powerspectrum of the underlying process through the relation

where represents the power spectrum of For any finite power process, we have

and since Thus

2221

2 ∞+ →>≥≥ σσσσ knn (16-65).02 ≥∞σ

)(nTX

).(nTX( ) 0XXS ω ≥

2

1exp ln ( ) 0.2 XXS d

π

πσ ω ω

π+

∞ −

= ≥ ∫ (16-66)

( ) ,XXS d

π

πω ω

+

−< ∞∫

PILLAI

02 ≥∞σ

ln ( ) ( ) .XX XXS d S d

π π

π πω ω ω ω

+ +

− −≤ < ∞∫ ∫

( ( ) 0), ln ( ) ( ).XX XX XXS S Sω ω ω≥ ≤

(16-67)

31

Moreover, if the power spectrum is strictly positive at everyFrequency, i.e.,

then from (16-66)

and hence

i.e., For processes that satisfy the strict positivity condition in (16-68) almost everywhere in the interval the final minimum mean square error is strictly positive (see (16-70)).i.e., Such processes are not completely predictable even using their entire set of past samples, or they are inherently stochastic,

( ) 0, in - ,XXS ω π ω π> < <

ln ( ) .XXS d

π

πω ω

+

−> −∞∫

2

1exp ln ( ) 02 XXS d eπ

πσ ω ω

π+ −∞

∞ −

= > = ∫

(16-68)

(16-69)

(16-70)

),,( ππ−

PILLAI

32

since the next output contains information that is not contained in the past samples. Such processes are known as regular stochasticprocesses, and their power spectrum is strictly positive.

)(ωXXS

ωππ−

Power Spectrum of a regular stochastic Process

PILLAI

Conversely, if a process has the following power spectrum,

such that in then from (16-70), ( ) 0XXS ω = 21 ωωω << .02 =∞σ

ω2ω1ωπ− π

)(ωXXS

33

Such processes are completely predictable from their past data samples. In particular

is completely predictable from its past samples, since consistsof line spectrum.

in (16-71) is a shape deterministic stochastic process.

∑ +=k

kkk tanTX )cos()( φω (16-71)

( )XXS ω

)(nTX

1ω kωω

)(ωXXS

PILLAI

16. Mean Square Estimation - Innovative · PDF file13 where are the optimum values from...

Documents

Transcript of 16. Mean Square Estimation - Innovative · PDF file13 where are the optimum values from...