Adaptive Beamforming Algorithms - Arraytool · PDF file ·...

Least Mean Squares Sample Matrix Inversion Recursive Least Squares Accelerated Gradient Approach Conjugate Gradient Method

Adaptive Beamforming Algorithms

S. R. Zinka

[email protected]

October 29, 2014

Adaptive Beamforming Algorithms CT531, DA-IICT

mailto:[email protected]


Outline

1 Least Mean Squares

2 Sample Matrix Inversion

3 Recursive Least Squares

4 Accelerated Gradient Approach

5 Conjugate Gradient Method



Gradient Descent Method

x0

x1

x2

x3

x4

*

*

xn+1 = xn − γ∇F (xn) (1)

In general, γ is allowed to change at every iteration. However, for this algorithm we use constant γ.



Gradient Descent Method - γ and Convergence



Least Mean Squares Algorithm

The squared error (between the reference signal and the array output) is given as

ε (t) = r (t)−wHx (t) . (2)

Taking the expected values of both sides of the equation (2) gives

ξ = E{|ε (t)|2

}= E {ε (t) ε∗ (t)}

= E{[

r (t)−wHx (t)] [

r (t)−wHx (t)]∗}

= E{[

r (t)−wHx (t)] [

r∗ (t)− xH (t)w]}

= E{|r (t)|2 + wHx (t) xH (t)w−wHx (t) r∗ (t)− r (t) xH (t)w

}= E

{|r (t)|2

}+ wHRw−wHE {x (t) r∗ (t)} − E

{r (t) xH (t)

}w

= E{|r (t)|2

}+ wHRw−wHz− zHw, (3)

where z = E {x (t) r∗ (t)} is known as signal correlation vector.



Least Mean Squares Algorithm ... Contd

The mean square error(

E{|ε (t)|2

})surface is a quadratic function of w and is minimized by

setting its gradient with respect to w∗ equal to zero, which gives

∇w∗E{|ε (t)|2

}∣∣∣wopt

= 0

⇒ Rwopt − z = 0

⇒ wMSE= R−1z. (4)

In general, we do not know the signal statistics and thus must resort to estimating the array correla-

tion matrix (R) and the signal correlation vector (z) over a range of snapshots or for each instant in

time.




The instantaneous estimates of R and z are as given below:

R (k) = x (k) xH (k) (5)

z (k) = r∗ (k) x (k) (6)




The method of steepest descent can be approximated in terms of the weights using the LMS methodas shown below:

w (k + 1) = w (k)− γ∇w∗ (ξ)

= w (k)− γ [Rw (k)− z]

≈ w (k)− γ[R (k + 1)w (k)− z (k + 1)

]= w (k)− γ

[x (k + 1) xH (k + 1)w (k)− r∗ (k + 1) x (k + 1)

]= w (k)− γ

{x (k + 1)

[xH (k + 1)w (k)− r∗ (k + 1)

]}= w (k)− γ

{x (k + 1)

[wH (k) x (k + 1)− r (k + 1)

]∗}= w (k) + γε∗ (k + 1) x (k + 1) (7)



LMS - Convergence of Weight VectorAccording to LMS algorithm,

w (k + 1) = w (k)− γ[x (k + 1) xH (k + 1)w (k)− r∗ (k + 1) x (k + 1)

].

Taking the expected value on both sides of the above equation

E [w (k + 1)] = E [w (k)]− γE[x (k + 1) xH (k + 1)w (k)

]+ γE [r∗ (k + 1) x (k + 1)]

w (k + 1) = w (k)− γRw (k) + γz

= [I− γR] w (k) + γz. (8)

Define a mean error vector v asv (k) = w (k)−wMSE (9)

where wMSE = R−1z. Then,

v (k + 1) = w (k + 1)−wMSE

= [I− γR] w (k) + γz−wMSE

= [I− γR] [v (k) + wMSE] + γz−wMSE

= [I− γR] v (k) + [I− γR]wMSE + γIz− IwMSE︸︷︷︸0

= [I− γR] v (k). (10)



LMS - Convergence of Weight Vector ... Contd

From equation (10),v (k + 1) = [I− γR]k+1 v (0). (11)

Using eigenvalue decomposition, the above equation an be written as

v (k + 1) =[I− γQΛQH]k+1

v (0) , (12)

where Λ is a diagonal matrix of the eigenvalues and QQH = I.

The above equation can further be written as

v (k + 1) = Q [I− γΛ]k+1 QH v (0). (13)

For convergence, as the iteration number increases, each diagonal element of the matrix[I− γΛ]k+1should diminish, i.e.,

|1− γλi| < 1, ∀i. (14)



LMS - Convergence of Weight Vector ... Contd

From the previous slide, step size γ should be (for all λi values)

|1− γλi| < 1

⇒ −1 < (1− γλi)< 1

⇒ −2 < (−γλi)< 0

⇒ 2 > γλi> 0

⇒ 0 < γλi< 2

⇒ 0 < γ <2λi

⇒ 0 < γ <2

λmax. (15)

As the sum of all eigenvalues of R equals its trace, the sum of its diagonal elements, the gradientstep size γ can be selected in terms of measurable quantities using

γ <2

trace (R). (16)



Drawbacks

• One of the drawbacks of the LMS adaptive scheme is that the algorithm must go throughmany iterations before satisfactory convergence is achieved.

• The rate of convergence of the weights is dictated by the eigenvalue spread of R.



Outline








Sample / Direct Matrix Inversion

The sample matrix is a time average estimate of the array correlation, matrix using K time samples.For MMSE optimal criteria, optimum weight vector is given as

wMSE = R−1z, (17)

where R = E[xxH

]and z = E [r∗x].

We can estimate the correlation matrix by calculating the time average such that

R ≈ R =1K

N2

∑i=N1

x (i) xH (i) , (18)

where N2 −N1 = K is the observation interval. Similarly, signal correlation vector can be estimatedas

z ≈ z =1K

N2

∑i=N1

r∗ (i) x (i) . (19)

Since we use a K length block of data, this method is called a block adaptive approach. We are thus

adapting the weights block-by-block.



Calculation of Estimations of R and z

Let’s define a matrix XK as shown below:

XK =

x1 (N1) x1 (N1 + 1) · · · x1 (N1 + K)

x2 (N1) x2 (N1 + 1)...

......

. . ....

xL (N1) . . . xL (N1 + K)

(20)

Then the estimate of the array correlation matrix is given by

R =1K

XKXHK . (21)

Similarly, the estimate of the correlation vector is given by

z =1K

rHXK , (22)

where r = [ r (N1) r (N1 + 1) · · · r (N1 + K) ].



SMI Weights

wMSE = R−1z

=[XKXH

K]−1 [

rHXK]

(23)



SMI - Drawbacks

• The correlation matrix may be ill conditioned resulting in errors or singularities wheninverted

• Computationally intensive because for large arrays, there is the challenge of inverting large

matrices



Outline








Recursive Least Squares

Following a similar estimation method used in SMI algorithm, one can estimate R and z as shownbelow:

R ≈ R (k) =k

∑i=1

x (i) xH (i) , (24)

z ≈ z (k) =k

∑i=1

r∗ (i) x (i) . (25)

where k is the block length.



Recursive Least Squares ... Weighted Estimation

Both (24) and (25) use rectangular windows, thus they equally consider all previous time samples.Since the signal sources can change or slowly move with time, we might want to deemphasize theearliest data samples and emphasize the most recent ones as shown below:

R (k) =k

∑i=1

αk−ix (i) xH (i) , (26)

z (k) =k

∑i=1

αk−ir∗ (i) x (i) . (27)

where α is the forgetting factor such that 0 ≤ α ≤ 1.



Recursive Least Squares ... Recursive Formulation

R (k) =k

∑i=1

αk−ix (i) xH (i)

= x (k) xH (k) +k−1

∑i=1

αk−ix (i) xH (i)

= x (k) xH (k) + αk−1

∑i=1

αk−1−ix (i) xH (i)

= αR (k− 1) + x (k) xH (k)

z (k) =k

∑i=1

αk−ir∗ (i) x (i)

= αz (k− 1) + r∗ (k) x (k)



Recursive Least Squares ... Recursive Formulation ...Contd

R (k) = αR (k− 1) + x (k) xH (k)

⇒ R−1 (k) =[αR (k− 1) + x (k) xH (k)

]−1

= α−1R−1 (k− 1)− α−1R−1 (k− 1) x (k) xH (k) α−1R−1 (k− 1)1 + xH (k) α−1R−1 (k− 1) x (k)

= α−1R−1 (k− 1)− α−2R−1 (k− 1) x (k) xH (k) R−1 (k− 1)1 + α−1xH (k) R−1 (k− 1) x (k)

= α−1R−1 (k− 1)− α−1R−1 (k− 1) x (k)1 + α−1xH (k) R−1 (k− 1) x (k)

α−1xH (k) R−1 (k− 1)

= α−1R−1 (k− 1)− α−1g (k) xH (k) R−1 (k− 1)

where g (k) = α−1R−1(k−1)x(k)1+α−1xH (k)R−1(k−1)x(k)

. We can also prove that

g (k) =[α−1R−1 (k− 1)− α−1g (k) xH (k) R−1 (k− 1)

]x (k) = R−1 (k) x (k) .



Recursive Least Squares ... Recursive Formulation ...Contd

w (k) = R−1 (k) z (k)

= R−1 (k) [αz (k− 1) + r∗ (k) x (k)]

= αR−1 (k) z (k− 1) + r∗ (k) R−1 (k) x (k)

= α[α−1R−1 (k− 1)− α−1g (k) xH (k) R−1 (k− 1)

]z (k− 1) + r∗ (k) R−1 (k) x (k)

=[R−1 (k− 1)− g (k) xH (k) R−1 (k− 1)

]z (k− 1) + r∗ (k) R−1 (k) x (k)

= R−1 (k− 1) z (k− 1)− g (k) xH (k) R−1 (k− 1) z (k− 1) + r∗ (k) R−1 (k) x (k)

= w (k− 1)− g (k) xH (k)w (k− 1) + r∗ (k) g (k)

= w (k− 1) + g (k)[r∗ (k)− xH (k)w (k− 1)

]= w (k− 1) + g (k) ε∗ (k + 1) (28)



Outline








Quadratic Form

F (x) = xHAx− bHx− xHb + c

∇x∗F (x) = Ax− b



The Method of Steepest Descent

x0

x1

x2

x3

x4

*

*

xn+1 = xn − γ∇F (xn) (29)

Unlike LMS method, in AGM, that the value of γ is allowed to change at every iteration.



What is the optimum γ value?




.........

-4 -2 2 4 6

-6

-4

-2

2

4

✆ ♠

0 ♥

♣

♠

0 ♥

✆

-2.50

2.55

-5-2.50

2.5050100150

-2.50

2.55

✆

1

✆

2

❀

✳

✆▼ ✶

✆

1




0.2 0.4 0.6

20406080100120140

②

∂

∂γ{ F [xn − γ∇F (xn)]} = 0

i.e.,

∂

∂γ{ F [xn − γ [Axn − b]]} = 0




F [xn+1] = [xn − γ [Axn − b]]H A [xn − γ [Axn − b]]− bH [xn − γ [Axn − b]]− [xn − γ [Axn − b]]H b

So,

dFdγ

= −xHn A [Axn − b]− [Axn − b]H Axn + 2γ [Axn − b]H A [Axn − b] + bH [Axn − b] + [Axn − b]H b

= −xHn A [Axn − b] + bH [Axn − b]− [Axn − b]H Axn + [Axn − b]H b + 2γ [Axn − b]H A [Axn − b]

=[−xH

n A + bH] [Axn − b]− [Axn − b]H [Axn − b] + 2γ [Axn − b]H A [Axn − b]

= − [Axn − b]H [Axn − b]− [Axn − b]H [Axn − b] + 2γ [Axn − b]H A [Axn − b]

= −2 [Axn − b]H [Axn − b] + 2γ [Axn − b]H A [Axn − b] .

Since dFdγ = 0,

γ =[Axn − b]H [Axn − b]

[Axn − b]H A [Axn − b].



Consecutive Orthogonal Paths

-4 -2 2 4 6

-6

-4

-2

2

4

✆ ♠

1

(Axn − b)H (Axn+1 − b) = 0

⇒ (Axn − b)H {A [xn − γ (Axn − b)]− b} = 0

⇒ (Axn − b)H {[Axn − γA (Axn − b)]− b} = 0

⇒ (Axn − b)H {[(Axn − b)− γA (Axn − b)]} = 0

⇒{[

(Axn − b)H (Axn − b)− γ (Axn − b)H A (Axn − b)]}

= 0

⇒ γ =(Axn − b)H (Axn − b)

(Axn − b)H A (Axn − b).



So, Accelerated Gradient Method is ...

If we approximate array correlation matrix and signal correlation vectors as R ≈ R =x (k + 1) xH (k + 1) and z ≈ z = r∗ (k + 1) x (k + 1),

w (k + 1) = w (k)− γ∇w∗ (ξ)

= w (k)− γ [Rw (k)− z]

= w (k)−(

[Rw (k)− z]H [Rw (k)− z]

[Rw (k)− z]H R [Rw (k)− z]

)[Rw (k)− z]

≈ w (k)−( [

Rw (k)− z]H [Rw (k)− z

][Rw (k)− z

]H R[Rw (k)− z

]) [

Rw (k)− z]

= w (k) + ε∗ (k + 1)(

xH (k + 1) x (k + 1)xH (k + 1) x (k + 1) xH (k + 1) x (k + 1)

)x (k + 1)

= w (k) +[

ε∗ (k + 1)xH (k + 1) x (k + 1)

]x (k + 1), (30)

because [Rw (k)− z] = −ε∗ (k + 1) x (k + 1).



Outline








Steepest Descent vs Conjugate Gradient

-4 -2 2 4 6

-6

-4

-2

2

4

-4 -2 2 4 6

-6

-4

-2

2

4



Before proceeding further, let’s recap Gram-Schmidt method ...



Gram-Schmidt Conjugation

d

du

u

u

u

+

*

d0

1

(0)(0)

(1)

d(0) = u0

d(1) = u1 − β10d(0)

d(2) = u2− β20d(0)− β21d(1)

d(i) = ui −i−1

∑j=0

βijd(j)

βij =dH(j)Aui

dH(j)Ad(j)



Now, let’s enter into Conjugate Gradient method ...



Problem Statement

Let’s consider a convex cost function as shown below:

F (x) = xHAx− bHx− xHb + c (31)

Let’s say that the optimal solution which minimizes the above cost function is x∗ . Then,

Ax∗ − b = 0. (32)

So, solving the cost function for it’s stationary point is equivalent to solving the above system of

linear equations.



Conjugate Gradient Method



Expressing Error in terms of Search Vectors

e0 = x∗ − x0 = α1p1 + α2p2 + α3p3 + · · ·+ αnpn

e1 = x∗ − x1 = α2p2 + α3p3 + · · ·+ αnpn

...

...

en−1 = x∗ − xn−1 = αnpn

If {pk} is a simple orthogonal set,

αk =pH

k ek−1

pHk pk

. (33)

Also, it can be noted that ek is orthogonal to pk , pk−1, pk−2, etc. However, we can’t use this approach

because we don’t know the value of x∗ . If we knew x∗ in the beginning itself, then we wouldn’t

worry about solving the optimization problem at all. So, we can’t use this approach.



Expressing Residual in terms of Search Vectors

r0 = b−Ax0 = α1Ap1 + α2Ap2 + α3Ap3 + · · ·+ αnApn

r1 = b−Ax1 = α2Ap2 + α3Ap3 + · · ·+ αnApn

...

...

rn−1 = b−Axn−1 = αnApn

If {pk}A is an orthogonal (conjugate) set with respect to matrix A,

αk =pH

k rk−1

pHk Apk

. (34)

Instead of ek, in this case, rk is orthogonal to pk , pk−1, pk−2, etc.



How to Generate the Orthogonal Set

Till now, we assumed that we knew the conjugate (i.e., A orthogonal) set {pk}A. There are several

ways to generate this set, both iterative and non-iterative. We will choose Gram-Schmidt iterative

process to generate this set.



Gram-Schmidt Conjugation

It can be easily shown that r0, r1, r2, · · · , rn−1 are independent of each other. So, we can use theseresiduals to construct an A-orthogonal search set {p1, p2, p3, · · · , pn}A. So,

pk+1 = rk −k

∑j=1

βkjpj (35)

and

βkj =pH

j Ark

pHj Apj

. (36)



So, the Basic Algorithm is ...

• Choose a starting position x0

• Calculate the residual at x0 and assign this residual vector to p1, i.e., p1 = r0.• Now, calculate the value of α1 using p1 and r0 according to the equation (34)• Move to the next position according to the equation xk = xk−1 + αkpk

• Once you have reached to the next position, calculate the next search direction using rk and{p1, p2, · · · , pk} according to the equation (36)

• Repeat this process until you reach x∗



Let’s further refine conjugate gradient algorithm ...



The Residual is Orthogonal to all Previous ResidualsSearch directions are dictated by the following equation:

pk+1 = rk −k

∑j=1

βkjpj

By taking Hermitian on both sides, we get

pHk+1 = rH

k −k

∑j=1

β∗kjpHj .

Multiply the above equation on both sides by rk+1 to get

pHk+1rk+1 = rH

k rk+1 −k

∑j=1

β∗kjpHj rk+1.

So,

0 = rHk rk+1 − 0⇒ rH

k rk+1 = 0 (37)



Writing αk in terms of ResidualsFrom (34),

αk =pH

k rk−1

pHk Apk

.

Let’s re-formulate the above equation in terms of residuals as shown below:

rHk rk−1 = 0

⇒ (rk−1 − αkApk)H rk−1 = 0

⇒(rH

k−1 − α∗k pHk AH) rk−1 = 0

⇒ rHk−1rk−1 − α∗k pH

k AHrk−1 = 0

⇒ α∗k=rH

k−1rk−1

pHk AHrk−1

⇒ αk =rH

k−1rk−1

rHk−1Apk

⇒ αk =rH

k−1rk−1(pk + ∑k−1

j=1 βk−1jpj

)HApk

⇒ αk=rH

k−1rk−1

pHk Apk

(38)



Writing βkj in terms of ResidualsWe know that

rj = rj−1 − αjApj.

So,

rHj = rH

j−1 − α∗j pHj AH .

Multiply the above equation on both sides by rk . Then we get

rHj rk = rH

j−1rk − α∗j pHj AHrk.

When A is Hermitian

rHj rk = rH

j−1rk − α∗j pHj Ark

⇒ rHj rk = rH

j−1rk − α∗j βkjpHj Apj

⇒ rHj rk = rH

j−1rk − βkjrHj−1rj−1

⇒ βkj =rH

j−1rk − rHj rk

rHj−1rj−1

.



Writing βkj in terms of Residuals ... Contd

Since j ≤ k while calculating βkj values the term rHj−1rk is always zero. So, βkj = 0 when j < k and

βkk = −rH

k rk

rHk−1rk−1

. (39)

So, finally

pk+1 = rk − βkkpk . (40)



Basic Version vs Refined Version

αk =pH

k rk−1

pHk Apk

pk+1 = rk −k

∑j=1

βkjpj

βkj =pH

j Ark

pHj Apj

αk =rH

k−1rk−1

pHk Apk

pk+1 = rk − βkkpk

βkk = −rH

k rk

rHk−1rk−1



If A is not Hermitian ?

Actual problem is solvingAx∗ = b.

If A is not Hermitian, then we can reformulate the above problem as shown below:

AHAx∗ = AHb

⇒ A1x∗ = b1 (41)

where A1 = AHA and b1 = AHb. Now, we can see that conjugate gradient method can be appliedto (41). Also, for this new problem, rnew

k = b1 −A1xk = AH (b−A1xk) = AHrk . So, algorithm is

αk =

(AHrk−1

)H (AHrk−1)

pHk AHApk

pk+1 = rnewk − βkkpk = AHrk − βkkpk

βkk = −(AHrk

)H (AHrk)

(AHrk−1)H (AHrk−1)



So, Adaptive Algorithm using CGM is ...

w (k + 1) = w (k) + αk+1pk+1

pk+1 = RHrk − βkkpk

where

αk =

(RHrk−1

)H (RHrk−1)

pHk RHRpk

βkk = −(RHrk

)H (RHrk)

(RHrk−1)H (RHrk−1)

rk = z−Rw (k)

≈ z− Rw (k)

= ε∗ (k + 1) x (k + 1)


Adaptive Beamforming Algorithms - Arraytool · PDF file ·...

Documents

Transcript of Adaptive Beamforming Algorithms - Arraytool · PDF file ·...