8/19/2019 Pdf8 Steepest Descent
1/60
8/19/2019 Pdf8 Steepest Descent
2/60
2
• { x(n)} are the WSS input samples
•
{d (n)} is the WSS desired output• )}(ˆ{ nd is the estimate of the desired
signal given by)()()(ˆ nnnd H xw=
whereT M n xn xn xn
)]1(),...,1(),([)( +−−=x
andT
M nwnwnwn )](),...,(),([)( 110 −=w is thefilter weight vector at time n.
8/19/2019 Pdf8 Steepest Descent
3/60
3
Then
)()()(
)(ˆ)()(
nnnd
nd nd ne H xw−=
−=
Thus the MSE of time n is
)()()()(
}|)({|)(
2
2
nnnn
ne E n J
H H H d Rwwwppw +−−=
=σ
where2d σ - variance of desired signal
p – Cross-correlation between x(n) and d (n)
R –correlation matrix of x(n)
When w(n) is set to the (optimal) Wiener
solution, then
pRww1
0)( −
==nand
02
min)( wp H
d J n J −== σ
8/19/2019 Pdf8 Steepest Descent
4/60
4
Hence, in order to iteratively find w0, we usethe method of steepest descent. To illustrate
this concept, let M = 2, in the 2-D spacedw(n), the MSE forms a Bowl-shapedfunction.
A contour of the MSE is given as
Thus, if we are at a specific point in theBowl, we can imagine dropping a marble. Itwould reach the minimum. By goingthrough the path of steepest descent.
w0w
J w
J w
w1
w2
w0
w2
w1
)( J ∇−
2w J
∂∂−
1w J
∂∂−
8/19/2019 Pdf8 Steepest Descent
5/60
5
Hence the direction in which we change thefilter direction is )(n J ∇− , or
)]([21)()1( n J nn −∇+=+ µ ww
or, since )(22))(( nn J Rwp +−=∇)]([)()1( nnn Rwpww −+=+
for n=0, 1, 2, … and where µ is called the
stepsize and
)generalin( )0( 0w =
Stability: Since the SD method usesfeedback, the system can go unstable
• bounds on the step size guaranteeingstability can be determined with respect
ot the eigenvalues of R (widrow, 1970)
8/19/2019 Pdf8 Steepest Descent
6/60
6
Define the error vector for the tap weights as
0)()( wwc −= nn
Then using 0Rwp = in the update,
)()(
)]([)(
)]([)()1(
0
nn
nn
nnn
Rcw
RwRww
Rwpww
µ
µ
−=
−+=−+=+
and
)(][)()()1(
)()()1( 00
nnnn
or
nnn
cRIRccc
Rcwwww
µ µ
µ
−=−=+
−−=−+
8/19/2019 Pdf8 Steepest Descent
7/60
7
Using the Unitary Similarity Transform H QQR =
we have
)(][)1( nn H cQQIc µ −=+
Premultiplying by Q H gives
)(][
)(][)1(
n
nn
H
H H H H
cQI
cQQQQcQ
µ
µ
−=
−=+
Define the transformed coefficients as
))((
)()(
0wwQ
cQv
−==
n
nn H
H
8/19/2019 Pdf8 Steepest Descent
8/60
8
Then
)(][)1( nn vIv −=+with initial condition
00 ))0(()0( wQwwQv H H −=−=
if 0w =)0(The k th term in v(n+1) (mode) is given by
M k nvnv k k k ,...,2,1)()1()1( =−=+ λ
or using the recursion
)0()1()( k n
k k vnv µλ −=Thus for all 0)(lim =∞→ nvk n we must have
1|1|
8/19/2019 Pdf8 Steepest Descent
9/60
9
The k th mode has geometric decay
)0()1()( k n
k k vnv µλ −=we can characterize the rate of decay by
finding the time it takes to decay to e-1 of the
initiative. Thus
11
)1ln(1
)0()0()1()( 1
8/19/2019 Pdf8 Steepest Descent
10/60
10
Recall that
∑
∑
=
=
−+=
+=
+=
−−+=−−+=
M
k k
n
k k
M
k k k
H
H H
H
v J
nv J
nn J
nn J
nn J n J
1
22
min
1
2min
min
00min
00min
|)0(|)1(
|)(|
)()(
))(())((
))(())(()(
µλ λ
λ
v
wwQQww
wwRww
Thus min)(lim J n J n =∞→
8/19/2019 Pdf8 Steepest Descent
11/60
11
Example: Consider a two-tap predictor
Consider the effects of the following cases
• Varying the eigenvalue spread
min
max)(λ
χ =R and keeping µ fixed.
• Varying µ and keeping the eigenvalue
spread )(R fixed
8/19/2019 Pdf8 Steepest Descent
12/60
12
8/19/2019 Pdf8 Steepest Descent
13/60
13
8/19/2019 Pdf8 Steepest Descent
14/60
14
8/19/2019 Pdf8 Steepest Descent
15/60
15
8/19/2019 Pdf8 Steepest Descent
16/60
16
8/19/2019 Pdf8 Steepest Descent
17/60
17
8/19/2019 Pdf8 Steepest Descent
18/60
18
8/19/2019 Pdf8 Steepest Descent
19/60
19
Example: Consider the system identification
problem
For M =2 suppose
==5.0
8.0
18.0
8.01PR x
w(n) system
{ x(n)}
)(ˆ nd d (n)+_
e(n)
8/19/2019 Pdf8 Steepest Descent
20/60
20
From eigen analysis we have
λ 1 = 1.8, λ 2 = 0.2 and 8.12
< µ also
−== 11
2
11
1
2
121 qq
and
−= 1111
2
1Q
Also,
−== −
389.011.11
0 pRw
Thus])([)( 0wwQv −= nn H
Noting that=−−−=−= 06.1
51.0
389.0
11.1
11
11
2
1)0( 0wQv
H
8/19/2019 Pdf8 Steepest Descent
21/60
21
and51.0))8.1(1()(1
nnv µ −=
06.1))2.0(1()(2nnv µ −=
8/19/2019 Pdf8 Steepest Descent
22/60
22
The Least Mean Square (LMS) Algorithm
The error performance surface used by theSD method is not always known a priori.We can use estimated values. The estimatesare RVs and thus this leads to a stochasticapproach.
We will use the following instantaneousestimates
)()()(ˆ nnn H xxR =)()()(ˆ nd nn ∗=xP
8/19/2019 Pdf8 Steepest Descent
23/60
23
Recall the SD update
[ ]))((2
1)()1( n J nn ∇−+=+ µ ww
where the gradient of the error surface atw(n) was shown to be
)(22))(( nn J Rwp +−=∇Using the instantaneous estimates,
)()(2)](ˆ)()[(2
)]()()()[(2
)()()(2)()(2))((ˆ
nen
nd nd n
nnnd n
nnnnd nn J H
H
∗
∗∗
∗
∗
−=−−=
−−=+−=∇
x
x
wxx
wxxx
Complex conjugate of estimate error
8/19/2019 Pdf8 Steepest Descent
24/60
24
Putting this in the update
)()()()1( nennn ∗+=+ xww µ Thus LMS algorithm belongs to the family
of stochastic gradient algorithms.
The update is extremely simple while theinstantaneous estimates may have large
variance, the LMS algorithm is recursive
and effectively averages these estimates.
The simplicity and good performance of the
LMS algorithm make it the benchmark
against which other optimization algorithms
are judged.
8/19/2019 Pdf8 Steepest Descent
25/60
25
The LMS algorithm can be analyzed by
invoking the independence theory, which
states
1) The vectors x(1), x(2), …, x(n) are
statistically independent.
2)
x(n) is independent of d (1), d (2), …,d (n-1)
3) d (n) is statistically dependent on x(n),
but is independent of d (1), d (2), …, d (n-1)
4) x(n) and d (n) are mutually Gaussian.The independence theorem is justified in
some cases, e.g. beamforming where we
receive independent vector observations. In
other cases it is not well justified, but allows
the analysis to proceeds.
8/19/2019 Pdf8 Steepest Descent
26/60
26
Using the independence theory we can show
that w(n) converges to the optimal solution
in the mean
0)}({lim ww =∞→ n E nIn certain cases, to show this, evaluate the
update
)()()()1( nennn ∗+=+ xww µ
)()()()1( 00 nennn ∗+−=−+ xwwww µ
])()()[(
)()]()([)()()()()(
)()()(
])()[()(
)()()())()()()(()()1(
0
0
00
wxx
cxxIwxxcxx
xc
wwwxx
xc
wxxcc
nnd n
nnnnnnnn
nd nn
nnn
nd nn
nnnd nnn
H
H
H H
H
H
−+−= −−
+=+−−
+= −+=+
∗
∗
∗
∗
µ
µ µ µ
µ
µ
µ µ
8/19/2019 Pdf8 Steepest Descent
27/60
27
Note that since w(n) is based on past inputs
desired responses, w(n) (and c(n)) is
independent of x(n)
Thus
)}({)()}1({
)}()({)}({)()}1({
)()()()]()([)1(
why?zero,isThis
0
0
n E n E
nen E n E n E
nennnnn H
cRIc
xcRIc
xcxxIc
µ
µ µ
µ µ
−=+
+−=+⇓
+−=+
∗
∗
Using arguments similar to the SD case we
have
max
20if 0)}({lim
λ µ
8/19/2019 Pdf8 Steepest Descent
28/60
28
Noting that2
max )0(][ x N Nr trace σ λ ==≤ Ra more conservative bound is
2
20
x N σ µ
8/19/2019 Pdf8 Steepest Descent
29/60
29
An equivalent condition is to show that
constant}|)({|lim)(lim 2 ==∞→∞→
ne E n J nn
write e(n) as
)()()(
)()()()(
)()()()(ˆ)()(
0
0
nnne
nnnnd
nnnd nd nd ne
H
H H
H
xc
xcxw
xw
−=
−−=−=−=
Thus
)(
)}()()()({
))}()()())(()()({(
}|)({|)(
min
)(
min
00
2
n J J
nnnn E J
nnnennne E
ne E n J
ex
n J
H H
H H
ex
+=
+=−−=
=∗
cxxc
cxxc
8/19/2019 Pdf8 Steepest Descent
30/60
30
Since J ex(n) is a scalar
)}]()()()({[
)]}()()()([{
)]}()()()([{
)}()()()({)(
nnnn E trace
nnnntrace E
nnnntrace E
nnnn E n J
H H
H H
H H
H H ex
ccxx
ccxx
cxxc
cxxc
====
Invoking the independence theorem
)]([
)}]()({)}()({[)(
ntrace
nn E nn E tracen J H H exRK
ccxx
==
where
)}()({)( nn E n H
ccK =
8/19/2019 Pdf8 Steepest Descent
31/60
31
Thus
)]([
)()(
min
min
ntrace J
n J J n J exRK+=
+=
Recall H H or QQRRQQ ==
Let
)()( nn H SQKQ∆=
where S(n) need not be diagonal. Then H
nn QQSK )()( = and
)]([
)]([
])([
])([
)]([)(
ntrace
ntrace
ntrace
ntrace
ntracen J
H
H
H H
ex
SQQ
QSQ
QQSQQ
RK
=
=
=
==
8/19/2019 Pdf8 Steepest Descent
32/60
32
Since Ω is diagonal
∑===
M
iiiex nsntracen J
1
)()]([)( λ
where s1(n), s2(n), …, s M (n) are the diagonal
elements of S(n).
The recursion expression can be modified to
yield a recursion on S(n), which is
ISIS min2))(()()1( J nn µ µ µ +−−=+
which for the diagonal elements is
M i J nsns iiii ,..,2,1)()1()1( min22 =+−=+ λ µ µλ
Suppose J ex(n) converges, then
)()1( nsns ii =+ and from the above
M i J
J J
ns
i
ii
i
i
i
i
,...,2,12
2)1(1)(
min
22
min2
2
min2
=−=−=−−=
µλ µ
λ µ µλ λ µ
µλ λ µ
8/19/2019 Pdf8 Steepest Descent
33/60
33
Utilizing
∑=== M
iiiex nsntracen J 1 )()]([)( λ
we see
∑=∞→ −
= M
i i
iex
n J n J
1min 2
)(lim µλ
µλ
The LMS misadjustment is defined
∑=
∞→
−== M
i i
iex
n
J
n J M
1min 2
)(lim
µλ µλ
A misadjustment at 10% or less is generally
considered acceptable.
8/19/2019 Pdf8 Steepest Descent
34/60
34
Example: one tap predictor of order one AR
process. Let
)()1()( nvnaxn x +−−=and use a one tap predictor.
The weight update is
)]1()()()[1()(
)()1()()1(
−−−+=−+=+
n xnwn xn xnw
nen xnwnw
µ
Note aw −=0 consider two cases and set05.0= a
2 xσ
-0.99 0.936270.99 0.995
8/19/2019 Pdf8 Steepest Descent
35/60
35
8/19/2019 Pdf8 Steepest Descent
36/60
36
8/19/2019 Pdf8 Steepest Descent
37/60
37
8/19/2019 Pdf8 Steepest Descent
38/60
38
Consider the expected trajectory of w(n).
Recall
)()1()()]1()1(1[
)]1()()()[1()(
)()1()()1(
n xn xnwn xn x
n xnwn xn xnw
nen xnwnw
−+−−−=−−−+=
−+=+
µ µ µ
Since)()1()( nvnaxn x +−−=
)()1()1()1(
)()]1()1(1[
)]()1()[1(
)()]1()1(1[)1(
nvn xn xnax
nwn xn x
nvnaxn x
nwn xn xnw
−+−−−−−−=
+−−−+−−−=+
µ µ µ
µ
Taking the expectation and invoking the
dependence theorem
anw E nw E x x22 )}({)1()}1({ µσ µσ −−=+
8/19/2019 Pdf8 Steepest Descent
39/60
39
8/19/2019 Pdf8 Steepest Descent
40/60
40
We can also derive a theoretical expression
for J (n).
Note that the initial value of J (n) is2)0( x J σ =
and the final value is
1
1min2min 2)( µλ λ
σ −+=+=∞ J J J J vexif µ small
+=
+=∞
21
2)(
22
222 x
v x
vv J µσ
σ µσ
σ σ
Also, the time constant is
)
2
1()1)](
2
1([)(
21
)1ln(21
)1ln(21
2222222
221
xvn
x xv x
x x
n J σ µ
σ µσ σ µ
σ σ
µσ µσ µλ τ
++−+−=
≈−−=−−=
8/19/2019 Pdf8 Steepest Descent
41/60
41
8/19/2019 Pdf8 Steepest Descent
42/60
42
Example: Adaptive equalization
Goal: Pass known signal through unknown
channel to invert effects of channel and
noise on signal.
8/19/2019 Pdf8 Steepest Descent
43/60
43
The signal is a Bernouli sequence
−
+=1/2yprobabilitwith1
1/2yprobabilitwith1n x
The channel has a raised cosine response
=−+=otherwise 0
3,2,1n))]2(2
cos(1[21
nwhn
π
Note that w controls the eigenvalue spread
)(R .
Also the additive noise is ∼ N (0, 0.001)
Note that hn is symmetric about n=2 and
thus introduces a delay of 2. We will use an
M =11 tap filter, which will be symmetric
about n=5 and introduce a delay of 5.
Thus an overall delay of δ =5+2=7 is addedto the system.
8/19/2019 Pdf8 Steepest Descent
44/60
44
Channel response and Filter response
Consider three w values
Note step size is bound by w=3.5 case
14.0)3022.1(11
2)0(
2 ==≤ Nr
µ
Choose µ =0.075 in all cases.
8/19/2019 Pdf8 Steepest Descent
45/60
45
8/19/2019 Pdf8 Steepest Descent
46/60
46
8/19/2019 Pdf8 Steepest Descent
47/60
47
8/19/2019 Pdf8 Steepest Descent
48/60
48
Example: Directionality of the LMS
algorithm
• The speed of convergence of the LMSalgorithm is faster in certain directions in
the weight space.
•
If the convergence is in the appropriatedirection, the convergence can be
accelerated by increased eigenvalue
spread.
Consider the deterministic signal
)cos()cos()( 2211 n An An x +=with
++++
= 22
212
221
21
2
2
21
2
1
2
2
2
1)cos()cos(
)cos()cos(21
A A A A
A A A A
ω ω ω ω
R
8/19/2019 Pdf8 Steepest Descent
49/60
49
which gives
))cos(1(21
))cos(1(21
))cos(1(
2
1))cos(1(
2
1
2221
212
2221
211
ω ω λ
ω ω λ
−+−=
+++=
A A
A A
and−==1
1
1
121 qq
Consider two cases:
9.12)( with)23.0cos(5.0)6.0cos()(
9.2)( with)1.0cos(5.0)2.1cos()(
=+==+=
R
R
χ nnn xnnn x
b
a
In each case let
==⇒=⇒=1
11011011 qwqRwqp λ λ and
−==⇒=⇒=1
12022022 qwqRwqp λ λ
Look at 200 iterations of the algorithm.
Look at minimum eigenfilter, first
Then maximum eigenfilter,
−==1
120 qw
== 1110 qw
8/19/2019 Pdf8 Steepest Descent
50/60
50
8/19/2019 Pdf8 Steepest Descent
51/60
8/19/2019 Pdf8 Steepest Descent
52/60
52
Normalized LMS Algorithm
In the standard LMS algorithm the
correction is proportional to )()( nen ∗x µ
)()()()1( nennn ∗+=+ xww µ If x(n) is large, the update suffers from
gradient noise amplification. The
normalized LMS algorithm seeks to avoid
gradient noise amplification
• The step size is made time varying, µ (n),and optimized to minimize error.
8/19/2019 Pdf8 Steepest Descent
53/60
53
Thus let
)]()[()()]()[(2
1)()1(
nnnnnnn
Rwpwww
−+=−∇+=+
µ µ
Choose µ (n), such that the updated w(n+1)
produces the minimum MSE,
}|)1({|)1( 2+=+ ne E n J where
)1()1()1()1( ++−+=+ nnnd ne H xwThus we choose µ (n) such that it minimizes
J (n+1).
The optimal step size, µ 0(n), will be a
function of R and ∇(n). As before, we useinstantaneous estimates of these values.
8/19/2019 Pdf8 Steepest Descent
54/60
54
To determine µ 0(n), expand J(n+1)
)1()1(
)1()1(
))}1()1()1((
))1()1()1({(
)}1()1({)1(
2
+++
+−+−=++−+
++−+=++=+
∗
∗
nn
nn
nnnd
nnnd E
nene E n J
H
H H d
H
H
Rww
wppw
wx
xw
σ
Now use the fact that )()(21
)()1( nnnn ∇−=+ µ ww
)()()(41
)()()(21
)()()(21
)()(
2
2
)()(21
)()()(21
)(
)()(21
)(
)()(
2
1)()1(
nnnnnn
nnnnn
H
H
H
d
H H
H H
nnnnnn
nnn
nnnn J
∇∇+∇−
∇−=
∇−∇−+
∇−−
∇−−=+
RRw
RwRww
wRw
wp
pw
µ µ
µ
µ µ
µ
µ σ
8/19/2019 Pdf8 Steepest Descent
55/60
55
)()()(41
)()()(21
)()()(21
)()(
)()(21)(
)()(21
)()1(
2
2
nnnnnn
nnnnn
nnn
nnnn J
H H
H H
H
H
d
∇∇+∇−
∇−+
∇−−
∇−−=+
RRw
RwRww
wp
pw
µ µ
µ
µ
µ σ
Differentiating with respect to µ (n),
)()()(
2
1)()(
2
1
)()(21
)(21
)(21
)()1(
nnnnn
nnnnn
n J
H H
H H H
∇∇+∇−
∇−∇+∇=∂+∂
RRw
Rwpp
µ
µ
Setting equal to 0
pRw
pRwR
)()()(
)()()()()()(0nnn
nnnnnn H H
H H H
∇−∇+∇−∇=∇∇ µ
)()(])()[()(])([
)()(])()[()(])([)(0
nnnnnn
nnnnnnn
H
H H
H
H H H
∇∇
−∇+∇−=
∇∇−∇+∇−=
RpRwpRw
RpRwpRw µ
8/19/2019 Pdf8 Steepest Descent
56/60
56
)()()()(
)()(
)()(21
)()(21
)(0
nnnn
nn
nnnnn
H
H
H
H H
∇∇
∇∇=∇∇
∇∇+∇∇=
R
R µ
Using instantaneous estimates
)()(2
))]()(ˆ)(([2
)]()()()()([2)(ˆ)()(
ˆ
nen
nd nd n
nd nnnnnnn
H
H
∗
∗∗
∗
−=−=
−=∇ =
x
x
xwxxxxR
Thus
2
22
2
0
||)(||1
)()(1
))()((|)(|)()(|)(|
)()(2)()()()(2)()()()(4
)(
nnn
nnnennne
nennnnennennen
n
H
H
H
H H
H
xxx
xxxx
xxxxxx
==
=
=∗
∗
µ
8/19/2019 Pdf8 Steepest Descent
57/60
57
Thus the NLMS update is
)()(||)(||
~)()1(
)(
2 nenn
nn
n
∗+=+ xx
ww
µ
To avoid problems when 0||)(||2 ≈nx we add
an offset
)()(||)(||
~)()1( 2 nenna
nn ∗++=+
xx
ww
where a > 0.Consider now the convergence of the NLMS
algorithm.
)()(||)(||
~)()1( 2 nenn
nn ∗+=+ xx
ww
substituting )()()()( nnnd ne H xw−=
22
2
||)(||)()(~)(]
||)(||)()(~[
)]()()()[(||)(||
~)()1(
nnd n
nn
nn
nnnd nnnn
H
H
xx
wx
xx
wxxxww
∗
∗
+−=
−+=+
µ µ
8/19/2019 Pdf8 Steepest Descent
58/60
58
Compare NLMS and LMS:
NLMS:
22 ||)(||)()(~)(]
||)(||)()(~[)1(
nnd n
nn
nnn
H
xx
wx
xxw
∗
+−=+ µ µ
LMS:
)()()()]()([)1( nd nnnnn H ∗+−=+ xwxxw µ µ Comparing we see the following
corresponding terms
LMS NLMS
µ ~
)()( nn H xx2||)(||
)()(n
nn H
xxx
)()( nd n ∗x2||)(||)()(
nnd n
xx ∗
8/19/2019 Pdf8 Steepest Descent
59/60
59
Since in the LMS case
][
2
)}]()({[
2
0 Rxx tracenn E trace H =
8/19/2019 Pdf8 Steepest Descent
60/60
1}||)({||)}()({
||)(||)()(
22 == n E nn E
nnn
E trace H H
xxx
xxx
Thus the NLMS update
)(||)(||
)(~)()1( 2 nenn
nn ∗+=+xx
ww µ
will converge if 2~0
Top Related