1
Adaptive Filtering Method of Steepest
Descent
Steepest descent is an old, deterministic
method, which is the basis for stochastic
gradient based methods.
This is a feed back approach to finding the
minimum of the error performance surface
- error surface must be know
- adaptive approach converses to the
optimal solution, pRw 10
−= without
inverting matrix
2
• x(n) are the WSS input samples• d(n) is the WSS desired output
• )(ˆ nd is the estimate of the desiredsignal given by
)()()(ˆ nnnd H xw=
where TMnxnxnxn )]1(),...,1(),([)( +−−=x
and T
M nwnwnwn )](),...,(),([)( 110 −=w is the
filter weight vector at time n.
3
Then
)()()(
)(ˆ)()(
nnnd
ndndneH xw−=
−=
Thus the MSE of time n is
)()()()(
|)(|)(2
2
nnnn
neEnJHHH
d Rwwwppw +−−==
σwhere
2dσ - variance of desired signal
p – Cross-correlation between x(n) and d(n)
R –correlation matrix of x(n)
When w(n) is set to the (optimal) Wiener
solution, then
pRww 10)( −==n
and
02
min)( wpHdJnJ −== σ
4
Hence, in order to iteratively find w0, we usethe method of steepest descent. To illustratethis concept, let M = 2, in the 2-D spacedw(n), the MSE forms a Bowl-shapedfunction.
A contour of the MSE is given as
Thus, if we are at a specific point in theBowl, we can imagine dropping a marble. Itwould reach the minimum. By goingthrough the path of steepest descent.
w0
w
J(w)
J(w)
w1
w2
w0
w2
w1
)(J∇−
2w
J
∂∂−
1w
J
∂∂−
5
Hence the direction in which we change thefilter direction is )(nJ∇− , or
)]([2
1)()1( nJnn −∇+=+ µww
or, since )(22))(( nnJ Rwp +−=∇
)]([)()1( nnn Rwpww −+=+ µ
for n=0, 1, 2, … and where µ is called the
stepsize and
) generalin ( )0( 0w =
Stability: Since the SD method uses
feedback, the system can go unstable
• bounds on the step size guaranteeing
stability can be determined with respect
ot the eigenvalues of R (widrow, 1970)
6
Define the error vector for the tap weights as
0)()( wwc −= nn
Then using 0Rwp = in the update,
)()(
)]([)(
)]([)()1(
0
nn
nn
nnn
Rcw
RwRww
Rwpww
µµµ
−=−+=
−+=+
and
)(][
)()()1(
)()()1( 00
n
nnn
or
nnn
cRI
Rccc
Rcwwww
µµ
µ
−=−=+
−−=−+
7
Using the Unitary Similarity Transform
HQQR =
we have
)(][)1( nn H cQQIc µ−=+
Premultiplying by QH gives
)(][
)(][)1(
n
nn
H
HHHH
cQI
cQQQQcQ
µ
µ
−=
−=+
Define the transformed coefficients as
))((
)()(
0wwQ
cQv
−==
n
nnH
H
8
Then
)(][)1( nn vIv µ−=+
with initial condition
00 ))0(()0( wQwwQv HH −=−=
if 0w =)0(
The kth term in v(n+1) (mode) is given by
Mknvnv kkk ,...,2,1)()1()1( =−=+ µλ
or using the recursion
)0()1()( kn
kk vnv µλ−=
Thus for all 0)(lim =∞→
nvkn we must have
1|1| <− kµλ for all k, or max
20
λµ <<
9
The kth mode has geometric decay
)0()1()( kn
kk vnv µλ−=
we can characterize the rate of decay by
finding the time it takes to decay to e-1 of the
initiative. Thus
11
)1ln(
1
)0()0()1()( 1
<<≈−−=
⇓
=−= −
µµλµλ
τ
µλτ τ
for
vevv
kkk
kkkkkk
The overall rate of decay is
)1ln(
1
)1ln(
1
minmax µλτ
µλ −−≤≤
−−
10
Recall that
∑
∑
=
=
−+=
+=
+=
−−+=
−−+=
M
kk
nkk
M
kkk
H
HH
H
vJ
nvJ
nnJ
nnJ
nnJnJ
1
22min
1
2min
min
00min
00min
|)0(|)1(
|)(|
)()(
))(())((
))(())(()(
µλλ
λ
Yv
wwQQww
wwRww
Thus min)(lim JnJn
=∞→
11
Example: Consider a two-tap predictor
Consider the effects of the following cases
• Varying the eigenvalue spread
min
max)(λλχ =R
and keeping µ fixed.
• Varying µ and keeping the eigenvalue
spread )(Rχ fixed
12
13
14
15
16
17
18
19
Example: Consider the system identification
problem
For M=2 suppose
=
=
5.0
8.0
18.0
8.01PRx
w(n) system
x(n)
)(ˆ nd d(n)+_
e(n)
20
From eigen analysis we have
λ1 = 1.8, λ2 = 0.2 and 8.1
2<µ
also
−
=
=
1
1
2
1
1
1
2
121 qq
and
−
=11
11
2
1Q
Also,
−
== −
389.0
11.110 pRw
Thus])([)( 0wwQv −= nn H
Noting that
=
−
−
−=−=06.1
51.0
389.0
11.1
11
11
2
1)0( 0wQv H
21
and51.0))8.1(1()(1
nnv µ−=
06.1))2.0(1()(2nnv µ−=
22
The Least Mean Square (LMS) Algorithm
The error performance surface used by theSD method is not always known a priori.We can use estimated values. The estimatesare RVs and thus this leads to a stochasticapproach.
We will use the following instantaneousestimates
)()()(ˆ nnn HxxR =)()()(ˆ ndnn ∗= xP
23
Recall the SD update
[ ]))((2
1)()1( nJnn ∇−+=+ µww
where the gradient of the error surface atw(n) was shown to be
)(22))(( nnJ Rwp +−=∇
Using the instantaneous estimates,
)()(2
)](ˆ)()[(2
)]()()()[(2
)()()(2)()(2))((ˆ
nen
ndndn
nnndn
nnnndnnJH
H
∗
∗∗
∗
∗
−=−−=
−−=+−=∇
x
x
wxx
wxxx
Complex conjugate ofestimate error
24
Putting this in the update
)()()()1( nennn ∗+=+ xww µ
Thus LMS algorithm belongs to the family
of stochastic gradient algorithms.
The update is extremely simple while the
instantaneous estimates may have large
variance, the LMS algorithm is recursive
and effectively averages these estimates.
The simplicity and good performance of the
LMS algorithm make it the benchmark
against which other optimization algorithms
are judged.
25
The LMS algorithm can be analyzed by
invoking the independence theory, which
states
1) The vectors x(1), x(2), …, x(n) are
statistically independent.
2) x(n) is independent of d(1), d(2), …,
d(n-1)
3) d(n) is statistically dependent on x(n),
but is independent of d(1), d(2), …, d(n-1)
4) x(n) and d(n) are mutually Gaussian.
The independence theorem is justified in
some cases, e.g. beamforming where we
receive independent vector observations. In
other cases it is not well justified, but allows
the analysis to proceeds.
26
Using the independence theory we can show
that w(n) converges to the optimal solution
in the mean
0)(lim ww =∞→
nEn
In certain cases, to show this, evaluate the
update
)()()()1( nennn ∗+=+ xww µ
)()()()1( 00 nennn ∗+−=−+ xwwww µ
])()()[(
)()]()([
)()()()()(
)()()(
])()[()(
)()()(
))()()()(()()1(
0
0
00
wxx
cxxI
wxxcxx
xc
wwwxx
xc
wxxcc
nndn
nnn
nnnnn
ndnn
nnn
ndnn
nnndnnn
H
H
HH
H
H
−+
−=
−−
+=
+−−+=
−+=+
∗
∗
∗
∗
µµ
µµµ
µµµ
27
Note that since w(n) is based on past inputs
desired responses, w(n) (and c(n)) is
independent of x(n)
Thus
)()()1(
)()()()()1(
)()()()]()([)1(
why?zero, is This
0
0
nEnE
nenEnEnE
nennnnn H
cRIc
xcRIc
xcxxIc
µ
µµ
µµ
−=+
+−=+⇓
+−=+
∗
∗
4434421
Using arguments similar to the SD case we
have
max
20 if0)(lim
λµ <<=
∞→nE
nc
or equivalently
max0
20 if)(lim
λµ <<=
∞→ww nE
n
28
Noting that
2max )0(][ xNNrtrace σλ ==≤ R
a more conservative bound is
2
20
xNσµ <<
Also, convergence in the mean
0)(lim ww =∞→
nEn
is a weak condition that says nothing about
the variance, which may even grow.
A stronger condition is convergence in the
mean square, which says
constant|)(|lim 2 =∞→
nEn
c
29
An equivalent condition is to show that
constant|)(|lim)(lim 2 ==∞→∞→
neEnJnn
write e(n) as
)()()(
)()()()(
)()()()(ˆ)()(
0
0
nnne
nnnnd
nnndndndne
H
HH
H
xc
xcxw
xw
−=
−−=−=−=
Thus
)(
)()()()(
))()()())(()()((
|)(|)(
min
)(
min
00
2
nJJ
nnnnEJ
nnnennneE
neEnJ
ex
nJ
HH
HH
ex
+=
+=
−−==
∗
4444 34444 21cxxc
cxxc
30
Since Jex(n) is a scalar
)]()()()([
)]()()()([
)]()()()([
)()()()()(
nnnnEtrace
nnnntraceE
nnnntraceE
nnnnEnJ
HH
HH
HH
HHex
ccxx
ccxx
cxxc
cxxc
===
=
Invoking the independence theorem
)]([
)]()()()([)(
ntrace
nnEnnEtracenJ HHex
RK
ccxx
==
where
)()()( nnEn HccK =
31
Thus
)]([
)()(
min
min
ntraceJ
nJJnJ ex
RK+=+=
Recall
HH or QQRRQQ ==
Let
)()( nnH SQKQ∆=
where S(n) need not be diagonal. Then
Hnn QQSK )()( = and
)]([
)]([
])([
])([
)]([)(
ntrace
ntrace
ntrace
ntrace
ntracenJ
H
H
HH
ex
6
SQQ
QSQ
QQSQQ
RK
=
=
=
=
=
32
Since Ω is diagonal
∑=
==M
iiiex nsntracenJ
1
)()]([)( λ6
where s1(n), s2(n), …, sM(n) are the diagonal
elements of S(n).
The recursion expression can be modified to
yield a recursion on S(n), which is
ISIS min2))(()()1( Jnn µµµ +−−=+
which for the diagonal elements is
MiJnsns iiii ,..,2,1)()1()1( min22 =+−=+ λµµλ
Suppose Jex(n) converges, then
)()1( nsns ii =+ and from the above
MiJ
JJns
i
ii
i
i
ii
,...,2,12
2)1(1)(
min
22min
2
2min
2
=−
=
−=
−−=
µλµ
λµµλλµ
µλλµ
33
Utilizing
∑=
==M
iiiex nsntracenJ
1
)()]([)( λ6
we see
∑=∞→ −
=M
i i
iex
nJnJ
1min 2
)(limµλ
µλ
The LMS misadjustment is defined
∑=
∞→
−==
M
i i
iex
n
J
nJM
1min 2
)(lim
µλµλ
A misadjustment at 10% or less is generally
considered acceptable.
34
Example: one tap predictor of order one AR
process. Let
)()1()( nvnaxnx +−−=and use a one tap predictor.
The weight update is
)]1()()()[1()(
)()1()()1(
−−−+=−+=+
nxnwnxnxnw
nenxnwnw
µµ
Note aw −=0 consider two cases and set
05.0=µ a 2xσ
-0.99 0.936270.99 0.995
35
36
37
38
Consider the expected trajectory of w(n).
Recall
)()1()()]1()1(1[
)]1()()()[1()(
)()1()()1(
nxnxnwnxnx
nxnwnxnxnw
nenxnwnw
−+−−−=−−−+=
−+=+
µµµµ
Since )()1()( nvnaxnx +−−=
)()1()1()1(
)()]1()1(1[
)]()1()[1(
)()]1()1(1[)1(
nvnxnxnax
nwnxnx
nvnaxnx
nwnxnxnw
−+−−−−−−=
+−−−+−−−=+
µµµ
µµ
Taking the expectation and invoking the
dependence theorem
anwEnwE xx22 )()1()1( µσµσ −−=+
39
40
We can also derive a theoretical expression
for J(n).
Note that the initial value of J(n) is
2)0( xJ σ=
and the final value is
1
1min
2min 2
)(µλ
µλσ−
+=+=∞ JJJJ vex
if µ small
+=
+=∞
21
2)(
22
222 x
vx
vvJµσσµσσσ
Also, the time constant is
)2
1()1)](2
1([)(
2
1
)1ln(2
1
)1ln(2
1
2222222
221
xvn
xxvx
xx
nJ σµσµσσµσσ
µσµσµλτ
++−+−=
≈−
−=−
−=
41
42
Example: Adaptive equalization
Goal: Pass known signal through unknown
channel to invert effects of channel and
noise on signal.
43
The signal is a Bernouli sequence
−+
=1/2y probabilitwith 1
1/2y probabilitwith 1nx
The channel has a raised cosine response
=−+
=otherwise 0
3,2,1n))]2(2
cos(1[2
1n
whn
π
Note that w controls the eigenvalue spread
)(Rχ .
Also the additive noise is ∼N(0, 0.001)
Note that hn is symmetric about n=2 and
thus introduces a delay of 2. We will use an
M=11 tap filter, which will be symmetric
about n=5 and introduce a delay of 5.
Thus an overall delay of δ=5+2=7 is added
to the system.
44
Channel response and Filter response
Consider three w values
Note step size is bound by w=3.5 case
14.0)3022.1(11
2
)0(
2 ==≤Nr
µ
Choose µ=0.075 in all cases.
45
46
47
48
Example: Directionality of the LMS
algorithm
• The speed of convergence of the LMS
algorithm is faster in certain directions in
the weight space.
• If the convergence is in the appropriate
direction, the convergence can be
accelerated by increased eigenvalue
spread.
Consider the deterministic signal
)cos()cos()( 2211 nAnAnx ωω +=
with
++++
=22
212
221
21
2221
21
22
21
)cos()cos(
)cos()cos(
21
AAAA
AAAA
ωωωω
R
49
which gives
))cos(1(2
1))cos(1(
2
1
))cos(1(2
1))cos(1(
2
1
2221
212
2221
211
ωωλ
ωωλ
−+−=
+++=
AA
AA
and
−=
=
1
1
1
121 qq
Consider two cases:
9.12)( with )23.0cos(5.0)6.0cos()(
9.2)( with )1.0cos(5.0)2.1cos()(
=+==+=
R
R
χχ
nnnx
nnnx
b
a
In each case let
==⇒=⇒=
1
11011011 qwqRwqp λλ and
−==⇒=⇒=
1
12022022 qwqRwqp λλ
Look at 200 iterations of the algorithm.
Look at minimum eigenfilter, first
Then maximum eigenfilter,
−==
1
120 qw
==
1
110 qw
50
51
52
Normalized LMS Algorithm
In the standard LMS algorithm the
correction is proportional to )()( nen ∗xµ
)()()()1( nennn ∗+=+ xww µ
If x(n) is large, the update suffers from
gradient noise amplification. The
normalized LMS algorithm seeks to avoid
gradient noise amplification
• The step size is made time varying, µ(n),
and optimized to minimize error.
53
Thus let
)]()[()(
)]()[(2
1)()1(
nnn
nnnn
Rwpw
ww
−+=
−∇+=+
µ
µ
Choose µ(n), such that the updated w(n+1)
produces the minimum MSE,
|)1(|)1( 2+=+ neEnJ
where
)1()1()1()1( ++−+=+ nnndne H xw
Thus we choose µ(n) such that it minimizes
J(n+1).
The optimal step size, µ0(n), will be a
function of R and ∇(n). As before, we use
instantaneous estimates of these values.
54
To determine µ0(n), expand J(n+1)
)1()1(
)1()1(
))1()1()1((
))1()1()1((
)1()1()1(
2
+++
+−+−=++−+
++−+=++=+
∗
∗
nn
nn
nnnd
nnndE
neneEnJ
H
HHd
H
H
Rww
wppw
wx
xw
σ
Now use the fact that )()(21
)()1( nnnn ∇−=+ µww
44444444 344444444 21
)()()(4
1)()()(
2
1
)()()(2
1)()(
2
2
)()(2
1)()()(
2
1)(
)()(2
1)(
)()(2
1)()1(
nnnnnn
nnnnn
H
H
H
d
HH
HH
nnnnnn
nnn
nnnnJ
∇∇+∇−
∇−=
∇−
∇−+
∇−−
∇−−=+
RRw
RwRww
wRw
wp
pw
µµ
µ
µµ
µ
µσ
55
)()()(4
1)()()(
2
1
)()()(2
1)()(
)()(2
1)(
)()(2
1)()1(
2
2
nnnnnn
nnnnn
nnn
nnnnJ
HH
HH
H
H
d
∇∇+∇−
∇−+
∇−−
∇−−=+
RRw
RwRww
wp
pw
µµ
µ
µ
µσ
Differentiating with respect to µ(n),
)()()(2
1)()(
2
1
)()(2
1)(
2
1)(
2
1
)(
)1(
nnnnn
nnnnn
nJ
HH
HHH
∇∇+∇−
∇−∇+∇=∂
+∂
RRw
Rwpp
µ
µ
Setting equal to 0
pRw
pRwR
)()()(
)()()()()()(0
nnn
nnnnnnHH
HHH
∇−∇+
∇−∇=∇∇µ
)()(
])()[()(])([
)()(
])()[()(])([)(0
nn
nnnn
nn
nnnnn
H
HH
H
HHH
∇∇−∇+∇−=
∇∇−∇+∇−=
RpRwpRw
RpRwpRwµ
56
)()(
)()(
)()(
)()(21
)()(21
)(0
nn
nn
nn
nnnnn
H
H
H
HH
∇∇∇∇=
∇∇
∇∇+∇∇=
R
Rµ
Using instantaneous estimates
)()(2
))]()(ˆ)(([2
)]()()()()([2)(ˆ
)()(ˆ
nen
ndndn
ndnnnnn
nnH
H
∗
∗∗
∗
−=−=
−=∇
=
x
x
xwxx
xxR
Thus
2
22
2
0
||)(||
1
)()(
1
))()((|)(|
)()(|)(|
)()(2)()()()(2
)()()()(4)(
nnn
nnne
nnne
nennnnen
nennenn
H
H
H
HH
H
xxx
xxxx
xxxxxx
==
=
= ∗
∗
µ
57
Thus the NLMS update is
)()(||)(||
~)()1(
)(
2nen
nnn
n
∗+=+ xx
ww43421
µ
µ
To avoid problems when 0||)(|| 2 ≈nx we add
an offset
)()(||)(||
~)()1(
2nen
nann ∗
++=+ x
xww
µ
where a > 0.Consider now the convergence of the NLMS
algorithm.
)()(||)(||
~)()1(
2nen
nnn ∗+=+ x
xww
µ
substituting )()()()( nnndne H xw−=
22
2
||)(||
)()(~)(]||)(||
)()(~[
)]()()()[(||)(||
~)()1(
n
ndnn
n
nn
nnndnn
nn
H
H
xx
wx
xx
wxxx
ww
∗
∗
+−=
−+=+
µµ
µ
58
Compare NLMS and LMS:
NLMS:
22 ||)(||
)()(~)(]||)(||
)()(~[)1(n
ndnn
n
nnn
H
xx
wx
xxw
∗
+−=+ µµ
LMS:
)()()()]()([)1( ndnnnnn H ∗+−=+ xwxxw µµComparing we see the following
corresponding terms
LMS NLMS
µ µ~
)()( nn Hxx2||)(||
)()(
n
nn H
xxx
)()( ndn ∗x2||)(||
)()(
n
ndn
xx ∗
59
Since in the LMS case
][
2
)]()([
20
Rxx tracennEtrace H=<< µ
guarantees stability by analogy,
the NLMS condition is
]||)(||
)()([
2~0
2
<<
n
nnEtrace
H
xxx
µ
make the following approximation
||)(||
)()(
||)(||
)()(22 nE
nnE
n
nnE
HH
xxx
xxx ≈
Then
||)(||
)]()([
||)(||
)]()([
||)(||
)]()([
||)(||
)()(
22
22
nE
nntraceE
nE
nntraceE
nE
nnEtrace
n
nnEtrace
HH
HH
xxx
xxx
xxx
xxx
==
=
60
1||)(||
)()(
||)(||
)()(22
==
nE
nnE
n
nnEtrace
HH
xxx
xxx
Thus the NLMS update
)(||)(||
)(~)()1(2
nen
nnn ∗+=+
xx
ww µ
will converge if 2~0 << µ .
• The NLMS has a simpler convergence
criteria than the LMS
• The NLMS generally converges faster
than the LMS algorithm
Top Related