METHOD OF STEEPEST DESCENT

27
METHOD OF STEEPEST DESCENT ELE 774 - Adaptive Signal Processing 1 Week 5

description

METHOD OF STEEPEST DESCENT. Mean Square Error (Revisited). For a transversal filter (of length M), the output is written as and the error term wrt. a certain desired response is . Mean Square Error (Revisited). Following these terms, the MSE criterion is defined as - PowerPoint PPT Presentation

Transcript of METHOD OF STEEPEST DESCENT

Page 1: METHOD OF  STEEPEST DESCENT

METHOD OF STEEPEST DESCENT

Week 5

Page 2: METHOD OF  STEEPEST DESCENT

Mean Square Error (Revisited)

For a transversal filter (of length M), the output is written as

and the error term wrt. a certain desired response is

Page 3: METHOD OF  STEEPEST DESCENT

Mean Square Error (Revisited) Following these terms, the MSE criterion is defined as

Substituting e(n) and manupulating the expression, we get

where

Quadratic in w !

Page 4: METHOD OF  STEEPEST DESCENT

Mean Square Error (Revisited) For notational simplicity, express MSE in terms of vector/matrices

where

Page 5: METHOD OF  STEEPEST DESCENT

Mean Square Error (Revisited)

We found that the solution (optimum filter coef.s wo) is given by the Wiener-Hopf eqn.s

Inversion of R can be very costly. J(w) is quadratic in w → convex in w → for wo,

Surface has a single minimum and it is global, then

Can we reach to wo, i.e. with a less demanding algorithm?

Page 6: METHOD OF  STEEPEST DESCENT

Basic Idea of the Method of Steepest Descent

Can we find wo in an iterative manner?

Page 7: METHOD OF  STEEPEST DESCENT

Basic Idea of the Method of Steepest Descent

Starting from w(0), generate a sequence {w(n)} with the property

Many sequences can be found following different rules.

Method of steepest descent generates points using the gradient Gradient of J at point w, i.e. gives the direction at which

the function increases most. Then gives the direction at which the function

decreases most. Release a tiny ball on the surface of J → it follows negative

gradient of the surface.

Page 8: METHOD OF  STEEPEST DESCENT

Basic Idea of the Method of Steepest Descent

For notational simplicity, let

, then going in the direction given by the negative gradient

How far should we go in –g → defined by the step size param. μ Optimum step size can be obtained by line search - difficult Generally a constant step size is taken for simplicity.

Then, at each step improvement in J is (from Taylor series expansion)

Page 9: METHOD OF  STEEPEST DESCENT

Application of SD to Wiener Filter For w(n)

From the theory of Wiener Filter we know that

Then the update eqn. Becomes

which defines a feedback connection.

Page 10: METHOD OF  STEEPEST DESCENT

Convergence Analysis Feedback → may cause stability problems under certain conditions.

Depends on The step size, μ The autocorrelation matrix, R

Does SD converge? Under which conditions? What is the rate of convergence?

We may use the canonical representation.

Let the weight-error vector be

then the update eqn. becomes

Page 11: METHOD OF  STEEPEST DESCENT

Convergence Analysis Let

be the eigendecomposition of R.

Then

Using QQH=I

Apply the change of coordinates

Then, the update eqn. becomes

Page 12: METHOD OF  STEEPEST DESCENT

Convergence Analysis We know that Λ is diagonal, then the k-th natural mode is

or, with the initial values vk(0), we have

Note the geometric series

Page 13: METHOD OF  STEEPEST DESCENT

Convergence Analysis Obviously for stability

or, simply

Geometric series results in an exponentially decaying curve with time constant τk, where letting

or

Page 14: METHOD OF  STEEPEST DESCENT

Convergence Analysis We have

but

We know that Q is composed of the eigenvectors of R, then

or

Each filter coefficient decays exponentially. The overall rate of convergence is limited by the slowest and fastest

modes

then

Page 15: METHOD OF  STEEPEST DESCENT

Convergence Analysis For small step size

What is v(0)? The initial value v(0) is

For simplicity assume that w(0)=0, then

Page 16: METHOD OF  STEEPEST DESCENT

Convergence Analysis Transient behaviour:

From the canonical form we know that

then

As long as the upper limit on the step size parameter μ is satisfied, regardless of the initial point

Page 17: METHOD OF  STEEPEST DESCENT

Convergence Analysis The progress of J(n) for n=0,1,... is called the learning curve.

The learning curve of the steepest-descent algorithm consists of a sum of exponentials, each of which corresponds to a natural mode of the problem.

# natural modes = # filter taps

Page 18: METHOD OF  STEEPEST DESCENT

Example

A predictor with 2 taps (w1(n) and w2(n)) is used to find the params. of the AR process

Examine the transient behaviour for Fixed step size, varying eigenvalue spread Fixed eigenvalue spread, varying step size.

σv2 is adjusted so that σu

2=1.

Page 19: METHOD OF  STEEPEST DESCENT

Example The AR process:

Two eigenmodes

Condition number (Eigenvalue Spread):

Page 20: METHOD OF  STEEPEST DESCENT

Example (Experiment 1)

Experiment 1: Keep the step size fixed at

Change the eigenvalue spread

Page 21: METHOD OF  STEEPEST DESCENT

Example (Experiment 1)

Page 22: METHOD OF  STEEPEST DESCENT
Page 23: METHOD OF  STEEPEST DESCENT

Example (Experiment 2)

Keep the eigenvalue spread fixed at

Change the step size (μmax=1.1)

Page 24: METHOD OF  STEEPEST DESCENT
Page 25: METHOD OF  STEEPEST DESCENT

Example (Experiment 2)

Depending on the value of μ, the learning curve can be Overdamped, moves smoothly to the min. ((very) small μ) Underdamped, oscillates towards the min. (large μ< μmax) Critically damped Generally rate of convergence is slow for the first two.

Page 26: METHOD OF  STEEPEST DESCENT

Observations SD is a ‘deterministic’ algorithm, i.e. we assume that

R and p are known exactly. In practice they can only be estimated

Sample average?

Can have high computational complexity.

SD is a local search algorithm, but for Wiener filtering, the cost surface is convex (quadratic) convergence is guaranteed as long as μ< μmax is satisfied.

Page 27: METHOD OF  STEEPEST DESCENT

Observations The origin of SD comes from the Taylor series expansion (as many

other local search optimization algorithms)

Convergence can we very slow. To speed up the process, second term can also be included as in

the Newton’s Method

High computational complexity (inversion), numerical stability problems.

Hessian