Adaline,Madaline,Widrow Hoff

34
ADALINE, MADALINE and the Widrow-Hoff Rule

description

Adaline,Madaline,Widrow Hoff

Transcript of Adaline,Madaline,Widrow Hoff

Page 1: Adaline,Madaline,Widrow Hoff

ADALINE, MADALINE and the Widrow-Hoff Rule

Page 2: Adaline,Madaline,Widrow Hoff

Adaptive Linear Combiner

Page 3: Adaline,Madaline,Widrow Hoff

ADALINE

Page 4: Adaline,Madaline,Widrow Hoff

MADALINE

Page 5: Adaline,Madaline,Widrow Hoff
Page 6: Adaline,Madaline,Widrow Hoff

Minimal Disturbance Principle

• Adjust weights to reduce error wrt current pattern with minimal disturbance to patterns already learnt.

• In other words, make changes to the weight vector in the same direction as the input

Page 7: Adaline,Madaline,Widrow Hoff

Learning Rules

Page 8: Adaline,Madaline,Widrow Hoff

Error Correction- Single Element Network

Page 9: Adaline,Madaline,Widrow Hoff

Perceptron Convergence Rule Non-linear Weight update:

Quantizer error:

Page 10: Adaline,Madaline,Widrow Hoff

Geometric Visualization of the Perceptron Convergence Rule

Page 11: Adaline,Madaline,Widrow Hoff

Geometric Visualization of the Perceptron Convergence Rule

Page 12: Adaline,Madaline,Widrow Hoff

Geometric Visualization of the Perceptron Convergence Rule

Page 13: Adaline,Madaline,Widrow Hoff

α Least Mean Square (LMS) Linear

Page 14: Adaline,Madaline,Widrow Hoff

Weight Update equation: Error for the kth input pattern

Change in error for the kth input pattern after the weights have been updated:

Condition for convergence and stability

Page 15: Adaline,Madaline,Widrow Hoff

Error Correction Rules for Multi-Layer Networks

Page 16: Adaline,Madaline,Widrow Hoff

Madaline Rule 1 Non-Linear

Page 17: Adaline,Madaline,Widrow Hoff

Steps:

• If output matches the desired response- no adaptation

• If output is different:

- Find the adaline whose linear sum is closest to 0

- Adapt its weights in the LMS direction far enough to reverse its output.

- LOAD SHARING : Do until you get desired response.

Page 18: Adaline,Madaline,Widrow Hoff

Madaline Rule II Non-Linear

Page 19: Adaline,Madaline,Widrow Hoff

Steps: (For one training pattern) • Similar to MR I

• Concept of trial adaptation, by adding a small perturbation of suitable amplitude and polarity

• If output Hamming error is reduced- change the weights of that adaline in direction collinear with input, else no adaptation

• Keep doing this for all adalines with sufficiently small linear output magnitude.

• Finally last layer adapted using alpha-LMS.

Page 20: Adaline,Madaline,Widrow Hoff

Steepest Descent – Single Element Network

Page 21: Adaline,Madaline,Widrow Hoff

Error Surface of a Linear Combiner

Page 22: Adaline,Madaline,Widrow Hoff

The Optimal Weiner Hopf Weight

The squared error can be written as:

Taking expectation of the above expression yields:

Page 23: Adaline,Madaline,Widrow Hoff

So MSE surface equations is :

With global optimal weight solution as:

Page 24: Adaline,Madaline,Widrow Hoff

Gradient Descent Algorithm

The aim of gradient descent is to make weight updates in the direction of the negative gradient by a factor of μ, which controls the stability and convergence of the algorithm and ∇𝑘 is the gradient a point on the MSE surface corresponding to w= 𝑤𝑘.

𝑤𝑘+1 = 𝑤𝑘 + 𝜇(−𝛻𝑘)

Page 25: Adaline,Madaline,Widrow Hoff

μ- LMS Linear

• It uses the Instantaneous Gradient i.e. the gradient of the squared error of the current training sample is used as an approximation of the actual gradient.

Page 26: Adaline,Madaline,Widrow Hoff

• Since instantaneous gradient can be easily calculated from the current sample, rather than averaging over instantaneous gradients over all pattern in the training set.

• For stability and convergence we need:

Page 27: Adaline,Madaline,Widrow Hoff

Madaline III

Non-Linear

Page 28: Adaline,Madaline,Widrow Hoff

Steps:

• Small perturbation added to input

• Change in error and change in putput due to this perturbation on input is calculated

• Given this change in output error wrt input perturbation, instantaneous gradient can be calculated.

• It is shown to be mathematically equivalent to backprop if input perturbation is small.

Page 29: Adaline,Madaline,Widrow Hoff

Approximate Gradient:

Since: And therefore:

So for small perturbation:

So weight update equation is thus:

Page 30: Adaline,Madaline,Widrow Hoff

• No need to know apriori the nature of the

activation function • Robust to drifts in analog hardware

Alternatively:

So weight update equation is thus:

Page 31: Adaline,Madaline,Widrow Hoff

Steepest Descent- Multi-Layer Networks

Page 32: Adaline,Madaline,Widrow Hoff

Madaline -III

Page 33: Adaline,Madaline,Widrow Hoff

Steps:

• Same as a single element, except here the change due to perturbation is measure at the output of multiple layers.

• Add perturbation to the linear sum.

• Measure the change in sum of squared error caused due to this perturbation.

• Obtain the instantaneous gradient of MSE wrt weight vector of the perturbed adaline.

Page 34: Adaline,Madaline,Widrow Hoff

Relevance to Present day work

• μ-LMS and α-LMS are still used today

• MR-III and MR-II can be applied on complicated architectures

• Given an arbitrary activation function, one can actually use MR-III architecture without requiring the activation function to be known