Adaline,Madaline,Widrow Hoff

ADALINE, MADALINE and the Widrow-Hoff Rule

Adaptive Linear Combiner

ADALINE

MADALINE

Minimal Disturbance Principle

• Adjust weights to reduce error wrt current pattern with minimal disturbance to patterns already learnt.

• In other words, make changes to the weight vector in the same direction as the input

Learning Rules

Error Correction- Single Element Network

Perceptron Convergence Rule Non-linear Weight update:

Quantizer error:

Geometric Visualization of the Perceptron Convergence Rule

α Least Mean Square (LMS) Linear

Weight Update equation: Error for the kth input pattern

Change in error for the kth input pattern after the weights have been updated:

Condition for convergence and stability

Error Correction Rules for Multi-Layer Networks

Madaline Rule 1 Non-Linear

Steps:

• If output matches the desired response- no adaptation

• If output is different:

- Find the adaline whose linear sum is closest to 0

- Adapt its weights in the LMS direction far enough to reverse its output.

- LOAD SHARING : Do until you get desired response.

Madaline Rule II Non-Linear

Steps: (For one training pattern) • Similar to MR I

• Concept of trial adaptation, by adding a small perturbation of suitable amplitude and polarity

• If output Hamming error is reduced- change the weights of that adaline in direction collinear with input, else no adaptation

• Keep doing this for all adalines with sufficiently small linear output magnitude.

• Finally last layer adapted using alpha-LMS.

Steepest Descent – Single Element Network

Error Surface of a Linear Combiner

The Optimal Weiner Hopf Weight

The squared error can be written as:

Taking expectation of the above expression yields:

So MSE surface equations is :

With global optimal weight solution as:

Gradient Descent Algorithm

The aim of gradient descent is to make weight updates in the direction of the negative gradient by a factor of μ, which controls the stability and convergence of the algorithm and ∇𝑘 is the gradient a point on the MSE surface corresponding to w= 𝑤𝑘.

𝑤𝑘+1 = 𝑤𝑘 + 𝜇(−𝛻𝑘)

μ- LMS Linear

• It uses the Instantaneous Gradient i.e. the gradient of the squared error of the current training sample is used as an approximation of the actual gradient.

• Since instantaneous gradient can be easily calculated from the current sample, rather than averaging over instantaneous gradients over all pattern in the training set.

• For stability and convergence we need:

Madaline III

Non-Linear

Steps:

• Small perturbation added to input

• Change in error and change in putput due to this perturbation on input is calculated

• Given this change in output error wrt input perturbation, instantaneous gradient can be calculated.

• It is shown to be mathematically equivalent to backprop if input perturbation is small.

Approximate Gradient:

Since: And therefore:

So for small perturbation:

So weight update equation is thus:

• No need to know apriori the nature of the

activation function • Robust to drifts in analog hardware

Alternatively:

So weight update equation is thus:

Steepest Descent- Multi-Layer Networks

Madaline -III

Steps:

• Same as a single element, except here the change due to perturbation is measure at the output of multiple layers.

• Add perturbation to the linear sum.

• Measure the change in sum of squared error caused due to this perturbation.

• Obtain the instantaneous gradient of MSE wrt weight vector of the perturbed adaline.

Relevance to Present day work

• μ-LMS and α-LMS are still used today

• MR-III and MR-II can be applied on complicated architectures

• Given an arbitrary activation function, one can actually use MR-III architecture without requiring the activation function to be known

Adaline,Madaline,Widrow Hoff

Documents

Transcript of Adaline,Madaline,Widrow Hoff