Adaline,Madaline,Widrow Hoff
description
Transcript of Adaline,Madaline,Widrow Hoff
ADALINE, MADALINE and the Widrow-Hoff Rule
Adaptive Linear Combiner
ADALINE
MADALINE
Minimal Disturbance Principle
• Adjust weights to reduce error wrt current pattern with minimal disturbance to patterns already learnt.
• In other words, make changes to the weight vector in the same direction as the input
Learning Rules
Error Correction- Single Element Network
Perceptron Convergence Rule Non-linear Weight update:
Quantizer error:
Geometric Visualization of the Perceptron Convergence Rule
Geometric Visualization of the Perceptron Convergence Rule
Geometric Visualization of the Perceptron Convergence Rule
α Least Mean Square (LMS) Linear
Weight Update equation: Error for the kth input pattern
Change in error for the kth input pattern after the weights have been updated:
Condition for convergence and stability
Error Correction Rules for Multi-Layer Networks
Madaline Rule 1 Non-Linear
Steps:
• If output matches the desired response- no adaptation
• If output is different:
- Find the adaline whose linear sum is closest to 0
- Adapt its weights in the LMS direction far enough to reverse its output.
- LOAD SHARING : Do until you get desired response.
Madaline Rule II Non-Linear
Steps: (For one training pattern) • Similar to MR I
• Concept of trial adaptation, by adding a small perturbation of suitable amplitude and polarity
• If output Hamming error is reduced- change the weights of that adaline in direction collinear with input, else no adaptation
• Keep doing this for all adalines with sufficiently small linear output magnitude.
• Finally last layer adapted using alpha-LMS.
Steepest Descent – Single Element Network
Error Surface of a Linear Combiner
The Optimal Weiner Hopf Weight
The squared error can be written as:
Taking expectation of the above expression yields:
So MSE surface equations is :
With global optimal weight solution as:
Gradient Descent Algorithm
The aim of gradient descent is to make weight updates in the direction of the negative gradient by a factor of μ, which controls the stability and convergence of the algorithm and ∇𝑘 is the gradient a point on the MSE surface corresponding to w= 𝑤𝑘.
𝑤𝑘+1 = 𝑤𝑘 + 𝜇(−𝛻𝑘)
μ- LMS Linear
• It uses the Instantaneous Gradient i.e. the gradient of the squared error of the current training sample is used as an approximation of the actual gradient.
• Since instantaneous gradient can be easily calculated from the current sample, rather than averaging over instantaneous gradients over all pattern in the training set.
• For stability and convergence we need:
Madaline III
Non-Linear
Steps:
• Small perturbation added to input
• Change in error and change in putput due to this perturbation on input is calculated
• Given this change in output error wrt input perturbation, instantaneous gradient can be calculated.
• It is shown to be mathematically equivalent to backprop if input perturbation is small.
Approximate Gradient:
Since: And therefore:
So for small perturbation:
So weight update equation is thus:
• No need to know apriori the nature of the
activation function • Robust to drifts in analog hardware
Alternatively:
So weight update equation is thus:
Steepest Descent- Multi-Layer Networks
Madaline -III
Steps:
• Same as a single element, except here the change due to perturbation is measure at the output of multiple layers.
• Add perturbation to the linear sum.
• Measure the change in sum of squared error caused due to this perturbation.
• Obtain the instantaneous gradient of MSE wrt weight vector of the perturbed adaline.
Relevance to Present day work
• μ-LMS and α-LMS are still used today
• MR-III and MR-II can be applied on complicated architectures
• Given an arbitrary activation function, one can actually use MR-III architecture without requiring the activation function to be known