NN.adaline

14
NNs Adaline 1 Neural Networks - Adaline L. Manevitz

description

Adaline

Transcript of NN.adaline

Page 1: NN.adaline

NNs Adaline 1

Neural Networks - Adaline

L. Manevitz

Page 2: NN.adaline

NNs Adaline 2

Plan of Lecture

• Perceptron: Connected and Convex examples

• Adaline Square Error

• Gradient

• Calculate for and, xor

• Discuss limitations

• LMS algorithm: derivation.

Page 3: NN.adaline

NNs Adaline 3

What is best weights?

• Most classified correctly?

• Least Square Error

• Least Square Error Before Cut-Off!

• To Minimize d – w x)**2 (first sum over examples; second over dimension)

Page 4: NN.adaline

NNs Adaline 4

Least Square Minimization

• Find gradient of error over all examples. Either calculate the minimum or move opposite to gradient.

• Widrow-Hoff(LMS): Use instantaneous example as approximation to gradient.– Advantages: No memory; on-line; serves similar

function as noise to avoid local problems.

– Adjust by w(new) = w(old) + x for each x.

– Here desired output – wx)

Page 5: NN.adaline

NNs Adaline 5

LMS Derivation

• Errsq = d(k) – W x(k)) ** 2

• Grad(errsq) = 2d(k) – W x(k)) (-x(k))

• W (new) = W(old) - Grad(errsq)

• To ease calculations, use Err(k) in place of Errsq

• W(new) = W(old) + 2Err(k) x(k))

• Continue with next choice of k

Page 6: NN.adaline

NNs Adaline 6

Applications

• Adaline has better convergence properties than Perceptron

• Useful in noise correction

• Adaline in every modem.

Page 7: NN.adaline

NNs Adaline 7

LMS (Least Mean Square Alg.)

• 1. Apply input to Adaline input

• 2. Find the square error of current input

– Errsq(k) = (d(k) - W x(k))**2

• 3. Approximate Grad(ErrorSquare) by

– differentiating Errsq

– approximating average Errsq by Errsq(k)

– obtain -2Errsq(k)x(k)

4. Update W: W(new) = W(old) + 2Errsq(k)X(k)

5. Repeat steps 1 to 4.

Page 8: NN.adaline

NNs Adaline 8

Comparison with Perceptron

• Both use updating rule changing with each input

• One fixes binary error; the other minimizes continuous error

• Adaline always converges; see what happens with XOR

• Both can REPRESENT Linearly separable functions

Page 9: NN.adaline

NNs Adaline 9

Convergence Phenomenom

• LMS converges depending on choice of

• How to choose it?

Page 10: NN.adaline

NNs Adaline 10

Limitations

• Linearly Separable

• How can we get around it?

– Use network of neurons?

– Use a transformation of data so that it is linearly separable

Page 11: NN.adaline

NNs Adaline 11

Multi-level Neural Networks

• Representability– Arbitrarily complicated decisions

– Continuous Approximation: Arbitrary Continuous Functions (and more) (Cybenko Theorem)

• Learnability– Change Mc-P neurons to Sigmoid etc.

– Derive backprop using chain rule. (Like LMS TheoremSample Feed forward Network (No loops)

Page 12: NN.adaline

NNs Adaline 12

Replacement of Threshold Neurons with Sigmoid or Differentiable Neurons

•Threshold •Sigmoid

Page 13: NN.adaline

NNs Adaline 13

Prediction

•Input/Output •NN

•delay

•Compare

Page 14: NN.adaline

NNs Adaline 14

Sample Feed forward Network (No loops)

•Weights •Weights

•Weights

•Input

•Output

•Wji•Vik

F(wji xj