ANN Module2

download ANN Module2

of 39

description

A presentation on artificial neural networks

Transcript of ANN Module2

  • Overcoming Linear separability LimitationLinear separability limitation of single layer networks can be overcome by adding more layers.Multilayer networks can perform more general tasks.

  • Perceptron training AlgorithmTraining methods used can be summarized as followsApply an input pattern and calculate the output..If the output is correct, go to step 1.If the output is incorrect, and is zero, add each input to its corresponding weight;orIf the output is incorrect, and is one, subtract each input to its corresponding weight.3.Go to step 1.

  • THE DELTA RULEDelta rule is an important generalization of perceptron training algorithm .Perceptron training algorithm is generalized by introducing a term = ( T-A )

    T = Target Output.A = Actual Output

    If = 0 step 2a >0 step2b < 0 step2c

  • In any of these case ,the perceptron training algorithm is satisfied if is multiplied the value of each input xi and this product is added to the corresponding weight.a Learning Rate coeifficent is multiplied with xi product to allow the average size of weight changes.

  • = ( T-A ) i = xi wi(n+1) =wi(n)+ i Wherei = the correction associated with i th input xiwi(n+1) = the value of weight i after adjustmentwi(n) = the value of weight i before adjustment

  • Problems with Perceptron Training AlgorithmIt is difficult to determine whether the input sets are lineaerly seperable or not.In real world situation the inputs are often time varying and may be sepearable at one time and not at another.The number of steps required is not properly defined.There is no proof that the perceptron algorithms are faster than simply changing the values.

  • Module2Back propagation: Training Algorithm - Application - Network Configurations - Network Paralysis - Local Minima - Temporal instability.

  • The expansion of ANN was under eclipse due to lack of algorithms for training multilayer ANN.Back propagation is a systematic method of training multilayer ANN.The back propagation algorithms dramatically expanded the range of problem that can be solved using ANN.INTRODUCTION

  • BACK PROPAGATION Back propagation is a systematic method for training multilayer artificial neural networksOvercoming Linear separability LimitationLinear separability limitation of single layer Peceptron networks can be overcome by adding more layers.Multilayer networks can perform more general tasks. The multi layer perceptron ,trained by BACK PROPAGATION algorithm is the most widely used NN.

  • *Notations are incorrectThree Layer Neural NetworkFirst LayerSecond LayerThird Layer

  • Back Propagation Training algorithmNetwork Configuration

  • The generally used activation function for NN using Back Propagation Algorithm is Sigmoid Function

    Sigmoid Function Gives a nonlinear gain for Artificial Neuron.Sigmoid Function

  • Why Sigmoid Function is used in Back Propagation?

    BP requires a function differentiable everywhere.Sigmoid function has an additional advantage of providing a form of automatic gain control.Multilayer network will have more representational power with non linear functions.

    d (OUT)---------- =OUT(1-OUT)d (NET)

  • Multilayer Layer Back Propagation NetworkTarget 1Target 3Target 2

  • OBJECTIVE OF TRAININGTRAINING PAIRTRAINING SETTRAINING STEPSOVER VIEW OF TRAINING

  • The Steps RequiredSelect the Training Pair from the training set;apply the input to the network input.Calculate the output of the network.Calculate the error between the network output and the target.Adjust the weights of the network in a way to minimise the error.Repeat step 1 to 4 for each vector in trhe training set,until the error for the entire set is acceptably low.

  • Forward Pass

    Step 1 and Step 2 constitute forward passSignal propagate from input to output.

    NET = XW.OUT = F(XW) Reverse Pass Step 3 and step 4 constitute reverse pass.Weights in the OUTPUT LAYER is adjusted with the modified delta rule.Training is more complicated in the HIDDEN LAYERS, as their output have no target for comparison.

  • Adjusting Weights of the Output LayerTraining Process is as followsConsider the weight between neuron p in the hidden layer j to neuron q in the output layer. The OUTPUT of the neuron in the layer k is subtracted from the Target value to produce the error signal.This is multiplied by the derivative of the Function calculated for layer k. =OUT(1--OUT)(Target -OUT)

  • pWqp ,Training Ratewqp Wqp(n)Wqp(n+1)Adjusting Weights of the Output Layer Contd..

  • Then the is multiplied by the OUT from neuron j-the source neuron.This product is multiplied by the Learning Rate. () typically the learning rate is taken as value between 0.01-1.0.This result is added to the weight.An identical process is done for each weight proceeding from a neuron in the hidden layer to output layer.

  • The following Equation will illustrate this calculation..

  • Adjusting Weights of Hidden LayerHidden layer have no target vectors, so training process described above is not used for them.Back propagation trains the hidden layers by propagating the output error back through a the network layer by layer, adjusting weights at each layer.The same equation as in the previous case can be utilized here also.i.e..

  • How to generate for hidden layers?First is calculated for each neuron in the output layer.It is used to adjust the weights feeding into the output layer.Propagated back the above through the same weights to generate for each neuron in the first hidden layer.

  • Calculate the by summing up all the weighted .These s are used for adjusting weights of this hidden layer.Now these are propagated back to all the preceding layers in a similar way.

  • Derivation of Learning Rule for Back PropagationAssumptions and Notations.

    yk -The output of the kth neuron yk =f(yin,k )yin,k -The net input to the neuron k.ESquired error,E =0.5 (Target-OUT)2zjOutput from jth hidden layer.portion of error correction weight.jHidden LayerkOutput Layer

  • E =0.5 (Target-OUT)2.E =0.5 (tk-yk)2

  • Usually the activation function for the BP Network is either Binary Sigmoid function (range [0,1]) or Bipolar sigmoid function (range [-1,1]).Hence the above equation for becomes

  • For Hidden Layer

  • Now consider the first case Now consider the second case (Hidden Layer)

  • BiasAdjustments.(For Hidden and Output Layers)

    Example :Find the equation for Change in weight by back propagation algorithm, when the activation function used is Tan Sigmoid Function.

  • Example: For the network shown the initial weights and biases are chosen to be w1(0)=-1,b1(0)=1,w2(0)=-2,b2(0)=1.An input Target pair is given to be ((p=-1),(T=1)).Perform the Back propagation algorithm for one iteration.F=Tan-Sigmoid function

  • Example: For the neural network shown in Fig. With initial data determine the new weights after applying the sample(0,0) once. Assume the learning rate as 0.3 and the activation function for the hidden layer and the output layer as

  • Applications of Back propagation algorithmShort-term Load ForecastingImage ProcessingOnline Motor fault detectionPower system stability.

  • Network ParalysisDuring BP training the weights can become very large.This force all or most of the neuron to operate at large values of OUT.Derivative in this region is very small.The error sent back for training is also small. (Proportional to derivative of OUT)Hence the training process can come to a virtual stand still. (called Network Paralysis)It is commonly avoided by reducing the step size.

  • Local MinimaBack Propagation algorithm employs a type of Gradient descent method.The error surface of a complex network is highly convoluted, full of hills, valleys, folds etc.The network can be get trapped in a local minimum (shallow valley) when there is a much deeper minimum nearby. ( This problem denoted as Local Minima.)It is avoided by statistical training methods.Wasserman proposed a combined statistical and gradient descent method

  • Temporal InstabilityHuman brain has the ability of retaining the data ,while able to record new data.Conventional ANN have failed to solve this stability problem.Learning a new pattern may erase or modifies too often.In BPNN the new set of applied input may badly change the existing weights, hence complete retraining is required.In real world problems NN is exposed to a continuously changing environment.BPNN learn nothing because of continuous change in the input pattern., never arriving at satisfactory settings.