Download - Artificial Neural Networsks

8/8/2019 Artificial Neural Networsks

1/25

CHAPTER III

Artificial Neural Networks


2/25

Artificial Neural Networks

An Artificial Neural Network (ANN) is an information-processing paradigm that

is inspired by the way biological nervous systems, such as the brain, process information.

The key element of this paradigm is the novel structure of the information processing

system. It is composed of a large number of highly interconnected processing elements

(neurons) working in unison to solve specific problems.

What is an artificial neuron and how it can be constructed using human neurons?

An artificial neuron is the simple model of the basic generic neuron.

We conduct these neural networks by first trying to deduce the essential features

of neurons and their interconnections. We then typically program a computer to simulate

these features.

By the figure of the simple neuron shown below we can clearly understand what

an artificial neuron is.

Fig 3.1 A Simple NeuronThe brine is a highly complex, nonlinear, and parallel computer (Information

processing system). The definition of a neural network can be given as

A neural network is a massively parallel-distributed processor that has a natural

propensity for storing experiential knowledge and making it available for use. It

resembles the brain in two respects:1) Knowledge is acquired by the network through a learning process

2) Interneuron connection strengths known as synaptic weights are used to store

the knowledge.

44 CHAPTER 3. ARTIFICIAL NEURAL NETWORKS


3/25

3.1 Construction of Artificial Neural Networks

Neural networks are composed of simple elements operating in parallel. These

elements are inspired by biological nervous systems 34. As in nature, the network function

is determined largely by the connections between elements.

Fig 3.2:

N e u r a l N e t w o r k si n c l u d i n g c o n n e c t i o n

( c a l l e d w e i g h t s ) b e t w e e n n e u r o n s

C o m p

T a r g e

I n p u t

O u t p u t

A d j u s t W e i g h t s

In the above figure the inputs are given and the weight are given to the network

and the output is compared with the target if the target is not reached the weights are

adjusted and the process continues until the target is reached. The entire simulation

nowadays is conducted using computer softwares. Example: Trazan, MATLAB etc

3.2: Introduction to the Neural Network Toolbox in MATLAB Software

3.2.1 What is MATLAB?

MATLAB is a high-performance language for technical computing. It integrates

computation, visualization, and programming in an easy-to-use environment where

problems and solutions are expressed in familiar mathematical notation. Typical uses

include math and computation algorithm development, data acquisition modeling,

simulation, and prototyping data analysis, exploration, and visualization scientific and

engineering graphics Application development, including graphical user interface

3.1 Construction of Artificial Neural Networks 45


4/25

building. The name MATLAB stands for matrix laboratory . MATLAB software is now

available in Version 7.0.

3.2.2 Neural Network Toolbox:

Neural network toolbox is a simple and user-friendly environment in the

MATLAB software used for modeling neural networks.

3.3 A Simple Neuron construction in MATLAB software :

Fig 3.3 a = f (wp) a = f (wp+b)

p is the input of the neuron. a is the output of the neuron. w is the weight.

f is the transfer function. The neuron on the right has the scalar bias b.The output of the network depends on the bias b and the weights w given to

the network.

The transfer function shown above produces a scalar output using the weights and

biass provided in the network. In these transfer functions w and b are the adjustable

scalar parameters of the neuron. The central idea of neural networks is that such

parameters can be adjusted so that the network exhibits some desired or interesting

behavior. Thus, we can train the network to do a particular job by adjusting the weight or

bias parameters, or perhaps the network itself will adjust these parameters to achieve

some desired end.

3.4 Models of Neuron

3.2 Introduction to Neural Networks in MATLAB software 46

3.4 Model of Neuron 47


5/25


6/25

There are many transfer functions included in this toolbox. One of the transfer

functions is explained below

Fig: 3.5Hard-Limit Transfer Function: For n < 0 Response a = 0;

For n > = 0 Response a = +1;

These codes when typed in the MATLAB environment the results are shown above

n = -5:0.1:5; plot (n, hardlim(n), 'b+:');

3.6 Network architectures

The manner in which the neurons of a neural network are structured is intimately

linked with the learning algorithms used to train the network.

3.6.1 Single-layer feed-forward networks : A layered neural network is a network

of neurons organized in the form of layers. In this network, there is just an

input layer of source nodes that projects onto an output layer of neurons, but

not vice versa.

3.6.2 Multi-layer feed forward networks: The second class of a feed-forward

neural network distinguishes itself by the presence of one or more hidden

layers, whose neurons are correspondingly called hidden neurons. The ability

of hidden neurons to extract higher-order statistics is particularly valuable

when the size of the input layer is large.


7/25

3.6.3 Recurrent networks : A recurrent neural network distinguishes itself from a

feed-forward neural network in that it has at least one feedback loop. The

presence of a feedback loops has a profound impact on the learning capability

of the network and on its performance.

3.7 Network Learning Categories :

A learning rule is defined as a procedure for modifying the weights and biases of

a network. (This procedure can also be referred to as a training algorithm.) The learning

rule is applied to train the network to perform some particular task.

Learning rules in this toolbox fall into two broad categories:

3.7.1 Unsupervised learning.

3.7.2 Supervised learning.

3.7.1 Unsupervised learning : The weights and biases are modified in response to

network inputs only. There are no target outputs available. Most of these

algorithms perform clustering operations. They categorize the input patterns

Fig 3.6 (a) Feed forward network with single

layer of neuro ns

Input Layer 1 Layer 2 Output

(b) Feed-forward network with two hidden layer

and output layer

3.6 Network architecture 49


8/25

into a finite number of classes. This is especially useful in such applications as

vector quantization.

3.7.2 Supervised Learning : In supervised learning , the learning rule is provided

with a set of examples (the training set ) of proper network behavior

Where p Q is an input to the network, and tQ is the corresponding correct

(target ) output. As the inputs are applied to the network, the network outputs are

compared to the targets. The learning rule is then used to adjust the weights and

biases of the network in order to move the network outputs closer to the targets.The supervised learning algorithms include the least mean square (LMS)

algorithm and its generalization known as Backpropagation (BP) algorithm 25. The

name Backpropagation algorithm derives its name from the fact that the error

term in the algorithms are back propagated through the network on a layer-by-

layer basis.

3.8 Creating a neuron :

3.7 Network Learning Categories 50

3.8 Creating a neuron 51


9/25

The newlin function is used in the creation of a neuron in MATLAB software.

NEWLIN (PR, S, ID, LR) takes these arguments

PR - Rx2 matrix of min and max values for R input elements.

S - Number of elements in the output vector.

ID - Input delay vector, default = [0].

LR - Learning rate, default = 0.01;

and returns a new linear layer.

SIM Simulate a neural network

[Y,Pf,Af,E,perf] = SIM(net,P,Pi,Ai,T) takes,

net - Network.

P - Network inputs.Pi - Initial input delay conditions,

default = zeros.Ai - Initial layer delay conditions,

default = zeros.T - Network targets, default = zeros.

Note that arguments Pi, Ai, Pf, and Af are optional and need only be used for

networks that have input or layer delays.

3.9 Simple program to run Neural Networks in MATLAB Software :

Fig 3.7: Feed forward network with two inputs and one output

and returns:Y - Network outputs.Pf - Final input delay conditions.Af - Final layer delay conditions.E - Network errors.perf - Network performance.


10/25

The simplest situation for simulating a network occurs when the network to be

simulated is static (has no feedback or delays). Here two inputs are present and one

output.

To set up this feed forward network, the following commands

net = newlin([1 3;1 3],1); % newlin is the command used to construct neuron

For simplicity assign the weight matrix and bias to be

W = [1,2]; b = 0;

The commands for these assignments are

net.IW{1,1} = [1 2]; % IW = Input weights.

net.b{1} = 0; % b = Bias.

Concurrent vectors are presented to the network as a single matrix: thecommands are

P = [1 2 2 3; 2 1 3 1];

To simulate a network the command is

A = sim(net, P); % Sim command is used to simulate the network.

A single matrix of concurrent vectors is presented to the network and the network

produces a single matrix of concurrent vectors as output.

3.10 LINEAR CLASSIFICATION

Linear classification is the association of an input vector with a particular target

vector. Linear networks can be trained to perform linear classification with the function

train . This function applies each vector of a set of input vectors and calculates the

network weight and bias increments due to each of the inputs according to learnp . Then

the network is adjusted with the sum of all these corrections. Each pass through the input

vectors is called an epoch .

3.9 Neural network program 52


11/25

Finally, train applies the inputs to the new network, calculates the outputs,

compares them to the associated targets, and calculates a mean square error. If the error

goal is met, or if the maximum number of epochs is reached, the training is stopped, and

train returns the new network and a training record. Otherwise train goes through another

epoch .

There are four input vectors, four targets, and we like to produce a network that

gives the output corresponding to each input vector when that vector is presented.

Use train to get the weights and biases for a network that produces the correct

targets for each input vector. The initial weights and bias for the new network are 0 by

default. Set the error goal to 0.1 rather than accept its default of 0.

The problem runs, producing the following training record.

Thus, the performance goal is met in 64 epochs. The new weights and bias are

You can simulate the new network as shown below

3.10 Linear classification 53


12/25

3.11 BACK PROPAGATION ALGORITHMS

It is the method used to update the weights of the neural network. In this process,

input vectors and the corresponding target vectors are used to train a network until it can

approximate a function, associate input vectors with specific output vectors or classify

input vectors in an appropriate way as defined by us.

The network is created using the function newff. It requires four inputs and returns

the network object. The first input is an R-by-2 matrix of minimum and maximum values

for each of the R elements of the input vector. The second input is an array containing the

sizes of each layer. The third input is a cell array containing the names of the transfer

functions to be used in each layer. The final input contains the name of the training

function to be used.Eg: net = newff ([-1 2; 0 5],[3,1],{ 'tansig ',' purelin '},' trainlm ');

(tansig and purelin are the transfer functions)

(trainlm is the training function )

init is the function used to initialize weights. The function sim simulates a

network. sim takes the network input p and the network object net and returns the

network outputs a. The output window shown besides using all these three functions

newff , init, and sim.

3.11.1 Training: Once the network weights and biases are initialized, the network is

ready for training. The network can be trained for function approximation, pattern

association, or pattern classification. The training process requires a set of examples of

proper network behavior, network inputs p and target outputs t. During training the

weights and biases of the network are iteratively adjusted to minimize the network

function.

There are various training functions used in the back propagation algorithmswhere Levenberg-Marquardt ( trainlm ) and Bayesian Regulation Backpropagation

(trainbr ) were explained below.

3.11 Back propagation algorithms 54


13/25

3.11.1.1 Levenberg-Marquardt ( trainlm): The Levenberg-Marquardt algorithm was

designed to approach second-order training speed. trainlm is a network training function

that updates weight and bias values according to Levenberg-Marquardt optimization.

trainlm can train any network as long as its weight, net input, and transfer functions have

derivative functions 32.

3.11.1.2 Bayesian Regulation Backpropagation (trainbr ): trainbr is a network training

function that updates the weight and bias values according to Levenberg-Marquardt

optimization. It minimizes a combination of squared errors and weights, and then

determines the correct combination so as to produce a network, which generalizes well.

The process is called Bayesian regularization. This Bayesian regularization takes place

within the Levenberg-Marquardt algorithm. trainbr can train any network as long as its

weight, net input, and transfer functions have derivative functions. Bayesian

regularization minimizes a linear combination of squared errors and weights. It also

modifies the linear combination so that at the end of training the resulting network has

good generalization qualities.

3.11 Back propagation algorithms 55


14/25

3.12 IMPROVED GENERALIZATION

One of the problems that occur during neural network training is called

overfitting. The error on the training set is driven to a very small value, but when newdata is presented to the network the error is large. The network has memorized the

training examples, but it has not learned to generalize to new situations.

The following figure shows the response of a 1-20-1 neural network that has been

trained to approximate a noisy sine function. The underlying sine function is shown by

the dotted line, the noisy measurements are given by the + symbols, and the neural

network response is given by the solid line. Clearly this network has overfitted the data

and will not generalize well.

Fig: 3.8: Noisy sine function

There are two methods of improved generalization explained in the MATLAB

software where one is regularization that is modifying the performance function. It is

normally chosen to be the sum of squares of the network errors on the training set. It is

desirable to determine the optimal regularization parameters in an automated fashion.

One approach to this process is the Bayesian framework of David MacKay. In

this framework, the weights and biases of the network are assumed to be random

variables with specified distributions. The function used in the Bayesian function is

trainbr .

3.12 Improved generalization 56


15/25

3.12.1 Bayesian Regulation Backpropagation : trainbr 35,36 is a network training

function that updates the weight and bias values according to Levenberg-Marquardt

optimization. It minimizes a combination of squared errors and weights, and then

determines the correct combination so as to produce a network, which generalizes well.

The process is called Bayesian regularization.

trainbr(net,Pd,Tl,Ai,Q,TS,VV,TV)

takes these inputs,

net - Neural network, Pd - Delayed input vectors, Tl - Layer target vectors, Ai -

Initial input delay conditions, Q - Batch size, TS - Time steps, VV - Either empty

matrix [] or structure of validation vectors.

and returns,

net - Trained network, TR - Training record of various values over each epoch:

TR.epoch - Epoch number, TR.perf - Training performance, TR.vperf -

Validation performance. TR.tperf - Test performance, TR.mu - Adaptive mu value,

Bayesian regularization minimizes a linear combination of squared errors and

weights. It also modifies the linear combination so that at the end of training the resulting

network has good generalization qualities. This Bayesian regularization takes place

within the Levenberg-Marquardt algorithm.

Bayesian regularization has been implemented in the function trainbr . The

following code shows 3.12 (a) how you can train a 1-20-1 network using this function to

approximate the noisy sine wave shown on figure 3.11.



16/25

One feature of this algorithm is that it provides a measure of how many network

parameters (weights and biases) are being effectively used by the network. In this case,

the final trained network uses approximately 12 parameters out of the 61 total weights

and biases in the 1-20-1 network. This effective number of parameters should remain

approximately the same, no matter how large the number of parameters in the network

becomes. (This assumes that the network has been trained for a sufficient number of

iterations to ensure convergence.)

The trainbr algorithm generally works best when the network inputs and targets

are scaled so that they fall approximately in the range [-1,1]. The following figure shows

the response of the trained network. In contrast to the previous figure, in which a 1-20-1

network overfits the data, here you see that the network response is very close to the

underlying sine function (dotted line), and, therefore, the network will generalize well to

new inputs. You could have tried an even larger network, but the network response would

never overfit the data. This eliminates the guesswork required in determining the

optimum network size . When using trainbr, it is important to let the

algorithm run until the effective number of parameters has converged.

The training might stop with the message Maximum MU

reached. This is typical, and is a good indication that the algorithm



17/25

has truly converged. You can also tell that the algorithm has converged

if the sum squared error and sum squared weights are relatively

constant over several iterations. When this occurs you might want to

click the Stop Training button in the training window.

Fig 3.9: Response of the trained network using sine wave function

Table 3.1 List of the algorithms that are tested and the acronyms used to identifythem.



18/25

3.13 Preprocessing and Postprocessing

Neural network training can be made more efficient if you perform certain

preprocessing steps on the network inputs and targets.

3.13.1 Min and Max (mapminmax): Before training, it is often useful to scale theinputs and targets so that they always fall within a specified range. You can usethe function mapminmax to scale inputs and targets so that they fall in the range [-1,1]. The following code illustrates the use of this function.

The original network inputs and targets are given in the matrices p and t. The

normalized inputs and targets pn and tn that are returned will all fall in the interval [-1,1].

The structures ps and ts contain the settings, in this case the minimum and maximum

values of the original inputs and targets. After the network has been trained, the ps

settings should be used to transform any future inputs that are applied to the network.

They effectively become a part of the network, just like the network weights and biases.

If mapminmax is used to scale the targets, then the output of the network will be trained

to produce outputs in the range [-1,1]. To convert these outputs back into the same unitsthat were used for the original targets, use the settings ts. The following code simulates

the network that was trained in the previous code, and then converts the network output

back into the original units.

The network output an corresponds to the normalized targets tn. The

unnormalized network output a is in the same units as the original targets t.

3.13.2 Prepossessing data ( premnmx ): premnmx preprocesses the network training set by normalizing the inputs and targets so that they fall in the interval [-1,1].

p = [-10 -7.5 -5 -2.5 0 2.5 5 7.5 10];

[pn,minp,maxp] = premnmx(p,t);

pn =

-1.0000 -0.7500 -0.5000 -0.2500 0 0.2500 0.5000 0.7500 1.0000

3.13 Preprocessing and postprocessing 60


19/25

3.13.3 TRAMNMX : tramnmx code transform data using a precalculated min and max.

tramnmx transforms the network input set using minimum and maximum valuesthat were previously computed by premnmx . This function needs to be used when

a network has been trained using data normalized by premnmx . All subsequent

inputs to the network need to be transformed using the same normalization.

p = [-10 -7.5 -5 -2.5 0 2.5 5 7.5 10];

t = [0 7.07 -10 -7.07 0 7.07 10 7.07 0];

[pn,minp,maxp,tn,mint,maxt] = premnmx(p,t);

net = newff(minmax(pn),[5 1],{'tansig' 'purelin'},'trainlm');

net = train(net,pn,tn);

p2 = [4 -7];

[p2n] = tramnmx(p2,minp,maxp);

an = sim(net,pn);

p2n =

0.4000 -0.7000

3.13.4 Posttraining Analysis ( postreg ): The postreg function is used to perform the

regression analysis of the trained network. The figure shown is the regression

analysis of the above network.

m = 0.9819; b = 0.0002; r = 0.9905;

The network output and the corresponding targets are passed to postreg . It returns

three parameters. The first two, m and b, correspond to the slope and the y-intercept of

the best linear regression relating targets to network outputs. If there were a perfect fit

(outputs exactly equal to targets), the slope would be 1, and the y-intercept would be 0.



20/25

Fig: 3.10: Regression analysis of noisy sine wave function

The following figure illustrates the graphical output provided by postreg . The

network outputs are plotted versus the targets as open circles. The best linear fit is

indicated by a dashed line. The perfect fit (output equal to targets) is indicated by the

solid line. In this example, it is difficult to distinguish the best linear fit line from the

perfect fit line because the fit is so good.



21/25

3.14 Optimization using gblsolve function 37

This is a standalone version of glbSolve.m which is a part of the

optimization environment TOMLAB 38. The function gblsolve refers to the global

optimization routine function solves problems defined below.

This function solves the problem of the form; min f(x) and x_L 0 Small infoPriLev > 1 Each iteration info

OUTPUT PARAMETERSResult Structure with fields:x_k Matrix with all points fulfilling f(x)=min(f).f_k Smallest function value found.Iter Number of iterationsFuncEv Number of function evaluations.

GLOBAL.C Matrix with all rectangle centerpoints.GLOBAL.D Vector with distances from centerpoint to the vertices.GLOBAL.L Matrix with all rectangle side lengths in each dimension.GLOBAL.F Vector with function values.GLOBAL.d Row vector of all different distances, sorted.GLOBAL.d_min Row vector of minimum function value for each distance



22/25

TOMLAB developed by the Applied Optimization and Modeling group (TOM) at

Malardalen University, is an open MATLAB environment for research and teaching in

optimization. TOMLAB is based on NLPLIB TB, a toolbox for nonlinear programming

and parameter estimation and OPERA TB, a MATLAB toolbox for linear and discrete

optimization. Although TOMLAB includes more than 65 different optimization

algorithms, until recently there has been no routine included that handles global

optimization problems. Therefore the DIRECT algorithm focused our interest.

DIRECT is an algorithm developed by Donald R.Jones for finding the global

minimum of the multi-variate function subject to simple bounds, using no derivative

information. The algorithm is a modification of the standard Lipschitzian approach thateliminates the need to specify a Lipschitz constant. The idea is to carry out simultaneous

searches using all possible constants from zero to infinity. Lipschitz constant is viewed as

a weighting parameter that indicates how much emphasis to place on global versus local

search. In standard Lipschitzian methods, this constant is usually large because it must be

equal to or exceed the maximum rate of change of the objective function. As a result,

these methods place a high emphasis on global search, which leads to slow convergence.

In contrast, the DIRECT algorithm carriers out simultaneous searches using all possible

constants, and therefore operates on both the global and local level.

DIRECT deals with the problems on the form

Min f (x) xs.t. x L x xU

Where f R and x, xL, xU R n. It is guaranteed to converge to the global

optimal function value, if the objective function f is continuous or at least continuous inthe neighborhood of a global optimum. This could be guaranteed since, as the number of

iterations goes to infinity, the set of points sampled by DIRECT form a dense subset of

the unit hypercube. In other words, given any point x in the unit hypercube and any



23/25

>0, DIRECT will eventually sample a point (compute the objective function) within a

distance of x.

The first step in the DIRECT algorithm is to transform the search space to be the

unit hypercube. The function is then sampled the center-point of this cube. Computing

the function value the center-point instead of doing it the vertices is an advantage when

dealing with problems in higher dimensions. The hypercube is then divided into smaller

hyperrectangles whose center points are also sampled. Instead of using a Lipschitz

constant when determining the rectangles to the sample next, DIRECT identifies a set of

potentially optimal rectangles in each iteration. All potentially optimal rectangles are

further divided into smaller rectangles whose center-points are sampled. When noLipschitz constant is used, there is no natural way of defining convergence. Instead, the

procedure described above is performed of a predefined number of iterations. In our

implementation it is possible to restart the optimization with the final status of all

parameters form the previous run.

An Example of the use of gblSolve

1. Create a Mat m-file function for computing the objective function f.

function f = funct1(x);

f = ( x ( 2 ) 5*x ( 1 ) ^2 / (4 * pi ^2)+5*x ( 1 ) / pi-6) ^2+10 * (1-1/8 *pi) * cos ( x (1))+10;

2. Define the input arguments at the MATLAB prompt:fun = funct1

x_L= [-5 0];

x_U = [10 15];

GLOBAL.iterations = 20;

PriLev = 2;

3. Now, you can call gblSolve :Result = gblSolve(fun,x_L,x_U,GLOBAL,PriLev);



24/25


25/25

To use the restart option do:

Result = gblSolve(fun,x_L,x_U,GLOBAL,PriLev); % First run

GLOBAL = Result.GLOBAL;

GLOBAL.iterations = 30;

Result = gblSolve(fun,x_L,x_U,GLOBAL,PriLev); ; % Restart

If you want a scatter plot of all sampled points in the search space, do:

C = Result.GLOBAL.C;

Plot(C(1,:),C(2,:),. .);

Fig 3.11: Sampled points by gblsolve in the parameter space