8/8/2019 Artificial Neural Networsks
1/25
CHAPTER III
Artificial Neural Networks
8/8/2019 Artificial Neural Networsks
2/25
Artificial Neural Networks
An Artificial Neural Network (ANN) is an information-processing paradigm that
is inspired by the way biological nervous systems, such as the brain, process information.
The key element of this paradigm is the novel structure of the information processing
system. It is composed of a large number of highly interconnected processing elements
(neurons) working in unison to solve specific problems.
What is an artificial neuron and how it can be constructed using human neurons?
An artificial neuron is the simple model of the basic generic neuron.
We conduct these neural networks by first trying to deduce the essential features
of neurons and their interconnections. We then typically program a computer to simulate
these features.
By the figure of the simple neuron shown below we can clearly understand what
an artificial neuron is.
Fig 3.1 A Simple NeuronThe brine is a highly complex, nonlinear, and parallel computer (Information
processing system). The definition of a neural network can be given as
A neural network is a massively parallel-distributed processor that has a natural
propensity for storing experiential knowledge and making it available for use. It
resembles the brain in two respects:1) Knowledge is acquired by the network through a learning process
2) Interneuron connection strengths known as synaptic weights are used to store
the knowledge.
44 CHAPTER 3. ARTIFICIAL NEURAL NETWORKS
8/8/2019 Artificial Neural Networsks
3/25
3.1 Construction of Artificial Neural Networks
Neural networks are composed of simple elements operating in parallel. These
elements are inspired by biological nervous systems 34. As in nature, the network function
is determined largely by the connections between elements.
Fig 3.2:
N e u r a l N e t w o r k si n c l u d i n g c o n n e c t i o n
( c a l l e d w e i g h t s ) b e t w e e n n e u r o n s
C o m p
T a r g e
I n p u t
O u t p u t
A d j u s t W e i g h t s
In the above figure the inputs are given and the weight are given to the network
and the output is compared with the target if the target is not reached the weights are
adjusted and the process continues until the target is reached. The entire simulation
nowadays is conducted using computer softwares. Example: Trazan, MATLAB etc
3.2: Introduction to the Neural Network Toolbox in MATLAB Software
3.2.1 What is MATLAB?
MATLAB is a high-performance language for technical computing. It integrates
computation, visualization, and programming in an easy-to-use environment where
problems and solutions are expressed in familiar mathematical notation. Typical uses
include math and computation algorithm development, data acquisition modeling,
simulation, and prototyping data analysis, exploration, and visualization scientific and
engineering graphics Application development, including graphical user interface
3.1 Construction of Artificial Neural Networks 45
8/8/2019 Artificial Neural Networsks
4/25
building. The name MATLAB stands for matrix laboratory . MATLAB software is now
available in Version 7.0.
3.2.2 Neural Network Toolbox:
Neural network toolbox is a simple and user-friendly environment in the
MATLAB software used for modeling neural networks.
3.3 A Simple Neuron construction in MATLAB software :
Fig 3.3 a = f (wp) a = f (wp+b)
p is the input of the neuron. a is the output of the neuron. w is the weight.
f is the transfer function. The neuron on the right has the scalar bias b.The output of the network depends on the bias b and the weights w given to
the network.
The transfer function shown above produces a scalar output using the weights and
biass provided in the network. In these transfer functions w and b are the adjustable
scalar parameters of the neuron. The central idea of neural networks is that such
parameters can be adjusted so that the network exhibits some desired or interesting
behavior. Thus, we can train the network to do a particular job by adjusting the weight or
bias parameters, or perhaps the network itself will adjust these parameters to achieve
some desired end.
3.4 Models of Neuron
3.2 Introduction to Neural Networks in MATLAB software 46
3.4 Model of Neuron 47
8/8/2019 Artificial Neural Networsks
5/25
8/8/2019 Artificial Neural Networsks
6/25
There are many transfer functions included in this toolbox. One of the transfer
functions is explained below
Fig: 3.5Hard-Limit Transfer Function: For n < 0 Response a = 0;
For n > = 0 Response a = +1;
These codes when typed in the MATLAB environment the results are shown above
n = -5:0.1:5; plot (n, hardlim(n), 'b+:');
3.6 Network architectures
The manner in which the neurons of a neural network are structured is intimately
linked with the learning algorithms used to train the network.
3.6.1 Single-layer feed-forward networks : A layered neural network is a network
of neurons organized in the form of layers. In this network, there is just an
input layer of source nodes that projects onto an output layer of neurons, but
not vice versa.
3.6.2 Multi-layer feed forward networks: The second class of a feed-forward
neural network distinguishes itself by the presence of one or more hidden
layers, whose neurons are correspondingly called hidden neurons. The ability
of hidden neurons to extract higher-order statistics is particularly valuable
when the size of the input layer is large.
8/8/2019 Artificial Neural Networsks
7/25
3.6.3 Recurrent networks : A recurrent neural network distinguishes itself from a
feed-forward neural network in that it has at least one feedback loop. The
presence of a feedback loops has a profound impact on the learning capability
of the network and on its performance.
3.7 Network Learning Categories :
A learning rule is defined as a procedure for modifying the weights and biases of
a network. (This procedure can also be referred to as a training algorithm.) The learning
rule is applied to train the network to perform some particular task.
Learning rules in this toolbox fall into two broad categories:
3.7.1 Unsupervised learning.
3.7.2 Supervised learning.
3.7.1 Unsupervised learning : The weights and biases are modified in response to
network inputs only. There are no target outputs available. Most of these
algorithms perform clustering operations. They categorize the input patterns
Fig 3.6 (a) Feed forward network with single
layer of neuro ns
Input Layer 1 Layer 2 Output
(b) Feed-forward network with two hidden layer
and output layer
3.6 Network architecture 49
8/8/2019 Artificial Neural Networsks
8/25
into a finite number of classes. This is especially useful in such applications as
vector quantization.
3.7.2 Supervised Learning : In supervised learning , the learning rule is provided
with a set of examples (the training set ) of proper network behavior
Where p Q is an input to the network, and tQ is the corresponding correct
(target ) output. As the inputs are applied to the network, the network outputs are
compared to the targets. The learning rule is then used to adjust the weights and
biases of the network in order to move the network outputs closer to the targets.The supervised learning algorithms include the least mean square (LMS)
algorithm and its generalization known as Backpropagation (BP) algorithm 25. The
name Backpropagation algorithm derives its name from the fact that the error
term in the algorithms are back propagated through the network on a layer-by-
layer basis.
3.8 Creating a neuron :
3.7 Network Learning Categories 50
3.8 Creating a neuron 51
8/8/2019 Artificial Neural Networsks
9/25
The newlin function is used in the creation of a neuron in MATLAB software.
NEWLIN (PR, S, ID, LR) takes these arguments
PR - Rx2 matrix of min and max values for R input elements.
S - Number of elements in the output vector.
ID - Input delay vector, default = [0].
LR - Learning rate, default = 0.01;
and returns a new linear layer.
SIM Simulate a neural network
[Y,Pf,Af,E,perf] = SIM(net,P,Pi,Ai,T) takes,
net - Network.
P - Network inputs.Pi - Initial input delay conditions,
default = zeros.Ai - Initial layer delay conditions,
default = zeros.T - Network targets, default = zeros.
Note that arguments Pi, Ai, Pf, and Af are optional and need only be used for
networks that have input or layer delays.
3.9 Simple program to run Neural Networks in MATLAB Software :
Fig 3.7: Feed forward network with two inputs and one output
and returns:Y - Network outputs.Pf - Final input delay conditions.Af - Final layer delay conditions.E - Network errors.perf - Network performance.
8/8/2019 Artificial Neural Networsks
10/25
The simplest situation for simulating a network occurs when the network to be
simulated is static (has no feedback or delays). Here two inputs are present and one
output.
To set up this feed forward network, the following commands
net = newlin([1 3;1 3],1); % newlin is the command used to construct neuron
For simplicity assign the weight matrix and bias to be
W = [1,2]; b = 0;
The commands for these assignments are
net.IW{1,1} = [1 2]; % IW = Input weights.
net.b{1} = 0; % b = Bias.
Concurrent vectors are presented to the network as a single matrix: thecommands are
P = [1 2 2 3; 2 1 3 1];
To simulate a network the command is
A = sim(net, P); % Sim command is used to simulate the network.
A single matrix of concurrent vectors is presented to the network and the network
produces a single matrix of concurrent vectors as output.
3.10 LINEAR CLASSIFICATION
Linear classification is the association of an input vector with a particular target
vector. Linear networks can be trained to perform linear classification with the function
train . This function applies each vector of a set of input vectors and calculates the
network weight and bias increments due to each of the inputs according to learnp . Then
the network is adjusted with the sum of all these corrections. Each pass through the input
vectors is called an epoch .
3.9 Neural network program 52
8/8/2019 Artificial Neural Networsks
11/25
Finally, train applies the inputs to the new network, calculates the outputs,
compares them to the associated targets, and calculates a mean square error. If the error
goal is met, or if the maximum number of epochs is reached, the training is stopped, and
train returns the new network and a training record. Otherwise train goes through another
epoch .
There are four input vectors, four targets, and we like to produce a network that
gives the output corresponding to each input vector when that vector is presented.
Use train to get the weights and biases for a network that produces the correct
targets for each input vector. The initial weights and bias for the new network are 0 by
default. Set the error goal to 0.1 rather than accept its default of 0.
The problem runs, producing the following training record.
Thus, the performance goal is met in 64 epochs. The new weights and bias are
You can simulate the new network as shown below
3.10 Linear classification 53
8/8/2019 Artificial Neural Networsks
12/25
3.11 BACK PROPAGATION ALGORITHMS
It is the method used to update the weights of the neural network. In this process,
input vectors and the corresponding target vectors are used to train a network until it can
approximate a function, associate input vectors with specific output vectors or classify
input vectors in an appropriate way as defined by us.
The network is created using the function newff. It requires four inputs and returns
the network object. The first input is an R-by-2 matrix of minimum and maximum values
for each of the R elements of the input vector. The second input is an array containing the
sizes of each layer. The third input is a cell array containing the names of the transfer
functions to be used in each layer. The final input contains the name of the training
function to be used.Eg: net = newff ([-1 2; 0 5],[3,1],{ 'tansig ',' purelin '},' trainlm ');
(tansig and purelin are the transfer functions)
(trainlm is the training function )
init is the function used to initialize weights. The function sim simulates a
network. sim takes the network input p and the network object net and returns the
network outputs a. The output window shown besides using all these three functions
newff , init, and sim.
3.11.1 Training: Once the network weights and biases are initialized, the network is
ready for training. The network can be trained for function approximation, pattern
association, or pattern classification. The training process requires a set of examples of
proper network behavior, network inputs p and target outputs t. During training the
weights and biases of the network are iteratively adjusted to minimize the network
function.
There are various training functions used in the back propagation algorithmswhere Levenberg-Marquardt ( trainlm ) and Bayesian Regulation Backpropagation
(trainbr ) were explained below.
3.11 Back propagation algorithms 54
8/8/2019 Artificial Neural Networsks
13/25
3.11.1.1 Levenberg-Marquardt ( trainlm): The Levenberg-Marquardt algorithm was
designed to approach second-order training speed. trainlm is a network training function
that updates weight and bias values according to Levenberg-Marquardt optimization.
trainlm can train any network as long as its weight, net input, and transfer functions have
derivative functions 32.
3.11.1.2 Bayesian Regulation Backpropagation (trainbr ): trainbr is a network training
function that updates the weight and bias values according to Levenberg-Marquardt
optimization. It minimizes a combination of squared errors and weights, and then
determines the correct combination so as to produce a network, which generalizes well.
The process is called Bayesian regularization. This Bayesian regularization takes place
within the Levenberg-Marquardt algorithm. trainbr can train any network as long as its
weight, net input, and transfer functions have derivative functions. Bayesian
regularization minimizes a linear combination of squared errors and weights. It also
modifies the linear combination so that at the end of training the resulting network has
good generalization qualities.
3.11 Back propagation algorithms 55
8/8/2019 Artificial Neural Networsks
14/25
3.12 IMPROVED GENERALIZATION
One of the problems that occur during neural network training is called
overfitting. The error on the training set is driven to a very small value, but when newdata is presented to the network the error is large. The network has memorized the
training examples, but it has not learned to generalize to new situations.
The following figure shows the response of a 1-20-1 neural network that has been
trained to approximate a noisy sine function. The underlying sine function is shown by
the dotted line, the noisy measurements are given by the + symbols, and the neural
network response is given by the solid line. Clearly this network has overfitted the data
and will not generalize well.
Fig: 3.8: Noisy sine function
There are two methods of improved generalization explained in the MATLAB
software where one is regularization that is modifying the performance function. It is
normally chosen to be the sum of squares of the network errors on the training set. It is
desirable to determine the optimal regularization parameters in an automated fashion.
One approach to this process is the Bayesian framework of David MacKay. In
this framework, the weights and biases of the network are assumed to be random
variables with specified distributions. The function used in the Bayesian function is
trainbr .
3.12 Improved generalization 56
8/8/2019 Artificial Neural Networsks
15/25
3.12.1 Bayesian Regulation Backpropagation : trainbr 35,36 is a network training
function that updates the weight and bias values according to Levenberg-Marquardt
optimization. It minimizes a combination of squared errors and weights, and then
determines the correct combination so as to produce a network, which generalizes well.
The process is called Bayesian regularization.
trainbr(net,Pd,Tl,Ai,Q,TS,VV,TV)
takes these inputs,
net - Neural network, Pd - Delayed input vectors, Tl - Layer target vectors, Ai -
Initial input delay conditions, Q - Batch size, TS - Time steps, VV - Either empty
matrix [] or structure of validation vectors.
and returns,
net - Trained network, TR - Training record of various values over each epoch:
TR.epoch - Epoch number, TR.perf - Training performance, TR.vperf -
Validation performance. TR.tperf - Test performance, TR.mu - Adaptive mu value,
Bayesian regularization minimizes a linear combination of squared errors and
weights. It also modifies the linear combination so that at the end of training the resulting
network has good generalization qualities. This Bayesian regularization takes place
within the Levenberg-Marquardt algorithm.
Bayesian regularization has been implemented in the function trainbr . The
following code shows 3.12 (a) how you can train a 1-20-1 network using this function to
approximate the noisy sine wave shown on figure 3.11.
3.12 Improved generalization 57
8/8/2019 Artificial Neural Networsks
16/25
One feature of this algorithm is that it provides a measure of how many network
parameters (weights and biases) are being effectively used by the network. In this case,
the final trained network uses approximately 12 parameters out of the 61 total weights
and biases in the 1-20-1 network. This effective number of parameters should remain
approximately the same, no matter how large the number of parameters in the network
becomes. (This assumes that the network has been trained for a sufficient number of
iterations to ensure convergence.)
The trainbr algorithm generally works best when the network inputs and targets
are scaled so that they fall approximately in the range [-1,1]. The following figure shows
the response of the trained network. In contrast to the previous figure, in which a 1-20-1
network overfits the data, here you see that the network response is very close to the
underlying sine function (dotted line), and, therefore, the network will generalize well to
new inputs. You could have tried an even larger network, but the network response would
never overfit the data. This eliminates the guesswork required in determining the
optimum network size . When using trainbr, it is important to let the
algorithm run until the effective number of parameters has converged.
The training might stop with the message Maximum MU
reached. This is typical, and is a good indication that the algorithm
3.12 Improved generalization 58
8/8/2019 Artificial Neural Networsks
17/25
has truly converged. You can also tell that the algorithm has converged
if the sum squared error and sum squared weights are relatively
constant over several iterations. When this occurs you might want to
click the Stop Training button in the training window.
Fig 3.9: Response of the trained network using sine wave function
Table 3.1 List of the algorithms that are tested and the acronyms used to identifythem.
3.12 Improved generalization 59
8/8/2019 Artificial Neural Networsks
18/25
3.13 Preprocessing and Postprocessing
Neural network training can be made more efficient if you perform certain
preprocessing steps on the network inputs and targets.
3.13.1 Min and Max (mapminmax): Before training, it is often useful to scale theinputs and targets so that they always fall within a specified range. You can usethe function mapminmax to scale inputs and targets so that they fall in the range [-1,1]. The following code illustrates the use of this function.
The original network inputs and targets are given in the matrices p and t. The
normalized inputs and targets pn and tn that are returned will all fall in the interval [-1,1].
The structures ps and ts contain the settings, in this case the minimum and maximum
values of the original inputs and targets. After the network has been trained, the ps
settings should be used to transform any future inputs that are applied to the network.
They effectively become a part of the network, just like the network weights and biases.
If mapminmax is used to scale the targets, then the output of the network will be trained
to produce outputs in the range [-1,1]. To convert these outputs back into the same unitsthat were used for the original targets, use the settings ts. The following code simulates
the network that was trained in the previous code, and then converts the network output
back into the original units.
The network output an corresponds to the normalized targets tn. The
unnormalized network output a is in the same units as the original targets t.
3.13.2 Prepossessing data ( premnmx ): premnmx preprocesses the network training set by normalizing the inputs and targets so that they fall in the interval [-1,1].
p = [-10 -7.5 -5 -2.5 0 2.5 5 7.5 10];
[pn,minp,maxp] = premnmx(p,t);
pn =
-1.0000 -0.7500 -0.5000 -0.2500 0 0.2500 0.5000 0.7500 1.0000
3.13 Preprocessing and postprocessing 60
8/8/2019 Artificial Neural Networsks
19/25
3.13.3 TRAMNMX : tramnmx code transform data using a precalculated min and max.
tramnmx transforms the network input set using minimum and maximum valuesthat were previously computed by premnmx . This function needs to be used when
a network has been trained using data normalized by premnmx . All subsequent
inputs to the network need to be transformed using the same normalization.
p = [-10 -7.5 -5 -2.5 0 2.5 5 7.5 10];
t = [0 7.07 -10 -7.07 0 7.07 10 7.07 0];
[pn,minp,maxp,tn,mint,maxt] = premnmx(p,t);
net = newff(minmax(pn),[5 1],{'tansig' 'purelin'},'trainlm');
net = train(net,pn,tn);
p2 = [4 -7];
[p2n] = tramnmx(p2,minp,maxp);
an = sim(net,pn);
p2n =
0.4000 -0.7000
3.13.4 Posttraining Analysis ( postreg ): The postreg function is used to perform the
regression analysis of the trained network. The figure shown is the regression
analysis of the above network.
m = 0.9819; b = 0.0002; r = 0.9905;
The network output and the corresponding targets are passed to postreg . It returns
three parameters. The first two, m and b, correspond to the slope and the y-intercept of
the best linear regression relating targets to network outputs. If there were a perfect fit
(outputs exactly equal to targets), the slope would be 1, and the y-intercept would be 0.
3.13 Preprocessing and postprocessing 61
8/8/2019 Artificial Neural Networsks
20/25
Fig: 3.10: Regression analysis of noisy sine wave function
The following figure illustrates the graphical output provided by postreg . The
network outputs are plotted versus the targets as open circles. The best linear fit is
indicated by a dashed line. The perfect fit (output equal to targets) is indicated by the
solid line. In this example, it is difficult to distinguish the best linear fit line from the
perfect fit line because the fit is so good.
3.13 Preprocessing and postprocessing 62
8/8/2019 Artificial Neural Networsks
21/25
3.14 Optimization using gblsolve function 37
This is a standalone version of glbSolve.m which is a part of the
optimization environment TOMLAB 38. The function gblsolve refers to the global
optimization routine function solves problems defined below.
This function solves the problem of the form; min f(x) and x_L 0 Small infoPriLev > 1 Each iteration info
OUTPUT PARAMETERSResult Structure with fields:x_k Matrix with all points fulfilling f(x)=min(f).f_k Smallest function value found.Iter Number of iterationsFuncEv Number of function evaluations.
GLOBAL.C Matrix with all rectangle centerpoints.GLOBAL.D Vector with distances from centerpoint to the vertices.GLOBAL.L Matrix with all rectangle side lengths in each dimension.GLOBAL.F Vector with function values.GLOBAL.d Row vector of all different distances, sorted.GLOBAL.d_min Row vector of minimum function value for each distance
3.14 Optimization using gblsolve function 63
8/8/2019 Artificial Neural Networsks
22/25
TOMLAB developed by the Applied Optimization and Modeling group (TOM) at
Malardalen University, is an open MATLAB environment for research and teaching in
optimization. TOMLAB is based on NLPLIB TB, a toolbox for nonlinear programming
and parameter estimation and OPERA TB, a MATLAB toolbox for linear and discrete
optimization. Although TOMLAB includes more than 65 different optimization
algorithms, until recently there has been no routine included that handles global
optimization problems. Therefore the DIRECT algorithm focused our interest.
DIRECT is an algorithm developed by Donald R.Jones for finding the global
minimum of the multi-variate function subject to simple bounds, using no derivative
information. The algorithm is a modification of the standard Lipschitzian approach thateliminates the need to specify a Lipschitz constant. The idea is to carry out simultaneous
searches using all possible constants from zero to infinity. Lipschitz constant is viewed as
a weighting parameter that indicates how much emphasis to place on global versus local
search. In standard Lipschitzian methods, this constant is usually large because it must be
equal to or exceed the maximum rate of change of the objective function. As a result,
these methods place a high emphasis on global search, which leads to slow convergence.
In contrast, the DIRECT algorithm carriers out simultaneous searches using all possible
constants, and therefore operates on both the global and local level.
DIRECT deals with the problems on the form
Min f (x) xs.t. x L x xU
Where f R and x, xL, xU R n. It is guaranteed to converge to the global
optimal function value, if the objective function f is continuous or at least continuous inthe neighborhood of a global optimum. This could be guaranteed since, as the number of
iterations goes to infinity, the set of points sampled by DIRECT form a dense subset of
the unit hypercube. In other words, given any point x in the unit hypercube and any
3.14 Optimization using gblsolve function 64
8/8/2019 Artificial Neural Networsks
23/25
>0, DIRECT will eventually sample a point (compute the objective function) within a
distance of x.
The first step in the DIRECT algorithm is to transform the search space to be the
unit hypercube. The function is then sampled the center-point of this cube. Computing
the function value the center-point instead of doing it the vertices is an advantage when
dealing with problems in higher dimensions. The hypercube is then divided into smaller
hyperrectangles whose center points are also sampled. Instead of using a Lipschitz
constant when determining the rectangles to the sample next, DIRECT identifies a set of
potentially optimal rectangles in each iteration. All potentially optimal rectangles are
further divided into smaller rectangles whose center-points are sampled. When noLipschitz constant is used, there is no natural way of defining convergence. Instead, the
procedure described above is performed of a predefined number of iterations. In our
implementation it is possible to restart the optimization with the final status of all
parameters form the previous run.
An Example of the use of gblSolve
1. Create a Mat m-file function for computing the objective function f.
function f = funct1(x);
f = ( x ( 2 ) 5*x ( 1 ) ^2 / (4 * pi ^2)+5*x ( 1 ) / pi-6) ^2+10 * (1-1/8 *pi) * cos ( x (1))+10;
2. Define the input arguments at the MATLAB prompt:fun = funct1
x_L= [-5 0];
x_U = [10 15];
GLOBAL.iterations = 20;
PriLev = 2;
3. Now, you can call gblSolve :Result = gblSolve(fun,x_L,x_U,GLOBAL,PriLev);
3.14 Optimization using gblsolve function 65
8/8/2019 Artificial Neural Networsks
24/25
8/8/2019 Artificial Neural Networsks
25/25
To use the restart option do:
Result = gblSolve(fun,x_L,x_U,GLOBAL,PriLev); % First run
GLOBAL = Result.GLOBAL;
GLOBAL.iterations = 30;
Result = gblSolve(fun,x_L,x_U,GLOBAL,PriLev); ; % Restart
If you want a scatter plot of all sampled points in the search space, do:
C = Result.GLOBAL.C;
Plot(C(1,:),C(2,:),. .);
Fig 3.11: Sampled points by gblsolve in the parameter space
3.14 Optimization using gblsolve function 67
Top Related