CHAPTER 5 ARTIFICIAL NEURAL NETWORKS (ANN) AND...
Transcript of CHAPTER 5 ARTIFICIAL NEURAL NETWORKS (ANN) AND...
46
CHAPTER 5
ARTIFICIAL NEURAL NETWORKS (ANN) AND
SUPPORT VECTOR MACHINES (SVM)
5.1 AN OVERVIEW OF ANN
Artificial neural network (ANN) is a form of computing inspired by
the functioning of the brain and nervous system. ANN approach is based on
the highly interconnected structure of the brain cells (Margrave et al 1999).
They are based on present understanding of biological nervous systems,
though much of biological detail is neglected (Freeman and Skapura 1991).
Neural networks represent highly idealized mathematical models of our
understanding of such complex systems (Jordan et al 1998). It is an
information processing paradigm that is inspired by the way biological
nervous systems, such as brain, process information. The key element of this
paradigm is the novel structure of the information processing system. It is
composed of a large number of highly interconnected processing elements
(Neurons) working in union to solve specific problems. An ANN is
configured for a specific application, such as pattern recognition (or) data
classification, through a learning process (Roy et al 1995, Jordan et al 1998,
Silverman and Noetzel 1990, Wendel and Dual 1996). Learning in biological
system involves adjustments to the synoptic connections that exist between
the neurons. This is true of ANNs as well.
ANN are mathematical models whose purpose is to simulate the
human brain in a simple and objective way. And so, a model should have the
47
fundamental capacity of a brain – learning capacity, which permits carrying
out tasks that are considered typical of the human brain, such as patterns
recognition, creation of associations, systems identification and clustering etc
(Veiga et al 2005). Although they are less complex than the human brain, the
neural networks can process enormous amounts of data in a short period of
time that typically could only be analyzed by one specialist. One of the most
important characteristics of the artificial neural network is the ability to be
trained or learn by example, exactly like the human brain (Haykin 1994).
ANNs should be brought under the category of parametric models
that are generally lumped (Masnata and Sunseri 1995). Application of ANN
does not require complete details about the catchment’s characteristics,
because ANN is a black box approach. The ANN technique can be applied to
data which may be incomplete, noisy, and ambiguous. They are ideally suited
to dynamic problems and are stingy in storing information. ANNs are simple
and easy to adopt comparing to other kinds of models (Berry et al 1991, Kim
et al 2001and Spanner et al 2000).
Artificial neural networks have been developed as generalizations
of mathematical models of human cognition or neural biology, based on the
assumptions that:
Information processing occurs at many simple elements calledneurons.
Signals are passed between neurons over connection links.
Each connection link has an associated weight, which, in atypical neural net, multiplies the signal transmitted.
Each neuron applies an activation function (usually non linear)to its net input (sum of weighted input signals) to determineits output signal.
48
A neural network is characterized by
Its pattern of connections between the neurons (called its
architecture)
Its method of determining the weights on the connections
(called its training or learning algorithms)
Its activation function.
Commonly neural networks are adjusted or trained, so that a
particular input leads to a specific target output. Such a situation is shown in
Figure 5.1. There, the network is adjusted, based on a comparison of the
output and the target, until the network output matches the target (Wasserman
1989). Typically many such input/target pairs are used, in this supervised
learning, to train a network.
Figure 5.1 Basic function of ANN
A typical ANN consists of large number of neurons, units, cells (or)
nodes that are organized according to a particular arrangement. Each neuron
is connected to other neuron by means of directed communication links, each
with an associated weight. The weights represent information being used by
the net to solve the problem.
49
Each neuron has an internal state, called its activation (or) activity
level, which is a function of the inputs it has received. Typically a neuron
sends its activation as a signal to several other neurons. It is important to note
that a neuron can send only one signal at a time, although that signal is
broadcast to several other neurons.
5.2 OPTIMAL NETWORK ARCHITECTURE
Neural networks operate on the principles of learning from a
training set. When applying ANN to any classification problems a thorough
knowledge is essential regarding choosing of an appropriate network type,
appropriate training algorithm, selection of suitable values for the parameters
like initial weight, learning rate, momentum rate, appropriate network
structure, training periods and the method of pre and post processing of input
and output data (Baum and Haussler 1989). It must be noted that the whole
exercise is based only on trial and error approach. There exists a variety of
neural network models and learning procedures. The Figure 5.2 shows the
architecture of neural system.
Figure 5.2 Architecture of neural system
Input patternsstored in a file
Neural Networkparameters stored
in a file
Normalizedtarget patternsstored in a file
Training the NeuralNetwork
EndUser
Production Mode
Final weightvalues, aftertraining isstored in a
file
Final Result
50
The determination of optimal network architecture as a part of thelearning strategy was proposed by Houghton and Shen (1990). Determinationof the appropriate neural network architecture is one of the most difficulttasks in the model building process (Tho et al 2004). The various types ofnetwork architecture available are feed forward network, Jordan-elman nets,ward nets, jump connection nets, unsupervised kohonen, probabilistic, generalregression net, GMDH (Polynomial set), Recurrent Neural Network (RNN)and Radial Basis Function (RBF).
The process of selecting a suitable architecture for a requiredproblem can be broadly classified into three steps (Carling and Alison 1995).
1. Fixing the architecture
2. Training the network
3. Testing the network
The following procedure is used for determination of optimalnetwork architecture by a trial and error. First the input / output parameters,training size, and learning algorithm are decided and a network is chosen witha trial number of nodes in the hidden layer. Hidden nodes perform a two foldfunction; first compute a signal from all incoming information, and secondthey transform this signal using a non-linear activation function (Roy et al1995). The network is trained for a fixed number of epochs. The networkgradient is observed over these epochs. Then, the network architecture thatresulted in the maximum is changed by increasing (or) decreasing the numberof hidden nodes. The training procedure is repeated for the new architecture.This procedure is continued for several different architectures. Eventually, thenetwork architecture that resulted in the maximum error gradient over thetraining epochs is adopted as the optimal architecture (Gallent 1993).
Finding suitable network architecture can be a very time consuming
exercise. If the architecture is too small, the network may not have sufficient
51
degrees of freedom to learn the process correctly. On the other hand, if the
network is too large it may not coverage during training or may over fit the
data (Thavasimuthu et al 1996). Another way of determining ‘optimum’
number of neurons in the hidden layer(s) is to add links and hidden nodes to a
simple network until convergence occurs.
A different approach was attempted by Ogilvy (1993) by
progressively adding or removing nodes until an optimum structure is
attained. The number of hidden layer neurons required is much more difficult
to determine since no general methodology is available for its determination.
In general, the most popular way of determining the appropriate number of
neurons in the hidden layer is by trial and error approach (Brown and
DeNale1991, Katragadda et al 1997 and McBride et al 2004).
Regardless of the approach used to optimize the number of neurons
in the hidden layer care needs to be taken since too many neurons will
increase training times unnecessarily by making it more difficult to estimate
suitable set of interconnection weights while too few neurons can cause
difficulties in mapping input to output in the training set. Multi Layer
Perceptron is one of the most fundamental and proper type of ANN
architecture for practical applications of model identification. It is reported
from literature that other kinds of ANN architectures like Radial Basis
Function, Recurrent Neural Networks do not provide any major advantage
over MLP architecture. Both the accuracy of classification and a networkslearning ability can be severely affected if the architecture is not suitable.
Two well known classes of neural networks that can be used for
classification applications are feed forward networks and probabilistic
networks. Of the two, feed forward have found to have maximum application
and has thus been adopted in this study.
52
5.3 FEED FORWARD NETWORKS
In a feed forward network the weighted connections feed
activations only in the forward direction from the input layer to the output
layer. The input neurons receive and process the input signals and send the
output to other neurons in the network where this process is continued
(Margrave et al 1999). This type of network where information passes one
way through the network is known as a feed forward network. A three layered
feed forward ANN also known as Multi Layer Perceptron (MLP) along with a
typical processing element, an activation function, and a threshold function
embedded to its body is shown in Figure 5.3.
Figure 5.3 Three layer feed forward ANN along with processing element
53
The data passing through the connections from one neuron to
another are manipulated by weights that control the strength of a passing
signal. When these weights are modified, the data transferred through the
network change and the network output alters. In the feed forward network,
the nodes are generally arranged in layers, starting from the first input layer
and ending at the final output layer. There can be several hidden layers, with
each layer having one or more nodes. Each neuron consists of one or more
number of inputs and number of outputs. The output is computed according to
the weighted sum of all its inputs and a selected activation function. The
various types of activation functions available are linear, logistic, hyperbolic
tangent, Gaussian. In most of the studies the logistic sigmoidal function or
hyperbolic tangent functions are adopted. The basic characteristics of the
sigmoid functions are that it is continuous, differentiable everywhere and is
monotonically increasing. The number of neurons in the input, hidden and
output layer is specified by the user dealing with the problem to which the
network is applied (Moura et al 2001).
The number of input variables determine the number of input
neurons while the number of output variables determines the number of
output neurons. Excessive number of neurons may become a hindrance to the
training process by way of delaying it. Information passes from the input to
the output side. The nodes in one layer are connected to those in the next, but
not to those in the same layer. Thus, the output of a node in a layer is only
dependent on the inputs it receives from the previous layer and the
corresponding weights. The multilayer feed forward networks have been
found to have the best performance with regard to input output function
approximation. Song et al (2002) have mentioned that three – layer feed
forward ANNs can be used to model real world functional relationships that
may be of unknown (or) poorly defined form and complexity. The feed
forward network is capable of nonlinear pattern recognition and memory
54
association (Bishop 1995). One of the most important types of feed forward
network is the Back Propagation Network.
5.3.1 Back Propagation Networks (BPN)
Back propagation is a systematic method for training multi-layer
artificial neural networks. It has a mathematical foundation that is strong if it
is not highly practical. It is a multi-layer feed forward network using extend
gradient- descent based delta-learning rule, commonly known as back
propagation (of errors) rule. Back propagation provides a commonly efficient
method for changing the weights in a feed forward network, with
differentiable activation function units, to learn a training set of input-output
examples. Being a gradient descent method it minimizes the total squared
error of the output computed by the net. The network is trained by supervised
learning method. The aim of this network is to train the net to achieve the
balance between the ability to respond correctly to the input patterns that are
used for training and the ability to provide good responses to the input that are
similar.
Back propagation networks (BPN) are multi-layer networks with
the hidden layers of sigmoid transfer function. The transfer function in the
hidden layers should be differentiable and thus, either log-sigmoid or tan-
sigmoid functions are typically used. In this study, the tan-sigmoid transfer
function, ‘tansig’ is used for both the hidden layers and the output layer. They
calculate a layer’s output from its net input. Each hidden layer and output
layer is made of artificial neurons, which are connected through adaptive
weights. The training function selected for the network is ‘trainlm’.
This type of neural network is trained using a process of supervised
learning in which the network is presented with a series of matched input and
output patterns and the connection strengths or weights of the connections
55
automatically adjusted to decrease the difference between the actual and
desired outputs. The schematic diagram of feed forward back propagation
network structure is shown in Figure 5.4. After the training phase, the testing
data set is presented to the trained model, to see how well the network has
learnt and how well the network has performed.
Figure 5.4 Schematic diagram of BPN structure
There are generally four steps to perform the classification of data
1. Assemble the training data
2. Create the network object
3. Train the network
4. Simulate the network response to new inputs
Before any data has been run through the network, the weights for
the nodes are random, which has the effect of making the network much like a
newborn's brain – developed but without knowledge. When presented with an
input pattern, each input node takes the value of the corresponding attribute in
56
the input pattern. These values are then ``fired'', at which time each node in
the hidden layer multiplies each attribute value by a weight and adds them
together. If this is above the node's threshold value, it fires a value of '1';
otherwise it fires a value of '0'. The same process is repeated in the output
layer with the values from the hidden layer, and if the threshold value is
exceeded, the input pattern is given the classification. When training the
network, once a classification has been given, it is compared to the actual
classification.
This is then ``back propagated'' through the network, which causes
the hidden and output layer nodes to adjust their weights in response to any
error in classification, if it occurs. The modification of the weights is done
according to the gradient of the error curve, which points in the direction to
the local minimum near the instance. Unfortunately, the local minimum is not
always the global minimum, which causes the network to settle in a non-
optimal configuration. The network can sometimes be deterred from settling
in local minima by increasing or decreasing the number of hidden layer nodes
or even by rerunning the algorithm (this is because the weights will be
reinitialized to a different set of random numbers, which may keep them from
falling into a local minimum that is not the global minimum).
Standard back propagation is a gradient descent algorithm, as is the
Widrow-Hoff learning rule, in which the network weights are moved along
the negative of the gradient of the performance function. The term back
propagation refers to the manner in which the gradient is computed for
nonlinear multilayer networks. There are a number of variations on the basic
algorithm that are based on other standard optimization techniques, such as
conjugate gradient and Newton methods.
57
5.4 TRAINING OF ANN
Connection weights of the network are learned through a process
called ‘training’ in which large number input-output pattern pairs are
presented to the network in a repetitive fashion designed to provide iterative
corrections to the weights. Each iteration (‘epoch’) is a single pass through all
training pattern pairs. In most of the studies the MLP is trained using the
error back propagation algorithm (Wendel and Dual 1996). The final weight
vector of a successful trained neural network represents its knowledge about
the problem. In general, it is assumed that the network does not have any prior
knowledge about the problem before it is trained. So, at the beginning of
training the network weights are initialized with a set of random values
(Selvakumar et al 2004). Learning in neural networks involves adjusting the
weights of interconnections.
The most commonly used training algorithm for feed forward
networks is the back propagation algorithm (Veiga et al 2005). The back
propagation algorithm is a gradient descent method in which weights of the
connections are updated using partial derivations of error with respect to
weights. However, the standard back propagation algorithm can train only on
a network of predetermined size. In the BP algorithm, the weight associated
with the neuron is adjusted by an amount proportional to the strength of the
signal in the connection and the total measure of the error. The total error at
the output layer is then reduced by redistributing this error backward through
the hidden layers until the input layer is reached. The process continues for
the number of prescribed sweeps or until the prescribed error tolerance is
reached (Moura et al 2001).
58
Some major limitations of BP algorithm are
They are easily trapped by local optima
The convergence is a slow process
The architecture is often ineffective when searching weight
spaces of high dimensions.
Performance of a BP-ANN simulator is quite sensitive to the
initial starting point.
During training, the network is trained to associate outputs with
input patterns. When the network is trained, it identifies the input pattern and
tries to output the associated output pattern. The power of neural networks
comes to life when a pattern that has no output associated with it, is given as
an input. In this case, the network gives the output that corresponds to a
taught input pattern that is least different from the given pattern. Training
process of a typical ANN is shown in Figure 5.5.
Figure 5.5 Training process of a typical ANN
The input layer is thus transparent and is a means of providing
information to the networks. The last (or) output layer consists of values
predicted by then network and thus represents the model output. The process
59
consists of presenting an input pattern to the network, making predictions as
to the output and then comparing this predicted output to the input pattern’s
actual output. Excessive number of input variables to ANN increase training
time and decrease performance (Ogaji and Singh 2003). The goal of the
training process is to present a sufficient number P of unique input-output
pattern pairs, which when coupled with a suitable methodology for iterative
correction of the interconnection weights, produces a final set of weights that
minimizes the global error.
The number of hidden layers and the number of nodes in each
hidden layer are usually determined by a trial–and–error procedure. The
nodes within the neighbouring layers of the network are fully connected by
links. A synoptic weight is assigned to each link to represent the relative
connection strength of two nodes at both ends in predicting the input-output
relationship. Figure 5.6 represents the structure of a neuron.
Figure 5.6 Neuron
5.5 DATA PREPARATION FOR ANN
One of the major strengths of neural networks, is their ability to
deal with incomplete noisy and non stationary data (Oleg Karpash et al 2006).
However, with appropriate data preparation in advance, it is quite possible to
60
improve the performance of neural network still further (Carling and Alison
1995). The various steps involved in data preparation are selection of suitable
inputs and outputs, noise reducing, pre-processing such as standardizing /
normalizing the data and finally grouping data in to calibration and validation
sets.
5.5.1 Selection of Suitable Inputs / Outputs
Selecting too many input variables and therefore too many free
parameters can lead to poor generalization performance. As a result, it is
crucial to reduce the dimension of the input vector either by constructing
more powerful variables in the preprocessing phase or by eliminating
variables with low information content.
5.5.2 Data Preprocessing
The input and output variables should be standardized to make sure
that they receive equal attention during the training process (Bond et al 1992).
Friedman and Kandal (1999) have emphasized the importance of correct
standardization factors. They have mentioned that the choice of
standardization ranges significantly influences the performance of the ANN
and also cautioned that ANN should not be used for extrapolation. Without
standardization in MLPs, input variables measured on different scale will
dominate training to a greater (or) lesser extent. Data standardization plays a
vital role in improving the efficiency of the training algorithm.
5.5.3 Model Training and Testing
Available records have been divided into two independent sets; the
training set and the testing set. The training set is used to minimize the error
and the testing set is used to avoid over fitting, when implementing the neural
61
model (Opitz 1999, Ravanbod 2005). The method of splitting the data
(Systematic (or) Random) can significantly affect the data.
5.6 STRUCTURE OF BPN NETWORK
The command structure for creating, training and testing the Back
propagation network is given and clearly explained below.
Creating and Initializing BPN Network
To create a general feed forward neural network, the command is
net = newff (input range, [number1,number2], {transfer1,transfer2}, training
algorithm);
Here, Input range is the maximum and minimum value of input data.
[number1, number2] is a list of the number of units in each layer.
transfer1, transfer2 is a list of transfer functions for each layer.
These are strings. The tan Sigmoidal transfer function which is called ‘tansig’
and Linear transfer function which is called ‘purelin’in Matlab are typically
used.
The tan sigmoid transfer function shown in Figure 5.7 takes the
input, which may have any value between plus and minus infinity, and
squashes the output into the range -1 to 1. The purelin transfer function takes
the input, which may have any value between plus and minus infinity and the
network output take any value. It is shown in Figure 5.8.
62
Figure 5.7 Tan sigmoidal transfer Figure 5.8 Linear transfer
function function
Training algorithm is described by a string. Matlab includes many
algorithms: gradient descent, gradient descent with adaptive learning rate,
conjugate gradient descent, Scaled conjugate gradient back propagation and
others. The Table 5.1 shows commonly used fastest algorithms for
classification.
Table 5.1 List of fastest algorithms
S. No. Algorithm
1. trainscg - Scaled Conjugate Gradient
2. trainlm - Levenberg-Marquardt
3. traingda, traingdx - Variable Learning Rate back propagation
4. traincgf - Fletcher Reeves Conjugate Gradient
Default Training Parameters
The training parameters for the network are initialized by the
following default training parameters.
net.trainparam.show
net.trainparam.epochs
net.trainparam.goal
63
The training parameters are depending on the selection algorithm.
The number of hidden layers, number of neurons in the hidden layer is
determined by the experimentation (i.e.) trial and error approach.
Training BPN network
The training of neural network is done by using the command,
net = train(net,p,t);
Here, net is network object. This will train the network using the
training data p and target data t by using learning algorithm specified when
net was created using new.
Testing BPN Network
The results of the input data acting on the network is tested by
using the sim function in Matlab.
a=sim (net, q);
where q=new input or testing input
5.7 PROBABILISTIC NEURAL NETWORK
Probabilistic neural network (PNN) is predominantly a classifier
which maps any input pattern into a number of classifications. It is an
implementation of a statistical algorithm called kernel discriminant analysis in
which the operations are organized into a network with four layers. They are
input layer, hidden layer, Pattern layer/Summation layer and output layer.
Figure 5.9 illustrates the schematic diagram of probabilistic neural network.
Their design is straightforward and does not depend on training.
64
Figure 5.9 Probabilistic neural network structure
Input layer: There is one neuron in the input layer for each predictor
variable. In the case of categorical variables, N-1 neurons are used where N is
the number of categories. The input neurons (or processing before the input
layer) standardize the range of the values by subtracting the median and
dividing by the interquartile range. The input neurons then feed the values to
each of the neurons in the hidden layer.
Hidden layer: This layer has one neuron for each case in the training data set.
The neuron stores the values of the predictor variables for the case along with
the target value. When presented with the x vector of input values from the
input layer, a hidden neuron computes the Euclidean distance of the test case
from the neuron’s center point and then applies the RBF kernel function using
the sigma value(s). The resulting value is passed to the neurons in the pattern
layer.
Pattern layer / Summation layer: There is one pattern neuron for each
category of the target variable. The actual target category of each training
case is stored with each hidden neuron; the weighted value coming out of a
hidden neuron is fed only to the pattern neuron that corresponds to the hidden
65
neuron’s category. The pattern neurons add the values for the class they
represent (hence, it is a weighted vote for that category).
Decision layer: The decision layer compares the weighted votes for each
target category accumulated in the pattern layer and uses the largest vote to
predict the target category.
The Probabilistic Neural Network (PNN) (Specht 1988) is a
representative alternative as it has all the advantages of neural networks while
excluding the typical disadvantages of back-propagation neural networks
(Schemerr et al 2000 and Song et al 2002). Given the fact that the architecture
of a PNN can be directly determined by the provided flaw classification
problem, the training of a PNN classifier can be completed instantaneously
(Zaknich 1998). These researches also suggested applying a Bayes decision
strategy to make the classification performance of PNNs consistent.
B. Sadoun also supported PNN as a flaw classifier in his study and compared
PNN with various other ANN paradigms including backprogapation
networks, radial basis function network, general regression neural network
and LVQ network (Sadoun 2001).
Even though PNN has advantages as an alternative paradigm of
ANNs to the multi layered neural networks, PNN has drawbacks which are
commonly admitted (Ko and Byun 2002, Zaknich 1998). The first
disadvantage of PNN is that it requires higher memory demands during the
execution, thus the execution of the trained network for applying new test
data becomes slow. The other drawback is that its efficiency depends strongly
on its initial training data. It means that PNN need to be trained with the
correct proper data in order to achieve a good efficiency. Both of those two
main disadvantages of PNN cannot be simply ignored when we consider PNN
as the classifier for the ultrasonic flaw signals.
66
PNN are derived from Bayes Decision Networks. They train
quickly since the training is done in one pass of each training vector, rather
than several. Probabilistic neural networks estimate the probability density
function for each class based on the training samples using Parzen or a similar
probability density function. This is calculated for each test vector. Usually a
spherical Gaussian basis function is used, although many other functions
work equally well.
Vectors must be normalized prior to input into the network. There
is an input unit for each dimension in the vector. The input layer is fully
connected to the hidden layer. The hidden layer has a node for each
classification. Each hidden node calculates the dot product of the input vector
with a test vector subtracts 1 from it and divides the result by the standard
deviation squared. The output layer has a node for each pattern classification.
The sum for each hidden node is sent to the output layer and the highest
values wins.
The Probabilistic neural network trains immediately but execution
time is slow and it requires a large amount of space in memory. It really
works only for classifying data. The training set must be a thorough
representation of the data. Probabilistic neural networks handle data that has
spikes and points outside the norm better than other neural nets.
PNN have advantages and disadvantages compared to Multilayer
Perceptron networks (BPN):
It is usually much faster to train a PNN network than a
multilayer perceptron network.
PNN networks often are more accurate than multilayer
perceptron networks.
67
PNN networks generate accurate predicted target probability
scores.
PNN networks approach Bayes optimal classification. PNN
networks are slower than multilayer perceptron networks at
classifying new cases.
PNN networks require more memory space to store the model.
5.8 ADVANTAGES OF NEURAL NETWORK
Neural Networks, with their remarkable ability to derive meaning
from complicated (or) imprecise data, can be used to extract patterns and
detect trends that are too complex to be noticed by either human or other
computing techniques. A trained neural network can be thought of as an
‘expert’ in the category of information it has been given to analyse. This
expert can then be used to provide projections given new situations of interest
and answer ‘what if’ questions.
ANN models have many advantages. Some of them are as follows
The application of a neural network does not require a prior
knowledge of the underlying process. (Black box approach)
All the existing complex relationship between various aspects
of the process under investigation need not be known.
This approach is faster when compared with its conventional
compatriots, flexible in the range of problems it can solve and
highly adaptive to newer environments
ANNs are data driven when compared to conventional
approaches which are model driven (Lee and Castro 2005)
68
The data used do not have to follow a Gaussian distribution
The data used may possess irregular seasonal variation
ANNs are non-linear models and perform well even when
limited data are available
They are very robust and are able to deal with outliers and
noisy or incomplete data (Mekdeci and McLaughlin 1995).
Other advantages includes
Adaptive learning: An ability to learn how to do task based on
the data given for training (or) initial experience
Self-organisation: An ANN can create its own organization or
representation of the information it receives during the
learning time.
Real time operation: ANN computations may be carried out in
parallel, and special hardware devices are being designed and
manufactured which take advantage of this capability.
Fault tolerance via Redundant information coding: Partial
destruction of a network leads to the corresponding
degradation of performance. However, some network
capabilities may be retained even with major network damage.
Neural network allows more complex modeling than the
regression procedure (Windsor et al 1993).
The combination of simplicity, interpolation, reasonably
accurate prediction statistics, ability to provide conditional
simulations and computational speed suggests that an artificial
69
neural networks can be a useful tool in water resources
systems analysis (Pittner and Kamarthi 1999). Due to these
established advantage, currently the ANN has numerous real
world applications such as image processing, speech
processing and robotics and stock market predictions. There
has been extensive research on its implementation in the
system engineering related fields, such as time series
prediction, rule based control and rainfall runoff modeling.
5.9 SUPPORT VECTOR MACHINES
5.9.1 Introduction to SVM
Support vector machines (SVMs) are a set of related supervised
learning methods used for classification and regression. In simple words,
given a set of training examples, each marked as belonging to one of two
categories, an SVM training algorithm builds a model that predicts whether a
new example falls into one category or the other (Olivier bousquet 2001).
Intuitively, an SVM model is a representation of the examples as points in
space, mapped so that the examples of the separate categories are divided by a
clear gap that is as wide as possible. New examples are then mapped into that
same space and predicted to belong to a category based on which side of the
gap they fall on (Lee and Estivill-Castro 2004).
More formally, a support vector machine constructs a hyperplane or
set of hyperplanes in a high or infinite dimensional space, which can be used
for classification, regression or other tasks. Intuitively, a good separation is
achieved by the hyperplane that has the largest distance to the nearest training
datapoints of any class (so-called functional margin), since in general the
larger the margin the lower the generalization error of the classifier (Carl
Gold and Peter Sollich 2003, Osuna et al 1997).
70
5.9.2 SVM for Classification
Classifying data is a common task in machine learning. The given
data points belongs to one of two groups. The goal of classification is to
predict the new data point belongs to the correct group among the two groups.
There are two types in SVM classification they are Binary class and
multiclass which are discussed as follows.
5.9.2.1 Binary class SVM
Binary class SVM is one type of Support Vector Machine. It is
used for classification of two different classes. In Binary class SVM,
classification is done by constructing an N-dimensional hyper plane that
optimally separates the data into two categories. The Figure 5.10 shows the
classification of two different classes by binary class SVM.
Figure 5.10 Classification of two different classes by binary class SVM
The Figure 5.10 shows classification of two different groups of data
points. The dot points belong to one type of group and the holes belongs to
another type of group. The plane which separates the two different classes is
known as the hyper plane. The points which are used to create the hyper plane
Support vector
Hyperplane
Margin
71
is called support vector. The gap between the support vectors is known as the
margin.
5.9.2.2 Multi class SVM
Multi class SVM is used to classify more than two different types
of classes. The data points of different groups are classified by creating
various hyperplanes in between the groups.
The single multiclass problem is reduced into multiple binary
problems. Each of the problems yields a binary classifier, which is assumed to
produce an output function that gives relatively large values for examples
from the positive class and relatively small values for examples belonging to
the negative class Sathiya and Keerthi (2002). Classification of new instances
for one-versus-all case is done by a winner-takes-all strategy, in which the
classifier with the highest output function assigns the class.
Figure 5.11 Classification of four different classes by Multi class SVM
72
For the one-versus-one approach, classification is done by a max-
wins voting strategy, in which every classifier assigns the instance to one of
the two classes, then the vote for the assigned class is increased by one vote,
and finally the class with most votes determines the instance classification
(Tong and Chang 2001). The Figure 5.11 shows the classification of four
different types of classes by using Multi class SVM.
5.10 SUMMARY
In this chapter an overview of Artificial Neural Networks with
respect to optimal network architecture, Training of ANN and data
preparation for ANN is clearly discussed with the extensive literature. Feed
forward Back Propagation Network, structure of BPN network and
Probabilistic Neural Network are also clearly explained. In addition, a brief
explanation of Support Vector Machines is provided at the end.