CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial...

27
74 CHAPTER 4 ARTIFICIAL NEURAL NETWORKS 4.1 INTRODUCTION Artificial Neural Networks (ANNs) are relatively crude electronic models based on the neural structure of the brain. The brain learns from experience. Artificial neural networks try to mimic the functioning of brain. Even simple animal brains are capable of functions that are currently impossible for computers. Computers do the things well, but they have trouble recognizing even simple patterns. The brain stores information as patterns. Some of these patterns are very complicated and allow us the ability to recognize individual faces from many different angles. This process of storing information as patterns, utilizing those patterns, and then solving the problems encompasses a new field in computing, which does not utilize traditional programming but involves the creation of massively parallel networks and the training of those networks to solve specific problems. The exact workings of the human brain are still a mystery, yet some aspects are known. The most basic element of the human brain is a specific type of cell, called ‘neuron’. These neurons provide the abilities to remember, think, and apply previous experiences to our every action. They are about 100 billion in number and each of these neurons connects itself with about 200,000 other neurons, although 1,000 to 10,000 is typical. The power of the human mind comes from the sheer numbers of these basic components and the multiple connections between them. It also comes from genetic programming and learning. The individual neurons are complicated. They have a myriad of parts, sub- systems and control mechanisms. They convey information via a host of

Transcript of CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial...

Page 1: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

74

CCHHAAPPTTEERR 44

AARRTTIIFFIICCIIAALL NNEEUURRAALL NNEETTWWOORRKKSS

44..11 IINNTTRROODDUUCCTTIIOONN

Artificial Neural Networks (ANNs) are relatively crude electronic models

based on the neural structure of the brain. The brain learns from experience.

Artificial neural networks try to mimic the functioning of brain. Even simple animal

brains are capable of functions that are currently impossible for computers.

Computers do the things well, but they have trouble recognizing even simple

patterns.

The brain stores information as patterns. Some of these patterns are very

complicated and allow us the ability to recognize individual faces from many

different angles. This process of storing information as patterns, utilizing those

patterns, and then solving the problems encompasses a new field in computing,

which does not utilize traditional programming but involves the creation of

massively parallel networks and the training of those networks to solve specific

problems.

The exact workings of the human brain are still a mystery, yet some aspects

are known. The most basic element of the human brain is a specific type of cell,

called ‘neuron’. These neurons provide the abilities to remember, think, and apply

previous experiences to our every action. They are about 100 billion in number and

each of these neurons connects itself with about 200,000 other neurons, although

1,000 to 10,000 is typical. The power of the human mind comes from the sheer

numbers of these basic components and the multiple connections between them. It

also comes from genetic programming and learning.

The individual neurons are complicated. They have a myriad of parts, sub-

systems and control mechanisms. They convey information via a host of

Page 2: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

75

electrochemical pathways. Together, these neurons and their connections form a

process, which is not binary, not stable, and not synchronous.

44..22 AA BBIIOOLLOOGGIICCAALL NNEEUURROONN

Basically, a biological neuron receives inputs from other sources, combines

them in some way, performs a generally nonlinear operation on the result, and then

outputs the final result. Fig. 4.1 shows the relationship of these four parts.

Fig. 4.1: A biological neuron

Within humans there are many variations on basic type of neuron, yet, all

biological neurons have the same four basic components. They are known by their

biological names – cell body (soma), dendrites, axon, and synapses.

Cell body (Soma): The body of neuron cell contains the nucleus and carries out

biochemical transformation necessary to the life of neurons.

Dendrite: Each neuron has fine, hair like tubular structures (extensions) around it.

They branch out into tree around the cell body. They accept incoming signals.

Axon: It is a long, thin, tubular structure which works like a transmission line.

Synapse: Neurons are connected to one another in complex spatial arrangement.

When axon reaches its final destination it branches again called as terminal

arborization. At the end of axon are highly complex and specialized structures called

synapses. Connection between two neurons takes place at these synapses.

Page 3: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

76

Dendrites receive the input through the synapses of other neurons. The soma

processes these incoming signals over time and converts that processed value into

an output, which is sent out to other neurons through the axon and the synapses.

44..33 AANN AARRTTIIFFIICCIIAALL NNEEUURROONN

The artificial neuron simulates four basic functions of a biological neuron. Fig.

4.2 shows basic representation of an artificial neuron.

Fig. 4.2: A basic artificial neuron.

In Fig. 4.2, various inputs to the network are represented by the mathematical

symbol, x(n). Each of these inputs is multiplied by a connection weight. The weights

are represented by w(n). In the simplest case, these products are summed, fed to a

transfer function (activation function) to generate a result, and this result is sent as

output. This is also possible with other network structures, which utilize different

summing functions as well as different transfer functions.

Some applications like recognition of text, identification of speech, image

deciphering of scenes etc. require binary answers. These applications may utilize the

Page 4: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

77

binary properties of ORing and ANDing of inputs along with summing operations.

Such functions can be built into the summation and transfer functions of a network.

Seven major components make up an artificial neuron. These components

are valid whether the neuron is used for input, output, or is in the hidden layers.

Component 1. Weighting Factors: A neuron usually receives many simultaneous

inputs. Each input has its own relative weight, which gives the input the impact that

it needs on the processing element's summation function. Some inputs are made

more important than others to have a greater effect on the processing element as

they combine to produce a neural response. Weights are adaptive coefficients that

determine the intensity of the input signal as registered by the artificial neuron.

They are a measure of an input's connection strength. These strengths can be

modified in response to various training sets and according to a network's specific

topology or its learning rules.

Component 2. Summation Function: The inputs and corresponding weights are

vectors which can be represented as (i1, i2 . . . in) and (w1, w2 . . . wn). The total input

signal is the dot product of these two vectors. The result; (i1 * w1) + (i2 * w2) +……..

+ (in * wn) ; is a single number.

The summation function can be more complex than just weight sum of

products. The input and weighting coefficients can be combined in many different

ways before passing on to the transfer function. In addition to summing, the

summation function can select the minimum, maximum, majority, product or

several normalizing algorithms. The specific algorithm for combining neural inputs

is determined by the chosen network architecture and paradigm. Some summation

functions have an additional ‘activation function’ applied to the result before it is

passed on to the transfer function for the purpose of allowing the summation output

to vary with respect to time.

Page 5: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

78

Component 3. Transfer Function: The result of the summation function is

transformed to a working output through an algorithmic process known as the

transfer function. In the transfer function the summation can be compared with

some threshold to determine the neural output. If the sum is greater than the

threshold value, the processing element generates a signal and if it is less than the

threshold, no signal (or some inhibitory signal) is generated. Both types of response

are significant. The threshold, or transfer function, is generally non-linear. Linear

functions are limited because the output is simply proportional to the input.

The step type of transfer function would output zero and one, one and minus

one, or other numeric combinations. Another type, the ‘threshold’ or ramping

function, can mirror the input within a given range and still act as a step function

outside that range. It is a linear function that is clipped to minimum and maximum

values, making it non-linear. Another option is a ‘S’ curve, which approaches a

minimum and maximum value at the asymptotes. It is called a sigmoid when it

ranges between 0 and 1, and a hyperbolic tangent when it ranges between -1 and 1.

Both the function and its derivatives are continuous

Component 4. Scaling and Limiting: After the transfer function, the result can pass

through additional processes, which scale and limit. This scaling simply multiplies a

scale factor times the transfer value and then adds an offset. Limiting is the

mechanism which insures that the scaled result does not exceed an upper, or lower

bound. This limiting is in addition to the hard limits that the original transfer

function may have performed.

Component 5. Output Function (Competition): Each processing element is allowed

one output signal, which it may give to hundreds of other neurons. Normally, the

output is directly equivalent to the transfer function's result. Some network

topologies modify the transfer result to incorporate competition among neighboring

processing elements. Neurons are allowed to compete with each other inhibiting

Page 6: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

79

processing elements unless they have great strength. Competition can occur at one

or both levels. First, competition determines which artificial neuron will be active or

provides an output. Second, competitive inputs help determine which processing

element will participate in the learning or adaptation process.

Component 6. Error Function and Back-Propagated Value: In most learning

networks the difference between the current output and the desired output is

calculated as an error which is then transformed by the error function to match a

particular network architecture. Most basic architectures use this error directly but

some square the error while retaining its sign, some cube the error, other paradigms

modify the error to fit their specific purposes. The error is propagated backwards to

a previous layer. This back-propagated value can be either the error, the error scaled

in some manner (often by the derivative of the transfer function) or some other

desired output depending on the network type. Normally, this back-propagated

value, after being scaled by the learning function, is multiplied against each of the

incoming connection weights to modify them before the next learning cycle.

Component 7. Learning Function: Its purpose is to modify the weights on the

inputs of each processing element according to some neural based algorithm.

Page 7: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

80

44..44 AANN AARRTTIIFFIICCIIAALL NNEEUURRAALL NNEETTWWOORRKK

Fig. 4.3: An artificial neural network

Fig. 4.3 shows an artificial neural network. Inputs enter into the processing

element from the upper left. The first step is to multiply each of these inputs by their

respective weighting factor [w(n)]. These modified inputs are then fed into the

summing function, which usually sums these products, however, many different

types of operations can be selected. These operations can produce a number of

different values, which are then propagated forward; values such as the average, the

largest, the smallest, the ORed values, the ANDed values, etc. Other types of

summing functions can also be created and sometimes they may be further

complicated by the addition of an activation function which enables the summing

function to operate in a time sensitive way.

The output of the summing function is then sent into a transfer function,

which turns this number into a real output (a 0 or a 1, -1 or +1 or some other

number) via some algorithm. The transfer function can also scale the output or

Page 8: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

81

control its value via thresholds. This output is then sent to other processing elements

or an outside connection, as dictated by the structure of the network.

44..55 TTRRAANNSSFFEERR ((AACCTTIIVVAATTIIOONN)) FFUUNNCCTTIIOONNSS

The transfer function for neural networks must be differential and therefore

continuous to enable correcting error. Derivative of the transfer function is required

for computation of local gradient. One such example of a suitable transfer function is

the sigmoid function. The sigmoid function is a S-shaped graph. It is one of the most

common forms of transfer function used in construction of ANNs. It is defined as a

strictly increasing function. Mathematically its derivative is always positive. It

exhibits a graceful balance between linear and nonlinear behavior. One example of it

is a logistic function represented by the equation: e

v+

=1

1)(φ

This function has certain characteristic. At extremes of ϕ(v): ϕ (v) is flat and ϕ’(v) is

very small. At midrange of ϕ (v): ϕ’(v) is maximum as seen in Fig. 4.4.

Fig. 4.4: Logistic transfer function

Page 9: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

82

Several other transfer functions can also be employed as shown in Fig. 4.5.

Fig. 4.5: Transfer functions with different characteristic constant values

44..66 LLAAYYEERR AARRRRAANNGGEEMMEENNTT IINN AA NNEEUURRAALL NNEETTWWOORRKK

The neurons can be clustered together in many ways. This clustering occurs

in the human mind in such a way that information can be processed in a dynamic,

interactive, and self-organizing way. Biologically, neural networks are constructed

in a three-dimensional world from microscopic components. These neurons seem

capable of nearly unrestricted interconnections. That is not true of any existing man-

made network. Neural networks are the simple clustering of artificial neurons by

creating layers and interconnections as shown in Fig. 4.6.

Page 10: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

83

Fig. 4.6: Layer arrangement in a neural network

Basically, a neural network is the grouping of neurons into layers, the

connections between these layers, and the summation and transfer functions that

comprises a functioning neural network. Most applications require networks that

contain at least the three layers - input, hidden, and output. The input layer receives

the data either from input files or directly from electronic sensors in real-time

applications. The output layer sends information directly to the outside world, to a

secondary computer process or to other devices. Between these two layers there can

be many hidden layers. These hidden layers contain many neurons in various

interconnected structures. The inputs and outputs of each of these hidden neurons

simply go to other neurons.

In most networks, each neuron in a hidden layer receives the signals from all

the neurons typically from the input layer. After a neuron performs its function, it

passes its output to all of the neurons from typically the output layer, providing a

feed-forward path. This gives a variable strength to an input. There are two types of

these connections. One causes the summing mechanism of the next neuron to add

(excite) while the other causes it to subtract (inhibit). Some networks want a neuron

to inhibit the other neurons in the same layer, called ‘lateral inhibition’. The most

common use of this is in the output layer, e.g., in text recognition, if the probability

of a character being ‘P’ is 0.85 and if the same being ‘F’ is 0.65, the network wants to

Page 11: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

84

choose the highest probability and inhibit all the others. It can do that with lateral

inhibition (competition).

44..77 TTYYPPEESS OOFF AARRTTIIFFIICCIIAALL NNEEUURRAALL NNEETTWWOORRKKSS

44..77..11 SSIINNGGLLEE LLAAYYEERR FFEEEEDD FFOORRWWAARRDD NNEETTWWOORRKK

A neural network in which the input layer of source nodes projects into an output

layer of neurons but not vice-versa is known as single feed-forward or acyclic

network. In single layer network, ‘single layer’ refers to the output layer of

computation nodes as shown in Fig. 4.7.

44..77..22 MMUULLTTIILLAAYYEERR FFEEEEDD FFOORRWWAARRDD NNEETTWWOORRKK

This type of network (Fig. 4.8) consists of one or more hidden layers, whose

computation nodes are called hidden neurons or hidden units. The function of

hidden neurons is to interact between the external input and network output in

some useful manner and to extract higher order statistics. The source nodes in input

layer of network supply the input signal to neurons in the second layer (1st hidden

layer). The output signals of 2nd layer are used as inputs to the third layer and so on.

The set of output signals of the neurons in the output layer of network constitutes

Fig. 4.7 : A Single layer feedforward network

Input layer Output Layer

Page 12: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

85

the overall response of network to the activation pattern supplied by source nodes in

the input first layer.

Short characterization of feedforward networks:

1. typically, activation is fed forward from input to output through ‘hidden

layers’, though many other architectures exist.

2. mathematically, they implement static input-output mappings.

3. most popular supervised training algorithm: backpropagation algorithm

4. have proven useful in many practical applications as approximators of

nonlinear functions and as pattern classificators.

44..77..33 RREECCUURRRREENNTT NNEETTWWOORRKK

A feed forward neural network having one or more hidden layers with atleast

one feedback loop is known as recurrent network as shown in Fig. 4.9. The feedback

may be a self feedback, i.e., where output of neuron is fed back to its own input.

Sometimes, feedback loops involve the use of unit delay elements, which results in

nonlinear dynamic behaviour, assuming that neural network contains non linear

units.

Fig. 4.8 : A multilayer feed forward network

Input layer

Hidden Layer Output layer

Page 13: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

86

There are various other types of networks like; delta-bar-delta, Hopfield,

vector quantization, counter propagation, probabilistic, Hamming, Boltzman, bi-

directional associative memory, spacio-temporal pattern, adaptive resonance, self

organizing map, recirculation etc.

A recurrent neural network has (at least one) cyclic path of synaptic

connections. Basic characteristics:

1. all biological neural networks are recurrent

2. mathematically, they implement dynamical systems

3. several types of training algorithms are known, no clear winner

4. theoretical and practical difficulties by and large have prevented practical

applications so far.

44..88 TTRRAAIINNIINNGG OOFF AARRTTIIFFIICCIIAALL NNEEUURRAALL NNEETTWWOORRKKSS

Once a network has been structured for a particular application, it is ready for

training. At the beginning, the initial weights are chosen randomly and then the

training or learning begins. There are two approaches to training; supervised and

unsupervised.

Fig. 4.9 : A recurrent network

Input layer Output Layer

����

����

����

����

Page 14: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

87

44..88..11 SSUUPPEERRVVIISSEEDD TTRRAAIINNIINNGG

In supervised training, both the inputs and the outputs are provided. The

network then processes the inputs and compares its resulting outputs against the

desired outputs. Errors are then propagated back through the system, causing the

system to adjust the weights, which control the network. This process occurs over

and over as the weights are continually tweaked. The set of data, which enables the

training, is called the "training set." During the training of a network, the same set of

data is processed many times, as the connection weights are ever refined.

Sometimes a network may never learn. This could be because the input data

does not contain the specific information from which the desired output is derived.

Networks also don't converge if there is not enough data to enable complete

learning. Ideally, there should be enough data so that part of the data can be held

back as a test. Many layered networks with multiple nodes are capable of

memorizing data. To monitor the network to determine if the system is simply

memorizing its data in some non-significant way, supervised training needs to hold

back a set of data to be used to test the system after it has undergone its training.

If a network simply can't solve the problem, the designer then has to review

the input and outputs, the number of layers, the number of elements per layer, the

connections between the layers, the summation, transfer, and training functions, and

even the initial weights themselves. Another part of the designer's creativity governs

the rules of training. There are many laws (algorithms) used to implement the

adaptive feedback required to adjust the weights during training. The most common

technique is known as back-propagation.

The training is not just a technique, but a conscious analysis, to insure that the

network is not over trained. Initially, an artificial neural network configures itself

Page 15: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

88

with the general statistical trends of the data. Later, it continues to ‘learn’ about

other aspects of the data, which may be spurious from a general viewpoint.

When finally the system has been correctly trained and no further learning is

needed, the weights can, if desired, be ‘frozen’. In some systems, this finalized

network is then turned into hardware so that it can be fast. Other systems don't lock

themselves in but continue to learn while in production use.

44..88..22 UUNNSSUUPPEERRVVIISSEEDD OORR AADDAAPPTTIIVVEE TTRRAAIINNIINNGG

The other type is the unsupervised training (learning). In this type, the

network is provided with inputs but not with desired outputs. The system itself

must then decide what features it will use to group the input data. This is often

referred to as self-organization or adaption. These networks use no external

influences to adjust their weights. Instead, they internally monitor their

performance. These networks look for regularities or trends in the input signals, and

makes adaptations according to the function of the network. Even without being

told whether it's right or wrong, the network still must have some information about

how to organize itself. This information is built into the network topology and

learning rules. An unsupervised learning algorithm might emphasize cooperation

among clusters of processing elements. In such a scheme, the clusters would work

together. If some external input activated any node in the cluster, the cluster's

activity as a whole could be increased. Likewise, if external input to nodes in the

cluster was decreased, that could have an inhibitory effect on the entire cluster.

Competition between processing elements could also form a basis for

learning. Training of competitive clusters could amplify the responses of specific

groups to specific stimuli. As such, it would associate those groups with each other

and with a specific appropriate response. Normally, when competition for learning

is in effect, only the weights belonging to the winning processing element will be

Page 16: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

89

updated. Presently, the unsupervised learning is not well understood and there

continues to be a lot of research in this aspect.

44..99 LLEEAARRNNIINNGG RRAATTEESS

The rate at which ANNs learn depends upon several controllable factors. A

slower rate means more time to spend in producing an adequately trained system.

With faster learning rates, however, the network may not be able to make the fine

discriminations that are possible with a system learning slowly.

Most learning functions have some provision for a learning rate (learning

constant). Usually this term is positive and between 0 and 1. If the learning rate is

greater than 1, it is easy for the learning algorithm to overshoot in correcting the

weights, and the network will oscillate. Small values of the learning rate will not

correct the current error as quickly, but if small steps are taken in correcting errors,

there is a good chance of arriving at the best minimum convergence.

44..1100 LLEEAARRNNIINNGG LLAAWWSS ((AALLGGOORRIITTHHMMSS))

Many learning laws are in common use. Most of them are some sort of

variation of the best known and oldest ‘Hebb's Rule’.

Hebb's Rule: This was introduced by Donald Hebb in ‘Organization of Behavior’.

The basic rule is: If a neuron receives an input from another neuron and if both are

highly active (same sign), the weight between the two neurons should be

strengthened.

Hopfield Law: If the desired output and the input are both active or both inactive,

increment the connection weight by the learning rate, otherwise decrement the

weight by the learning rate.

Page 17: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

90

The Delta Rule: This rule is based on the simple idea of continuously modifying the

strengths of the input connections to reduce the difference (the delta) between the

desired output value and the actual output of a processing element.

The Gradient Descent Rule: This is similar to Delta Rule in that, the derivative of

the transfer function is still used to modify the delta error before it is applied to the

connection weights. However, an additional proportional constant tied to the

learning rate is appended to the final modifying factor acting upon the weight.

Kohonen's Law: In this, the processing elements compete for the opportunity to

learn or update their weights. The element with largest output is declared the

winner and has the capability of inhibiting its competitors as well as exciting its

neighbors. Only the winner is permitted an output and only the winner plus its

neighbors are allowed to adjust their connection weights.

44..1111 BBAACCKKPPRROOPPAAGGAATTIIOONN FFOORR FFEEEEDD FFOORRWWAARRDD NNEETTWWOORRKKSS

The backpropagation (BP) algorithm is the most commonly used training

method for feed forward networks. Consider a multi-layer perceptron with ‘k’

hidden layers. Together with the layer of input units and the layer of output units

this gives k+2 layers of units altogether, which are numbered by 0, ..., k+1. Let the

number of input units be K, output units be L and of unis in hidden layer m be Nm

.

The weight of jth unit in layer m and the ith unit in layer m+1 is denoted by wij

m. The

activation of the ith unit in layer m is xi

m (for m = 0 this is an input value, for m = k+1

an output value). The training data for a feedforward network training task consist

of T input-output (vector-valued) data pairs

,))(),...,(()(,))(),...,(()( 11

1

00

1

tk

L

kt

K ndndndnxnxnu++

==

where ‘n’ denotes training instance. The activation of non-input units is computed

according to

Page 18: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

91

)).(()(,...,1

1nxwfnx

mNj

j

m

ij

m

i �=

+=

Presented with training input u(t), the previous update equation is used to

compute activations of units in subsequent hidden layers, until a network

response tK

L

knxnxny ))(),...,(()( 11

1

++= is obtained in the output layer. The objective of

training is to find a set of network weights such that the summed squared error

� �= =

=−=

Tn Tn

nEnyndE,...,1 ,....1

2)()()( is minimized. This is done by incrementally

changing the weights along the direction of the error gradient with respect to

weights �= ∂

∂=

Ttm

ij

m

ij w

nE

w

E

,....1

)(using a (small) learning rate γ:

m

ij

m

ij

m

ijw

Ewwnew

∂−= γ .

This is the formula used in batch learning mode, where new weights are

computed after presenting all training samples. One such pass through all samples

is called an epoch. Before the first epoch, weights are initialized, typically to small

random numbers. A variant is incremental learning, where weights are changed

after presentation of individual training samples:

m

ij

m

ij

m

ijw

nEwwnew

∂−=

)(γ

The subtask in this method is the computation of the error gradients m

ijw

nE

∂ )(.

The backpropagation algorithm is a scheme to perform these computations.

The procedure for one epoch of batch processing is given below.

Input: current weights m

ijw , training samples.

Output: new weights.

Computation steps:

1. For each sample n, compute activations of internal and output units (forward

pass).

Page 19: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

92

2. Compute, by proceeding backward through m = k+1, k, ..., 1, for each unit m

ix the

error propagation term )(nm

iδ .

1

)())()(()(1

+=

+

∂−=

kizu

ii

k

iu

ufnyndnδ

for the output layer and

mj

m

zu

N

i

m

ij

m

i

m

ju

ufwn

==

+

∂= �

+

)()(

1

1

1δδ

for the hidden layers, where

�−

=

−−=

1

1

11 )()(

mN

j

m

ij

m

j

m

i wnxnz

is the internal state (or potential) of unit m

ix . This is the error backpropagation pass.

Mathematically, the error propagation term )(nm

iδ represents the error gradient

w.r.t. the potential of the unit m

ix .

mjzuu

E

=∂

3. Adjust the connection weights according to

�=

−−−+=

T

t

m

j

m

i

m

ij

m

ij nxnwwnew1

111 )()(δγ

After every such epoch, compute the error. Stop when the error falls below a

predetermined threshold or when the change in error falls below another

predetermined threshold or when the number of epochs exceeds a predetermined

maximal number of epochs. Many (order of thousands in nontrivial tasks) such

epochs may be required until a sufficiently small error is achieved. One epoch

requires O(T M) multiplications and additions, where M is the total number of

network connections.

Page 20: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

93

The basic gradient descent approach (and its backpropagation algorithm

implementation) is notorious for slow convergence, because the learning rate γ must

be typically chosen small to avoid instability. Another approach to achieve faster

convergence is to use second-order gradient descent techniques, which exploit

curvature of the gradient but have epoch complexity O(T M2).

44..1122 AAPPPPLLIICCAATTIIOONNSS OOFF NNEEUURRAALL NNEETTWWOORRKKSS

44..1122..11 GGEENNEERRAALL AAPPPPLLIICCAATTIIOONNSS

Many of the networks being designed presently are statistically quite accurate

(upto 85% to 90% accuracy). Currently, neural networks are not the user interface,

which translates spoken words into instructions for a machine but some day it will

be achieved. VCRs, home security systems, CD players, and word processors will

simply be activated by voice. Touch screen and voice editing will replace the word

processors of today while bringing spreadsheets and data bases to a level of

usability. Neural network design is progressing in other more promising application

areas.

(i) Language Processing: These applications include text-to-speech conversion,

auditory input for machines, automatic language translation, secure voice keyed

locks, automatic transcription, aids for the deaf, aids for the physically disabled

which respond to voice commands and natural language processing.

(ii) Character Recognition: Neural network based products are available which can

recognize hand printed characters through a scanner. It is 98% accurate for numbers,

a little less for alphabetical characters. Quantum Neural Network software package

(Qnspec) is available for recognizing characters, including cursive.

(iii) Image (data) Compression: Neural networks can do real-time compression and

decompression of data. These networks can reduce eight bits of data to three and

then reverse that process upon restructuring to eight bits again.

Page 21: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

94

(iv) Pattern Recognition: Many pattern recognition applications are in use like, a

system that can detect bombs in luggage at airports by identifying from small

variances and patterns from within specialized sensor's outputs, a back-propagation

neural network which can discriminate between a true and a false heart attack, a

network which can scan and also read the PAP smears etc. Many automated quality

control applications are now in use, which are based on pattern recognition.

(v) Signal Processing: Neural networks have proven capable of filtering out

electronic noise. Another application is a system that can detect engine misfire

simply from the engine sound.

(vi) Financial: Banks, credit card companies and lending institutions deal with many

decisions that are not clear-cut. They involve learning and statistical trends. Neural

networks are now trained on the data from past decisions and being used in

decision-making.

(vii) Servo Control: A neural system known as Martingale's Parametric Avalanche -

a spatio-temporal pattern recognition network is being designed to control the

shuttle during in-flight maneuvers. Another application is ALVINN, for

Autonomous Land Vehicle.

44..1122..22 AAPPPPLLIICCAATTIIOONNSS IINN PPOOWWEERR SSYYSSTTEEMMSS

The electric power industry is currently undergoing an unprecedented

reform. One of the most exciting and potentially profitable recent developments is

increasing usage of ANN techniques. A major advantage of ANN approach is that

the domain knowledge is distributed in the neurons and information processing is

carried out in parallel-distributed manner. Being adaptive units, they are able to

learn these complex relationships even when no functional model exists. This

provides the capability to do ‘Black Box Modeling’ with little or no prior knowledge

of the function itself. ANNs have the ability to properly classify a highly non-linear

relationship and once trained, they can classify new data much faster than it would

Page 22: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

95

be possible by solving the model analytically. The rising interest in ANNs is largely

due to the emergence of powerful new methods as well as to the availability of

computational power suitable for simulation. The field is particularly exciting today

because ANN algorithms and architectures can be implemented in VLSI technology

for real-time applications.

The application of ANNs in many areas under electrical power systems has

lead to acceptable results.

1. Load Forecasting

Load forecasting is a very common and popular problem, which has an

important role in economic, financial, development, expansion and planning of

power systems.

• Short-term load forecasting over an interval ranging from an hour to a week is

important for various applications such as unit commitment, economic dispatch,

energy transfer scheduling and real time control.

• Mid-term load forecasting that range from one month to five years, is used to

purchase enough fuel for power plants after electricity tariffs are calculated.

• Long-term load forecasting (LTLF), covering from 5 to 20 years or more, is used by

planning engineers and economists to determine the type and the size of generating

plants that minimize both fixed and variable costs.

The ANNs can be used to solve these problems. Most of the projects using

ANNs have considered many factors such as weather condition, holidays, weekends

and sports matches days in forecasting model successfully. This is because of

learning ability of ANNs with many input factors. Main advantages of ANNs that

has increased their use in forecasting are:

1. Being conducted off-line without time constraints and direct coupling to power

system for data acquisition.

Page 23: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

96

2. Ability to adjust the parameters for ANN inputs that does not have functional

relationship between them such as weather conditions and load profile.

2. Fault Diagnosis\Fault Location

Progress in the areas of communication and digital technology has increased

the amount of information available at supervisory control and data acquisition

(SCADA) systems. Although information is very useful, during events that cause

outages, the operator may be overwhelmed by the excessive number of

simultaneously operating alarms, which increases the time required for identifying

the main outage cause and to start the restoration process. Besides, factors such as

stress and inexperience can affect the operator’s performance; thus, the availability

of a tool to support the real-time decision-making process is desired. The protection

devices are responsible for detecting the occurrence of a fault, and when necessary,

they send trip signals to circuit breakers (CBs) in order to isolate the defective part of

the system. However, when relays or CBs do not work properly, larger parts of the

system may be disconnected. After such events, in order to avoid damages to energy

distribution utilities and consumers, it is essential to restore the system as soon as

possible. Not only this, before starting the restoration, it is necessary to identify the

event that caused the sequence of alarms such as protection system failure, defects in

communication channels, corrupted data acquisition etc. The heuristic nature of the

reasoning involved in the operator’s analysis and the absence of an analytical

formulation, leads to the use of artificial intelligence techniques. Model-based

systems including temporal characteristics of protection schemes based on general

regression neural network (GRNN) in feed forward topology can be successfully

used for this purpose. ANNs are fault tolerant and are able to learn off-line from a

set of historical or simulated data in order to make on-line inferences. They are able

to produce a diagnosis even in difficult situations, such as the presence of noisy data

or protection system failures.

Page 24: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

97

3. Economic Dispatch

Main goal of economic dispatch (ED) consists of minimizing the operating

costs depending on demand and subject to certain constraints, i.e. how to allocate

the required load demand between the available generation units. In practice, the

whole of the unit operating range is not always available for load allocation due to

physical operation limitations. Several methods like Lagrangian relaxation method,

linear programming techniques, Beale’s quadratic programming, Newton method,

Lagrangian augmented function etc. are being used. The economic dispatch problem

is a nonconvex optimization problem. Neural networks and specially the Hopfield

model, have a well-demonstrated capability of solving combinational optimization

problem.

4. Security Assessment

The principle task of an electric power system is to deliver the power

requested by the customers, without exceeding acceptable voltage and frequency

limits. This task has to be solved in real time and in safe, reliable and economical

manner. Fig. 4.10 show a simplified diagram of the principle data flow in a power

system where real-time measurements are stored in a database. The state estimation

then adjusts bad and missing data. Based on the estimated values the current

mathematical model of the power system is established. Based on simulation of

potential equipment outage, the security level of the system is determined. If the

system is considered unsafe with respect to one or more potential outages, control

actions have to be taken.

Page 25: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

98

Fig. 4.10: Data flow in power System Operation

Generally there are two types of security assessments: static and dynamic. In

both types different operational states are defined as follows:

• Normal or secure state: All customer demands are met and operating limit is

within presented limits.

• Alert or critical state: The system variables are still within limits and constrain are

satisfied, but little disturbance can lead to variable toward instability.

• Emergency or unsecure state: the power system enters the emergency mode of

operation upon violation of security related inequality constraints.

ANN with backpropagation training algorithm can be successfully used to

solve these problems.

5. Alarm Processing

In emergencies, engineers are expected to quickly evaluate various options

and implement an optimal corrective action. However, the number of real-time

messages (alarms) received is too large for the time available for their evaluation.

Processing such alarms in real-time and alerting the operator to the root cause or the

Page 26: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

99

most important of these alarms has been identified as a valuable operational aid.

ANNs have been implemented for such alarm processing. The fast response of a

trained ANN and its generalization abilities are very useful in this application.

6. Eddy current analysis

Analysis of eddy current losses requires numerical solution of Integra-

differential equations. Discretizing these equations and solving them using finite-

element methods is computationally very expensive. Cellular neural networks as an

alternative to finite-element methods have been developed which produce faster,

computationally less expensive and simpler method of solving these equations.

These networks calculate eddy currents and eddy current losses in a source current

carrying conductor in a time-varying magnetic field. This implementation opens up

a wide range of applications in structural analysis, electromagnetic field

computations, etc.

7. Harmonic source monitoring

It is often required to identify and monitor harmonic sources in the systems

containing non-linear loads. The ANNs can be trained using simulation results for

varying load conditions. They can be used in conjunction with a state estimator to

pinpoint and monitor the source of the harmonics.

8. Applications in nuclear power plants

ANNs have potential applications in enhancing the safety and efficiency of

nuclear power plants, like, diagnosis of specific abnormal conditions, detection of

the change of mode of operation, signal validation, monitoring of check valves,

modeling of the plant thermodynamics, monitoring of plant parameters, analysis of

plant vibrations, etc.

The ANNs can also be used in many other applications, some of which as

listed below:

Page 27: CHAPTER 4 ARTIFICIAL NEURAL NETWORKS · 4.4 AN ARTIFICIAL NEURAL NETWORK Fig. 4.3: An artificial neural network Fig. 4.3 shows an artificial neural network. Inputs enter into the

100

• Load frequency control (Automatic Generation Control)

• Hydroelectric Generation Scheduling:

• Power System Stabilizer Design:

• Load Flow studies

• Load modeling

• HVDC

• Non conventional energy field

• Maintenance scheduling

• Unit commitment

Main advantages of using ANNs in power system applications are:

• Their capability of dealing with stochastic variations of the scheduled operating

point with increasing data

• Very fast and on-line processing and classification

• Implicit nonlinear modeling and filtering of system data

44..1133 CCOONNCCLLUUDDIINNGG RREEMMAARRKKSS

The artificial neural networks have been dealt with in details in this chapter.

In this research work, the ANN controllers based on optimal and suboptimal control

strategy have been designed and developed for various types of interconnected

power systems, the procedure for which has been discussed in chapter 5.

Since these ANN controllers have been trained with a data obtained from

optimal and suboptimal control strategies, they have given the results comparable to

that of optimal and suboptimal control. The actual results have been shown and

discussed in chapter 6.