CP8206 – Soft Computing & Machine Intelligence …asadeghi/teaching/Neural Net 1 V02.pdf ·...
Transcript of CP8206 – Soft Computing & Machine Intelligence …asadeghi/teaching/Neural Net 1 V02.pdf ·...
CP8206 – Soft Computing & Machine Intelligence
1
• PRINCIPLE OF ARTIFICIAL NEURAL NETWORKS
Important properties of artificial neural networks will be discussed, namely that,
(i) the underlying principle of artificial neural networks.
(ii) general representation of the neural networks,
(iii) the principles of the error correction algorithm.
CP8206 – Soft Computing & Machine Intelligence
2
• ARTIFICIAL INTELLIGENCE & NEURAL NETWORKS
During the past twenty years, interest in applying the results of Artificial
Intelligence (AI) research has been growing rapidly.
AI relates to the development of theories & techniques required for a
computational engine to efficiently perceive, think & act with intelligence in
complex environments. The artificial intelligence discipline is concerned with intelligent computer
systems, exhibiting the characteristics associated with intelligence in human
behavior, such as understanding language, learning, solving problems &
reasoning.
CP8206 – Soft Computing & Machine Intelligence
3
• BRANCHES OF AI
Developments in some branches of AI have already led to new technologies
having significant effects in problem solving approaches. These include new
ways of defining the problems, new methods of representing the existing
knowledge regarding the problems & new problem handling methods.
There are several distinctive areas of research in Artificial Intelligence, more
importantly:
• artificial neural networks,
• fuzzy logic systems,
• expert systems,
each with its own specific interest, research techniques, terminology & objectives
(Fig. 1).
CP8206 – Soft Computing & Machine Intelligence
4
Genetic Algorithms
Fuzzy Systems
Expert Systems
Fuzzy-ExpertSystems
Neuro-GeneticSystems
Neuro-FuzzySystems
Neural Networks
AI
Fig. 1: Partial Taxonomy of Artificial Intelligence depicting a number of important AI branches & their relationships
CP8206 – Soft Computing & Machine Intelligence
5
• NEURAL NETWORKS
Among the various branches of AI, the area of artificial neural networks in
particular has received considerable attention during the twenty years.
An artificial neural network is a massively parallel & distributed processor that
has a natural propensity for storing experimental knowledge & making it available
for use.
The underlying idea is to implement a processor that works in a fashion similar to
the human brain.
CP8206 – Soft Computing & Machine Intelligence
6
• NEURAL NETWORKS NN resembles the brain in two respects; first, the knowledge is acquired through
a learning process, & second, inter-neuron connection strengths known as
weights are used to store the knowledge.
The learning process involves modification of the connection weights to obtain a
desired objective.
Major applications of neural networks can be categorized into five groups
including Pattern recognition, image processing, signal processing, system
identification & control.
CP8206 – Soft Computing & Machine Intelligence
7
• NEURAL NETWORKS
There are a variety of definitions for artificial neural networks each of which
highlights some aspects of this methodology such as:
• its similarity to its biological counterpart,
• its parallel computation capabilities, &
• its interaction with outside world.
A neural network is a non-programmable dynamic system with capabilities such
as trainability & adaptivity that can be trained to store, process & retrieve
information. It also possesses the ability to learn & to generalize based on past
observations.
CP8206 – Soft Computing & Machine Intelligence
8
• NEURAL NETWORKS
Owe their computing power to their parallel/distributed structure & the manner
that the activation functions have been defined. This information processing
ability provides the possibility of solving complex problems.
• Function approximation (I/O mapping): ability to approximate any nonlinear
function to the desired degree.
• Learning & Generalization: ability to learn I/O patterns, extract the hidden
relationship among presented data, & provide acceptable response to new
data that the network has not yet experienced. This enables neural
networks to provide models based on the imprecise information.
CP8206 – Soft Computing & Machine Intelligence
9
• NEURAL NETWORKS
• Adaptivity: capable of modifying their memory, & thus its fnality, over time.
• Fault tolerance: due to their highly parallel/distributed structure, failure of a
number of neurons to generate the correct response does not lead to failure
of the overall performance of the system.
CP8206 – Soft Computing & Machine Intelligence
10
• NEURAL NETWORKS - DISADVANTAGES
• large dimension that leads to memory restriction;
• selection of optimum configuration;
• convergence difficulty especially when soln is trapped in local minima;
• choice of training methodology;
• black-box representation, lack of explanation capabilities & transparency.
CP8206 – Soft Computing & Machine Intelligence
11
• NEURAL NETWORKS
A neural network can be characterized in terms of:
• Neurons: the basic processing units defining the manner in which
computation is performed.
• Neuron activation functions: indicate the function of each neuron.
• Inter-neuron patterns: define the way neurons are connected to each other.
• Learning algorithms: define how the knowledge stored in the network.
CP8206 – Soft Computing & Machine Intelligence
12
• NEURON MODEL
NN paradigm attempts to clone the physical structure & functionality of the
biological neuron.
Artificial neurons, like their biological counterparts, receive inputs, [x1, x2, ..., xr],
from the outside or other neurons through incoming connections.
Each neuron then generates a product term, [wixi], using the inputs &
connections weights ([w1, w2, ..., wr], represents the connection memory).
The product terms are then summed using an addition operator to produce the
neuron internal activity index, v(t).
CP8206 – Soft Computing & Machine Intelligence
13
• NEURON MODEL
This index is passed to an activation function, ϕ(.), which produces an output,
y(t).
v t w xii
r
i( ) ==∑
1
(1)
( )y t v t( ) ( )= ϕ (2)
A more general model of the neuron functionality can be provided by the
introduction of a threshold measure, w0, for the activation function.
CP8206 – Soft Computing & Machine Intelligence
14
• NEURON MODEL
This signifies the scenario where a neuron generates an output if its input is
beyond the threshold (Fig. 2), i.e.,
y t w x wii
r
i( ) ( )= −⎛⎝⎜
⎞⎠⎟=
∑ϕ1
0 (3)
This model is a simple yet useful approximation of the biological neuron & can be
used to develop different neural structures including feedforward & feedback
networks (Fig. 3).
CP8206 – Soft Computing & Machine Intelligence
15
∑
wk1
wk1
wk1
wk1
ϕ(.)ykak
xr
x2
x1
±1
aggregationoperation
synaptic operation somatic operation
Fig. 2: nonlinear model of a neuron
CP8206 – Soft Computing & Machine Intelligence
16
• TYPES OF ACTIVATION FUNCTIONS
Each neuron includes a nonlinear function, known as the activation function, that
transforms several weighted input signals into a single numerical output signal.
The neuron activation function, ϕ(.) expresses the functionality of the neuron.
There are at least three main classes of activation function, including linear,
sigmoid & Gaussian.
Table 3.1 illustrates different types of activation functions.
CP8206 – Soft Computing & Machine Intelligence
17
• NEURAL NETWORK ARCHITECTURES
The manner in which neurons are connected together defines the architecture of
a neural network. These architectures can be classified into two main groups
(Fig. 3):
• Feedforward neural network
• Recurrent neural network
CP8206 – Soft Computing & Machine Intelligence
18
Lattice
Neural Networks
Feedforward
Radial BasisFunction
Perceptron
Single Layer Multi Layer
Recurrent
Single Layer Multi Layer
Hopfield Elman
Fig. 3: Classification of different neural network structures
CP8206 – Soft Computing & Machine Intelligence
19
• FEEDFORWARD NEURAL NETWORK
The flow of the information is from input to output.
• SINGLE LAYER NETWORK (Fig. 4):
The main body of the structure consists of only one layer (a one-dimensional
vector) of neurons.
Can be considered as a linear association network that relate the output patterns
to input patterns.
CP8206 – Soft Computing & Machine Intelligence
20
ϕ(.)
ϕ(.)
ϕ(.)xr
x2
x1
yr
y2
y1
Single Layer of Neurons
Outputs Inputs
Fig. 4: Single layer feedforward neural network
CP8206 – Soft Computing & Machine Intelligence
21
• A MULTI-LAYER NETWORK (Fig. 5):
The structure consists of two or more layers of neurons.
The function of the additional layers is to extract higher order statistics.
The network acquires a global perspective despite its local connectivity by virtue
of the extra set of connection connections & the extra dimension of neural
interaction.
Specified by
• The number of I/O, the number of layers,
• Number of neuron in each layer,
• The network connection pattern, &
• The activation function for each layer.
CP8206 – Soft Computing & Machine Intelligence
22
ϕ1(.)
ϕ1(.)
ϕ1(.)
ϕ2(.)
ϕ2(.) yq
y1
xp
xp-1
x3
x2
x1
First Layer
Second Layer
Outputs
Inputs
Fig. 5: Multi-Layer feedforward neural network
CP8206 – Soft Computing & Machine Intelligence
23
• RECURRENT NEURAL NETWORK
A recurrent structure represents a network in which there is at least one
feedback connection.
Fig. 6 depicts a multi-layer recurrent neural network, which is similar to the
feedforward case except for the presence of the feedback loops & z-1 (unit
delay operator) that introduces the delay involved in feeding back the output to
input.
CP8206 – Soft Computing & Machine Intelligence
24
ϕ1(.)
ϕ1(.)
ϕ1(.)
ϕ2(.)
ϕ2(.)
xp
xp-1
x1
yq
y1
First Layer
Second Layer
Outputs
Inputs
z-1
z-1
Feedback connections
Unit delay
Fig. 6: Multi-layer recurrent neural network
CP8206 – Soft Computing & Machine Intelligence
25
Table 3.1: Neural Network Activation Functions
Function plot
Piecewise Linear;
f xif x b
a x if x bif X b
( ) .=− < −
<+ >
⎧
⎨⎪
⎩⎪
1
1 -10 -5 0 5 10-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
-10 -5 0 5 10-10
-8
-6
-4
-2
0
2
4
6
8
10
Linear; f(x)=a.x
-10 -5 0 5 10-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Indicator; f(x)=sgn(x)
CP8206 – Soft Computing & Machine Intelligence
26
Sigmoid; f xe a x( ) .=
+ −
11
-10 -5 0 5 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-10 -5 0 5 10-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Bipolar Sigmoid; f(x)=tanh(a.x)
-10 -5 0 5 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Gaussian; f x ex
( ) =2
2 2σ
CP8206 – Soft Computing & Machine Intelligence
27
• MULTI-LAYER PERCEPTRON (MLP)
A class of NNs that consists of one input layer together with one output layer that
represent the system inputs & outputs, respectively, & one or more hidden layers
that provide the learning capability for the network (Fig. 7).
The basic element of a MLP network is an artificial neuron whose activation
function, for the hidden layer, is a smooth, differentiable function (usually
sigmoid). The neurons in the output layer have a linear activation function.
CP8206 – Soft Computing & Machine Intelligence
28
f x x g w xn ii
m
ijj
n
j i( ,..., )11 1
= ⋅ −⎛
⎝⎜
⎞
⎠⎟
= =∑ ∑ω θ
LINEAR
g(x)
Sigmoid function
g xe x( ) =
+ −1
1
g(x)g(x)g(x)g(x)
XnX3X2X1
wij, bij: weights & biases-hidden layer i: number of inputs; 1,…,n j: number of neurons; 1,…,m ωjk: weights-output layer k: number of outputs
w1,1 wn,m
ω1 ωm
Fig. 7: General structure of a Multi-Layer Perceptron network, illustrating the concept of input, hidden & output layers
CP8206 – Soft Computing & Machine Intelligence
29
• MLP
The output of a MLP network, therefore, can be represented as follows:
4444 34444 2144 344 21
43421 ⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜
⎝
⎛
−⋅= ∑∑==
ij
p
jij
M
iip xwgxxF θω
111 ),...,(
output layer output
hidden layer outputinternal activation
(4)
where F(⋅) is the network output, [x1,…,xp] is the input vector having P inputs, M
denotes the number of hidden neurons, w represents the hidden layer connection
weights, θ is the threshold value associated with hidden neurons, & ω represents
the output layer connection weights which in effect serves as coefficients to the
linear output function.
CP8206 – Soft Computing & Machine Intelligence
30
• UNIVERSAL APPROXIMITY It has been proven mathematically that standard multi-layer perceptron networks
using arbitrary squashing functions are capable of approximating any continuous
function from one finite dimensional space to another to any desired degree of
efficiency, provided sufficient hidden neurons are available.
A squashing function is a non-decreasing function that is defined as follows:
σ( ),.
tas tas t
→→ +∞→ −∞
⎧⎨⎩
10
(6)
CP8206 – Soft Computing & Machine Intelligence
31
• UNIVERSAL APPROXIMITY
It has been further shown that approximation can be achieved using any multi-
layer perceptron with only one hidden layer & sigmoid function.
MLPs are a class of universal approximator & can be used successfully to solve
difficult problems in diverse areas using the error back-propagation learning
algorithm.
Furthermore, failure in learning can be attributed to factors such as inadequate
learning, insufficient number of hidden neurons, & non-deterministic nature of
relationship between inputs & outputs.
CP8206 – Soft Computing & Machine Intelligence
32
• THE STONE-WEIERSTRASS THEOREM
to prove that NNs are capable of uniformly approximating any real continuous
function on a compact set to an arbitrary degree of accuracy. This theorem
states that for any given real continuous function, f, on a compact set U⊂Rn,
there exists an NN, F, that is an approximate realization of the function f(⋅):
F x x w xp ii
M
ijj
p
j i( ,..., )11 1
= ⋅ −⎛
⎝⎜
⎞
⎠⎟
= =∑ ∑ω ϕ θ (7)
F x x f x xp p( ,..., ) ( ,..., )1 1− < ε (8)
where X=(x1, x2, …, xn)∈U represents the input space, ε denotes the
approximation error for all { } Uxx p ∈,...,1 & ε is positive very small value.
CP8206 – Soft Computing & Machine Intelligence
33
• LEARNING PROCESS
accomplished through the associations between different I/O patterns.
Regularities & irregularities in the training data are extracted, & consequently are
validated using validation data.
Achieved by stimulating the network using the data representing the fn to be
learned & attempting to optimize a related performance measure.
Assumed that the data represents a system that is deterministic in nature with
unknown probability distributions.
CP8206 – Soft Computing & Machine Intelligence
34
• LEARNING PROCESS
The fashion in which the parameters are adjusted determines the type of
learning. There are two general learning paradigms (Fig. 8):
• Unsupervised learning
• Supervised learning
Unsupervised learning not in the scope of the course, not to be discussed.
CP8206 – Soft Computing & Machine Intelligence
35
Learning Algorithms
UnsupervisedLearning
SupervisedLearning
Kohonen Hebbian
Back Propagation
Widrow-Hoff Rule
Perceptronrule
Associative
Competitive
Self Organizing
Fig. 8: A classification of learning algorithms
CP8206 – Soft Computing & Machine Intelligence
36
• SUPERVISED LEARNING
The organization & training of a neural network by a combination of repeated
presentation of input patterns & their associated output patterns.
Equivalent to adjusting the network weights. In supervised learning, a set of
training data is used to help the network in arriving at appropriate connection
weights.
Can be seen in the conventional delta rule, one of the early supervised
algorithms, that was developed by McCulloch & Pitts, & Rosenblatt. In this
method, a training data set is always available that provides the system ideal
values for output due to a set of known inputs & the goal is to obtain the strength
of each connection in the network.
CP8206 – Soft Computing & Machine Intelligence
37
• BACK-PROPAGATION
The best known supervised learning algorithm.
This learning rule was first developed by Werbos,
improved by Rumelhart et al.
The learning is done on the basis of direct comparison of the output of the
network with known correct answers. An efficient method of computing the change in each connection weight in a
multi-layer network so as to reduce the error in the outputs.
Works by propagating errors backwards from the output layer to the input layer.
CP8206 – Soft Computing & Machine Intelligence
38
• Back-Propagation an efficient method of computing the change in each connection weight in a
multi-layer network so as to reduce the error in the outputs.
The method essentially works by propagating errors backwards from the output
layer to the input layer.
assuming that wji denotes the connection weight from ith neuron to jth, xj signifies
the input to jth neuron, yj represents the corresponding output, dj is the desired
output:
Total input to unit j: jii
ij wyx ∑= (9)
Output from unit j: jxj
ey
−+=
11 (10)
CP8206 – Soft Computing & Machine Intelligence
39
The back-propagation algorithm attempts to minimize the global error which, for
a given set of weights, is the squared difference between the actual and desired
outputs of a unit, i.e.,
( )2,,21∑∑ −=
c jcjcj dyE (11)
where E denotes the global error.
The error derivatives for all weights can be computed by working backwards from
the output units after a case has been presented and given the derivatives, the
weights are updated to reduce the error.
CP8206 – Soft Computing & Machine Intelligence
40
jj+1j+2
ii+1i+2
∂∂
Ey
y dj
j j= −
( ) ( )∂∂
∂∂
∂∂
Ex
Ey
y yyx
y yj j
j jj
jj j= ⋅ − = −1 1;
∂∂
∂∂
∂∂
Ew
Ex
yxw
yji j
ij
ji= ⋅ =;
∂∂
∂∂
∂∂
Ey
Ex
wxy
wi jj
jij
iji= ⋅ =∑ ;
Fig. 9: basic idea of back-propagation learning algorithm
CP8206 – Soft Computing & Machine Intelligence
41
• BACK-PROPAGATION
Consists of two passes; forward and backward.
Forward pass: a training case is presented to the network. The training case
itself consists of an input vector and its associated (desired) output.
Backward pass: starts when the output error, i.e., the difference between the
desired and actual output, is propagated back through and changes are made to
connection weights in order to reduce the output error.
Different training cases are then presented to the network. The process of
presenting epochs of training cases to the network continues until the average
error over the entire training set reaches a defined error goal.
CP8206 – Soft Computing & Machine Intelligence
42
D e f in e n e tw o rk s tru c tu re D e f in e c o n n e c t io n p a tte rn D e f in e a c t iv a t io n fu n c t io n s D e f in e p e r fo rm a n c e
P re p a re t ra in in g d a ta P re p a re v a lid a t io n d a ta
e rro r b a c k p ro p a g a te d th ro u g h th e n e tw o rk , c h a n g e s p ro p o r t io n a l to th e d e r iv a t iv e o f e r ro r w r t w e ig h t ty o b e m a d e to s y n a p t ic w e ig h ts
P ro v id e s t im ilu s f ro m tra in in g s e t to th e n e tw o rk
p e rfo rm a n c e m e a s u re
s a t is fa c to ry ?
fe e d fo rw a rd f lo w o f in fo rm a tio n - g e n e ra te o u tp u t a n d p e r fo rm a n c e m e a s u re
P ro v id e s t im ilu s f ro m v a lid a t io n to th e n e tw o rk
p e rfo rm a n c e m e a s u re
s a t is fa c to ry ?
fe e d fo rw a rd f lo w o f in fo rm a tio n - g e n e ra te o u tp u t a n d p e r fo rm a n c e m e a s u re
e n d o f t ra in in g
Y e s N o
N o
Y e s
Fig. 10: Basic presentation of back-propagation learning algorithm