CP8206 – Soft Computing & Machine Intelligence …asadeghi/teaching/Neural Net 1 V02.pdf ·...

CP8206 – Soft Computing & Machine Intelligence

1

• PRINCIPLE OF ARTIFICIAL NEURAL NETWORKS

Important properties of artificial neural networks will be discussed, namely that,

(i) the underlying principle of artificial neural networks.

(ii) general representation of the neural networks,

(iii) the principles of the error correction algorithm.


2

• ARTIFICIAL INTELLIGENCE & NEURAL NETWORKS

During the past twenty years, interest in applying the results of Artificial

Intelligence (AI) research has been growing rapidly.

AI relates to the development of theories & techniques required for a

computational engine to efficiently perceive, think & act with intelligence in

complex environments. The artificial intelligence discipline is concerned with intelligent computer

systems, exhibiting the characteristics associated with intelligence in human

behavior, such as understanding language, learning, solving problems &

reasoning.


3

• BRANCHES OF AI

Developments in some branches of AI have already led to new technologies

having significant effects in problem solving approaches. These include new

ways of defining the problems, new methods of representing the existing

knowledge regarding the problems & new problem handling methods.

There are several distinctive areas of research in Artificial Intelligence, more

importantly:

• artificial neural networks,

• fuzzy logic systems,

• expert systems,

each with its own specific interest, research techniques, terminology & objectives

(Fig. 1).


4

Genetic Algorithms

Fuzzy Systems

Expert Systems

Fuzzy-ExpertSystems

Neuro-GeneticSystems

Neuro-FuzzySystems

Neural Networks

AI

Fig. 1: Partial Taxonomy of Artificial Intelligence depicting a number of important AI branches & their relationships


5

• NEURAL NETWORKS

Among the various branches of AI, the area of artificial neural networks in

particular has received considerable attention during the twenty years.

An artificial neural network is a massively parallel & distributed processor that

has a natural propensity for storing experimental knowledge & making it available

for use.

The underlying idea is to implement a processor that works in a fashion similar to

the human brain.


6

• NEURAL NETWORKS NN resembles the brain in two respects; first, the knowledge is acquired through

a learning process, & second, inter-neuron connection strengths known as

weights are used to store the knowledge.

The learning process involves modification of the connection weights to obtain a

desired objective.

Major applications of neural networks can be categorized into five groups

including Pattern recognition, image processing, signal processing, system

identification & control.


7

• NEURAL NETWORKS

There are a variety of definitions for artificial neural networks each of which

highlights some aspects of this methodology such as:

• its similarity to its biological counterpart,

• its parallel computation capabilities, &

• its interaction with outside world.

A neural network is a non-programmable dynamic system with capabilities such

as trainability & adaptivity that can be trained to store, process & retrieve

information. It also possesses the ability to learn & to generalize based on past

observations.


8

• NEURAL NETWORKS

Owe their computing power to their parallel/distributed structure & the manner

that the activation functions have been defined. This information processing

ability provides the possibility of solving complex problems.

• Function approximation (I/O mapping): ability to approximate any nonlinear

function to the desired degree.

• Learning & Generalization: ability to learn I/O patterns, extract the hidden

relationship among presented data, & provide acceptable response to new

data that the network has not yet experienced. This enables neural

networks to provide models based on the imprecise information.


9

• NEURAL NETWORKS

• Adaptivity: capable of modifying their memory, & thus its fnality, over time.

• Fault tolerance: due to their highly parallel/distributed structure, failure of a

number of neurons to generate the correct response does not lead to failure

of the overall performance of the system.


10

• NEURAL NETWORKS - DISADVANTAGES

• large dimension that leads to memory restriction;

• selection of optimum configuration;

• convergence difficulty especially when soln is trapped in local minima;

• choice of training methodology;

• black-box representation, lack of explanation capabilities & transparency.


11

• NEURAL NETWORKS

A neural network can be characterized in terms of:

• Neurons: the basic processing units defining the manner in which

computation is performed.

• Neuron activation functions: indicate the function of each neuron.

• Inter-neuron patterns: define the way neurons are connected to each other.

• Learning algorithms: define how the knowledge stored in the network.


12

• NEURON MODEL

NN paradigm attempts to clone the physical structure & functionality of the

biological neuron.

Artificial neurons, like their biological counterparts, receive inputs, [x1, x2, ..., xr],

from the outside or other neurons through incoming connections.

Each neuron then generates a product term, [wixi], using the inputs &

connections weights ([w1, w2, ..., wr], represents the connection memory).

The product terms are then summed using an addition operator to produce the

neuron internal activity index, v(t).


13

• NEURON MODEL

This index is passed to an activation function, ϕ(.), which produces an output,

y(t).

v t w xii

r

i( ) ==∑

1

(1)

( )y t v t( ) ( )= ϕ (2)

A more general model of the neuron functionality can be provided by the

introduction of a threshold measure, w0, for the activation function.


14

• NEURON MODEL

This signifies the scenario where a neuron generates an output if its input is

beyond the threshold (Fig. 2), i.e.,

y t w x wii

r

i( ) ( )= −⎛⎝⎜

⎞⎠⎟=

∑ϕ1

0 (3)

This model is a simple yet useful approximation of the biological neuron & can be

used to develop different neural structures including feedforward & feedback

networks (Fig. 3).


15

∑

wk1

wk1

wk1

wk1

ϕ(.)ykak

xr

x2

x1

±1

aggregationoperation

synaptic operation somatic operation

Fig. 2: nonlinear model of a neuron


16

• TYPES OF ACTIVATION FUNCTIONS

Each neuron includes a nonlinear function, known as the activation function, that

transforms several weighted input signals into a single numerical output signal.

The neuron activation function, ϕ(.) expresses the functionality of the neuron.

There are at least three main classes of activation function, including linear,

sigmoid & Gaussian.

Table 3.1 illustrates different types of activation functions.


17

• NEURAL NETWORK ARCHITECTURES

The manner in which neurons are connected together defines the architecture of

a neural network. These architectures can be classified into two main groups

(Fig. 3):

• Feedforward neural network

• Recurrent neural network


18

Lattice

Neural Networks

Feedforward

Radial BasisFunction

Perceptron

Single Layer Multi Layer

Recurrent

Single Layer Multi Layer

Hopfield Elman

Fig. 3: Classification of different neural network structures


19

• FEEDFORWARD NEURAL NETWORK

The flow of the information is from input to output.

• SINGLE LAYER NETWORK (Fig. 4):

The main body of the structure consists of only one layer (a one-dimensional

vector) of neurons.

Can be considered as a linear association network that relate the output patterns

to input patterns.


20

ϕ(.)

ϕ(.)

ϕ(.)xr

x2

x1

yr

y2

y1

Single Layer of Neurons

Outputs Inputs

Fig. 4: Single layer feedforward neural network


21

• A MULTI-LAYER NETWORK (Fig. 5):

The structure consists of two or more layers of neurons.

The function of the additional layers is to extract higher order statistics.

The network acquires a global perspective despite its local connectivity by virtue

of the extra set of connection connections & the extra dimension of neural

interaction.

Specified by

• The number of I/O, the number of layers,

• Number of neuron in each layer,

• The network connection pattern, &

• The activation function for each layer.


22

ϕ1(.)

ϕ1(.)

ϕ1(.)

ϕ2(.)

ϕ2(.) yq

y1

xp

xp-1

x3

x2

x1

First Layer

Second Layer

Outputs

Inputs

Fig. 5: Multi-Layer feedforward neural network


23

• RECURRENT NEURAL NETWORK

A recurrent structure represents a network in which there is at least one

feedback connection.

Fig. 6 depicts a multi-layer recurrent neural network, which is similar to the

feedforward case except for the presence of the feedback loops & z-1 (unit

delay operator) that introduces the delay involved in feeding back the output to

input.


24

ϕ1(.)

ϕ1(.)

ϕ1(.)

ϕ2(.)

ϕ2(.)

xp

xp-1

x1

yq

y1

First Layer

Second Layer

Outputs

Inputs

z-1

z-1

Feedback connections

Unit delay

Fig. 6: Multi-layer recurrent neural network


25

Table 3.1: Neural Network Activation Functions

Function plot

Piecewise Linear;

f xif x b

a x if x bif X b

( ) .=− < −

<+ >

⎧

⎨⎪

⎩⎪

1

1 -10 -5 0 5 10-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

-10 -5 0 5 10-10

-8

-6

-4

-2

0

2

4

6

8

10

Linear; f(x)=a.x

-10 -5 0 5 10-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Indicator; f(x)=sgn(x)


26

Sigmoid; f xe a x( ) .=

+ −

11

-10 -5 0 5 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-10 -5 0 5 10-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Bipolar Sigmoid; f(x)=tanh(a.x)

-10 -5 0 5 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Gaussian; f x ex

( ) =2

2 2σ


27

• MULTI-LAYER PERCEPTRON (MLP)

A class of NNs that consists of one input layer together with one output layer that

represent the system inputs & outputs, respectively, & one or more hidden layers

that provide the learning capability for the network (Fig. 7).

The basic element of a MLP network is an artificial neuron whose activation

function, for the hidden layer, is a smooth, differentiable function (usually

sigmoid). The neurons in the output layer have a linear activation function.


28

f x x g w xn ii

m

ijj

n

j i( ,..., )11 1

= ⋅ −⎛

⎝⎜

⎞

⎠⎟

= =∑ ∑ω θ

LINEAR

g(x)

Sigmoid function

g xe x( ) =

+ −1

1

g(x)g(x)g(x)g(x)

XnX3X2X1

wij, bij: weights & biases-hidden layer i: number of inputs; 1,…,n j: number of neurons; 1,…,m ωjk: weights-output layer k: number of outputs

w1,1 wn,m

ω1 ωm

Fig. 7: General structure of a Multi-Layer Perceptron network, illustrating the concept of input, hidden & output layers


29

• MLP

The output of a MLP network, therefore, can be represented as follows:

4444 34444 2144 344 21

43421 ⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜

⎝

⎛

−⋅= ∑∑==

ij

p

jij

M

iip xwgxxF θω

111 ),...,(

output layer output

hidden layer outputinternal activation

(4)

where F(⋅) is the network output, [x1,…,xp] is the input vector having P inputs, M

denotes the number of hidden neurons, w represents the hidden layer connection

weights, θ is the threshold value associated with hidden neurons, & ω represents

the output layer connection weights which in effect serves as coefficients to the

linear output function.


30

• UNIVERSAL APPROXIMITY It has been proven mathematically that standard multi-layer perceptron networks

using arbitrary squashing functions are capable of approximating any continuous

function from one finite dimensional space to another to any desired degree of

efficiency, provided sufficient hidden neurons are available.

A squashing function is a non-decreasing function that is defined as follows:

σ( ),.

tas tas t

→→ +∞→ −∞

⎧⎨⎩

10

(6)


31

• UNIVERSAL APPROXIMITY

It has been further shown that approximation can be achieved using any multi-

layer perceptron with only one hidden layer & sigmoid function.

MLPs are a class of universal approximator & can be used successfully to solve

difficult problems in diverse areas using the error back-propagation learning

algorithm.

Furthermore, failure in learning can be attributed to factors such as inadequate

learning, insufficient number of hidden neurons, & non-deterministic nature of

relationship between inputs & outputs.


32

• THE STONE-WEIERSTRASS THEOREM

to prove that NNs are capable of uniformly approximating any real continuous

function on a compact set to an arbitrary degree of accuracy. This theorem

states that for any given real continuous function, f, on a compact set U⊂Rn,

there exists an NN, F, that is an approximate realization of the function f(⋅):

F x x w xp ii

M

ijj

p

j i( ,..., )11 1

= ⋅ −⎛

⎝⎜

⎞

⎠⎟

= =∑ ∑ω ϕ θ (7)

F x x f x xp p( ,..., ) ( ,..., )1 1− < ε (8)

where X=(x1, x2, …, xn)∈U represents the input space, ε denotes the

approximation error for all { } Uxx p ∈,...,1 & ε is positive very small value.


33

• LEARNING PROCESS

accomplished through the associations between different I/O patterns.

Regularities & irregularities in the training data are extracted, & consequently are

validated using validation data.

Achieved by stimulating the network using the data representing the fn to be

learned & attempting to optimize a related performance measure.

Assumed that the data represents a system that is deterministic in nature with

unknown probability distributions.


34

• LEARNING PROCESS

The fashion in which the parameters are adjusted determines the type of

learning. There are two general learning paradigms (Fig. 8):

• Unsupervised learning

• Supervised learning

Unsupervised learning not in the scope of the course, not to be discussed.


35

Learning Algorithms

UnsupervisedLearning

SupervisedLearning

Kohonen Hebbian

Back Propagation

Widrow-Hoff Rule

Perceptronrule

Associative

Competitive

Self Organizing

Fig. 8: A classification of learning algorithms


36

• SUPERVISED LEARNING

The organization & training of a neural network by a combination of repeated

presentation of input patterns & their associated output patterns.

Equivalent to adjusting the network weights. In supervised learning, a set of

training data is used to help the network in arriving at appropriate connection

weights.

Can be seen in the conventional delta rule, one of the early supervised

algorithms, that was developed by McCulloch & Pitts, & Rosenblatt. In this

method, a training data set is always available that provides the system ideal

values for output due to a set of known inputs & the goal is to obtain the strength

of each connection in the network.


37

• BACK-PROPAGATION

The best known supervised learning algorithm.

This learning rule was first developed by Werbos,

improved by Rumelhart et al.

The learning is done on the basis of direct comparison of the output of the

network with known correct answers. An efficient method of computing the change in each connection weight in a

multi-layer network so as to reduce the error in the outputs.

Works by propagating errors backwards from the output layer to the input layer.


38

• Back-Propagation an efficient method of computing the change in each connection weight in a

multi-layer network so as to reduce the error in the outputs.

The method essentially works by propagating errors backwards from the output

layer to the input layer.

assuming that wji denotes the connection weight from ith neuron to jth, xj signifies

the input to jth neuron, yj represents the corresponding output, dj is the desired

output:

Total input to unit j: jii

ij wyx ∑= (9)

Output from unit j: jxj

ey

−+=

11 (10)


39

The back-propagation algorithm attempts to minimize the global error which, for

a given set of weights, is the squared difference between the actual and desired

outputs of a unit, i.e.,

( )2,,21∑∑ −=

c jcjcj dyE (11)

where E denotes the global error.

The error derivatives for all weights can be computed by working backwards from

the output units after a case has been presented and given the derivatives, the

weights are updated to reduce the error.


40

jj+1j+2

ii+1i+2

∂∂

Ey

y dj

j j= −

( ) ( )∂∂

∂∂

∂∂

Ex

Ey

y yyx

y yj j

j jj

jj j= ⋅ − = −1 1;

∂∂

∂∂

∂∂

Ew

Ex

yxw

yji j

ij

ji= ⋅ =;

∂∂

∂∂

∂∂

Ey

Ex

wxy

wi jj

jij

iji= ⋅ =∑ ;

Fig. 9: basic idea of back-propagation learning algorithm


41

• BACK-PROPAGATION

Consists of two passes; forward and backward.

Forward pass: a training case is presented to the network. The training case

itself consists of an input vector and its associated (desired) output.

Backward pass: starts when the output error, i.e., the difference between the

desired and actual output, is propagated back through and changes are made to

connection weights in order to reduce the output error.

Different training cases are then presented to the network. The process of

presenting epochs of training cases to the network continues until the average

error over the entire training set reaches a defined error goal.


42

D e f in e n e tw o rk s tru c tu re D e f in e c o n n e c t io n p a tte rn D e f in e a c t iv a t io n fu n c t io n s D e f in e p e r fo rm a n c e

P re p a re t ra in in g d a ta P re p a re v a lid a t io n d a ta

e rro r b a c k p ro p a g a te d th ro u g h th e n e tw o rk , c h a n g e s p ro p o r t io n a l to th e d e r iv a t iv e o f e r ro r w r t w e ig h t ty o b e m a d e to s y n a p t ic w e ig h ts

P ro v id e s t im ilu s f ro m tra in in g s e t to th e n e tw o rk

p e rfo rm a n c e m e a s u re

s a t is fa c to ry ?

fe e d fo rw a rd f lo w o f in fo rm a tio n - g e n e ra te o u tp u t a n d p e r fo rm a n c e m e a s u re

P ro v id e s t im ilu s f ro m v a lid a t io n to th e n e tw o rk

p e rfo rm a n c e m e a s u re

s a t is fa c to ry ?

fe e d fo rw a rd f lo w o f in fo rm a tio n - g e n e ra te o u tp u t a n d p e r fo rm a n c e m e a s u re

e n d o f t ra in in g

Y e s N o

N o

Y e s

Fig. 10: Basic presentation of back-propagation learning algorithm

CP8206 – Soft Computing & Machine Intelligence …asadeghi/teaching/Neural Net 1 V02.pdf ·...

Documents

Transcript of CP8206 – Soft Computing & Machine Intelligence …asadeghi/teaching/Neural Net 1 V02.pdf ·...