Download - CS407 Neural Computation...Feedforward Networks… • One I/P and one O/P layer • One or more hidden layers • Each hidden layer is built from artificial neurons • Each element

CS407 Neural Computation

Lecture 2: Neurobiology and Architectures of ANNs

Lecturer: A/Prof. M. Bennamoun

NERVOUS SYSTEM & HUMAN BRAIN

Organization of the nervous system

Central Nervous System

Spinal cord

Hindbrain & MidbrainBrain stem & cerebellum

ThalamusHypothalamusLimbic system

Sub-cortical structures

Frontal, parietal,occipital & temporal lobesin left & right hemispheres

Cortex

Forebrain

Brain

Central Nervous System

THE BIOLOGICAL NEURON

The Structure of Neurons

axon

cell body

synapse

nucleus

dendrites

axon

cell body

synapse

nucleus

dendrites


A neuron has a cell body, a branching input structure (the dendrIte) and a branching output structure (the axOn)

• Axons connect to dendrites via synapses.• Electro-chemical signals are propagated

from the dendritic input, through the cell body, and down the axon to other neurons


• A neuron only fires if its input signal exceeds a certain amount (the threshold) in a short time period.

• Synapses vary in strength– Good connections allowing a large signal– Slight connections allow only a weak signal.– Synapses can be either excitatory or

inhibitory.

Neurotransmission

http://www.health.org/pubs/qdocs/mom/TG/intro.htm

Neurons come in many shapes & sizes

The brain’s plasticityThe ability of the brain to alter its neural pathways.

Recovery from brain damage– Dead neurons are not replaced, but branches of the

axons of healthy neurons can grow into the pathways and take over the functions of damaged neurons.

– Equipotentiality: more than one area of the brain may be able to control a given function.

– The younger the person, the better the recovery (e.g. recovery from left hemispherectomy).

THE ARTIFICIAL NEURONModel

Models of NeuronNeuron is an information processing unitA set of synapses or connecting links– characterized by weight or strength

An adder– summing the input signals weighted by synapses– a linear combiner

An activation function– also called squashing function

• squash (limits) the output to some finite values

Nonlinear model of a neuron (I)wk1

x1

wk2x2

wkmxm

... ... Σ

Biasbk

ϕ(.)vk

Inputsignal

Synapticweights

Summingjunction

Activationfunction

Outputyk

bxwv kj

m

jkjk +=∑

=1

)(vy kkϕ=

Analogy

• Inputs represent synapses• Weights represent the strengths of

synaptic links • Wi represents dentrite secretion• Summation block represents the addition

of the secretions• Output represents axon voltage

Nonlinear model of a neuron (II)

wk1x1

wk2x2

wkmxm

... ... Σ ϕ(.)vk

Inputsignal

Synapticweights

Summingjunction

Activationfunction

wk0X0 = +1 Wk0 = bk (bias)

Outputyk

xwv j

m

jkjk ∑

=

=0

)(vy kkϕ=

THE ARTIFICIAL NEURONActivation Function

Types of Activation Function

ini

Oj

+1

t ini

Oj

+1

The hard-limiting Threshold Function

Piecewise-linearFunction

Sigmoid Function(differentiable)

Oj

+1

init

Corresponds to the biological paradigm: either fires or not

('S'-shaped curves)

)exp(11)(

avv

−+=ϕ

≤>

=tintin

inO01

)( a is slope parameter

Activation Functions...Threshold or step function (McCulloch & Pitts model)Linear: neurons using a linear activation function are called in the literature ADALINEs (Widrow 1960)Sigmoidal functions: functions which more exactly describe non-linear functions of the biological neurons.

y

Activation Functions...sigmoid

0

1β

)exp(11)(

vv

ββϕ −+=

1 2β

v

)(1)()(

0)()(1)()(

ννϕν

β

ϕβν

νϕνβ

β

β

β

→ ∞→

→−∞→

→∞→

thenfixedand

iii

vtheniitheniif

1(v) is the modified Heaviside function

Activation Functions... sigmoid

1H(s) is the Heaviside function

<≥

=0001

)(1νν

νH )(1 νH

1(v) is the modified Heaviside function

=<>

=02/1

0001

)(1ννν

ν

v

1

11/2

)(1ν

s

Activation Function value range

vi

+1

-1

Signum Function (sign)

vi

+1

Hyperbolic tangent Function

)tanh()( vv =ϕ

Stochastic Model of a Neuron• So far we have introduced only deterministic

models of ANNs.• A stochastic (probabilistic) model can also be

defined.• If x denotes the state of a neuron, then P(v)

denotes the prob. of firing a neuron, where v is the induced activation potential (bias + linear combination).

Tv

evP

−+

=1

1)(

Stochastic Model of a Neuron…• Where T is a pseudo-temperature used to

control the noise level (and therefore the uncertainty in firing)

Stochastic model deterministic model

0→T

<−≥+

=0101

vv

x

DECISION BOUNDARIES

Decision boundaries

• In simple cases, divide feature space by drawing a hyperplane across it.

• Known as a decision boundary.• Discriminant function: returns different values

on opposite sides. (straight line)

• Problems which can be thus classified are linearly separable.

E.g. Decision Surface of a Perceptron

+

+-

-

x1

x2

Non-Linearly separable

• Perceptron is able to represent some useful functions• AND(x1,x2) choose weights w0=-1.5, w1=1, w2=1• But functions that are not linearly separable (e.g. XOR)

are not representable

+

++

+ -

-

-

-

x2

Linearly separable

x1

Linear Separability

X2

X1B

A(x1,y1)

B

B

BB

DecisionDecisionBoundaryBoundary

A(x2,y2)

A(x4,y4)

A(x3,y3)

A(x6,y6) B

(x11,y11)

B(x8,y8)

B(x10,y10)

21

2

12 w

txwwx +−=

A(x5,y5)

A(x7,y7)

Rugby players & Ballet dancers

2

1

50 120

Height (m)

Weight (Kg)

Rugby ?Rugby ?

Ballet?Ballet?

Training the neuron

x2

x1

X0=-1W0 = t=?

W2 = ?

W1 = ?

+1

-1

Σv t

<−=>

=01

0001

)(ννν

νf

v

twxwxwxwx =−==++ 00331100 ;10

It is clear that:

twxwxiffByxtwxwxiffAyx

<+∈>+∈

2211

2211

),(),( Finding wi is called

learning

THE ARTIFICIAL NEURONLearning

Supervised Learning–The desired response of the system is provided by ateacher, e.g., the distance ρ[d,o] as as error measure– Estimate the negative error gradient direction and

reduce the error accordingly– Modify the synaptic weights to reduce the stochastic

minimization of error in multidimensional weight space

Unsupervised Learning (Learning without a teacher)

–The desired response is unknown, no explicit error information can be used to improve network behavior. E.g. finding the cluster boundaries of input pattern–Suitable weight self-adaptation mechanisms have toembedded in the trained network

Training

Linear threshold is used. W - weight valuet - threshold value

1 if Σ wi xi >tOutput=

0 otherwise{ i=0

Simple network

t = 0.0

Y

X

W1 = 1.5

W3 = 1

-1

W2 = 1

1 if Σ wi xi >tOutput=

0 otherwiseAND with a Biased inputAND with a Biased input

Learning algorithmWhile epoch produces an error

Present network with next inputs from epoch

Error = T – OIf Error <> 0 then

Wj = Wj + LR * Ij * ErrorEnd If

End While

Learning algorithm

Epoch : Presentation of the entire training set to the neural network. In the case of the AND function an epoch consists of four sets of inputs being presented to the network (i.e. [0,0], [0,1], [1,0], [1,1])

Error: The error value is the amount by which the value output by the network differs from the target value. For example, if we required the network to output 0 and it output a 1, then Error = -1

Learning algorithm

Target Value, T : When we are training a network we not only present it with the input but also with a value that we require the network to produce. For example, if we present the network with [1,1] for the AND function the target value will be 1

Output , O : The output value from the neuron

Ij : Inputs being presented to the neuron

Wj : Weight from input neuron (Ij) to the output neuron

LR : The learning rate. This dictates how quickly the network converges. It is set by a matter of experimentation. It is typically 0.1

Training the neuronFor ANDA B Output0 0 00 1 01 0 01 1 1

t = 0.0

y

x

-1W1 = ?

W3 = ?

W2 = ?

••What are the weight values? What are the weight values? ••Initialize with random weight valuesInitialize with random weight values

Training the neuronFor ANDA B Output0 0 00 1 01 0 01 1 1

t = 0.0

y

x

-1W1 = 0.3

W3 =-0.4

W2 = 0.5

I1 I2 I3 Summation Output -1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0 -1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0 -1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1 -1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0

Learning in Neural Networks

Learn values of weights from I/O pairsStart with random weightsLoad training example’s inputObserve computed inputModify weights to reduce differenceIterate over all training examplesTerminate when weights stop changing OR when error is very small

NETWORK ARCHITECTURE/TOPOLOGY

Network ArchitectureSingle-layer Feedforward Networks– input layer and output layer

• single (computation) layer

– feedforward, acyclicMultilayer Feedforward Networks– hidden layers - hidden neurons and hidden units– enables to extract high order statistics– 10-4-2 network, 100-30-10-3 network– fully connected layered network

Recurrent Networks– at least one feedback loop– with or without hidden neuron

Network ArchitectureMultiple layerfully connectedSingle layer

Unit delayoperator

Recurrent networkwithout hidden units

inputs

outputs

{

}

Recurrent networkwith hidden units

Feedforward Networks (static)InputLayer

HiddenLayers

OutputLayer

Feedforward Networks…• One I/P and one O/P layer• One or more hidden layers• Each hidden layer is built from artificial

neurons• Each element of the preceding layer is

connected with each element of the next layer.

• There is no interconnection between artificial neurons from the same layer.

• Finding weights is a task which has to be done depending on which solution problem is to be performed by a specific network.

Feedback Networks(Recurrent or dynamic systems)

OutputLayer

InputLayer

HiddenLayers

Feedback Networks …(Recurrent or dynamic systems)

• The interconnections go in two directions between ANNs or with the feedback.

• Boltzmann machine is an example of recursive nets which is a generalization of Hopfield nets. Other example of recursive nets: Adaptive Resonance Theory (ART) nets.

Neural network as directed Graph

x0 = +1

...

vk

Wk0 = bk

wk1

wk2

wkm

x1

ϕ(.) ykx2

xm

Neural network as directed Graph…

Block diagram can be simplify by the idea of signal flow graphnode is associated with signaldirected link is associated with transfer function– synaptic links

• governed by linear input-output relation• signal xj is multiplied by synaptic weight wkj

– activation links• governed by nonlinear input-output relation• nonlinear activation function

FeedbackOutput determines in part own output via feedback

depending on w– stable, linear divergence, exponential

divergence– we are interested in the case of |w| <1 ; infinite

memory• output depends on inputs of infinite past

NN with feedback loop : recurrent network

xj(n)

xj’(n)

w yk(n)

z-1 )()(0

1 lnnk xwy ji

l −=∑∞

=

+

NEURAL PROCESSING

Neural Processing• Recall

– The process of computation of an output o for a given input x performed by the ANN.

– It’s objective is to retrieve the information, i.e., to decode the stored content which must have been encoded in the network previously

• Autoassociation– A network is presented a pattern similar to a

member of the stored set, autoassociationassociates the input pattern with the closest stored pattern.

Neural Processing…Autoassociation: reconstruction of

incomplete or noisy image.• Heteroassociation:

– The network associates the input pattern with pairs of patterns stored

Neural Processing…Classification

– A set of patterns is already divided into a number of classes, or categories

- When an input pattern is presented, the classifier recalls the information regarding the class membership of the input pattern

– The classes are expressed by discrete-valued output vectors, thus the output neurons of the classifier employ binary activation functions

– A special case of heteroassociation

•RecognitionIf the desired response is the class number, but the input pattern doesn’t exactly corresponding to any of the patterns in the stored set

Neural Processing…

Neural Processing…

• Clustering– Unsupervised classification of patterns/objects

without providing information about the actual classes

– The network must discover for itself any existing patterns, regularities, separating properties, etc.

– While discovering these, the network undergoes change of its parameters, which is called Self-organization

Neural Processing…patterns stored

SummaryParallel distributed processing (especially a hardware based neural net) is a good approach for complex pattern recognition (e.g. image recognition, forecasting, text retrieval, optimization)

Less need to determine relevant factors a priori when building a neural networkLots of training data are neededHigh tolerance to noisy data. In fact, noisy data enhance post-training performanceDifficult to verify or discern learned relationships even with special knowledge extraction utilities developed for neural nets

References:1. ICS611 Foundations of Artificial Intelligence, Lecture

notes, Univ. of Nairobi, Kenya: Learning –http://www.uonbi.ac.ke/acad_depts/ics/course_material-

1. Berlin Chen Lecture notes: Normal University, Taipei, Taiwan, ROC. http://140.122.185.120-

2. Lecture notes on Biology of Behaviour, PYB012- Psychology, by James Freeman, QUT.

3. Jarl Giske Lecture notes: University of Bergen Norway, http://www.ifm.uib.no/staff/giske/

4. Denis Riordan Lecture notes, DalhousieUniv.:http://www.cs.dal.ca/~riordan/

5. Artificial Neural Networks (ANN) by David Christiansen: http://www.pa.ash.org.au/qsite/conferences/conf2000/moreinfo.asp?paperid=95

References:•Jin Hyung Kim, KAIST Computer Science Dept., CS679 Neural Network lecture notes http://ai.kaist.ac.kr/~jkim/cs679/detail.htm