CS407 Neural Computation
Lecture 2: Neurobiology and Architectures of ANNs
Lecturer: A/Prof. M. Bennamoun
NERVOUS SYSTEM & HUMAN BRAIN
Organization of the nervous system
Central Nervous System
Spinal cord
Hindbrain & MidbrainBrain stem & cerebellum
ThalamusHypothalamusLimbic system
Sub-cortical structures
Frontal, parietal,occipital & temporal lobesin left & right hemispheres
Cortex
Forebrain
Brain
Central Nervous System
THE BIOLOGICAL NEURON
The Structure of Neurons
axon
cell body
synapse
nucleus
dendrites
axon
cell body
synapse
nucleus
dendrites
The Structure of Neurons
A neuron has a cell body, a branching input structure (the dendrIte) and a branching output structure (the axOn)
• Axons connect to dendrites via synapses.• Electro-chemical signals are propagated
from the dendritic input, through the cell body, and down the axon to other neurons
The Structure of Neurons
• A neuron only fires if its input signal exceeds a certain amount (the threshold) in a short time period.
• Synapses vary in strength– Good connections allowing a large signal– Slight connections allow only a weak signal.– Synapses can be either excitatory or
inhibitory.
Neurotransmission
http://www.health.org/pubs/qdocs/mom/TG/intro.htm
Neurons come in many shapes & sizes
The brain’s plasticityThe ability of the brain to alter its neural pathways.
Recovery from brain damage– Dead neurons are not replaced, but branches of the
axons of healthy neurons can grow into the pathways and take over the functions of damaged neurons.
– Equipotentiality: more than one area of the brain may be able to control a given function.
– The younger the person, the better the recovery (e.g. recovery from left hemispherectomy).
THE ARTIFICIAL NEURONModel
Models of NeuronNeuron is an information processing unitA set of synapses or connecting links– characterized by weight or strength
An adder– summing the input signals weighted by synapses– a linear combiner
An activation function– also called squashing function
• squash (limits) the output to some finite values
Nonlinear model of a neuron (I)wk1
x1
wk2x2
wkmxm
... ... Σ
Biasbk
ϕ(.)vk
Inputsignal
Synapticweights
Summingjunction
Activationfunction
Outputyk
bxwv kj
m
jkjk +=∑
=1
)(vy kkϕ=
Analogy
• Inputs represent synapses• Weights represent the strengths of
synaptic links • Wi represents dentrite secretion• Summation block represents the addition
of the secretions• Output represents axon voltage
Nonlinear model of a neuron (II)
wk1x1
wk2x2
wkmxm
... ... Σ ϕ(.)vk
Inputsignal
Synapticweights
Summingjunction
Activationfunction
wk0X0 = +1 Wk0 = bk (bias)
Outputyk
xwv j
m
jkjk ∑
=
=0
)(vy kkϕ=
THE ARTIFICIAL NEURONActivation Function
Types of Activation Function
ini
Oj
+1
t ini
Oj
+1
The hard-limiting Threshold Function
Piecewise-linearFunction
Sigmoid Function(differentiable)
Oj
+1
init
Corresponds to the biological paradigm: either fires or not
('S'-shaped curves)
)exp(11)(
avv
−+=ϕ
≤>
=tintin
inO01
)( a is slope parameter
Activation Functions...Threshold or step function (McCulloch & Pitts model)Linear: neurons using a linear activation function are called in the literature ADALINEs (Widrow 1960)Sigmoidal functions: functions which more exactly describe non-linear functions of the biological neurons.
y
Activation Functions...sigmoid
0
1β
)exp(11)(
vv
ββϕ −+=
1 2β
v
)(1)()(
0)()(1)()(
ννϕν
β
ϕβν
νϕνβ
β
β
β
→ ∞→
→−∞→
→∞→
thenfixedand
iii
vtheniitheniif
1(v) is the modified Heaviside function
Activation Functions... sigmoid
1H(s) is the Heaviside function
<≥
=0001
)(1νν
νH )(1 νH
1(v) is the modified Heaviside function
=<>
=02/1
0001
)(1ννν
ν
v
1
11/2
)(1ν
s
Activation Function value range
vi
+1
-1
Signum Function (sign)
vi
+1
Hyperbolic tangent Function
)tanh()( vv =ϕ
Stochastic Model of a Neuron• So far we have introduced only deterministic
models of ANNs.• A stochastic (probabilistic) model can also be
defined.• If x denotes the state of a neuron, then P(v)
denotes the prob. of firing a neuron, where v is the induced activation potential (bias + linear combination).
Tv
evP
−+
=1
1)(
Stochastic Model of a Neuron…• Where T is a pseudo-temperature used to
control the noise level (and therefore the uncertainty in firing)
Stochastic model deterministic model
0→T
<−≥+
=0101
vv
x
DECISION BOUNDARIES
Decision boundaries
• In simple cases, divide feature space by drawing a hyperplane across it.
• Known as a decision boundary.• Discriminant function: returns different values
on opposite sides. (straight line)
• Problems which can be thus classified are linearly separable.
E.g. Decision Surface of a Perceptron
+
+-
-
x1
x2
Non-Linearly separable
• Perceptron is able to represent some useful functions• AND(x1,x2) choose weights w0=-1.5, w1=1, w2=1• But functions that are not linearly separable (e.g. XOR)
are not representable
+
++
+ -
-
-
-
x2
Linearly separable
x1
Linear Separability
X2
X1B
A(x1,y1)
B
B
BB
DecisionDecisionBoundaryBoundary
A(x2,y2)
A(x4,y4)
A(x3,y3)
A(x6,y6) B
(x11,y11)
B(x8,y8)
B(x10,y10)
21
2
12 w
txwwx +−=
A(x5,y5)
A(x7,y7)
Rugby players & Ballet dancers
2
1
50 120
Height (m)
Weight (Kg)
Rugby ?Rugby ?
Ballet?Ballet?
Training the neuron
x2
x1
X0=-1W0 = t=?
W2 = ?
W1 = ?
+1
-1
Σv t
<−=>
=01
0001
)(ννν
νf
v
twxwxwxwx =−==++ 00331100 ;10
It is clear that:
twxwxiffByxtwxwxiffAyx
<+∈>+∈
2211
2211
),(),( Finding wi is called
learning
THE ARTIFICIAL NEURONLearning
Supervised Learning–The desired response of the system is provided by ateacher, e.g., the distance ρ[d,o] as as error measure– Estimate the negative error gradient direction and
reduce the error accordingly– Modify the synaptic weights to reduce the stochastic
minimization of error in multidimensional weight space
Unsupervised Learning (Learning without a teacher)
–The desired response is unknown, no explicit error information can be used to improve network behavior. E.g. finding the cluster boundaries of input pattern–Suitable weight self-adaptation mechanisms have toembedded in the trained network
Training
Linear threshold is used. W - weight valuet - threshold value
1 if Σ wi xi >tOutput=
0 otherwise{ i=0
Simple network
t = 0.0
Y
X
W1 = 1.5
W3 = 1
-1
W2 = 1
1 if Σ wi xi >tOutput=
0 otherwiseAND with a Biased inputAND with a Biased input
Learning algorithmWhile epoch produces an error
Present network with next inputs from epoch
Error = T – OIf Error <> 0 then
Wj = Wj + LR * Ij * ErrorEnd If
End While
Learning algorithm
Epoch : Presentation of the entire training set to the neural network. In the case of the AND function an epoch consists of four sets of inputs being presented to the network (i.e. [0,0], [0,1], [1,0], [1,1])
Error: The error value is the amount by which the value output by the network differs from the target value. For example, if we required the network to output 0 and it output a 1, then Error = -1
Learning algorithm
Target Value, T : When we are training a network we not only present it with the input but also with a value that we require the network to produce. For example, if we present the network with [1,1] for the AND function the target value will be 1
Output , O : The output value from the neuron
Ij : Inputs being presented to the neuron
Wj : Weight from input neuron (Ij) to the output neuron
LR : The learning rate. This dictates how quickly the network converges. It is set by a matter of experimentation. It is typically 0.1
Training the neuronFor ANDA B Output0 0 00 1 01 0 01 1 1
t = 0.0
y
x
-1W1 = ?
W3 = ?
W2 = ?
••What are the weight values? What are the weight values? ••Initialize with random weight valuesInitialize with random weight values
Training the neuronFor ANDA B Output0 0 00 1 01 0 01 1 1
t = 0.0
y
x
-1W1 = 0.3
W3 =-0.4
W2 = 0.5
I1 I2 I3 Summation Output -1 0 0 (-1*0.3) + (0*0.5) + (0*-0.4) = -0.3 0 -1 0 1 (-1*0.3) + (0*0.5) + (1*-0.4) = -0.7 0 -1 1 0 (-1*0.3) + (1*0.5) + (0*-0.4) = 0.2 1 -1 1 1 (-1*0.3) + (1*0.5) + (1*-0.4) = -0.2 0
Learning in Neural Networks
Learn values of weights from I/O pairsStart with random weightsLoad training example’s inputObserve computed inputModify weights to reduce differenceIterate over all training examplesTerminate when weights stop changing OR when error is very small
NETWORK ARCHITECTURE/TOPOLOGY
Network ArchitectureSingle-layer Feedforward Networks– input layer and output layer
• single (computation) layer
– feedforward, acyclicMultilayer Feedforward Networks– hidden layers - hidden neurons and hidden units– enables to extract high order statistics– 10-4-2 network, 100-30-10-3 network– fully connected layered network
Recurrent Networks– at least one feedback loop– with or without hidden neuron
Network ArchitectureMultiple layerfully connectedSingle layer
Unit delayoperator
Recurrent networkwithout hidden units
inputs
outputs
{
}
Recurrent networkwith hidden units
Feedforward Networks (static)InputLayer
HiddenLayers
OutputLayer
Feedforward Networks…• One I/P and one O/P layer• One or more hidden layers• Each hidden layer is built from artificial
neurons• Each element of the preceding layer is
connected with each element of the next layer.
• There is no interconnection between artificial neurons from the same layer.
• Finding weights is a task which has to be done depending on which solution problem is to be performed by a specific network.
Feedback Networks(Recurrent or dynamic systems)
OutputLayer
InputLayer
HiddenLayers
Feedback Networks …(Recurrent or dynamic systems)
• The interconnections go in two directions between ANNs or with the feedback.
• Boltzmann machine is an example of recursive nets which is a generalization of Hopfield nets. Other example of recursive nets: Adaptive Resonance Theory (ART) nets.
Neural network as directed Graph
x0 = +1
...
vk
Wk0 = bk
wk1
wk2
wkm
x1
ϕ(.) ykx2
xm
Neural network as directed Graph…
Block diagram can be simplify by the idea of signal flow graphnode is associated with signaldirected link is associated with transfer function– synaptic links
• governed by linear input-output relation• signal xj is multiplied by synaptic weight wkj
– activation links• governed by nonlinear input-output relation• nonlinear activation function
FeedbackOutput determines in part own output via feedback
depending on w– stable, linear divergence, exponential
divergence– we are interested in the case of |w| <1 ; infinite
memory• output depends on inputs of infinite past
NN with feedback loop : recurrent network
xj(n)
xj’(n)
w yk(n)
z-1 )()(0
1 lnnk xwy ji
l −=∑∞
=
+
NEURAL PROCESSING
Neural Processing• Recall
– The process of computation of an output o for a given input x performed by the ANN.
– It’s objective is to retrieve the information, i.e., to decode the stored content which must have been encoded in the network previously
• Autoassociation– A network is presented a pattern similar to a
member of the stored set, autoassociationassociates the input pattern with the closest stored pattern.
Neural Processing…Autoassociation: reconstruction of
incomplete or noisy image.• Heteroassociation:
– The network associates the input pattern with pairs of patterns stored
Neural Processing…Classification
– A set of patterns is already divided into a number of classes, or categories
- When an input pattern is presented, the classifier recalls the information regarding the class membership of the input pattern
– The classes are expressed by discrete-valued output vectors, thus the output neurons of the classifier employ binary activation functions
– A special case of heteroassociation
•RecognitionIf the desired response is the class number, but the input pattern doesn’t exactly corresponding to any of the patterns in the stored set
Neural Processing…
Neural Processing…
• Clustering– Unsupervised classification of patterns/objects
without providing information about the actual classes
– The network must discover for itself any existing patterns, regularities, separating properties, etc.
– While discovering these, the network undergoes change of its parameters, which is called Self-organization
Neural Processing…patterns stored
SummaryParallel distributed processing (especially a hardware based neural net) is a good approach for complex pattern recognition (e.g. image recognition, forecasting, text retrieval, optimization)
Less need to determine relevant factors a priori when building a neural networkLots of training data are neededHigh tolerance to noisy data. In fact, noisy data enhance post-training performanceDifficult to verify or discern learned relationships even with special knowledge extraction utilities developed for neural nets
References:1. ICS611 Foundations of Artificial Intelligence, Lecture
notes, Univ. of Nairobi, Kenya: Learning –http://www.uonbi.ac.ke/acad_depts/ics/course_material-
1. Berlin Chen Lecture notes: Normal University, Taipei, Taiwan, ROC. http://140.122.185.120-
2. Lecture notes on Biology of Behaviour, PYB012- Psychology, by James Freeman, QUT.
3. Jarl Giske Lecture notes: University of Bergen Norway, http://www.ifm.uib.no/staff/giske/
4. Denis Riordan Lecture notes, DalhousieUniv.:http://www.cs.dal.ca/~riordan/
5. Artificial Neural Networks (ANN) by David Christiansen: http://www.pa.ash.org.au/qsite/conferences/conf2000/moreinfo.asp?paperid=95
References:•Jin Hyung Kim, KAIST Computer Science Dept., CS679 Neural Network lecture notes http://ai.kaist.ac.kr/~jkim/cs679/detail.htm
Top Related