Neural Networks THE PERCEPTRON AND LEARNING
The Biological Neuron 2
The Biological Neuron 3
The Biological Neuron
The Brain is a collection of about 10 billion interconnected neurons. Each neuron is a cell that uses biochemical reactions to receive, process and transmit information.
Each terminal button is connected to other neurons across a small gap called a synapse.
A neuron's dendritic tree is connected to a thousand neighboring neurons. When one of those neurons fire, a positive or negative charge is received by one of the dendrites. The strengths of all the received charges are added together through the processes of spatial and temporal summation.
4
Model of a neuron
Each neuron within the network is usually a simple processing unit which takes one or more inputs through synapses (connecting links). Every input has an associated weight (strength) which modifies the strength of each input.
The adder simply adds together all the inputs and calculates an output to be passed on.
An activation function exists for limiting the output.
Neural computing requires a number of neurons, to be connected together into a neural network. Neurons are arranged in layers.
5
Model of a neuron
In mathematical terms, = = = � +
= linear combiner output
= + =induced local field
(activation potential)
6
Affine transformation by
the bias If we want, we can consider
the bias as just another input,
with
� = +
= �
7
Types of activation
functions
Linear Transfer Function
8
The activation function is generally non‐linear.
Linear functions are
limited because the
output is simply
proportional to the
input.
Types of activation
functions
Symmetric Hard Limit Transformation Function
9
Types of activation
functions
Threshold function
� = �� ≥ �� <
Commonly known as Heaviside function
10
Types of activation
functions
Satlin Transfer Function
11
Types of activation
functions
Tan Sigmoid Function
12
Types of activation
functions
Sigmoid function
One example – logistic function
� = +exp −��
13
You can see that the
function gets closer to
the threshold function
as the value of a
increases.
Types of activation
functions
Gauss Function
14
Rules of knowledge
representation
15
Knowledge
representation: Rules
Rule 1: Similar inputs from similar classes should usually produce similar representations inside the network, and should be classified into the same class.
Problem: how do you define similarity?
Input Vector = , , ……… �
Euclidian distance , = − = −=
16
Knowledge
representation: Rules
Rule 1: Similar inputs from similar classes should usually produce similar representations inside the network, and should be classified into the same class.
Problem: how do you define similarity?
Input Vector = , , ……… �
Dot product , = � = =
17
Usually we
normalize the
vectors to have
unit length. = =
Knowledge
representation: Rules
Rule 1: Similar inputs from similar classes should usually
produce similar representations inside the network, and
should be classified into the same class.
Problem: how do you define similarity?
As approaches
− →
� →
18
Knowledge
representation: Rules
Rule 2: Items to be classified as separate classes should be
given widely different representations in the network.
19
Knowledge
representation: Rules
Rule 3: If a particular feature is important, then there should
be a large number of neurons involved in representing it.
20
Knowledge
representation: Rules
Rule 4: Prior information and invariances should be built into
the design of the neural network whenever possible.
This would simplify the design of NN by not having to learn
additional information.
Less free parameters to learn
Information transmission is faster
Cost is reduced
21
Knowledge
representation: Rules
Rule 4: Prior information and invariances should be built into
the design of the neural network whenever possible.
How to build prior information into NN?
Unfortunately, there are no well-defined rules to do this.
Some rules of thumb:
Restrict the network architecture – usually to local connections
called receptive fields
Constrain the choice of synaptic weights – usually achieved
through weight sharing
22
Knowledge representation:
Invariances
The network should be invariant to trivial transformations of
the inputs.
E.g. rotation of a picture
Techniques:
Invariance by structure
Pick a structure that isn’t sensitive to the meaningless transformations of the input
Invariance by training
Let the classifier learn invariances
Invariance by feature space
pick a feature set that is invariant to the transformations
23
Supervised Learning &
Unsupervised learning
24
Classical conditioning:
Pavlov's dog
25
Classical conditioning:
Pavlov's dog
26
Classical conditioning:
Pavlov's dog
27
Classical conditioning:
Pavlov's dog
28
Supervised learning
In supervised training, both the inputs and the outputs are provided.
The network then processes the inputs and compares its resulting outputs against the desired outputs.
Errors are then propagated back through the system, causing the system to adjust the weights which control the network.
This process occurs over and over as the weights are continually tweaked.
The set of data which enables the training is called the training set.
During the training of a network the same set of data is processed many times as the connection weights are ever refined.
Example architectures : Multilayer perceptrons
29
Unsupervised learning
In unsupervised training, the network is provided with inputs but not with desired outputs.
The system itself must then decide what features it will use to
group the input data.
This is often referred to as self‐organization or adaption.
Example architectures : Kohonen, SoM
30
Rosenblatt’s Perceptron
31
Perceptron
Recall from previous section
NN – linear combiner followed by a
hard limiter
Induced local field = +=
Recap: we could represent the bias
as +1 input with b weight.
32
Thus we have: A Decision
hyperplane
It’s common to plot a map of decision regions into m-
dimensional input space spanned
by the m input variables
, , , …
33
Using Symmetric Hard
Limit Transformation
Function
� = �� ≥− �� <
The perceptron neuron
produces a 1 if the net
input into the transfer
function is equal to or
greater than 0, otherwise
it produces a -1.
34
Perceptron convergence
algorithm We start with:
Training vectors : + × vectors + , , , ……… . . , ,
Weight vectors : + × vectors , , , ……… . . , , � Bias
Actual responses
Desired response
Learning rate parameter � such that < � <
35
Perceptron convergence
algorithm
Initialization: set =
At step n, activate the perceptron by applying input vector
Compute the actual response = �
Adapt the weight vector + = + � − = + −
Increment n and go to step 2.
36
Perceptron convergence
algorithm : Example AND gate
37
0 0 0 -1
1 0 1 -1
2 1 0 -1
3 1 1 1
Perceptron convergence
algorithm : Example
Percepron
38
�
Perceptron convergence
algorithm : Example We start with:
Training vectors : + × vectors + , ,
39
0 + , , 1 + , , + 2 + ,+ , 3 + ,+ ,+
Perceptron convergence
algorithm : Example We start with:
Bias =
Learning rate parameter � such that < � < � = .
Desired response
40
0 −
1 −
2 −
3 +
Perceptron convergence
algorithm : Example
Initialization: set = , , = , ,
At step n, activate the perceptron by applying input vector
n=0
= + , , = + , , = , , � ∗ + , , = , , � ∗ + , , =
41
Perceptron convergence
algorithm : Example
Compute the actual response
= � = =
42
Perceptron convergence
algorithm : Example
Adapt the weight vector + = + � − = + − = − = + � − = + . ∗ − − ∗ = = + . ∗ − − ∗ = = + . ∗ − − ∗ = − .
, , = , , − .
43
Perceptron convergence
algorithm : Example
Increment n and go to step 2.
n=1
At step n, activate the perceptron by applying input vector
= + , , = + , , + = , , � ∗ + , , = − . , , � ∗ + , , + =-0.2
44
Perceptron convergence
algorithm : Example
Compute the actual response
= � = − . = −
45
Perceptron convergence
algorithm : Example
Adapt the weight vector + = + � − = + − = − = + � − = + . ∗ − − − ∗ = = + . ∗ − − − ∗ = = − . + . ∗ − − − ∗ = − .
, , = , , − .
46
Perceptron convergence
algorithm : Example
Increment n and go to step 2.
n=2
At step n, activate the perceptron by applying input vector
= + , , = + ,+ ,
= , , � ∗ + , , = − . , , � ∗ + ,+ , =-0.2
47
Perceptron convergence
algorithm : Example
Compute the actual response
= � = − . = −
48
Perceptron convergence
algorithm : Example
Adapt the weight vector + = + � − = + − = − = + � − = + . ∗ − − − ∗ = = + . ∗ − − − ∗ = = − . + . ∗ − − − ∗ = − .
, , = , , − .
49
Perceptron convergence
algorithm : Example
Increment n and go to step 2.
n=3
At step n, activate the perceptron by applying input vector
= + , , = + ,+ ,+
= , , �* + , , = − . , , �* + ,+ ,+ =-0.2
50
Perceptron convergence
algorithm : Example
Compute the actual response
= � = − . = −
51
Perceptron convergence
algorithm : Example
Adapt the weight vector + = + � − = + − = = + � − = + . ∗ − − ∗ = . = + . ∗ − − ∗ = . = − . + . ∗ − − ∗ =
, , = . , . ,
52
Perceptron convergence
algorithm : Example
53
Desired
outputSum
Threshold
function
C1 C2 cb s Y(n)
x1 * w1 x2 * w2 xb * wb c1+c2+cb
if s>=0
then 1,
else -1
n*e
0 0 0 1 -1 0 0 0 0 0 0 0 1 -2 -0.2 0 0 -0.2
1 0 1 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2
2 1 0 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2
3 1 1 1 1 0 0 -0.2 0 0 -0.2 -0.2 -1 2 0.2 0.2 0.2 0
n
Wb d(n)-Y(n) W1 W2 Wbb d(n) W1 W2
Input
Initial weights
Output
Error (e)
Co
rrecti
on
New weightsSensor values Per sensor
X1 X2
Perceptron convergence
algorithm : Example
54
Desired
outputSum
Threshold
function
C1 C2 cb s Y(n)
x1 * w1 x2 * w2 xb * wb c1+c2+cb
if s>=0
then 1,
else -1
n*e
0 0 0 1 -1 0 0 0 0 0 0 0 1 -2 -0.2 0 0 -0.2
1 0 1 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2
2 1 0 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2
3 1 1 1 1 0 0 -0.2 0 0 -0.2 -0.2 -1 2 0.2 0.2 0.2 0
4 0 0 1 -1 0.2 0.2 0 0 0 0 0 1 -2 -0.2 0.2 0.2 -0.2
5 0 1 1 -1 0.2 0.2 -0.2 0 0.2 -0.2 0 1 -2 -0.2 0.2 0 -0.4
6 1 0 1 -1 0.2 0 -0.4 0.2 0 -0.4 -0.2 -1 0 0 0.2 0 -0.4
7 1 1 1 1 0.2 0 -0.4 0.2 0 -0.4 -0.2 -1 2 0.2 0.4 0.2 -0.2
8 0 0 1 -1 0.4 0.2 -0.2 0 0 -0.2 -0.2 -1 0 0 0.4 0.2 -0.2
9 0 1 1 -1 0.4 0.2 -0.2 0 0.2 -0.2 0 1 -2 -0.2 0.4 0 -0.4
10 1 0 1 -1 0.4 0 -0.4 0.4 0 -0.4 0 1 -2 -0.2 0.2 0 -0.6
11 1 1 1 1 0.2 0 -0.6 0.2 0 -0.6 -0.4 -1 2 0.2 0.4 0.2 -0.4
12 0 0 1 -1 0.4 0.2 -0.4 0 0 -0.4 -0.4 -1 0 0 0.4 0.2 -0.4
13 0 1 1 -1 0.4 0.2 -0.4 0 0.2 -0.4 -0.2 -1 0 0 0.4 0.2 -0.4
14 1 0 1 -1 0.4 0.2 -0.4 0.4 0 -0.4 0 1 -2 -0.2 0.2 0.2 -0.6
15 1 1 1 1 0.2 0.2 -0.6 0.2 0.2 -0.6 -0.2 -1 2 0.2 0.4 0.4 -0.4
16 0 0 1 -1 0.4 0.4 -0.4 0 0 -0.4 -0.4 -1 0 0 0.4 0.4 -0.4
17 0 1 1 -1 0.4 0.4 -0.4 0 0.4 -0.4 0 1 -2 -0.2 0.4 0.2 -0.6
18 1 0 1 -1 0.4 0.2 -0.6 0.4 0 -0.6 -0.2 -1 0 0 0.4 0.2 -0.6
19 1 1 1 1 0.4 0.2 -0.6 0.4 0.2 -0.6 0 1 0 0 0.4 0.2 -0.6
20 0 0 1 -1 0.4 0.2 -0.6 0 0 -0.6 -0.6 -1 0 0 0.4 0.2 -0.6
21 0 1 1 -1 0.4 0.2 -0.6 0 0.2 -0.6 -0.4 -1 0 0 0.4 0.2 -0.6
22 1 0 1 -1 0.4 0.2 -0.6 0.4 0 -0.6 -0.2 -1 0 0 0.4 0.2 -0.6
23 1 1 1 1 0.4 0.2 -0.6 0.4 0.2 -0.6 0 1 0 0 0.4 0.2 -0.6
n
Wb d(n)-Y(n) W1 W2 Wbb d(n) W1 W2
Input
Initial weights
Output
Error (e)
Co
rrecti
on
New weightsSensor values Per sensor
X1 X2
Perceptron convergence
algorithm : Example
55
Desired
outputSum
Threshold
function
C1 C2 cb s Y(n)
x1 * w1 x2 * w2 xb * wb c1+c2+cb
if s>=0
then 1,
else -1
n*e
0 0 0 1 -1 0 0 0 0 0 0 0 1 -2 -0.2 0 0 -0.2
1 0 1 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2
2 1 0 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2
3 1 1 1 1 0 0 -0.2 0 0 -0.2 -0.2 -1 2 0.2 0.2 0.2 0
4 0 0 1 -1 0.2 0.2 0 0 0 0 0 1 -2 -0.2 0.2 0.2 -0.2
5 0 1 1 -1 0.2 0.2 -0.2 0 0.2 -0.2 0 1 -2 -0.2 0.2 0 -0.4
6 1 0 1 -1 0.2 0 -0.4 0.2 0 -0.4 -0.2 -1 0 0 0.2 0 -0.4
7 1 1 1 1 0.2 0 -0.4 0.2 0 -0.4 -0.2 -1 2 0.2 0.4 0.2 -0.2
8 0 0 1 -1 0.4 0.2 -0.2 0 0 -0.2 -0.2 -1 0 0 0.4 0.2 -0.2
9 0 1 1 -1 0.4 0.2 -0.2 0 0.2 -0.2 0 1 -2 -0.2 0.4 0 -0.4
10 1 0 1 -1 0.4 0 -0.4 0.4 0 -0.4 0 1 -2 -0.2 0.2 0 -0.6
11 1 1 1 1 0.2 0 -0.6 0.2 0 -0.6 -0.4 -1 2 0.2 0.4 0.2 -0.4
12 0 0 1 -1 0.4 0.2 -0.4 0 0 -0.4 -0.4 -1 0 0 0.4 0.2 -0.4
13 0 1 1 -1 0.4 0.2 -0.4 0 0.2 -0.4 -0.2 -1 0 0 0.4 0.2 -0.4
14 1 0 1 -1 0.4 0.2 -0.4 0.4 0 -0.4 0 1 -2 -0.2 0.2 0.2 -0.6
15 1 1 1 1 0.2 0.2 -0.6 0.2 0.2 -0.6 -0.2 -1 2 0.2 0.4 0.4 -0.4
16 0 0 1 -1 0.4 0.4 -0.4 0 0 -0.4 -0.4 -1 0 0 0.4 0.4 -0.4
17 0 1 1 -1 0.4 0.4 -0.4 0 0.4 -0.4 0 1 -2 -0.2 0.4 0.2 -0.6
18 1 0 1 -1 0.4 0.2 -0.6 0.4 0 -0.6 -0.2 -1 0 0 0.4 0.2 -0.6
19 1 1 1 1 0.4 0.2 -0.6 0.4 0.2 -0.6 0 1 0 0 0.4 0.2 -0.6
20 0 0 1 -1 0.4 0.2 -0.6 0 0 -0.6 -0.6 -1 0 0 0.4 0.2 -0.6
21 0 1 1 -1 0.4 0.2 -0.6 0 0.2 -0.6 -0.4 -1 0 0 0.4 0.2 -0.6
22 1 0 1 -1 0.4 0.2 -0.6 0.4 0 -0.6 -0.2 -1 0 0 0.4 0.2 -0.6
23 1 1 1 1 0.4 0.2 -0.6 0.4 0.2 -0.6 0 1 0 0 0.4 0.2 -0.6
Wbb d(n) W1 W2
Co
rrecti
on
New weightsSensor values Per sensor
X1 X2
n
Wb d(n)-Y(n) W1 W2
Input
Initial weights
Output
Error (e)
Perceptron convergence
algorithm : Example
AND Percepron
56
� . .
− .
Perceptron convergence
algorithm : Example
Decision hyperplane
57
3
1.5 0 1
1
.4 + . − .6 =
Perceptron convergence
rule
If there exists a set of connection weights ∗ which is able to perform the
transformation = � , the
perceptron learning rule will converge
to some solution (which may or may not be the same as ∗) in a finite
number of steps for any initial choice of
the weights.
58
Perceptron convergence
algorithm : Example
59
Desired
outputSum
Threshold
function
C1 C2 cb s Y(n)
x1 * w1 x2 * w2 xb * wb c1+c2+cb
if s>=0
then 1,
else -1
n*e
0 0 0 1 -1 1 1 1 0 0 1 1 1 -2 -0.2 1 1 0.8
1 0 1 1 -1 1 1 0.8 0 1 0.8 1.8 1 -2 -0.2 1 0.8 0.6
2 1 0 1 -1 1 0.8 0.6 1 0 0.6 1.6 1 -2 -0.2 0.8 0.8 0.4
3 1 1 1 1 0.8 0.8 0.4 0.8 0.8 0.4 2 1 0 0 0.8 0.8 0.4
4 0 0 1 -1 0.8 0.8 0.4 0 0 0.4 0.4 1 -2 -0.2 0.8 0.8 0.2
5 0 1 1 -1 0.8 0.8 0.2 0 0.8 0.2 1 1 -2 -0.2 0.8 0.6 0
6 1 0 1 -1 0.8 0.6 0 0.8 0 0 0.8 1 -2 -0.2 0.6 0.6 -0.2
7 1 1 1 1 0.6 0.6 -0.2 0.6 0.6 -0.2 1 1 0 0 0.6 0.6 -0.2
8 0 0 1 -1 0.6 0.6 -0.2 0 0 -0.2 -0.2 -1 0 0 0.6 0.6 -0.2
9 0 1 1 -1 0.6 0.6 -0.2 0 0.6 -0.2 0.4 1 -2 -0.2 0.6 0.4 -0.4
10 1 0 1 -1 0.6 0.4 -0.4 0.6 0 -0.4 0.2 1 -2 -0.2 0.4 0.4 -0.6
11 1 1 1 1 0.4 0.4 -0.6 0.4 0.4 -0.6 0.2 1 0 0 0.4 0.4 -0.6
12 0 0 1 -1 0.4 0.4 -0.6 0 0 -0.6 -0.6 -1 0 0 0.4 0.4 -0.6
13 0 1 1 -1 0.4 0.4 -0.6 0 0.4 -0.6 -0.2 -1 0 0 0.4 0.4 -0.6
14 1 0 1 -1 0.4 0.4 -0.6 0.4 0 -0.6 -0.2 -1 0 0 0.4 0.4 -0.6
15 1 1 1 1 0.4 0.4 -0.6 0.4 0.4 -0.6 0.2 1 0 0 0.4 0.4 -0.6
n
Wb d(n)-Y(n) W1 W2 Wbb d(n) W1 W2
Input
Initial weights
Output
Error (e)
Co
rrecti
on
New weightsSensor values Per sensor
X1 X2
Perceptron convergence
algorithm : Example
Decision hyperplane
60
1.5 0 1
1
.4 + .4 − .6 =
1.5
Simple logic functions :
AND
61
0 1
1
Simple logic functions :
OR
62
0 1
1
Simple logic functions :
XOR????
63
0 1
1
Simple perceptron can’t represent a logical XOR
function.
Add a hidden layer 64
− .
− . −
XOR 65
Desired
outputSum
Threshold
functionSum
Threshold
function
Y1(n) Y2(n)
if s>=0
then 1,
else -1
if s>=0
then 1,
else -1
0 -1 -1 -0.5 -0.5 -1 -2.5 -1 -0.5 -1
1 -1 1 -0.5 -0.5 1 -0.5 -1 1.5 1
2 1 -1 -0.5 -0.5 1 -0.5 -1 1.5 1
3 1 1 -0.5 -0.5 -1 1.5 1 -0.5 -1
Layer 1
V1(n)
Layer 2
V2(n)
n
Input
Sensor values
X1 X2 b1 d(n)b2
− .
− . −
XOR-Decision hyperplane 66
0 -1 -1 -1 -1
1 -1 1 -1 1
2 1 -1 -1 1
3 1 1 1 -1
n X1 X2 Y1(n) Y2(n)
n=0
n=1
n=2
n=3
Perceptron with a hidden
layer
By adding the extra dimension, we made the XOR problem
a separable case.
Multi-layer networks are more powerful in their expressive ability.
However, we can’t use the same learning algorithm as earlier!
System isn’t linear anymore, and the training algorithm does not converge.
In the next lecture, we will learn how to handle this.
67
References
The lecture slides are based on the slides
prepared by Dr. Chathura De Silva and Dr.
Upali Kohomban for this class in previous
years.
68
Top Related