1 Financial Informatics –XV: Perceptron Learning 1 Khurshid Ahmad, Professor of Computer Science,...
-
Upload
arline-adams -
Category
Documents
-
view
221 -
download
0
Transcript of 1 Financial Informatics –XV: Perceptron Learning 1 Khurshid Ahmad, Professor of Computer Science,...
1
Financial Informatics –XV:
Perceptron Learning
1
Khurshid Ahmad, Professor of Computer Science,
Department of Computer Science
Trinity College,Dublin-2, IRELAND
November 19th, 2008.https://www.cs.tcd.ie/Khurshid.Ahmad/Teaching.html
2
Widrow-Hoff Networks:Error-correction or Performance Learning
Widrow and Hoff showed that the weight update law, called variously Widrow/Hoff learning law, the LMS learning law and the delta rule,
η a positive constant, aka Rate of Learning, usually selected by trial and error through the heuristic that if η is too large, w will not converge., if η is too small, then w will take a long time to converge.
Typically, 0.01≤ η ≤ 10, with η=0.1 as a usual starting point.
)()()( txtetww oldnew
3
Widrow-Hoff Networks:
Error-correction or Performance Learning
Widrow and Hoff showed that the weight update law, called variously Widrow/Hoff learning law, the LMS learning law and the delta rule: For each training cycle, t, the least mean square learning law says that
)()(
)()(
)()()()1(
xnetdte
txtew
txtetwtw
4
Widrow-Hoff Networks:
Error-correction or Performance Learning
The net input is calculated by computing the weighted sum of all the input patterns.
During each training cycle, the difference between the actual and desired output is to be minimized using the well-known least square minimization technique where attempt is made to minimize the error-energy, E, or the square of the difference of the errors:
)...()(2
)(2
)..........(1
2211
222
21
jnnjjj
j
jj
n
xexexenw
Ew
w
Ew
eeen
E
5
Widrow-Hoff Networks:
Error-correction or Performance Learning
Widrow and Hoff showed that the weight update law, called variously Widrow/Hoff learning law, the LMS learning law and the delta rule: For each training cycle, t, the least mean square learning law says that
)()(
)()(
)()()()1(
xnetdte
txtew
txtetwtw
6
Widrow-Hoff Networks:
Error-correction or Performance Learning
More precisely, let us consider n patterns that have to a Widrow-Hoff network has ‘learn’ to recognise correctly: for all the weights in the network, say j, Widrow and Hoff showed that
n
txtetxtetxtetw jnnjj
j
))()(...)()()()(()( 2211
7
Rosenblatt’s Perceptron
Rosenblatt, Selfridge and others generalised McCulloch-Pitts form of the linear threshold law was generalised to laws such that activities of all pathways (cf. dendrites) impinging on a neuron are computed, and the neuron fires whenever some weighted sum of those activities is above a given amount.
8
Rosenblatt’s Perceptron
In the early days of neural network modelling, considerable attention was paid to McCulloch and Pitts who essentially incorporated the behaviouristic learning approach, that of interrelating stimuli and responses as a mechanism for learning, due originally to Donald Hebb, for learning into a network of all-or-none neurons. This led a number of other workers to adapt this approach during the late 1940's. Prominent among these workers were
Rosenblatt (1962): PERCEPTRONS
Selfridge (1959): PANDEMONIUM
The modellers called these networks, adaptive networks in that the network adapted to its environment quite autonomously. Rosenblatt developed a network architecture, and successfully implemented aspects of his architecture, which could make and learn choices between different patterns of sensory stimuli.
9
Rosenblatt’s Perceptron
Rosenblatt's Perceptron has the following 'properties':
(1) It can receive inputs from other neurons(2) The 'recipient' neuron can integrate the input(3) The connection weights are modelled as follows:
If the presence of features xi stimulates the perceptron to firethen wi will be positive;If the presence of features xi inhibits the perceptron
then wi will be negative.
(4) The output function of the neuron is all-or-none(5) Learning is a process of modifying the weights
Whatever a neuron can compute, it can learn to compute!
10
Rosenblatt’s Perceptron
Rosenblatt's Perceptron is an early example of the so-called electronic neurons. The electronic neuron, a simulation of the biological neuron, had the following properties:
(1)It can receive inputs from a number of sources (~dendrites inputting onto a neuron) e.g. other neurons (e.g. sensory input to an inter-neuron). Typically the inputs are vector-like - i.e. a magnitude and a sign; x = {x1, x2,....xn}
(2) The 'recipient' electronic neuron can integrate the input - either by simply summing up the individual inputs or by weighing the individual inputs in proportion to their 'strength of connection' (wi) with the recipient (a biological neuron can filter, add, subtract and amplify the input) and then summing up the weighted input as g(x).
Usually the summation function g(x) has an additional weight w0 - the threshold weight which incorporates the propensity of the electronic neuron to fire irrespective of the input (the depolarisation of the biological neuron's membrane induced by an external stimuli, results in the neuron responding with an action potential or impulse. The critical value of this depolarisation is called the threshold value).
11
Rosenblatt’s Perceptron
(3) The connection weights are modelled as follows:
(3a) If the presence of some features xi tends the perceptron to fire then wi will be positive;
(3b) If the presence of some features xi inhibits the perceptron then wi will be negative.
(4) The output function of the electronic neuron (the impulse output along the axon terminals of biological neurons) is all-or-none output in that the output function:
ouptut(x) = 1 if g(x) > 0= 0 if g(x) < 0
(5) Learning, in electronic neurons, is a process of modifying the values of the weights (plasticity of synaptic connections) and the threshold.
12
Rosenblatt’s Perceptron
Rosenblatt was serious about using his perceptrons to build a computer system. Rosenblatt demonstrated that his perceptrons can LEARN to build four logic gates.
A combination of these gates, in turn, comprise the central processing unit of a computer (and othe parts): Ergo, perceptrons can learn to build themselves into computer systems!!
Rosenblatt became very famous for suggesting that one can design a computer based on neuro-scientific evidence
Rosenblatt’s Perceptron
The XOR ‘problem’ The simple perceptron cannot learn a linear decision
surface to separate the different outputs, because no such decision surface exists.
Such a non-linear relationship between inputs and outputs as that of an XOR-gate are used to simulate vision systems that can tell whether a line drawing is connected or not, and in separating figure from ground in a picture.
14
Rosenblatt’s Perceptron
Rosenblatt was serious about using his perceptrons to build a computer system. Rosenblatt demonstrated that his perceptrons can LEARN to build four logic gates. A combination of these gates, in turn, comprise the central processing unit of a computer: Ergo, perceptrons can learn to build themselves into computer systems!!
The logic gates can be traced back to Albert Boole (1815-1864), Professor of Mathematics at Queens College, Cork (now University of Cork). Boole has developed an algebra for analysing logic and published the algebra in his famous book:
‘An investigation into the Laws of Thought, on Which are founded the Mathematical Theories of Logic and Probabilities’.
15
Rosenblatt’s Perceptron
The logic gates can be traced back to Albert Boole (1815-1864), Professor of Mathematics at Queens College, Cork (now University of Cork). Boole has developed an algebra for analysing logic and published the algebra in his famous book:
‘An investigation into the Laws of Thought, on Which are founded the Mathematical Theories of Logic and Probabilities’.
Boole’s algebra of logic forms the basis of (computer) hardware design and includes the various processing units within.
Boolean logic is used to specify how key operations on a computer system, like addition, subtraction, comparison of two values and so on, are to be excuted.
Boolean logic is an integral part of hardware design and hardware circuits are usually referred to as logic circuits.
16
Rosenblatt’s Perceptron
Logic Gate: A digital circuit that implements an elementary logical operation. It has one or more inputs but ONLY one output. The conditions applied to the input(s) determine the voltage levels at the output. The output, typically, has two values ‘0’ or ‘1’.Digital Circuit: A circuit that responds to discrete values of input (voltage) and produces discrete values of output (voltage).Binary Logic Circuits: Extensively used in computers to carry out instructions and arithmetical processes. Any logical procedure maybe effected by a suitable combinations of the gates. Binary circuits are typically formed from discrete components like the integrated circuits.
17
Rosenblatt’s Perceptron
Logic Circuits: Designed to perform a particular logical function based on AND, OR (either), and NOR (neither). Those circuits that operate between two discrete (input) voltage levels, high & low, are described as binary logic circuits.
Logic element: Small part of a logic circuit, typically, a logic gate, that may be represented by the mathematical operators in symbolic logic.
18
Rosenblatt’s Perceptron
Gate Input(s) Output
AND Two
(or more)
High if and only if both (or all) inputs are high.
NOT One High if input low and vice versa
OR Two
(or more)
High if any one (or more) inputs are high
19
Rosenblatt’s Perceptron
Input 1 Input 2 Output0 0 0
0 1 0
1 0 0
1 1 1
The operation of an AND gate
AND (x,y)= minimum_value(x,y);AND (1,0)=minimum_value(1,0)=0;AND (1,1)=minimum_value(1,1)=1
20
Rosenblatt’s Perceptron
A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices.
x 1
x 2
w=+1 1
w=+1 2
w1x1+w2x2+
= 1.5
y=1 if y=0 if
A hard-wired perceptron below performs the AND operation.This is hard-wired because the weights are predetermined and not learnt
21
Rosenblatt’s Perceptron
A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices.
A learning perceptron below performs the AND operation.
An algorithm: Train the network for a number of epochs(1) Set initial weights w1 and w2 and the threshold θ to set of random numbers; (2) Compute the weighted sum:
x1*w1+x2*w2+ θ (3) Calculate the output using a delta function
y(i)= delta(x1*w1+x2*w2+ θ ); delta(x)=1, if x is greater than zero, delta(x)=0,if x is less than equal to zero
(4) compute the difference between the actual output and desired output:
e(i)= y(i)-ydesired
(5) If the errors during a training epoch are all zero then stop otherwise update
wj(i+1)=wj(i)+ *xj*e(i) , j=1,2
22
Rosenblatt’s Perceptron
A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices:
=0.1Θ=0.2
Epoch
X1
X2
Y desired
InitialW1
Weights
W2
Actual Outpu
t
Error Final W1
Weights
W2
1 0 0 0 0.3 -0.1 0 0 0.3
-0.1
0 1 0 0.3 -0.1 0 0 0.3
-0.1
1 0 0 0.3 -0.1 1 -1 0.2
-0.1
1 1 1 0.2 -0.1 0 1 0.3
0.0
23
Rosenblatt’s Perceptron
A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices.
Epoch
X1
X2
Ydesired InitialW1
Weights
W2
Actual Outpu
t
Error Final W1
Weights
W2
2 0 0 0 0.3 0.0 0 0 0.3
0.0
0 1 0 0.3 0.0 0 0 0.3
0.0
1 0 0 0.3 0.0 1 -1 0.2
0.0
1 1 1 0.2 0.0 1 0 0.2
0.0
24
Rosenblatt’s Perceptron
A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices.
Epoch
X1
X2
Ydesired InitialW1
Weights
W2
Actual Outpu
t
Error Final W1
Weights
W2
3 0 0 0 0.2 0.0 0 0 0.2
0.0
0 1 0 0.2 0.0 0 0 0.2
0.0
1 0 0 0.2 0.0 1 -1 0.1
0.0
1 1 1 0.1 0.0 1 1 0.2
0.1
25
Rosenblatt’s Perceptron
A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices.
Epoch
X1
X2
Ydesired InitialW1
Weights
W2
Actual Outpu
t
Error Final W1
Weights
W2
4 0 0 0 0.2 0.1 0 0 0.2
0.1
0 1 0 0.2 0.1 0 0 0.2
0.1
1 0 0 0.2 0.1 1 -1 0.1
0.1
1 1 1 0.1 0.1 1 0 0.1
0.1
26
Rosenblatt’s Perceptron
A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices.
Epoch
X1
X2
Ydesired InitialW1
Weights
W2
Actual Outpu
t
Error Final W1
Weights
W2
5 0 0 0 0.1 0.1 0 0 0.1
0.1
0 1 0 0.1 0.1 0 0 0.1
0.1
1 0 0 0.1 0.1 0 0 0.1
0.1
1 1 1 0.1 0.1 1 0 0.1
0.1
27
PreambleNeural Networks 'learn' by adapting in accordance with a training regimen: Five key algorithms.
ERROR-CORRECTION OR PERFORMANCE LEARNING
HEBBIAN OR COINCIDENCE LEARNING
BOLTZMAN LEARNING (STOCHASTIC NET LEARNING)
COMPETITIVE LEARNING
FILTER LEARNING (GROSSBERG'S NETS)
28
PreambleNeural Networks 'learn' by adapting in accordance with a training regimen: Five key algorithms.
California sought to have thelicense of one of the largest auditing firms(Ernst & Young) removed because of their rolein the well-publicized collapse of Lincoln Savings& Loan Association. Further, regulatorscould use a bankruptcy
29
Rosenblatt’s Perceptron
A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices.
However, the single layer perceptron cannot perform the exclusive-OR or XOR operation. The reason is that a single layer perceptron can only classify two classes, say C1 and C2, should be sufficiently separated from each other to ensure the decision surface consists of a hyperplane.
Linearly separable classes
C1C2
C1 C2
Linearly non-separable classes
30
Rosenblatt’s Perceptron
An informal perceptron learning algorithm:•If the perceptron fires when it should not, make each wi smaller by an amount proportional to xi.
•If the perceptron fails to fire when it should fire, make each wi larger by a similar amount.
31
Rosenblatt’s Perceptrons
A neuron learns because it is adaptive: • SUPERVISED LEARNING: The connection strengths of a neuron are modifiable depending on the input signal received, its output value and a pre-determined or desired response. The desired response is sometimes called teacher response. The difference between the desired response and the actual output is called the error signal. • UNSUPERVISED LEARNING: In some cases the teacher’s response is not available and no error signal is available to guide the learning. When no teacher’s response is available the neuron, if properly configured, will modify its weight based only on the input and/or output. Zurada (1992:59-63)
32
Rosenblatt’s Perceptrons
jiij
ii
i
ii
xxwdcw
xxwdcw
andxwy
ydr
)]sgn([
)]sgn([
);sgn(
;
t
i
t
i
t
i
Rosenblatt’s perceptrons learn in presence of a teacher. The
desired signal is denoted as di and the output as yi. The error
signal is denoted as ei . The weights are modified in
accordance with the perceptron learning rule; the weight
change is denoted as w which is proportional to the error
signal; c is a proportionality constant:
33
Rosenblatt’s Perceptrons
A fixed increment perceptron algorithmGiven: A classification problem with n input features (x0,x1, x2, .....xn) and 2
output classes.
Compute: A set of weights (w0, ,w1,.....wn) that will cause a perceptron to fire whenever the input falls into the first output class.
An Algorithm
Step Action
1. Create a perceptron with n+1 inputs and n+1 weights, where the extra input x0 is always set to 1.
2. Initialise the weights (w0, ,w1,.....wn) to random real values.
3. Iterate thorough the training set, collecting all examples misclassified by the current set of weights.
4. If all examples are classified correctly, output the weights and quit.
5. Otherwise, compute the vector sum S of the misclassified input vectors, where each vector has the form (x0,x1, x2, .....xn). In creating the sum, add to S a vector x if x is an input for which the perceptron incorrectly fails to fire, but add vector -x if x is an input for which the perceptron incorrectly fires. Multiply the sum by a scale factor h
6. Modify the weights (w0, ,w1,.....wn) by adding the elements of the vector S to them. GO TO STEP 3.
34
Rosenblatt’s Perceptrons
yj
j
wj1
wj2
wj3
Consider the following set of training vectors x1, x2, and x3, which are to be used in training a Rosenblatt's perceptron, labelled j with the desired responses d1, d2, and d3, and initial weights wj1, wj2, and wj3,
3
2
1
d
d
d
3
2
1
x
x
x
35
Rosenblatt’s Perceptrons
The Method:
The Perceptron j has to learn all the three patterns x1, x2, and x3, such that when we show patterns as same as the three or similar patterns the perceptron recognises them.
How will the perceptron indicate that it has recognised the patterns? By responding as d1, d2, and d3, respectively when shown x1, x2, and x3.
We have to show the patterns repeatedly to the perceptron. At each showing (training cycle) the weights change in an attempt to produce the correct desired response.
36
Rosenblatt’s Perceptrons
Input Output
X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 1
Decision line
Denotes 1 Denotes 0
(0,1)
(0,0)
(1,1)
(1,0)
x
x1
Definition of OR
37
Rosenblatt’s Perceptrons
Input Output
X1 X2 Y
0 0 0
0 1 0
1 0 0
Decision line
Denotes 1 Denotes 0
(0,1)
(0,0)
(1,1)
(1,0)
x2
x1
Definition of AND
38
Rosenblatt’s Perceptrons
Input Output
X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 0
Decision line #2
Denotes 1 Denotes 0
(0,1)
(0,0)
(1,1)
(1,0)
x2
x1
Decision line #1
Definition of XOR
Rosenblatt’s Perceptron
The XOR ‘problem’ The simple perceptron cannot learn a linear decision
surface to separate the different outputs, because no such decision surface exists.
Such a non-linear relationship between inputs and outputs as that of an XOR-gate are used to simulate vision systems that can tell whether a line drawing is connected or not, and in separating figure from ground in a picture.
Rosenblatt’s Perceptron
The XOR ‘problem’ For simulating the behaviour of an
XOR-gate we need to draw elliptical decision surfaces that would encircle two ‘1’ outputs: A simple perceptron is unable to do so.
Solution? Employ two separate line-drawing stages.
Rosenblatt’s Perceptron
The XOR ‘problem’One line drawing to separate the pattern where both the inputs are ‘0’ leading to an output ‘0’
and another line drawing to separate the remaining three I/O patternswhere either of the inputs is ‘0’ leading to an output
‘1’where both the inputs are ‘0’ leading to an output ‘0’
Rosenblatt’s Perceptron
The XOR ‘solution’In effect we use two perceptrons to solve the XOR problem: The output of the first perceptron becomes the input of the second.If the first perceptron sees both inputs as ‘1’. it sends a massive inhibitory signal to the second perceptron causing it to output ‘0’.If either of the inputs is ‘0’ the second perceptron gets no inhibition from the first perceptron and outputs 1, and outputs ‘1’ if either of the inputs is ‘1’.
Rosenblatt’s Perceptron
The XOR ‘solution’The multilayer perceptron designed to solve the XOR problem has a serious problem.The perceptron convergence theorem does not extend to multilayer perceptrons. The perceptron learning algorithm can adjust the weights between the inputs and
outputs, but it cannot adjust weights between perceptrons.
For this we have to wait for the back-propagation learning algorithms.
44
Rosenblatt’s Perceptrons
•A perceptron computes a binary function of its input. A group of perceptrons can be trained on sample input-output pairs until it learns to compute the correct function.
•Each perceptron, in some models, can function independently of others in the group, they can be separately trained – linearly separable.
•Thresholds can be varied together with weights.
•Given values of x1 and x2 to train such that the perceptron outputs 1 for white dots and 0 for black dots.
45
Rosenblatt’s Perceptrons
Rosenblatt’s contribution•What Rosenblatt proved was that if the patterns were drawn from two linearly separable classes, then the perceptron algorithm converges and positions the decision surface in the form of a hyperplane between the two classes the perceptron convergence theorem (Haykin 117).
46
Rosenblatt’s Perceptrons
011
101
110
000
2XORX1X2X1X
X1
X2
1
x1
x2
-1.5
1
1
1
x1
x2
-9.0
J(w) = x
xw
If is misclassified as a negative example
xxx
x
If –x is misclassified as a positive example
11
-0.5
J(w) is Called the PerceptronCriterion Function
47
Rosenblatt’s PerceptronsX2
The rate of change of J(w) with all the different weights, w1, w2, w3, w4…w, tells us the direction to move in. To find a solution change the weights in the direction of the gradient, recompute J(w), and recompute the gradient of J(w) and iterate until
J(w)=0wnew = wold +J(w)
48
Rosenblatt’s Perceptrons
Multilayer PerceptronThe perceptron built around a single neuron is limited to performing pattern classification with only two classes (hypotheses). By expanding the output (computation) layer of perceptron to include more than one neuron, it is possible to perform classification with more than two classes- but the classes have to be seperable.