1 Financial Informatics –XV: Perceptron Learning 1 Khurshid Ahmad, Professor of Computer Science,...

1

Financial Informatics –XV:

Perceptron Learning

1

Khurshid Ahmad, Professor of Computer Science,

Department of Computer Science

Trinity College,Dublin-2, IRELAND

November 19th, 2008.https://www.cs.tcd.ie/Khurshid.Ahmad/Teaching.html

2

Widrow-Hoff Networks:Error-correction or Performance Learning

Widrow and Hoff showed that the weight update law, called variously Widrow/Hoff learning law, the LMS learning law and the delta rule,

η a positive constant, aka Rate of Learning, usually selected by trial and error through the heuristic that if η is too large, w will not converge., if η is too small, then w will take a long time to converge.

Typically, 0.01≤ η ≤ 10, with η=0.1 as a usual starting point.

)()()( txtetww oldnew

3

Widrow-Hoff Networks:

Error-correction or Performance Learning

Widrow and Hoff showed that the weight update law, called variously Widrow/Hoff learning law, the LMS learning law and the delta rule: For each training cycle, t, the least mean square learning law says that

)()(

)()(

)()()()1(

xnetdte

txtew

txtetwtw

4



The net input is calculated by computing the weighted sum of all the input patterns.

During each training cycle, the difference between the actual and desired output is to be minimized using the well-known least square minimization technique where attempt is made to minimize the error-energy, E, or the square of the difference of the errors:

)...()(2

)(2

)..........(1

2211

222

21

jnnjjj

j

jj

n

xexexenw

Ew

w

Ew

eeen

E

5



Widrow and Hoff showed that the weight update law, called variously Widrow/Hoff learning law, the LMS learning law and the delta rule: For each training cycle, t, the least mean square learning law says that

)()(

)()(

)()()()1(

xnetdte

txtew

txtetwtw

6



More precisely, let us consider n patterns that have to a Widrow-Hoff network has ‘learn’ to recognise correctly: for all the weights in the network, say j, Widrow and Hoff showed that

n

txtetxtetxtetw jnnjj

j

))()(...)()()()(()( 2211

7

Rosenblatt’s Perceptron

Rosenblatt, Selfridge and others generalised McCulloch-Pitts form of the linear threshold law was generalised to laws such that activities of all pathways (cf. dendrites) impinging on a neuron are computed, and the neuron fires whenever some weighted sum of those activities is above a given amount.

8


In the early days of neural network modelling, considerable attention was paid to McCulloch and Pitts who essentially incorporated the behaviouristic learning approach, that of interrelating stimuli and responses as a mechanism for learning, due originally to Donald Hebb, for learning into a network of all-or-none neurons. This led a number of other workers to adapt this approach during the late 1940's. Prominent among these workers were

Rosenblatt (1962): PERCEPTRONS

Selfridge (1959): PANDEMONIUM

The modellers called these networks, adaptive networks in that the network adapted to its environment quite autonomously. Rosenblatt developed a network architecture, and successfully implemented aspects of his architecture, which could make and learn choices between different patterns of sensory stimuli.

9


Rosenblatt's Perceptron has the following 'properties':

(1) It can receive inputs from other neurons(2) The 'recipient' neuron can integrate the input(3) The connection weights are modelled as follows:

If the presence of features xi stimulates the perceptron to firethen wi will be positive;If the presence of features xi inhibits the perceptron

then wi will be negative.

(4) The output function of the neuron is all-or-none(5) Learning is a process of modifying the weights

Whatever a neuron can compute, it can learn to compute!

10


Rosenblatt's Perceptron is an early example of the so-called electronic neurons. The electronic neuron, a simulation of the biological neuron, had the following properties:

(1)It can receive inputs from a number of sources (~dendrites inputting onto a neuron) e.g. other neurons (e.g. sensory input to an inter-neuron). Typically the inputs are vector-like - i.e. a magnitude and a sign; x = {x1, x2,....xn}

(2) The 'recipient' electronic neuron can integrate the input - either by simply summing up the individual inputs or by weighing the individual inputs in proportion to their 'strength of connection' (wi) with the recipient (a biological neuron can filter, add, subtract and amplify the input) and then summing up the weighted input as g(x).

Usually the summation function g(x) has an additional weight w0 - the threshold weight which incorporates the propensity of the electronic neuron to fire irrespective of the input (the depolarisation of the biological neuron's membrane induced by an external stimuli, results in the neuron responding with an action potential or impulse. The critical value of this depolarisation is called the threshold value).

11


(3) The connection weights are modelled as follows:

(3a) If the presence of some features xi tends the perceptron to fire then wi will be positive;

(3b) If the presence of some features xi inhibits the perceptron then wi will be negative.

(4) The output function of the electronic neuron (the impulse output along the axon terminals of biological neurons) is all-or-none output in that the output function:

ouptut(x) = 1 if g(x) > 0= 0 if g(x) < 0

(5) Learning, in electronic neurons, is a process of modifying the values of the weights (plasticity of synaptic connections) and the threshold.

12


Rosenblatt was serious about using his perceptrons to build a computer system. Rosenblatt demonstrated that his perceptrons can LEARN to build four logic gates.

A combination of these gates, in turn, comprise the central processing unit of a computer (and othe parts): Ergo, perceptrons can learn to build themselves into computer systems!!

Rosenblatt became very famous for suggesting that one can design a computer based on neuro-scientific evidence


The XOR ‘problem’ The simple perceptron cannot learn a linear decision

surface to separate the different outputs, because no such decision surface exists.

Such a non-linear relationship between inputs and outputs as that of an XOR-gate are used to simulate vision systems that can tell whether a line drawing is connected or not, and in separating figure from ground in a picture.

14


Rosenblatt was serious about using his perceptrons to build a computer system. Rosenblatt demonstrated that his perceptrons can LEARN to build four logic gates. A combination of these gates, in turn, comprise the central processing unit of a computer: Ergo, perceptrons can learn to build themselves into computer systems!!

The logic gates can be traced back to Albert Boole (1815-1864), Professor of Mathematics at Queens College, Cork (now University of Cork). Boole has developed an algebra for analysing logic and published the algebra in his famous book:

‘An investigation into the Laws of Thought, on Which are founded the Mathematical Theories of Logic and Probabilities’.

15


The logic gates can be traced back to Albert Boole (1815-1864), Professor of Mathematics at Queens College, Cork (now University of Cork). Boole has developed an algebra for analysing logic and published the algebra in his famous book:

‘An investigation into the Laws of Thought, on Which are founded the Mathematical Theories of Logic and Probabilities’.

Boole’s algebra of logic forms the basis of (computer) hardware design and includes the various processing units within.

Boolean logic is used to specify how key operations on a computer system, like addition, subtraction, comparison of two values and so on, are to be excuted.

Boolean logic is an integral part of hardware design and hardware circuits are usually referred to as logic circuits.

16


Logic Gate: A digital circuit that implements an elementary logical operation. It has one or more inputs but ONLY one output. The conditions applied to the input(s) determine the voltage levels at the output. The output, typically, has two values ‘0’ or ‘1’.Digital Circuit: A circuit that responds to discrete values of input (voltage) and produces discrete values of output (voltage).Binary Logic Circuits: Extensively used in computers to carry out instructions and arithmetical processes. Any logical procedure maybe effected by a suitable combinations of the gates. Binary circuits are typically formed from discrete components like the integrated circuits.

17


Logic Circuits: Designed to perform a particular logical function based on AND, OR (either), and NOR (neither). Those circuits that operate between two discrete (input) voltage levels, high & low, are described as binary logic circuits.

Logic element: Small part of a logic circuit, typically, a logic gate, that may be represented by the mathematical operators in symbolic logic.

18


Gate Input(s) Output

AND Two

(or more)

High if and only if both (or all) inputs are high.

NOT One High if input low and vice versa

OR Two

(or more)

High if any one (or more) inputs are high

19


Input 1 Input 2 Output0 0 0

0 1 0

1 0 0

1 1 1

The operation of an AND gate

AND (x,y)= minimum_value(x,y);AND (1,0)=minimum_value(1,0)=0;AND (1,1)=minimum_value(1,1)=1

20


A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices.

x 1

x 2

w=+1 1

w=+1 2

w1x1+w2x2+

= 1.5

y=1 if y=0 if

A hard-wired perceptron below performs the AND operation.This is hard-wired because the weights are predetermined and not learnt

21



A learning perceptron below performs the AND operation.

An algorithm: Train the network for a number of epochs(1) Set initial weights w1 and w2 and the threshold θ to set of random numbers; (2) Compute the weighted sum:

x1*w1+x2*w2+ θ (3) Calculate the output using a delta function

y(i)= delta(x1*w1+x2*w2+ θ ); delta(x)=1, if x is greater than zero, delta(x)=0,if x is less than equal to zero

(4) compute the difference between the actual output and desired output:

e(i)= y(i)-ydesired

(5) If the errors during a training epoch are all zero then stop otherwise update

wj(i+1)=wj(i)+ *xj*e(i) , j=1,2

22


A single layer perceptron can carry out a number can perform a number of logical operations which are performed by a number of computational devices:

=0.1Θ=0.2

Epoch

X1

X2

Y desired

InitialW1

Weights

W2

Actual Outpu

t

Error Final W1

Weights

W2

1 0 0 0 0.3 -0.1 0 0 0.3

-0.1

0 1 0 0.3 -0.1 0 0 0.3

-0.1

1 0 0 0.3 -0.1 1 -1 0.2

-0.1

1 1 1 0.2 -0.1 0 1 0.3

0.0

23



Epoch

X1

X2

Ydesired InitialW1

Weights

W2

Actual Outpu

t

Error Final W1

Weights

W2

2 0 0 0 0.3 0.0 0 0 0.3

0.0

0 1 0 0.3 0.0 0 0 0.3

0.0

1 0 0 0.3 0.0 1 -1 0.2

0.0

1 1 1 0.2 0.0 1 0 0.2

0.0

24



Epoch

X1

X2

Ydesired InitialW1

Weights

W2

Actual Outpu

t

Error Final W1

Weights

W2

3 0 0 0 0.2 0.0 0 0 0.2

0.0

0 1 0 0.2 0.0 0 0 0.2

0.0

1 0 0 0.2 0.0 1 -1 0.1

0.0

1 1 1 0.1 0.0 1 1 0.2

0.1

25



Epoch

X1

X2

Ydesired InitialW1

Weights

W2

Actual Outpu

t

Error Final W1

Weights

W2

4 0 0 0 0.2 0.1 0 0 0.2

0.1

0 1 0 0.2 0.1 0 0 0.2

0.1

1 0 0 0.2 0.1 1 -1 0.1

0.1

1 1 1 0.1 0.1 1 0 0.1

0.1

26



Epoch

X1

X2

Ydesired InitialW1

Weights

W2

Actual Outpu

t

Error Final W1

Weights

W2

5 0 0 0 0.1 0.1 0 0 0.1

0.1

0 1 0 0.1 0.1 0 0 0.1

0.1

1 0 0 0.1 0.1 0 0 0.1

0.1

1 1 1 0.1 0.1 1 0 0.1

0.1

27

PreambleNeural Networks 'learn' by adapting in accordance with a training regimen: Five key algorithms.

ERROR-CORRECTION OR PERFORMANCE LEARNING

HEBBIAN OR COINCIDENCE LEARNING

BOLTZMAN LEARNING (STOCHASTIC NET LEARNING)

COMPETITIVE LEARNING

FILTER LEARNING (GROSSBERG'S NETS)

28

PreambleNeural Networks 'learn' by adapting in accordance with a training regimen: Five key algorithms.

California sought to have thelicense of one of the largest auditing firms(Ernst & Young) removed because of their rolein the well-publicized collapse of Lincoln Savings& Loan Association. Further, regulatorscould use a bankruptcy

29



However, the single layer perceptron cannot perform the exclusive-OR or XOR operation. The reason is that a single layer perceptron can only classify two classes, say C1 and C2, should be sufficiently separated from each other to ensure the decision surface consists of a hyperplane.

Linearly separable classes

C1C2

C1 C2

Linearly non-separable classes

30


An informal perceptron learning algorithm:•If the perceptron fires when it should not, make each wi smaller by an amount proportional to xi.

•If the perceptron fails to fire when it should fire, make each wi larger by a similar amount.

31

Rosenblatt’s Perceptrons

A neuron learns because it is adaptive: • SUPERVISED LEARNING: The connection strengths of a neuron are modifiable depending on the input signal received, its output value and a pre-determined or desired response. The desired response is sometimes called teacher response. The difference between the desired response and the actual output is called the error signal. • UNSUPERVISED LEARNING: In some cases the teacher’s response is not available and no error signal is available to guide the learning. When no teacher’s response is available the neuron, if properly configured, will modify its weight based only on the input and/or output. Zurada (1992:59-63)

32


jiij

ii

i

ii

xxwdcw

xxwdcw

andxwy

ydr

)]sgn([

)]sgn([

);sgn(

;

t

i

t

i

t

i

Rosenblatt’s perceptrons learn in presence of a teacher. The

desired signal is denoted as di and the output as yi. The error

signal is denoted as ei . The weights are modified in

accordance with the perceptron learning rule; the weight

change is denoted as w which is proportional to the error

signal; c is a proportionality constant:

33


A fixed increment perceptron algorithmGiven: A classification problem with n input features (x0,x1, x2, .....xn) and 2

output classes.

Compute: A set of weights (w0, ,w1,.....wn) that will cause a perceptron to fire whenever the input falls into the first output class.

An Algorithm

Step Action

1. Create a perceptron with n+1 inputs and n+1 weights, where the extra input x0 is always set to 1.

2. Initialise the weights (w0, ,w1,.....wn) to random real values.

3. Iterate thorough the training set, collecting all examples misclassified by the current set of weights.

4. If all examples are classified correctly, output the weights and quit.

5. Otherwise, compute the vector sum S of the misclassified input vectors, where each vector has the form (x0,x1, x2, .....xn). In creating the sum, add to S a vector x if x is an input for which the perceptron incorrectly fails to fire, but add vector -x if x is an input for which the perceptron incorrectly fires. Multiply the sum by a scale factor h

6. Modify the weights (w0, ,w1,.....wn) by adding the elements of the vector S to them. GO TO STEP 3.

34


yj

j

wj1

wj2

wj3

Consider the following set of training vectors x1, x2, and x3, which are to be used in training a Rosenblatt's perceptron, labelled j with the desired responses d1, d2, and d3, and initial weights wj1, wj2, and wj3,

3

2

1

d

d

d

3

2

1

x

x

x

35


The Method:

The Perceptron j has to learn all the three patterns x1, x2, and x3, such that when we show patterns as same as the three or similar patterns the perceptron recognises them.

How will the perceptron indicate that it has recognised the patterns? By responding as d1, d2, and d3, respectively when shown x1, x2, and x3.

We have to show the patterns repeatedly to the perceptron. At each showing (training cycle) the weights change in an attempt to produce the correct desired response.

36


Input Output

X1 X2 Y

0 0 0

0 1 1

1 0 1

1 1 1

Decision line

Denotes 1 Denotes 0

(0,1)

(0,0)

(1,1)

(1,0)

x

x1

Definition of OR

37


Input Output

X1 X2 Y

0 0 0

0 1 0

1 0 0

Decision line

Denotes 1 Denotes 0

(0,1)

(0,0)

(1,1)

(1,0)

x2

x1

Definition of AND

38


Input Output

X1 X2 Y

0 0 0

0 1 1

1 0 1

1 1 0

Decision line #2

Denotes 1 Denotes 0

(0,1)

(0,0)

(1,1)

(1,0)

x2

x1

Decision line #1

Definition of XOR


The XOR ‘problem’ The simple perceptron cannot learn a linear decision

surface to separate the different outputs, because no such decision surface exists.

Such a non-linear relationship between inputs and outputs as that of an XOR-gate are used to simulate vision systems that can tell whether a line drawing is connected or not, and in separating figure from ground in a picture.


The XOR ‘problem’ For simulating the behaviour of an

XOR-gate we need to draw elliptical decision surfaces that would encircle two ‘1’ outputs: A simple perceptron is unable to do so.

Solution? Employ two separate line-drawing stages.


The XOR ‘problem’One line drawing to separate the pattern where both the inputs are ‘0’ leading to an output ‘0’

and another line drawing to separate the remaining three I/O patternswhere either of the inputs is ‘0’ leading to an output

‘1’where both the inputs are ‘0’ leading to an output ‘0’


The XOR ‘solution’In effect we use two perceptrons to solve the XOR problem: The output of the first perceptron becomes the input of the second.If the first perceptron sees both inputs as ‘1’. it sends a massive inhibitory signal to the second perceptron causing it to output ‘0’.If either of the inputs is ‘0’ the second perceptron gets no inhibition from the first perceptron and outputs 1, and outputs ‘1’ if either of the inputs is ‘1’.


The XOR ‘solution’The multilayer perceptron designed to solve the XOR problem has a serious problem.The perceptron convergence theorem does not extend to multilayer perceptrons. The perceptron learning algorithm can adjust the weights between the inputs and

outputs, but it cannot adjust weights between perceptrons.

For this we have to wait for the back-propagation learning algorithms.

44


•A perceptron computes a binary function of its input. A group of perceptrons can be trained on sample input-output pairs until it learns to compute the correct function.

•Each perceptron, in some models, can function independently of others in the group, they can be separately trained – linearly separable.

•Thresholds can be varied together with weights.

•Given values of x1 and x2 to train such that the perceptron outputs 1 for white dots and 0 for black dots.

45


Rosenblatt’s contribution•What Rosenblatt proved was that if the patterns were drawn from two linearly separable classes, then the perceptron algorithm converges and positions the decision surface in the form of a hyperplane between the two classes the perceptron convergence theorem (Haykin 117).

46


011

101

110

000

2XORX1X2X1X

X1

X2

1

x1

x2

-1.5

1

1

1

x1

x2

-9.0

J(w) = x

xw

If is misclassified as a negative example

xxx

x

If –x is misclassified as a positive example

11

-0.5

J(w) is Called the PerceptronCriterion Function

47

Rosenblatt’s PerceptronsX2

The rate of change of J(w) with all the different weights, w1, w2, w3, w4…w, tells us the direction to move in. To find a solution change the weights in the direction of the gradient, recompute J(w), and recompute the gradient of J(w) and iterate until

J(w)=0wnew = wold +J(w)

48


Multilayer PerceptronThe perceptron built around a single neuron is limited to performing pattern classification with only two classes (hypotheses). By expanding the output (computation) layer of perceptron to include more than one neuron, it is possible to perform classification with more than two classes- but the classes have to be seperable.

1 Financial Informatics –XV: Perceptron Learning 1 Khurshid Ahmad, Professor of Computer Science,...

Documents

Transcript of 1 Financial Informatics –XV: Perceptron Learning 1 Khurshid Ahmad, Professor of Computer Science,...