Neural Networks: Rosenblatt's Perceptron

CHAPTER 01

ROSENBLATT’S PERCEPTRON

CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq M. Mostafa

Computer Science Department

Faculty of Computer & Information Sciences

AIN SHAMS UNIVERSITY

(most of figures in this presentation are copyrighted to Pearson Education, Inc.)

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

Introduction

The Perceptron

The Perceptron Convergence Theorem

Computer Experiment

The Batch Perceptron Algorithm

2

Rosenblatt’s Perceptron


Prof. Dr. Mostafa Gadal-Haqq 3

Introduction

The Perceptron:

the simplest form of a neural network.

consists of a single neuron with adjustable synaptic

weights and bias.

can be used to classify linearly Separable patterns;

patterns that lie on opposite sides of a hyperplane.

is limited to perform pattern classification with only two

classes.



The Perceptron

Linearly and nonlinearly separable classes.

Figure 1.4 (a) A pair of linearly separable patterns. (b) A pair of

non-linearly separable.

4



The Perceptron

A nonlinear neuron that is consists of a linear combiner followed by a hard limiter (e.g., signum activation function)

Weights are adapted using an error-correction rule.

Figure 1.3 Signal-flow graph of the perceptron.

5

01

01)(

v

vvy

m

iiixwv

0

m

iii bxwv

1



The Perceptron

The decision boundary

A hyperplane defined by

For the perceptron to function

Properly, the two classes C1

And C2 must be linearly

Separable. Figure 1.2 Illustration of the hyperplane

(in this example, a straight line) as decision

boundary for a two-dimensional, two-class

pattern-classification problem.

6

01

m

iii bxw



The Perceptron Convergence Algorithm

the fixed-increment convergence theorem for the perceptron (Rosenblatt, 1962):

Let the subsets of training vectors X1 and X2 be linearly separable. Let the inputs presented to the perceptron originate from these two subsets. The perceptron converges after some no iterations, in the sense:

is a solution vector for no nmax .

Proof is reading: Pages (82-83 of ch01, Haykin).

7

...)2()1()( 000 nnn www




We derive the error-correction learning algorithm as follows: We write the input signal, the weights, and the bias:

Then

The learning algorithms find a weight vector w such that:

8

Tm nxnxnxn )(),...,()(,1)( 21x

Tm nwnwnwnwn )(),...,(),(,)( 210w

m

i

Tii nnnxnwnv

0

)( xw

C1 or input vectevery for 0 xxwT





The learning algorithms find a weight vector W such that:

9






Given the subsets of training vectors X1 and X2 , then the training problem is then to find a weight vector W such that the previous two inequalities are satisfied. This is achieved when updating the weights as follows:

The learning-rate parameter (n) is a positive

number which could be variable. For fixed , we have fixed-increment learning rule.

The algorithm converges if (n) is a positive value.

10

C2 x(n)and 0)( if )()()()1( nnnnnnT

xwxww

C1 x(n)and 0)( if )()()()1( nnnnnnT

xwxww




11




12



Perceptron and Bayes Classifier

Bayes Classifier

Figure 1.6 Signal-flow graph

of Gaussian classifier.

13



Perceptron and Bayes Classifier

Bayes Classifier

Figure 1.7 Two overlapping, one-dimensional Gaussian distributions.

14




We define the perceptron cost function as

where H is the set of samples x misclassified by a perceptron using w as its weight vector

the cost function J(w) is differentiable with respect to the weight vector w. Thus, differentiating J(w) with respect to yields the gradient vector

In the method of steepest descent, the adjustment to the

weight vector w at each time step of the algorithm is applied in a direction opposite to the gradient vector .




Accordingly, the algorithm takes the form

which embodies the batch perceptron algorithm for computing the weight vector W.

The algorithm is said to be of the “batch” kind because at each time-step of the algorithm, a batch of misclassified samples is used to compute the adjustment



Batch Learning

Presentation of all the N examples in the training sample constitute one epoch.

The cost function of the learning is defined by the average error energy Eav

The weights are updated epoch-by-epoch

Advantages:

Accurate estimation of the gradient vector.

Parallelization of the learning process.



Computer Experiment: Pattern Classification

18

Figure 1.8 The double-moon classification problem.




19

Figure 1.9 Perceptron with the double-moon set at distance d = 1.




20

Figure 1.10 Perceptron with the double-moon set at distance d = -4.

•Problems:

•1.1, 1.4, and 1.5

•Computer Experiment

•1.6

Homework 1

21

Model building Through Regression

Next Time

22

Neural Networks: Rosenblatt's Perceptron

Education

Transcript of Neural Networks: Rosenblatt's Perceptron