Download - Neural Networks - iXnisansa/Classes/01... · Neural Networks THE PERCEPTRON AND LEARNING . The Biological Neuron 2 . ... The Biological Neuron X The Brain is a collection of about

Neural Networks THE PERCEPTRON AND LEARNING

The Biological Neuron 2

The Biological Neuron 3

The Biological Neuron

The Brain is a collection of about 10 billion interconnected neurons. Each neuron is a cell that uses biochemical reactions to receive, process and transmit information.

Each terminal button is connected to other neurons across a small gap called a synapse.

A neuron's dendritic tree is connected to a thousand neighboring neurons. When one of those neurons fire, a positive or negative charge is received by one of the dendrites. The strengths of all the received charges are added together through the processes of spatial and temporal summation.

4

Model of a neuron

Each neuron within the network is usually a simple processing unit which takes one or more inputs through synapses (connecting links). Every input has an associated weight (strength) which modifies the strength of each input.

The adder simply adds together all the inputs and calculates an output to be passed on.

An activation function exists for limiting the output.

Neural computing requires a number of neurons, to be connected together into a neural network. Neurons are arranged in layers.

5

Model of a neuron

In mathematical terms, = = = � +

= linear combiner output

= + =induced local field

(activation potential)

6

Affine transformation by

the bias If we want, we can consider

the bias as just another input,

with

� = +

= �

7

Types of activation

functions

Linear Transfer Function

8

The activation function is generally non‐linear.

Linear functions are

limited because the

output is simply

proportional to the

input.

Types of activation

functions

Symmetric Hard Limit Transformation Function

9

Types of activation

functions

Threshold function

� = �� ≥ �� <

Commonly known as Heaviside function

10

Types of activation

functions

Satlin Transfer Function

11

Types of activation

functions

Tan Sigmoid Function

12

Types of activation

functions

Sigmoid function

One example – logistic function

� = +exp −��

13

You can see that the

function gets closer to

the threshold function

as the value of a

increases.

Types of activation

functions

Gauss Function

14

Rules of knowledge

representation

15

Knowledge

representation: Rules

Rule 1: Similar inputs from similar classes should usually produce similar representations inside the network, and should be classified into the same class.

Problem: how do you define similarity?

Input Vector = , , ……… �

Euclidian distance , = − = −=

16

Knowledge


Rule 1: Similar inputs from similar classes should usually produce similar representations inside the network, and should be classified into the same class.


Input Vector = , , ……… �

Dot product , = � = =

17

Usually we

normalize the

vectors to have

unit length. = =

Knowledge


Rule 1: Similar inputs from similar classes should usually

produce similar representations inside the network, and

should be classified into the same class.


As approaches

− →

� →

18

Knowledge


Rule 2: Items to be classified as separate classes should be

given widely different representations in the network.

19

Knowledge


Rule 3: If a particular feature is important, then there should

be a large number of neurons involved in representing it.

20

Knowledge


Rule 4: Prior information and invariances should be built into

the design of the neural network whenever possible.

This would simplify the design of NN by not having to learn

additional information.

Less free parameters to learn

Information transmission is faster

Cost is reduced

21

Knowledge


Rule 4: Prior information and invariances should be built into

the design of the neural network whenever possible.

How to build prior information into NN?

Unfortunately, there are no well-defined rules to do this.

Some rules of thumb:

Restrict the network architecture – usually to local connections

called receptive fields

Constrain the choice of synaptic weights – usually achieved

through weight sharing

22

Knowledge representation:

Invariances

The network should be invariant to trivial transformations of

the inputs.

E.g. rotation of a picture

Techniques:

Invariance by structure

Pick a structure that isn’t sensitive to the meaningless transformations of the input

Invariance by training

Let the classifier learn invariances

Invariance by feature space

pick a feature set that is invariant to the transformations

23

Supervised Learning &

Unsupervised learning

24

Classical conditioning:

Pavlov's dog

25


Pavlov's dog

26


Pavlov's dog

27


Pavlov's dog

28

Supervised learning

In supervised training, both the inputs and the outputs are provided.

The network then processes the inputs and compares its resulting outputs against the desired outputs.

Errors are then propagated back through the system, causing the system to adjust the weights which control the network.

This process occurs over and over as the weights are continually tweaked.

The set of data which enables the training is called the training set.

During the training of a network the same set of data is processed many times as the connection weights are ever refined.

Example architectures : Multilayer perceptrons

29

Unsupervised learning

In unsupervised training, the network is provided with inputs but not with desired outputs.

The system itself must then decide what features it will use to

group the input data.

This is often referred to as self‐organization or adaption.

Example architectures : Kohonen, SoM

30

Rosenblatt’s Perceptron

31

Perceptron

Recall from previous section

NN – linear combiner followed by a

hard limiter

Induced local field = +=

Recap: we could represent the bias

as +1 input with b weight.

32

Thus we have: A Decision

hyperplane

It’s common to plot a map of decision regions into m-

dimensional input space spanned

by the m input variables

, , , …

33

Using Symmetric Hard

Limit Transformation

Function

� = �� ≥− �� <

The perceptron neuron

produces a 1 if the net

input into the transfer

function is equal to or

greater than 0, otherwise

it produces a -1.

34

Perceptron convergence

algorithm We start with:

Training vectors : + × vectors + , , , ……… . . , ,

Weight vectors : + × vectors , , , ……… . . , , � Bias

Actual responses

Desired response

Learning rate parameter � such that < � <

35


algorithm

Initialization: set =

At step n, activate the perceptron by applying input vector

Compute the actual response = �

Adapt the weight vector + = + � − = + −

Increment n and go to step 2.

36


algorithm : Example AND gate

37

0 0 0 -1

1 0 1 -1

2 1 0 -1

3 1 1 1


algorithm : Example

Percepron

38

�


algorithm : Example We start with:

Training vectors : + × vectors + , ,

39

0 + , , 1 + , , + 2 + ,+ , 3 + ,+ ,+


algorithm : Example We start with:

Bias =

Learning rate parameter � such that < � < � = .

Desired response

40

0 −

1 −

2 −

3 +


algorithm : Example

Initialization: set = , , = , ,


n=0

= + , , = + , , = , , � ∗ + , , = , , � ∗ + , , =

41


algorithm : Example

Compute the actual response

= � = =

42


algorithm : Example

Adapt the weight vector + = + � − = + − = − = + � − = + . ∗ − − ∗ = = + . ∗ − − ∗ = = + . ∗ − − ∗ = − .

, , = , , − .

43


algorithm : Example


n=1


= + , , = + , , + = , , � ∗ + , , = − . , , � ∗ + , , + =-0.2

44


algorithm : Example


= � = − . = −

45


algorithm : Example

Adapt the weight vector + = + � − = + − = − = + � − = + . ∗ − − − ∗ = = + . ∗ − − − ∗ = = − . + . ∗ − − − ∗ = − .

, , = , , − .

46


algorithm : Example


n=2


= + , , = + ,+ ,

= , , � ∗ + , , = − . , , � ∗ + ,+ , =-0.2

47


algorithm : Example


= � = − . = −

48


algorithm : Example

Adapt the weight vector + = + � − = + − = − = + � − = + . ∗ − − − ∗ = = + . ∗ − − − ∗ = = − . + . ∗ − − − ∗ = − .

, , = , , − .

49


algorithm : Example


n=3


= + , , = + ,+ ,+

= , , �* + , , = − . , , �* + ,+ ,+ =-0.2

50


algorithm : Example


= � = − . = −

51


algorithm : Example

Adapt the weight vector + = + � − = + − = = + � − = + . ∗ − − ∗ = . = + . ∗ − − ∗ = . = − . + . ∗ − − ∗ =

, , = . , . ,

52


algorithm : Example

53

Desired

outputSum

Threshold

function

C1 C2 cb s Y(n)

x1 * w1 x2 * w2 xb * wb c1+c2+cb

if s>=0

then 1,

else -1

n*e

0 0 0 1 -1 0 0 0 0 0 0 0 1 -2 -0.2 0 0 -0.2

1 0 1 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2

2 1 0 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2

3 1 1 1 1 0 0 -0.2 0 0 -0.2 -0.2 -1 2 0.2 0.2 0.2 0

n

Wb d(n)-Y(n) W1 W2 Wbb d(n) W1 W2

Input

Initial weights

Output

Error (e)

Co

rrecti

on

New weightsSensor values Per sensor

X1 X2


algorithm : Example

54

Desired

outputSum

Threshold

function

C1 C2 cb s Y(n)

x1 * w1 x2 * w2 xb * wb c1+c2+cb

if s>=0

then 1,

else -1

n*e

0 0 0 1 -1 0 0 0 0 0 0 0 1 -2 -0.2 0 0 -0.2

1 0 1 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2

2 1 0 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2

3 1 1 1 1 0 0 -0.2 0 0 -0.2 -0.2 -1 2 0.2 0.2 0.2 0

4 0 0 1 -1 0.2 0.2 0 0 0 0 0 1 -2 -0.2 0.2 0.2 -0.2

5 0 1 1 -1 0.2 0.2 -0.2 0 0.2 -0.2 0 1 -2 -0.2 0.2 0 -0.4

6 1 0 1 -1 0.2 0 -0.4 0.2 0 -0.4 -0.2 -1 0 0 0.2 0 -0.4

7 1 1 1 1 0.2 0 -0.4 0.2 0 -0.4 -0.2 -1 2 0.2 0.4 0.2 -0.2

8 0 0 1 -1 0.4 0.2 -0.2 0 0 -0.2 -0.2 -1 0 0 0.4 0.2 -0.2

9 0 1 1 -1 0.4 0.2 -0.2 0 0.2 -0.2 0 1 -2 -0.2 0.4 0 -0.4

10 1 0 1 -1 0.4 0 -0.4 0.4 0 -0.4 0 1 -2 -0.2 0.2 0 -0.6

11 1 1 1 1 0.2 0 -0.6 0.2 0 -0.6 -0.4 -1 2 0.2 0.4 0.2 -0.4

12 0 0 1 -1 0.4 0.2 -0.4 0 0 -0.4 -0.4 -1 0 0 0.4 0.2 -0.4

13 0 1 1 -1 0.4 0.2 -0.4 0 0.2 -0.4 -0.2 -1 0 0 0.4 0.2 -0.4

14 1 0 1 -1 0.4 0.2 -0.4 0.4 0 -0.4 0 1 -2 -0.2 0.2 0.2 -0.6

15 1 1 1 1 0.2 0.2 -0.6 0.2 0.2 -0.6 -0.2 -1 2 0.2 0.4 0.4 -0.4

16 0 0 1 -1 0.4 0.4 -0.4 0 0 -0.4 -0.4 -1 0 0 0.4 0.4 -0.4

17 0 1 1 -1 0.4 0.4 -0.4 0 0.4 -0.4 0 1 -2 -0.2 0.4 0.2 -0.6

18 1 0 1 -1 0.4 0.2 -0.6 0.4 0 -0.6 -0.2 -1 0 0 0.4 0.2 -0.6

19 1 1 1 1 0.4 0.2 -0.6 0.4 0.2 -0.6 0 1 0 0 0.4 0.2 -0.6

20 0 0 1 -1 0.4 0.2 -0.6 0 0 -0.6 -0.6 -1 0 0 0.4 0.2 -0.6

21 0 1 1 -1 0.4 0.2 -0.6 0 0.2 -0.6 -0.4 -1 0 0 0.4 0.2 -0.6

22 1 0 1 -1 0.4 0.2 -0.6 0.4 0 -0.6 -0.2 -1 0 0 0.4 0.2 -0.6

23 1 1 1 1 0.4 0.2 -0.6 0.4 0.2 -0.6 0 1 0 0 0.4 0.2 -0.6

n


Input

Initial weights

Output

Error (e)

Co

rrecti

on


X1 X2


algorithm : Example

55

Desired

outputSum

Threshold

function

C1 C2 cb s Y(n)

x1 * w1 x2 * w2 xb * wb c1+c2+cb

if s>=0

then 1,

else -1

n*e

0 0 0 1 -1 0 0 0 0 0 0 0 1 -2 -0.2 0 0 -0.2

1 0 1 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2

2 1 0 1 -1 0 0 -0.2 0 0 -0.2 -0.2 -1 0 0 0 0 -0.2

3 1 1 1 1 0 0 -0.2 0 0 -0.2 -0.2 -1 2 0.2 0.2 0.2 0

4 0 0 1 -1 0.2 0.2 0 0 0 0 0 1 -2 -0.2 0.2 0.2 -0.2

5 0 1 1 -1 0.2 0.2 -0.2 0 0.2 -0.2 0 1 -2 -0.2 0.2 0 -0.4

6 1 0 1 -1 0.2 0 -0.4 0.2 0 -0.4 -0.2 -1 0 0 0.2 0 -0.4

7 1 1 1 1 0.2 0 -0.4 0.2 0 -0.4 -0.2 -1 2 0.2 0.4 0.2 -0.2

8 0 0 1 -1 0.4 0.2 -0.2 0 0 -0.2 -0.2 -1 0 0 0.4 0.2 -0.2

9 0 1 1 -1 0.4 0.2 -0.2 0 0.2 -0.2 0 1 -2 -0.2 0.4 0 -0.4

10 1 0 1 -1 0.4 0 -0.4 0.4 0 -0.4 0 1 -2 -0.2 0.2 0 -0.6

11 1 1 1 1 0.2 0 -0.6 0.2 0 -0.6 -0.4 -1 2 0.2 0.4 0.2 -0.4

12 0 0 1 -1 0.4 0.2 -0.4 0 0 -0.4 -0.4 -1 0 0 0.4 0.2 -0.4

13 0 1 1 -1 0.4 0.2 -0.4 0 0.2 -0.4 -0.2 -1 0 0 0.4 0.2 -0.4

14 1 0 1 -1 0.4 0.2 -0.4 0.4 0 -0.4 0 1 -2 -0.2 0.2 0.2 -0.6

15 1 1 1 1 0.2 0.2 -0.6 0.2 0.2 -0.6 -0.2 -1 2 0.2 0.4 0.4 -0.4

16 0 0 1 -1 0.4 0.4 -0.4 0 0 -0.4 -0.4 -1 0 0 0.4 0.4 -0.4

17 0 1 1 -1 0.4 0.4 -0.4 0 0.4 -0.4 0 1 -2 -0.2 0.4 0.2 -0.6

18 1 0 1 -1 0.4 0.2 -0.6 0.4 0 -0.6 -0.2 -1 0 0 0.4 0.2 -0.6

19 1 1 1 1 0.4 0.2 -0.6 0.4 0.2 -0.6 0 1 0 0 0.4 0.2 -0.6

20 0 0 1 -1 0.4 0.2 -0.6 0 0 -0.6 -0.6 -1 0 0 0.4 0.2 -0.6

21 0 1 1 -1 0.4 0.2 -0.6 0 0.2 -0.6 -0.4 -1 0 0 0.4 0.2 -0.6

22 1 0 1 -1 0.4 0.2 -0.6 0.4 0 -0.6 -0.2 -1 0 0 0.4 0.2 -0.6

23 1 1 1 1 0.4 0.2 -0.6 0.4 0.2 -0.6 0 1 0 0 0.4 0.2 -0.6

Wbb d(n) W1 W2

Co

rrecti

on


X1 X2

n

Wb d(n)-Y(n) W1 W2

Input

Initial weights

Output

Error (e)


algorithm : Example

AND Percepron

56

� . .

− .


algorithm : Example

Decision hyperplane

57

3

1.5 0 1

1

.4 + . − .6 =


rule

If there exists a set of connection weights ∗ which is able to perform the

transformation = � , the

perceptron learning rule will converge

to some solution (which may or may not be the same as ∗) in a finite

number of steps for any initial choice of

the weights.

58


algorithm : Example

59

Desired

outputSum

Threshold

function

C1 C2 cb s Y(n)

x1 * w1 x2 * w2 xb * wb c1+c2+cb

if s>=0

then 1,

else -1

n*e

0 0 0 1 -1 1 1 1 0 0 1 1 1 -2 -0.2 1 1 0.8

1 0 1 1 -1 1 1 0.8 0 1 0.8 1.8 1 -2 -0.2 1 0.8 0.6

2 1 0 1 -1 1 0.8 0.6 1 0 0.6 1.6 1 -2 -0.2 0.8 0.8 0.4

3 1 1 1 1 0.8 0.8 0.4 0.8 0.8 0.4 2 1 0 0 0.8 0.8 0.4

4 0 0 1 -1 0.8 0.8 0.4 0 0 0.4 0.4 1 -2 -0.2 0.8 0.8 0.2

5 0 1 1 -1 0.8 0.8 0.2 0 0.8 0.2 1 1 -2 -0.2 0.8 0.6 0

6 1 0 1 -1 0.8 0.6 0 0.8 0 0 0.8 1 -2 -0.2 0.6 0.6 -0.2

7 1 1 1 1 0.6 0.6 -0.2 0.6 0.6 -0.2 1 1 0 0 0.6 0.6 -0.2

8 0 0 1 -1 0.6 0.6 -0.2 0 0 -0.2 -0.2 -1 0 0 0.6 0.6 -0.2

9 0 1 1 -1 0.6 0.6 -0.2 0 0.6 -0.2 0.4 1 -2 -0.2 0.6 0.4 -0.4

10 1 0 1 -1 0.6 0.4 -0.4 0.6 0 -0.4 0.2 1 -2 -0.2 0.4 0.4 -0.6

11 1 1 1 1 0.4 0.4 -0.6 0.4 0.4 -0.6 0.2 1 0 0 0.4 0.4 -0.6

12 0 0 1 -1 0.4 0.4 -0.6 0 0 -0.6 -0.6 -1 0 0 0.4 0.4 -0.6

13 0 1 1 -1 0.4 0.4 -0.6 0 0.4 -0.6 -0.2 -1 0 0 0.4 0.4 -0.6

14 1 0 1 -1 0.4 0.4 -0.6 0.4 0 -0.6 -0.2 -1 0 0 0.4 0.4 -0.6

15 1 1 1 1 0.4 0.4 -0.6 0.4 0.4 -0.6 0.2 1 0 0 0.4 0.4 -0.6

n


Input

Initial weights

Output

Error (e)

Co

rrecti

on


X1 X2


algorithm : Example

Decision hyperplane

60

1.5 0 1

1

.4 + .4 − .6 =

1.5

Simple logic functions :

AND

61

0 1

1


OR

62

0 1

1


XOR????

63

0 1

1

Simple perceptron can’t represent a logical XOR

function.

Add a hidden layer 64

− .

− . −

XOR 65

Desired

outputSum

Threshold

functionSum

Threshold

function

Y1(n) Y2(n)

if s>=0

then 1,

else -1

if s>=0

then 1,

else -1

0 -1 -1 -0.5 -0.5 -1 -2.5 -1 -0.5 -1

1 -1 1 -0.5 -0.5 1 -0.5 -1 1.5 1

2 1 -1 -0.5 -0.5 1 -0.5 -1 1.5 1

3 1 1 -0.5 -0.5 -1 1.5 1 -0.5 -1

Layer 1

V1(n)

Layer 2

V2(n)

n

Input

Sensor values

X1 X2 b1 d(n)b2

− .

− . −

XOR-Decision hyperplane 66

0 -1 -1 -1 -1

1 -1 1 -1 1

2 1 -1 -1 1

3 1 1 1 -1

n X1 X2 Y1(n) Y2(n)

n=0

n=1

n=2

n=3

Perceptron with a hidden

layer

By adding the extra dimension, we made the XOR problem

a separable case.

Multi-layer networks are more powerful in their expressive ability.

However, we can’t use the same learning algorithm as earlier!

System isn’t linear anymore, and the training algorithm does not converge.

In the next lecture, we will learn how to handle this.

67

References

The lecture slides are based on the slides

prepared by Dr. Chathura De Silva and Dr.

Upali Kohomban for this class in previous

years.

68