Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron....

55
Supervised Learning Artificial Neural Networks and Single Layer Perceptron

Transcript of Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron....

Page 1: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Supervised Learning

Artificial Neural Networks and Single Layer Perceptron

Page 2: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Supervised Learning• Learning from correct answers

Supervised Learning SystemInputs Outputs

Training info: desired (target outputs)

Page 3: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Supervised Learning Methods• Artificial neural networks• Decision trees• Gaussian process regression• Naive Bayes classifier• Nearest neighbor algorithm• Support vector machines• Random forests• Ensembles of classifiers• …

Page 4: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Applications of Supervised Learning• Handwriting recognition• Object recognition• Optical character

recognition• Spam detection• Pattern recognition• Speech recognition• Prediction models• …

Page 5: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Inspiration: The Human Brain• The brain doesn’t seem to have a

CPU

• Instead, it has got lots of simple parallel, asynchronous units, called neurons

• There are about 1011 (100 billion) neurons of about 20 types

• GPU NVIDIA Tesla K80 has 4992 cores

Page 6: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Neurons• A neuron is a single cell that has a number of relatively short

fibers, called dendrites, and one long fiber, called an axon

Page 7: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Synapses• A synapse is a structure that permits a neuron (or nerve cell)

to pass an electrical signal to another cell

Page 8: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Signal Generation: Action Potential• Neurotransmitters move across the synapse and change the

electrical potential of the cell body

Page 9: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

From Real to Artificial Neurons

u2 Σ

u1

un

...

w1

w2

wna=Σi=1

n wi ui

v

inputsweights

activation outputθ

activationfunction

Page 10: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Artificial Neurons• Threshold Logic Unit (TLU) proposed by Warren

McCulloch and Walter Pitts in 1943– Initially with binary inputs and outputs– Heaviside function as threshold

• Perceptron developed by Frank Rosenblatt in 1957– Arbitrary inputs and outputs– Linear transfer function

Page 11: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Activation Functions

a

v

a

v

a

v

a

v

threshold linear

piece-wise linear sigmoid

Page 12: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Threshold Logic Unit (TLU)

u2 Σ

u1

un

...

w1

w2

wna=Σi=1

n wi ui

v

inputsweights

activation output

θ

1 if a ≥ θv=

0 if a < θ{

Page 13: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Classification• Identifying to which of a set of categories a new

observation belongs, on the basis of a set of data containing observations whose category is known (Wikipedia)

-5 0 5

Variable 1

-4

-2

0

2

4

Var

iabl

e 2

Class 1

Class 2

Class 3

? (1, 2 or 3)

Page 14: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Decision Surface of a TLU

u1

u2

Decision linew1 u1 + w2 u2 = θ

1

1 1

0

0

0

0

0

1

> θ

< θ

Page 15: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Scalar Products and Projections

w•u > 0

u

w

w•u = 0

u

w

w•u < 0

u

wϕ ϕ ϕ

w•u = |w||u| cos ϕ|uw| = |u| cos ϕw•u = |w||uw|

u

wϕuw

Page 16: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Geometric Interpretation

u1

u2

w

u

w•u = θ

v=1

v=0

|uw| = θ/|w|uw

Decision linew1 u1 + w2 u2 = θ

w•u = |w||u| cos ϕ|uw| = |u| cos ϕw•u = |w||uw|= θ

Page 17: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Geometric Interpretation (cont.)

u1

u2

w

u

v=1

v=0

uw

w•u = θ

|uw| = θ/|w|

Decision linew1 u1 + w2 u2 = θ

w•u = |w||u| cos ϕ|uw| = |u| cos ϕw•u = |w||uw|= θ

Page 18: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Geometric Interpretation (cont.)

u1

u2

w

u

v=1

v=0

uw

w•u = θ

|uw| = θ/|w|

Decision linew1 u1 + w2 u2 = θ

w•u = |w||u| cos ϕ|uw| = |u| cos ϕw•u = |w||uw|= θ

Page 19: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Decision Surface: Summary• In n dimensions the relation w•u=θ defines a n-1

dimensional hyper-plane, which is perpendicular to the weight vector w

• On one side of the hyper-plane (w•u>θ) all patterns are classified by the TLU as “1”, while those that get classified as “0” lie on the other side of the hyper-plane

Page 20: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Example: Logical AND

u1

u2

10

0 0

u1 u2 a v0 0 ? 00 1 ? 01 0 ? 01 1 ? 1

w1=?w2=?θ=?

Page 21: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Example: Logical AND

u1

u2

10

0 0

u1 u2 a v0 0 0 00 1 1 01 0 1 01 1 2 1

w1=1w2=1θ=?

Page 22: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Example: Logical AND

u1

u2

10

0 0

u1 u2 a v0 0 0 00 1 1 01 0 1 01 1 2 1

w1=1w2=1θ=1.5

Page 23: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Example: Logical OR

u1

u2

11

0 1

u1 u2 a v0 0 ? 00 1 ? 11 0 ? 11 1 ? 1

w1=?w2=?θ=?

Page 24: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Example: Logical OR

u1

u2

11

0 1

u1 u2 a v0 0 0 00 1 1 11 0 1 11 1 2 1

w1=1w2=1θ=0.5

Page 25: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Example: Logical NOT

u11 0

u1 a v0 ? 11 ? 0

w1=?θ=?

Page 26: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Example: Logical NOT

u11 0

u1 a v0 0 11 -1 0

w1=-1θ=-0.5

Page 27: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Threshold as Weight

Σ

u1

u2

un

...

w1

w2

wn

wn+1

un+1=-1

a= Σi=1n+1 wi ui

v

θ=wn+1

θ

1 if a ≥ 0v=

0 if a < 0{

Page 28: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Geometric Interpretation

u1

u2Decision line

w

u

w•u=0v=1

v=0

Page 29: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Training ANNs• Training set S of examples {u,vt}

– u is an input vector and– vt the desired target output– Example: Logical And

S = {(0,0),0}, {(0,1),0}, {(1,0),0}, {(1,1),1}

• Iterative process– present a training example u– compute network output v– compare output v with target vt

– adjust weights w and threshold θ

• Learning rule– Specifies how to change the weights w and threshold θ of the

network as a function of the inputs u, output v and target vt.

Page 30: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Presentation of Training Samples• Presenting all training examples once to the ANN

is called an epoch

• Training examples can be presented in– Fixed order (1,2,3…,M) [on/off-line]– Randomly permutated order (5,2,7,…,3) [off-line]– Completely random (4,1,7,1,5,4,……) [off-line]

Page 31: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Perceptron Learning Rule• w`=w + µ (vt-v) u

– w`i = wi + ∆wi = wi + µ (vt-v) ui (i=1..n+1), withwn+1 = θ and un+1=-1

– The parameter µ is called the learning rate. It determines the magnitude of weight updates ∆wi

– If the output is correct (vt=v) the weights are not changed (∆wi =0)

– If the output is incorrect (vt ≠ v) the weights wi are changed such that the output of the TLU for the new weights w`i is closer/further to the input ui

Page 32: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Adjusting the Weight Vector

Target vt =1Output v=0

u

w

ϕ>90

w

uw` = w + µu

µu

Move w in the direction of u

Target vt =0Output v=1

wϕ<90

u

w

u

−µuw` = w - µu

Move w away from the direction of u

Page 33: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Perceptron Learning Rule (cont.)• If we do not include the threshold as an input we

use the following learning rule:

w`=w + µ(vt-v) u

θ`=θ - µ(vt-v)

and

Page 34: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Procedure of Perceptron Training

• While v≠vt for all training vector pairs– For each training vector pair (u, vt)

• calculate the output v: v=Σwu• if v≠vt

update weight vector w: w`=w + µ (vt-v)uupdate threshold θ: θ`=θ – µ(vt-v) (if not included in weights)

elsedo nothing

Page 35: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Demo: Perceptron with TLU

-5 0 5-5

-4

-3

-2

-1

0

1

2

3

4

5

-5 0 5-5

-4

-3

-2

-1

0

1

2

3

4

5

Page 36: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

TLU Convergence• The algorithm converges to the correct

classification– if the training data is linearly separable– given sufficiently small learning rate µ

• Solution w is not unique, since if w•u = 0 defines a hyper-plane, so does w` = k w

Page 37: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Multiple TLUs

u1 u2 u3 un

v1 v2 vn. . .

. . .

w’ji = wji + µ (vtj-vj) ui

wji connects ui with vj

• Learing rule:

Page 38: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Example: Character Recognition• 26 classes: A, B, C, …, Z• Target output is a vector

– e.g., vtA= [1 0 0 … 0], vt

B= [0 1 0 … 0], …

u1 u2 u3 un

v1 v2 v26. . .

. . .

A B Z

Page 39: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Activation Functions

a

v

a

v

a

v

a

v

threshold linear

piece-wise linear sigmoid

Page 40: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Perceptron with Linear Function

Σ

u1

u2

un

...

w1

w2

wn

v

v= a = Σi=1n wi vi

inputsweights

activationoutput

Note: We will use notation t instead of vt

Page 41: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Gradient Descent Learning Rule• Consider linear unit without threshold and

continuous output v (not just –1,1/1,0)v = w1 u1 + … + wn un

• Train the wi such that they minimize the squared errorE[w1,…,wn] = 1/2 Σd∈D (td-vd)2

– where D is the set of training examples and– t the target outputs

Page 42: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Gradient Descent Learning Rule (cont.)

Gradient:∇E[w] = [∂E/∂w0,… ,∂E/∂wn]∆w = -µ ∇E[w]Ed[w] = 1/2 (td-vd)2

(w1,w2)

(w1+∆w1,w2 +∆w2)

-1/µ ∆wi = ∂E/∂wi= ∂/∂wi 1/2 (td-vd)2

= ∂/∂wi 1/2 (td-Σi wi ui)2

= Σd(td- vd) (-ui)E

[w1,w

2]w1 w2

We get:∆wi = µ (td- vd) ui Also known as Delta rule

Page 43: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Incremental vs. Batch Gradient Descent

• Incremental (stochastic) mode: w` = w - µ ∇Ed[w] over individual training examples dEd[w] = 1/2 (td-vd)2

• Batch mode:w` = w - µ ∇ED[w] over the entire data DED[w] = 1/2Σd(td-vd)2

Page 44: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Perceptron vs. Gradient Descent Rule• Perceptron rule

w`i = wi + µ (vt-v) ui

– derived from manipulation of decision surface

• Gradient descent rulew`i = wi + µ (td-vd) ui

– derived from minimization of error function E[w1,…,wn] = 1/2 Σd (td-vd)2

by means of gradient descent

Page 45: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Demo: Gradient Descent Learning Rule

-5 0 5-5

-4

-3

-2

-1

0

1

2

3

4

5

-5 0 5-5

-4

-3

-2

-1

0

1

2

3

4

5

Page 46: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Linear Regression• Common statistical method

• Describe relationship between variables

• Predict one variable knowing the others

• We make assumption that there is a linear relationship– between an outcome (dependent variable, response variable) and a

predictor (independent variable, explanatory variable, feature) or

– between one variable and several other variables

Page 47: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Describing Relationships

5 6 7

Sepal length (cm)

2

2.5

3

3.5

4

4.5

Sep

al w

idth

(cm

)

Very weak relationship /very low correlation

Strong relationship /high correlation

cor. coef. = -0.1094

5 6 7

Sepal length (cm)

0

0.5

1

1.5

2

2.5

Pet

al w

idth

(cm

)

cor. coef = 0.8180

Page 48: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Making predictions

0 100 200 300

Days

0

500

1000

1500

Pro

fit (E

uro)

140 160 180 200

Father's height (cm)

150

160

170

180

190

Son

's h

eigh

t (cm

)

Predicting variables Predicting future outcomes

?

?

?

Page 49: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Demo: Linear Regression

0.2 0.4 0.6 0.8 1x

-2

-1.8

-1.6

-1.4

f(x)

0 0.5 1x

-2

-1.8

-1.6

f(x)

Page 50: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Perceptron vs. Gradient Descent• Perceptron (TLU)

– Guaranteed to succeed if• training examples are linearly separable• given sufficiently small learning rate µ

– Robust against outliers

• Gradient descent (linear transfer function)– Guaranteed to converge to hypothesis with minimum

squared error• even when training data not separable by hyperplane• given sufficiently small learning rate µ

– Very sensitive to outliers

Page 51: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Activation Functions

a

v

a

v

a

v

a

v

threshold linear

piece-wise linear sigmoid

Page 52: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Perceptron with Sigmoid Function

Σ

u1

u2

un

...

w1

w2

wna=Σi=1

n wi ui v=σ(a) =1/(1+e-a)

v

inputsweights

activation output

Page 53: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Gradient:∇E[w] = [∂E/∂w0,… ,∂E/∂wn]∆w = -µ ∇E[w]Ed[w] = 1/2 (td-vd)2

Gradient Descent Rule for Sigmoid Activation Function

a

σ

a

σ’

-1/µ ∆wi = ∂E/∂wi

= ∂/∂wi 1/2 (td-vd)2

= ∂/∂wi 1/2 (td - σ(Σi wi ui))2

= (td-vd) σ‘(Σi wi ui) (-ui)

We get:∆wi = µ vd(1-vd)(td-vd) ui

vd = σ(a) = 1/(1+e-a)σ’(a) = e-a/(1+e-a)2

= σ(a) (1-σ(a))

Page 54: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Demo: Gradient Descent with Sigmoid Activation Function

-5 0 5-5

-4

-3

-2

-1

0

1

2

3

4

5

-5 0 5-5

-4

-3

-2

-1

0

1

2

3

4

5

Page 55: Artificial Neural Networks and Single Layer Perceptron · 2019. 12. 9. · Single Layer Perceptron. Supervised Learning • Learning from correct answers Supervised Learning System

Summary• Threshold Logic Unit

– works only for linearly separable patterns– robust to outliers

• Linear Neuron– converges to minimal squared error even

when patterns are not linearly separable– very sensitive to outliers

• Sigmoid Neuron– combination of TLU and linear neuron– converges to minimal squared error even

when patterns are not linearly separable– robust to outliers