Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and...

Post on 29-Dec-2015

218 views 1 download

Transcript of Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and...

Neural Networks NN 1 1

Neural netwoksthanks to: www.cs.vu.nl/~elena/slides

Basics of neural network theory and practice for supervised and unsupervised learning.

Most popular Neural Network models:• architectures• learning algorithms• applications

Neural Networks NN 1 2

Neural Networks

• A NN is a machine learning approach inspired by the way in which the brain performs a particular learning task:– Knowledge about the learning task is given in the form of

examples.

– Inter neuron connection strengths (weights) are used to store the acquired information (the training examples).

– During the learning process the weights are modified in order to model the particular learning task correctly on the training examples.

Neural Networks NN 1 3

• Supervised Learning– Recognizing hand-written digits, pattern recognition,

regression.– Labeled examples

(input , desired output)– Neural Network models: perceptron, feed-forward, radial basis

function, support vector machine.

• Unsupervised Learning– Find similar groups of documents in the web, content

addressable memory, clustering.– Unlabeled examples

(different realizations of the input alone)– Neural Network models: self organizing maps, Hopfield

networks.

Learning

Neural Networks NN 1 4

Neurons

Neural Networks NN 1 5

Network architectures

• Three different classes of network architectures

– single-layer feed-forward neurons are organized

– multi-layer feed-forward in acyclic layers

– recurrent

• The architecture of a neural network is linked with the learning algorithm used to train

Neural Networks NN 1 6

Single Layer Feed-forward

Input layerof

source nodes

Output layerof

neurons

Neural Networks NN 1 7

Multi layer feed-forward

Inputlayer

Outputlayer

Hidden Layer

3-4-2 Network

Neural Networks NN 1 8

Recurrent Network with hidden neuron(s): unit delay operator z-1

implies dynamic system

z-1

z-1

z-1

Recurrent network

inputhiddenoutput

Neural Networks NN 1 10

The Neuron• The neuron is the basic information processing unit of

a NN. It consists of:1 A set of synapses or connecting links, each link

characterized by a weight: W1, W2, …, Wm

2 An adder function (linear combiner) which computes the weighted sum of the inputs:

3 Activation function (squashing function) for limiting the amplitude of the output of the neuron.

∑==

m

1jj xwu

j

ϕ

) (u y b+=ϕ

Neural Networks NN 1 11

The Neuron

Inputsignal

Synapticweights

Summingfunction

Bias

b

ActivationfunctionLocal

Field

vOutput

y

x1

x2

xm

w2

wm

w1

M M∑ )(−ϕ

Neural Networks NN 1 12

Bias of a Neuron

• Bias b has the effect of applying an affine transformation to u

v = u + b• v is the induced field of the neuron

v

u

∑==

m

1jj xwu

j

Neural Networks NN 1 13

Bias as extra input

Inputsignal

Synapticweights

Summingfunction

ActivationfunctionLocal

Field

vOutput

y

x1

x2

xm

w2

wm

w1

M M∑ )(−ϕ

w0x0 = +1

• Bias is an external parameter of the neuron. Can be modeled by adding an extra input.

bw

xwv j

m

j

j

=

=∑=

0

0

Neural Networks NN 1 14

Dimensions of a Neural Network

• Various types of neurons

• Various network architectures

• Various learning algorithms

• Various applications

Neural Networks NN 1 15

Face Recognition

90% accurate learning head pose, and recognizing 1-of-20 faces

Neural Networks NN 1 16

Handwritten digit recognition

Neural Networks NN 1 17

Learning in NN

• Hebb 49: learning by modifying connetions

• Widrow & Hoff 60: comparing with target

Neural Networks NN 1 18

Architecture

• We consider the architecture: feed-forward NN with one layer

• It is sufficient to study single layer perceptrons with just one neuron:

Neural Networks NN 1 19

Single layer perceptrons

• Generalization to single layer perceptrons with more neurons is easy because:

• The output units are independent among each other • Each weight only affects one of the outputs

Neural Networks NN 1 20

Perceptron: Neuron Model• Uses a non-linear (McCulloch-Pitts) model

of neuron:x1

x2

xn

w2

w1

wn

b (bias)

v y(v)

is the sign function:

(v) = +1 IF v >= 0

-1 IF v < 0Is the function sign(v)

Neural Networks NN 1 21

Perceptron: Applications

• The perceptron is used for classification: classify correctly a set of examples into one of the two classes C1, C2:

If the output of the perceptron is +1 then the input is assigned to class C1

If the output is -1 then the input is assigned to C2

Neural Networks NN 1 22

Perceptron: Classification

• The equation below describes a hyperplane in the input space. This hyperplane is used to separate the two classes C1 and C2

0 bxwm

1iii =+∑

=

x2

C1

C2

x1

decisionboundary

w1x1 + w2x2 + b = 0

decisionregion for C1

w1x1 + w2x2 + b >= 0

Neural Networks NN 1 23

Perceptron: Limitations

• The perceptron can only model linearly separable functions.

• The perceptron can be used to model the following Boolean functions:

• AND• OR• COMPLEMENT

• But it cannot model the XOR. Why?

Neural Networks NN 1 24

Perceptron: Learning Algorithm

• Variables and parameters(n) = input vector

= [+1, x1(n), x2(n), …, xm(n)]T

w(n) = weight vector

= [b(n), w1(n), w2(n), …, wm(n)]T

b(n) = biasy(n) = actual responsed(n) = desired response

= learning rate parameter

Neural Networks NN 1 25

The fixed-increment learning algorithm

• Initialization: set w(1) = 0• Activation: activate perceptron by applying input

example (vector (n) and desired response d(n) )• Compute actual response of perceptron:

y(n) = sgn[wT(n)(n)]

• Adapt weight vector: if d(n) and y(n) different then

w(n + 1) = w(n) + d(n)(n)

Where d(n) =+1 if (n) C1

-1 if (n) C2

• Continuation: increment time step n by 1 and go to Activation step

Neural Networks NN 1 26

Example

Consider 2D training set C1 C2, where: C1 = {(1,1), (1, -1), (0, -1)} elements of class 1 C2 = {(-1,-1), (-1,1), (0,1)} elements of class -1

Use the perceptron learning algorithm to classify these examples.

• w(1) = [1, 0, 0]T = 1

Neural Networks NN 1 27

Consider the augmented training set C’1 C’2, with firstentry fixed to 1 (to deal with the bias as extra weight):(1, 1, 1), (1, 1, -1), (1, 0, -1)

(1,-1, -1), (1,-1, 1), (1,0, 1)

Replace with - for all C2’ and use the following simpler update rule:

w(n) + x(n) if w(n) x(n) 0

w(n+1) = w(n) otherwise

Trick

Neural Networks NN 1 28

• Training set after application of trick:

(1, 1, 1), (1, 1, -1), (1,0, -1), (-1,1, 1), (-1,1, -1), (-1,0, -1)• Application of perceptron learning algorithm:

Example

Adjustedpattern

Weightapplied

w(n) x(n) Update? Newweight

(1, 1, 1) (1, 0, 0) 1 No (1, 0, 0)(1, 1, -1) (1, 0, 0) 1 No (1, 0, 0)(1,0, -1) (1, 0, 0) 1 No (1, 0, 0)(-1,1, 1) (1, 0, 0) -1 Yes (0, 1, 1)(-1,1, -1) (0, 1, 1) 0 Yes (-1, 2, 0)(-1,0, -1) (-1, 2, 0) 1 No (-1, 2, 0)

End epoch 1

Neural Networks NN 1 29

Example

Adjustedpattern

Weightapplied

w(n) x(n) Update? Newweight

(1, 1, 1) (-1, 2, 0) 1 No (-1, 2, 0)(1, 1, -1) (-1, 2, 0) 1 No (-1, 2, 0)(1,0, -1) (-1, 2, 0) -1 Yes (0, 2, -1)(-1, 1, 1) (0, 2, -1) 1 No (0, 2, -1)(-1, 1, -1) (0, 2, -1) 3 No (0, 2, -1)(-1,0, -1) (0,2, -1) 1 No (0, 2, -1)

End epoch 2

At epoch 3 no updates are performed. (check!) stop execution of algorithm.Final weight vector: (0, 2, -1). decision hyperplane is 2x1 - x2 = 0.

Neural Networks NN 1 30

Example

+ +

-

-

x1

x2

C2

C1

- +

1

1

-1

-1 1/2

Decision boundary:2x1 - x2 = 0

Neural Networks NN 1 31

Convergence of the learning algorithm

Suppose datasets C1, C2 are linearly separable. The perceptron convergence algorithm converges after n0 iterations, with n0 nmax on training set C1 C2.

XOR is not l.s.

Neural Networks NN 1 35

Adaline: Adaptive Linear Element

• Adaline: uses a linear neuron model and the Least-Mean-Square (LMS) learning algorithmThe idea: try to minimize the square error, which is a function of the weights

• We can find the minimum of the error function E by means of the Steepest descent method

)n(e)w(n)( 2

2

1=E

e(n) = d(n) − x j(n)w j(n)j= 0

m

Neural Networks NN 1 36

Steepest Descent Method

)) E(nofgradient ()w(n)1n(w −=+

• start with an arbitrary point• find a direction in which E is decreasing most rapidly

• make a small step in that direction

mw

EwEE

,, ))w( of(gradient

1…

Neural Networks NN 1 37

Least-Mean-Square algorithm (Widrow-Hoff algorithm)

• Approximation of gradient(E)

• Update rule for the weights becomes:

](n)xe(n)[

w(n)

e(n)e(n)

w(n)

)w(n)(

T−=

∂∂

=∂

∂E

(n)e(n)x w(n)1)w(n +=+

)n(e)w(n)( 2

2

1=E

e(n) = d(n) − x j(n)w j(n)j= 0

m

Neural Networks NN 1 38

Summary of LMS algorithm Training sample: input signal vector (n)

desired response d(n)

User selected parameter >0

Initializationset ŵ(1) = 0

Computation for n = 1, 2, … compute

e(n) = d(n) - ŵT(n)(n)

ŵ(n+1) = ŵ(n) + (n)e(n)

Neural Networks NN 1 39

Comparison LMS and Perceptron

• Perceptron and Adaline represent different implementations of a single-layer perceptron based on error-correction learning.

LMS: Linear.

• Model of a neuron Perceptron: Non linear.

Hard-Limiter activation function. McCulloch-Pitts model.

LMS: Continuous.

• Learning Process Perceptron: A finite number of iterations.