Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel...

79
Data Mining - Neural Networks Dr. Jean-Michel RICHER 2018 Dr. Jean-Michel RICHER Data Mining - Neural Networks 1 / 79

Transcript of Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel...

Page 1: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Data Mining - Neural Networks

Dr. Jean-Michel RICHER

[email protected]

Dr. Jean-Michel RICHER Data Mining - Neural Networks 1 / 79

Page 2: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Outline

1. Introduction

2. History and working principle

3. Improvements of NN

4. How to learn with a NN ?

5. Backpropagation example

6. Interesting links and applications

Dr. Jean-Michel RICHER Data Mining - Neural Networks 2 / 79

Page 3: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

1. Introduction

Dr. Jean-Michel RICHER Data Mining - Neural Networks 3 / 79

Page 4: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

What we will cover

What we will coverbasics of Artificial Neural Networksthe perceptronthe multi-layer networkthe sigmoid functionbackpropagationSynaptic.js

Dr. Jean-Michel RICHER Data Mining - Neural Networks 4 / 79

Page 5: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

2. History and workingprinciple

Dr. Jean-Michel RICHER Data Mining - Neural Networks 5 / 79

Page 6: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

ANN

Artificial Neural NetworksNNs, ANNs or Connectionist Systems are computingsystems inspired by the biological neural networks thatconstitute animal brainsbased on a collection of connected units or nodescalled artificial neuronsthey try to model how neurons in the brain functionsuch systems learn or progressively improve theirperformance by considering examples (trainingphase)

Note: strong and weak AI, intelligence = calculation ?

Dr. Jean-Michel RICHER Data Mining - Neural Networks 6 / 79

Page 7: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

ANN

Specific Artificial Neural Networksfor image recognition: Convolutional Neural Network(CNN or ConvNet), a variation of multilayerperceptrons designed to require minimalpreprocessingfor speech recognition: Time Delay Neural Network(TDNN)

Dr. Jean-Michel RICHER Data Mining - Neural Networks 7 / 79

Page 8: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

what can you do with NN ?

A first example: MNISTthe MNIST database of handwritten digits of 28× 28pixels784 inputs and 10 outputsdatabase of 60.000 examples and a test set of 10.000smallest error rate of 0.35% with 6-layers NN (Ciresanet al., 2010)smallest error rate of 0.23% with ConvolutionalNetwork (Ciresan et al., 2012)

Dr. Jean-Michel RICHER Data Mining - Neural Networks 8 / 79

Page 9: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

ANN

McCulloch and Pitts, 1943Warren S. McCulloch, a neuroscientist, and WalterPitts, a logician explain the complex decisionprocesses in a brain using a linear threshold gatetakes a sum and returns 0 if the result is below thethreshold and 1 otherwise.very simple: binary inputs and outputs, threshold stepactivation function, no weighting of inputs

Dr. Jean-Michel RICHER Data Mining - Neural Networks 9 / 79

Page 10: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

ANN

Donald O. Hebb, 1949Hebbian rule basis of nearly all neural learningproceduresconnection between two neurons is strengthenedwhen both neurons are active at the same timethis change in strength is proportional to the productof the two activitiesuse weights

Dr. Jean-Michel RICHER Data Mining - Neural Networks 10 / 79

Page 11: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

ANN

Rosenblatt, 1958Frank Rosenblatt, a psychologist at Cornell, wasworking on understanding the comparatively simplerdecision systems present in the eye of a fly, whichunderlie and determine its flee response.he proposed the idea of a Perceptron (Mark IPerceptron)an algorithm for pattern recognitionsimple input output relationship, modeled on aMcCulloch-Pittsperceptron learning: weights are adjusted only whena pattern is misclassified

Dr. Jean-Michel RICHER Data Mining - Neural Networks 11 / 79

Page 12: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

ANN

Bernard Widrow, Marcian E. Hoff, 1960professor Widrow and his student Hoff introduced theADALINE (ADAptive LInear NEuron)a fast and precise adaptive learning system: leastmean squares filter (LMS)delta rule: minimises the output error using(approximate) gradient descentfound in nearly every analog telephone for real-timeadaptive echo filtering

Note: Hoff received his master’s degree from Stanford Uni-versity in 1959 and his PhD in 1962, father of the micropro-cessor at Intel

Dr. Jean-Michel RICHER Data Mining - Neural Networks 12 / 79

Page 13: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

ANN

Minsky and Papert, 1969Marvin Minsky and Seymour Papert led a campaignto discredit neural network researchall neural networks suffer from the same fatal flaw asthe perceptron (XOR)they left the impression that neural network researchwas a dead end

Note: Minsky (MIT) is known for co-founding the field of AI,Papert (MIT) developed the Logo programming language

Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79

Page 14: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

ANN

Paul Werbos, 1974in 1974 developed the back-propagation learningmethod although its importance wasn’t fullyappreciated until a 1986accelerates the training of multi-layer networksinput vector is applied to the network andpropagated forward from the input layer to thehidden layer, and then to the output layeran error value is then calculated by using the desiredoutput and the actual output for each output neuronin the network.the error value is propagated backward through theweights of the network beginning with the outputneurons through the hidden layer and to the inputlayer

Dr. Jean-Michel RICHER Data Mining - Neural Networks 14 / 79

Page 15: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

ANN

Geoffrey Hinton, David Rumelhart, Ronald Williams ,1986

Backpropagation: repeatedly adjust the weights soas to minimize the difference between actual outputand desired outputHidden Layers: neuron nodes stacked in betweeninputs and outputs, allow NN to learn morecomplicated features (such as XOR logic)

Dr. Jean-Michel RICHER Data Mining - Neural Networks 15 / 79

Page 16: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Multi Layer NN

Figure: from the course of Nahua Kang ontowardsdatascience.com

Dr. Jean-Michel RICHER Data Mining - Neural Networks 16 / 79

Page 17: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

ANN

Deep LearningDeep Learning is about constructing machinelearning models that learn a hierarchicalrepresentation of the dataNeural Networks are a class of machine learningalgorithmsexample: NVIDIA CUDA Deep Neural Network library(cuDNN) is a GPU-accelerated library of primitives fordeep neural networks.

Dr. Jean-Michel RICHER Data Mining - Neural Networks 17 / 79

Page 18: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

ANN working principle

The Artificial Neuronconnected with n input channels x1 to xn

each has a synaptic weight w1 to wn

there is a bias b

use an activation function fa

The output is defined as:

y = fa(n∑

i=1

xi ×wi + b)

Dr. Jean-Michel RICHER Data Mining - Neural Networks 18 / 79

Page 19: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

ANN principle

Neuron formulaCan be modified by incorporating the bias into the xi ×wi

set x0 = 1w0 = b

the formula becomes:

y = fa

(n∑

i=0

xi ×wi

)

Dr. Jean-Michel RICHER Data Mining - Neural Networks 19 / 79

Page 20: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The perceptron

1

0

w1

fa∑

xn

x1

x0

w0

wn

y

Dr. Jean-Michel RICHER Data Mining - Neural Networks 20 / 79

Page 21: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Neuron

Figure: from the course of Nahua Kang ontowardsdatascience.com

Dr. Jean-Michel RICHER Data Mining - Neural Networks 21 / 79

Page 22: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The perceptron

Neuron activationThe heaviside (thresold or binary) function is of the form

y =

{1 if

∑ni=0 wi × xi > 0

0 otherwise

The perceptron is a simple model of prediction.

Dr. Jean-Michel RICHER Data Mining - Neural Networks 22 / 79

Page 23: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The perceptron

Learn with perceptron

initialize w ;while not convergence do

compute errors;update w from errors;

endAlgorithm 1: Perceptron learning scheme

wj = wj + η(y − y)× xj

where η is the learning constant (not too big, not too small)between 0.05 and 0.15

Dr. Jean-Michel RICHER Data Mining - Neural Networks 23 / 79

Page 24: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The perceptron

AND / ORThe perceptron can implement boolean formulas like theboolean OR or the AND

a b a ∨ b a ∧ b0 0 0 00 1 1 01 0 1 01 0 1 1

Dr. Jean-Michel RICHER Data Mining - Neural Networks 24 / 79

Page 25: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The perceptron

Exemple with AND

X =

x0 x1 x21 0 01 0 11 1 01 1 1

y =

0001

η = 0.1w = [0.1, 0.2, 0.05]

Dr. Jean-Michel RICHER Data Mining - Neural Networks 25 / 79

Page 26: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The perceptron

Exemple with AND - first casetake X0 = [1, 0, 0], y0 = 0∑

wi × xi = 0.1× 1 + 0.2× 0 + 0.05× 0 = 0.1fa(0.1) = 1 = y

w0 = w0 + η(y − y)× X00 = 0.1 + 0.1× (0− 1)× 1 = 0

w1 = w1 + η(y − y)× X10 = 0.2 + 0.1× (0− 1)× 0 = 0.2

w2 = w2 + η(y − y)× X20 = 0.05+ 0.1× (0− 1)× 0 = 0.05

continue with X1 = [1, 0, 1], ...

Dr. Jean-Michel RICHER Data Mining - Neural Networks 26 / 79

Page 27: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The perceptron

Exemple with AND - Convergencew = [−0.30000001, 0.22, 0.10500001]result is

x0 x1 x2 yp1. 0. 0. 01. 0. 1. 01. 1. 0. 01. 1. 1. 1

It works !

Dr. Jean-Michel RICHER Data Mining - Neural Networks 27 / 79

Page 28: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The perceptron

ExerciseTry to implement the perceptron in python, C++ orJavaand test it for the boolean AND and OR

Dr. Jean-Michel RICHER Data Mining - Neural Networks 28 / 79

Page 29: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The perceptron

Why XOR is not possible with a perceptron ? (1/2)The one layer perceptron acts as a linear separator:

0 1

0

1

0

0

0

1

AND OR XOR0 1

0

1

0

1

1

1

0 1

0

1

0

1

1

0

Dr. Jean-Michel RICHER Data Mining - Neural Networks 29 / 79

Page 30: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The perceptron

Why XOR is not possible with a perceptron ? (2/2)

a b a XOR b equation0 0 0 w0 + 0×w1 + 0×w2 ≤ 0 (1)0 1 1 w0 + 0×w1 + 1×w2 > 0 (2)1 0 1 w0 + 1×w1 + 0×w2 > 0 (3)1 1 0 w0 + 1×w1 + 1×w2 ≤ 0 (4)

adding (1) and (4) and then (2) and (3):

(1) + (4) 2w0 + w1 + w2 ≤ 0(2) + (3) 2w0 + w1 + w2 > 0

impossible !

Dr. Jean-Michel RICHER Data Mining - Neural Networks 30 / 79

Page 31: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

3. Improvements of NeuralNetworks

Dr. Jean-Michel RICHER Data Mining - Neural Networks 31 / 79

Page 32: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Other activation functions

Heaviside problemIf the activation function is linear then the final output is stilla linear combination of the input data

SigmoidA sigmoid function is a real function (special case of thelogistic function):

bounded (min, max)differentiablehas a characteristic "S"-shaped curve

s(x) = σ(x) =1

1 + e−x =ex

1 + ex

Dr. Jean-Michel RICHER Data Mining - Neural Networks 32 / 79

Page 33: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The sigmoid function

Other sigmoid-like functions

hyperbolic tangent: tanh(x) = ex−e−x

ex+e−x

arctangent function: arctan(x)

error function: 2√π

∫ x0 e−t2

dt

See wikipedia for a complete list

Dr. Jean-Michel RICHER Data Mining - Neural Networks 33 / 79

Page 34: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The sigmoid function

Dr. Jean-Michel RICHER Data Mining - Neural Networks 34 / 79

Page 35: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Characteristic features of the sigmoid function

Properties of the sigmoid functionoutput values range from 0 to 1the curve crosses 0.5 at x = 0simple derivative of s(x)× (1− s(x))

used for models where you have to predictprobability of an output

See math.stackexchange.com for demonstration of thederivative

Dr. Jean-Michel RICHER Data Mining - Neural Networks 35 / 79

Page 36: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

4. How to learn with aNeural Network ?

Dr. Jean-Michel RICHER Data Mining - Neural Networks 36 / 79

Page 37: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The backpropagation

Notationswe will use L to refer to a layer

yL represents the output of layer L

xL−1 represents the input layer for the computation ofyL

wL is the vector of weightsthe output is then computed by

yL = σ(wL × xL−1 + bL)

where σ is the sigmoid activation function and bL is the bias

Dr. Jean-Michel RICHER Data Mining - Neural Networks 37 / 79

Page 38: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The backpropagation

zL

0.6

zL−1

0.2

σ(zL)

yL−1 yL

σ(zL−1) wL,bL

To simplify understanding we will write:

yL = σ(zL)

withzL = wL × xL−1 + bL

Dr. Jean-Michel RICHER Data Mining - Neural Networks 38 / 79

Page 39: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The backpropagation

Imagine you want to build a NN to implement the XORfunction using a hidden layer:

L0 L1 L2

y1 y2x1

Input layer Hidden layer Output layer

we propagate the input values to the output layer:

y1 = σ(w1 × x0 + b1)

y2 = σ(w2 × y1 + b2)

We then can compare y2 to the expected value yexp

Dr. Jean-Michel RICHER Data Mining - Neural Networks 39 / 79

Page 40: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The backpropagation

Error function and gradientIf y2 and yexp (the expected value for the output) are differ-ent we need to modify the wi and bi , for this we computethe error as:

E(y2) =12(yexp − y2)

2

which in fact results from:

E(y2) =12(yexp − σ(W2 × σ(W1 × x0 + b1) + b2)

2

and in fact the error depends of w1, b1, w2, b2

Dr. Jean-Michel RICHER Data Mining - Neural Networks 40 / 79

Page 41: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The backpropagation

Error function and gradientWe will use the gradient of E to determine the influence ofthe wL’s and the bias bL’s:

∇E =

(∂E∂wL

,∂E∂bL

)+∇E is the direction to increase the function−∇E is the direction to decrease the function

Dr. Jean-Michel RICHER Data Mining - Neural Networks 41 / 79

Page 42: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The backpropagation

How to compute ∂E∂wL

?

Remember that

zL = WL × yL−1 + bLyL = σ(zL)

E = 12(yexp − yL)

2

So the derivative of E with respect to wL can be rewritten:

∂E∂wL

=

(∂E∂yL

)(∂yL

∂zL

)(∂zL

∂wL

)

Dr. Jean-Michel RICHER Data Mining - Neural Networks 42 / 79

Page 43: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The backpropagation

How to compute ∂E∂wL

?

Remember that

∂E∂yL

= 12 × 2× (yexp − yL)×−1

∂yL∂zL

= σ′(zL)∂zL∂wL

= yL−1

So the derivative of E with respect to wL is:

∂E∂wL

= −× (yexp − yL)× σ′(zL)× yL−1

Dr. Jean-Michel RICHER Data Mining - Neural Networks 43 / 79

Page 44: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The backpropagation

What about ∂E∂bL

?

Following the same demonstration, we get:

∂E∂bL

= −× (yexp − yL)× σ′(zL)

Dr. Jean-Michel RICHER Data Mining - Neural Networks 44 / 79

Page 45: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The backpropagation

Last stepwe have to sum all errors for each input datathen propagate the change to the previous layerusing the gradient:L2_delta = L2_error * sigmoid_deriv(L2)

L1_error = L2_delta.dot(w2.T)

L1_delta = L1_error * sigmoid_deriv(L1)

w2 += L1.T.dot(L2_delta) * eta

w1 += L0.T.dot(L1_delta) * eta

Dr. Jean-Michel RICHER Data Mining - Neural Networks 45 / 79

Page 46: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

The backpropagation

Dr. Jean-Michel RICHER Data Mining - Neural Networks 46 / 79

Page 47: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

How to design a NN ?

Design of Neural Network1 collect data (data structure ?)2 normalize data3 define training sets (Fold technique)4 define a test set (or use one of the folds)5 train the network using backpropagation6 test result

Follow the tutorial of Jason Brownlee on the net calledHow to Implement the Backpropagation Algorithm FromScratch In Python, November 2016.

Dr. Jean-Michel RICHER Data Mining - Neural Networks 47 / 79

Page 48: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

0.05

0.10

W2

0.01

W3

0.99

expectedvalues

0.35 0.60

b3b2

0.15 0.40

y12 = σ(z1

2)

y22 = σ(z2

2)

0.450.20

0.500.25

y23 = σ(z2

3)

y13 = σ(z1

3)x11

x21

0.30 0.55

Dr. Jean-Michel RICHER Data Mining - Neural Networks 48 / 79

Page 49: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

define W2 and W3 as matrices:

W2 =

0.15 0.2

0.25 0.3

=

w1,12 w1,2

2

w2,12 w2,2

2

W3 =

0.4 0.45

0.5 0.55

=

w1,13 w1,2

3

w2,13 w2,2

3

Dr. Jean-Michel RICHER Data Mining - Neural Networks 49 / 79

Page 50: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

Propagate values of x11 and x2

1 by computing z12 and y1

2 :

z12 = b2 + w1,1

2 × x11 + w1,2

2 × x21

z12 = 0.35 + 0.15× 0.05 + 0.2× 0.1 = 0.3775

y12 = σ(z1

2 )

y12 = 1/(1 + e−0.3775) = 0.5932

Dr. Jean-Michel RICHER Data Mining - Neural Networks 50 / 79

Page 51: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

Repeat the process for z22 and y2

2 :

z22 = b2 + w2,1

2 × x11 + w2,2

2 × x21

z22 = 0.35 + 0.25× 0.05 + 0.3× 0.1 = 0.3925

y22 = σ(z1

2 )

y22 = 1/(1 + e−0.3925) = 0.5968

Dr. Jean-Michel RICHER Data Mining - Neural Networks 51 / 79

Page 52: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

To simplify the computation we coud write: z12

z22

=

w1,12 w1,2

2

w2,12 w2,2

2

︸ ︷︷ ︸

W2

×

x12

x22

︸ ︷︷ ︸

X1

+b2 ×

1

1

︸ ︷︷ ︸

B2

or

Z2 = W2 × X1 + B2

and then

Y2 = σ(Z2)

Dr. Jean-Michel RICHER Data Mining - Neural Networks 52 / 79

Page 53: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

Propagate values of y12 and y2

2 by computing z13 and y1

3 :

z13 = b3 + w1,1

3 × y12 + w1,2

3 × y22

z13 = 0.6 + 0.4× 0.5932 + 0.45× 0.5968 = 1.1059

y13 = σ(z1

3 )

y13 = 1/(1 + e−1.1059) = 0.7513

Dr. Jean-Michel RICHER Data Mining - Neural Networks 53 / 79

Page 54: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

Do the same for z23 and y2

3 :

z23 = b3 + w2,1

3 × y12 + w2,2

3 × y22

z23 = 0.6 + 0.5× 0.5932 + 0.55× 0.5968 = 1.2249

y23 = σ(z1

3 )

y23 = 1/(1 + e−1.2249) = 0.7729

Dr. Jean-Michel RICHER Data Mining - Neural Networks 54 / 79

Page 55: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

Compute the error of the network where yexp is the vectorof expected values:

E(y3) = 12∑2

i=1(yiexp − y i

3)2

E(y3) = 12

((y1

exp − y13 )

2 + (y2exp − y2

3 )2)

E(y3) = 12

((0.01− 0.7513)2 + (0.99− 0.7729)2

)E(y3) = 1

2 (0.5496 + 0.0471)

E(y3) = 12(0.5496 + 0.0471) = 0.2983

Dr. Jean-Michel RICHER Data Mining - Neural Networks 55 / 79

Page 56: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

We need to compute the gradient of the error to updateW3:

y13 = σ(z1

3)

W 1,13

E(y3)z1

3 y13

Dr. Jean-Michel RICHER Data Mining - Neural Networks 56 / 79

Page 57: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

We apply the chain rule for w1,13 :

∂E(y3)

∂w1,13

= ∂E(y3)

∂y13× ∂y1

3∂z1

3× ∂z1

3

∂w1,13

where∂E(y3)

∂y13

= 2× 12 × (y1

exp − y13 )×−1

∂y13

∂z13

= σ′(z13 ) = y1

3 × (1− y13 )

∂z13

∂w1,13

= y12

Dr. Jean-Michel RICHER Data Mining - Neural Networks 57 / 79

Page 58: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

∂E(y3)

∂w1,13

= −(y1exp − y1

3 )× y13 × (1− y1

3 )× y12

= −(0.01− 0.7513)× 0.7513× (1− 0.7513)× 0.5932

= 0.7413× 0.1868× 0.5932

= 0.0821

Dr. Jean-Michel RICHER Data Mining - Neural Networks 58 / 79

Page 59: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

For w1,23 :

∂E(y3)

∂w1,23

= ∂E(y3)

∂y13× ∂y1

3∂z1

3× ∂z1

3

∂w1,23

where∂E(y3)

∂y13

= 2× 12 × (y1

exp − y13 )×−1

∂y13

∂z13

= σ′(z13 ) = y1

3 × (1− y13 )

∂z13

∂w1,23

= y22

Dr. Jean-Michel RICHER Data Mining - Neural Networks 59 / 79

Page 60: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

∂E(y3)

∂w1,23

= −(y1exp − y1

3 )× y13 × (1− y1

3 )× y22

= −(0.01− 0.7513)× 0.7513× (1− 0.7513)× 0.5968

= 0.7413× 0.1868× 0.5968

= 0.0826

Dr. Jean-Michel RICHER Data Mining - Neural Networks 60 / 79

Page 61: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

For w2,13 :

∂E(y3)

∂w2,13

= ∂E(y3)

∂y23× ∂y2

3∂z2

3× ∂z2

3

∂w2,13

where∂E(y3)

∂y23

= 2× 12 × (y2

exp − y23 )×−1

∂y23

∂z23

= σ′(z23 ) = y2

3 × (1− y23 )

∂z23

∂w2,13

= y12

Dr. Jean-Michel RICHER Data Mining - Neural Networks 61 / 79

Page 62: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

∂E(y3)

∂w2,13

= −(y2exp − y2

3 )× y23 × (1− y2

3 )× y12

= −(0.99− 0.7729)× 0.7729× (1− 0.7729)× 0.5932

= −0.2171× 0.1755× 0.5932

= −0.02260

Dr. Jean-Michel RICHER Data Mining - Neural Networks 62 / 79

Page 63: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

For w2,23 :

∂E(y3)

∂w2,23

= ∂E(y3)

∂y23× ∂y2

3∂z2

3× ∂z2

3

∂w2,23

where∂E(y3)

∂y23

= 2× 12 × (y2

exp − y23 )×−1

∂y23

∂z23

= σ′(z23 ) = y2

3 × (1− y23 )

∂z23

∂w2,23

= y22

Dr. Jean-Michel RICHER Data Mining - Neural Networks 63 / 79

Page 64: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

∂E(y3)

∂w2,13

= −(y2exp − y2

3 )× y23 × (1− y2

3 )× y12

= −(0.99− 0.7729)× 0.7729× (1− 0.7729)× 0.5968

= −0.2171× 0.1755× 0.5968

= −0.02274

Dr. Jean-Michel RICHER Data Mining - Neural Networks 64 / 79

Page 65: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Backpropagation example

We now can update W3:

W ?3 = W3 − η

0.0821 0.0826

−0.0226 −0.0227

where η is the learning rate, we set it to 0.5 in this case

W ?3 =

[0.358916479717885 0.4086661860762330.511301270238737 0.561370121107989

]

Dr. Jean-Michel RICHER Data Mining - Neural Networks 65 / 79

Page 66: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Limits of NN

Limits of Neural Networksdoes the use of the gradient function gives theminimum ?like for Maximum Parsimony: does the minimumrepresent the best network ?number and size of the hidden layers ?

Dr. Jean-Michel RICHER Data Mining - Neural Networks 66 / 79

Page 67: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

6. Interesting linksand applications

Dr. Jean-Michel RICHER Data Mining - Neural Networks 67 / 79

Page 69: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Toolkits

There are many tookits for NN available for many languages:

GPU computing: cuDNN (NVidia)Theano (University of Montreal)Tensorflow (Google)Caffe (Berkeley AI Research)MXNet (Microsoft, Nvidia, Intel, ...)many more on wikipedia

Dr. Jean-Michel RICHER Data Mining - Neural Networks 69 / 79

Page 70: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Synaptic.js

Synaptic.jsSynaptic.js defines itself as the javascript architecture-freeneural network library for node.js and the browser

you can easily define a NNtrain it efficientlyintegrate the code in a web page

Dr. Jean-Michel RICHER Data Mining - Neural Networks 70 / 79

Page 71: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Synaptic.js for XOR

Dr. Jean-Michel RICHER Data Mining - Neural Networks 71 / 79

Page 72: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Synaptic.js for XOR

Manuallyvar manualTrainingSet = [

{ input: [0,0], output: [0] },

{ input: [0,1], output: [1] },

{ input: [1,0], output: [1] },

{ input: [1,1], output: [0] }

]

GeneratedgeneratedTrainingSet = [];

for (var i = 0; i < 4; ++i) {

var op1 = Math.trunc(i/2);

var op2 = Math.trunc(i & 1);

input=[op1 , op2];

generatedTrainingSet.push({ input,

"output": [Math.trunc(op1 ^ op2)] });

}

Dr. Jean-Michel RICHER Data Mining - Neural Networks 72 / 79

Page 73: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Synaptic.js for XOR

Training of the neural networkDon’t use myTrain.trainXOR() which automatically pro-vides a XOR training set, but use train(..)

// var trainingSet = manualTrainingSet;

var trainingSet = generatedTrainingSet;

myTrainer.train(trainingSet);

Dr. Jean-Michel RICHER Data Mining - Neural Networks 73 / 79

Page 74: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Synaptic.js for IRIS

Neural Network for the IRIS datasetModify the example of the XOR network to create anetwork for the IRIS dataset

take the IRIS dataset from WEKA and convert it toJSONload the JSON data into the web page using JQuerytrain the network and display the results

Dr. Jean-Michel RICHER Data Mining - Neural Networks 74 / 79

Page 75: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Synaptic.js for IRIS

Dr. Jean-Michel RICHER Data Mining - Neural Networks 75 / 79

Page 76: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Synaptic.js for IRIS

JQuery<script type="text/javascript"

src="https://ajax.googleapis.com/ajax/libs/

jquery/2.1.3/jquery.min.js">

</script>

<script type="text/javascript">

var trainingSet = [];

$(document).ready(function(){

$.getJSON("iris.json", function(result){

for (i in result) {

...

}

...

});

});

</script>

Dr. Jean-Michel RICHER Data Mining - Neural Networks 76 / 79

Page 77: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

Synaptic.js for IRIS

NormalizationTo get the best results you need to normalize the data, forexample:

feature scaling:

x ′ =x − xmin

xmax − xmin

standard score:x ′ =

x − µσ

where µ and σ are respectively the mean andstandard deviation of the data

Dr. Jean-Michel RICHER Data Mining - Neural Networks 77 / 79

Page 78: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

6. End

Dr. Jean-Michel RICHER Data Mining - Neural Networks 78 / 79

Page 79: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed

University of Angers - Faculty of Sciences

UA - Angers2 Boulevard Lavoisier 49045 AngersCedex 01Tel: (+33) (0)2-41-73-50-72

Dr. Jean-Michel RICHER Data Mining - Neural Networks 79 / 79