Machine Learning and Computer Vision Group · 01 Introduction 02 Perceptron 03 Multi-layer...
Transcript of Machine Learning and Computer Vision Group · 01 Introduction 02 Perceptron 03 Multi-layer...
Machine Learning and Computer Vision Group
Deep Learning with TensorFlowhttp://cvml.ist.ac.at/courses/DLWT_W18
Lecture 3:Artificial Neural Networks(Multilayer Perceptrons)
Artificial Neural Networks Multi-layer Perceptrons
Lars Bollmann
Contents
Brief history, biological neuron and artificial neural network
Introduction 01
Features & drawbacks
Perceptron 02
Description & training
Multi-layer Perceptron 03
High-Level API and plain Tensorflow
Implementation: Tensorflow 04
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow
Brief history
[1]
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow
Biological neuron
[2]
[4] [3]
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow
Artificial neural network (ANN)
Logical AND
Logical OR
McCulloch & Pitts (1943) LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105
(e
(i 1
Figure 1. The neuron ci is always marked with the numeral i upon the body of the cell, and the corresponding action is denoted by “N” with is subscript, as in the text:
(a) N*(t) .=.N,(t- 1); (b) N,(t).s.N,(t-l)vN,(t-1);
(c) N3(t).s.N1(t-1).N2(t-1);
(d) N3(t).= N,(t-l).-N,(t-1);
(e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2); N&).=.N2(t-2).N2(t-1);
(f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). N,(t-l).N,(t-1) NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). N,(t-2).N,(t-2);
(g) N,(t).=.NN,(t-2).-N,(t-3); (h) N,(t).=.N,(t-l).N,(t-2); (i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x).
LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105
(e
(i 1
Figure 1. The neuron ci is always marked with the numeral i upon the body of the cell, and the corresponding action is denoted by “N” with is subscript, as in the text:
(a) N*(t) .=.N,(t- 1); (b) N,(t).s.N,(t-l)vN,(t-1);
(c) N3(t).s.N1(t-1).N2(t-1);
(d) N3(t).= N,(t-l).-N,(t-1);
(e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2); N&).=.N2(t-2).N2(t-1);
(f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). N,(t-l).N,(t-1) NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). N,(t-2).N,(t-2);
(g) N,(t).=.NN,(t-2).-N,(t-3); (h) N,(t).=.N,(t-l).N,(t-2); (i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x).
LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105
(e
(i 1
Figure 1. The neuron ci is always marked with the numeral i upon the body of the cell, and the corresponding action is denoted by “N” with is subscript, as in the text:
(a) N*(t) .=.N,(t- 1); (b) N,(t).s.N,(t-l)vN,(t-1);
(c) N3(t).s.N1(t-1).N2(t-1);
(d) N3(t).= N,(t-l).-N,(t-1);
(e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2); N&).=.N2(t-2).N2(t-1);
(f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). N,(t-l).N,(t-1) NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). N,(t-2).N,(t-2);
(g) N,(t).=.NN,(t-2).-N,(t-3); (h) N,(t).=.N,(t-l).N,(t-2); (i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x).
Heat illusion
LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105
(e
(i 1
Figure 1. The neuron ci is always marked with the numeral i upon the body of the cell, and the corresponding action is denoted by “N” with is subscript, as in the text:
(a) N*(t) .=.N,(t- 1); (b) N,(t).s.N,(t-l)vN,(t-1);
(c) N3(t).s.N1(t-1).N2(t-1);
(d) N3(t).= N,(t-l).-N,(t-1);
(e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2); N&).=.N2(t-2).N2(t-1);
(f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). N,(t-l).N,(t-1) NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). N,(t-2).N,(t-2);
(g) N,(t).=.NN,(t-2).-N,(t-3); (h) N,(t).=.N,(t-l).N,(t-2); (i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x).
“learning”
All images from [5]
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow
Σ
The Perceptron Rosenblatt (1957)
1
Outputs
Inputs x1 x2
Features Drawbacks
w i , j(next _ step)=wi , j +η(y j − y j )xi
y
w2,1
• Converges if training instances are linearly separable
• Still a linear classifier
x1
x2
• Linear threshold unit:
w1,1
y =1 if wi ,1
i=1
m
∑ ⋅ xi + b > 0
0 otherwise
⎧
⎨⎪
⎩⎪
• Update rule for weights: b
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow
Multi-layer Perceptron (MLP)
1
Output layer Input layer
x1
x2 y3
y2
y1
Σ
1
Hidden layer
Σ
Σ
Σ
Σ
Σ
Σ
• Activation of one neuron:
• Alternative activation functions:
[6]
ai(L) = factivation w(i , j )a j
(L−1) +bj∑⎛
⎝⎜⎜
⎞
⎠⎟⎟
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow
Training
1
Output layer Input layer
Σ
1
Hidden layer
Σ
Σ
Σ
Σ
Σ
Σ
x1
x2
ai(L) = factivation w(i , j )a j
(L−1) +bj∑⎛
⎝⎜⎜
⎞
⎠⎟⎟
• Activation of one neuron:
• Find optimal weights & bias terms to reduce cost function (e.g. MSE, cross-entropy)
• Weights: influence of neurons in previous layer
• Bias terms: nudge towards active/inactive
(ŷ1 – y1)2
(ŷ2 - y2)2
(ŷ3 – y3)2
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow
Training
1
Output layer Input layer
x1
x2
(ŷ1 – y1)2
Σ
1
Hidden layer
Σ
Σ
Σ
Σ
Σ
Σ
(ŷ2 - y2)2
(ŷ3 – y3)2
• Initialize weights & bias terms with random values
• Average cost over all instances:
• Mini-batches to compute gradient with
respect to weights and biases • Gradient descent step
Minimize cost function using GD
C =1m
Cii=1
m
∑
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow
• Forward pass: compute output of all neurons for training instance
• Backward pass: compute partial derivatives for each weight and bias
• Average over mini-batch:
• Combine partial derivatives
à gradient vector
Training
1
Output layer Input layer
x1 Σ
Hidden layer
Σ
a(L-1) a(L)
C0 = a(L) − y( ) 2• Cost:
• Activation: with
• Derivative of C0 with respect to w(L) :
a(L) = fact (z(L) ) z (L) = w(L)a(L−1) +b
∂C0∂w(L)
=∂z (L)
∂w(L)⋅∂a(L)
∂z (L)⋅∂C0∂a(L)
∂Cmb∂w(L)
=1mmb
∂Ci∂w(L)i=1
mmb
∑
Backpropagation
w(L)
b
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow
Tensorflow High-Level API Plain Tensorflow • TF.Learn
• DNNClassifier
• Parameters: • # layers • # neurons per layer • batch size • # iterations • activation function
1. Construction phase
2. Training phase
3. Using the trained network
à Demo in Jupyter
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow
Hyperparameters # hidden layers # neurons / layer # activation
• Funnel or constant size • “black art”
• Too many neurons cause overfitting
• ReLU is a good choice in general
• Softmax for output when
classes are mutually exclusive
• More layers: • exponentially fewer
neurons for complex functions
• Converge faster • Generalize better • Reuse of layers
• Start with few layers and
increase number
Summary
Initial concepts in the 1940s, dark age and recent boom
From Artifical Neurons to Deep Neural networks
Layers, neurons, bias, weights, activation function
Multi-layer Perceptron
Gradient descent, backpropagation & mini-batches
Training of ANN
High-Level API, Plain TensorFlow, hyperparameter tuning
Implementation in Tensorflow
References [1] https://www.slideshare.net/deview/251-deep-learning-using-cu-dnn/4 [2] https://en.wikipedia.org/wiki/Neuron#/media/File:Blausen_0657_MultipolarNeuron.png [3] https://en.wikipedia.org/wiki/Neural_oscillation [4] http://www.lesicalab.com/research/ [5] McCulloch and Pitts, A logical calculus of the ideas immanent in nervous activity [6] https://medium.com/@shrutijadon10104776/survey-on-activation-functions-for-deep-learning-9689331ba09