Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th
Transcript of Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th
15/08/54
1
Artificial Neural Network IIIBackpropagation
Werapon Chiracharit
Department of Electronic and Telecommunication EngineeringDepartment of Electronic and Telecommunication Engineering
King Mongkut’s University of Technology Thonburi
Feedforward
Backpropagation
Backpropagation Learning (1)
e.g. To approximate non‐linear function
y = 1 + sin(x/4) , ‐2x<2y 1 + sin(x/4) , 2x<2
18/08/11 2RMUTK
15/08/54
2
Create 2-layer network
Backpropagation Learning (2)
log-sigmoid layera11
b111
b1
b211
yxf11()
f1 ()
f21()
Linear layer
a12
a21
W1 = B1 = W2 = B2 =
18/08/11 3
b 21f 2()
w11,1
w12,1
2×1
b11b12
2×1
w21,1 w2
1,2
1×2
b211×1
RMUTK
Backpropagation Learning (3)
y = purelin( W2( logsig( W1x + B1 ) ) + B2)
h f1 ( 1) f1 ( 1) l i ( 1) 1 / 1 awhere f11(a1) = f12(a
1) = logsig(a1) = 1 / 1+e‐a
f21(a2) = purelin(a2) = a2
Step 1
Initialize weight and bias, generally small random values
1
random values
W1(0) = [‐0.27; ‐0.41], B1(0) = [‐0.48; ‐0.13]
W2(0) = [0.09 ‐0.17], B2(0) = [0.48]
18/08/11 4RMUTK
15/08/54
3
Backpropagation Learning (4)Step 2 Forward
Initialize input, let x = 11 l ( [ ][ ] [ ] ) [ ]
y1
1×1
y1 = logsig( [‐0.27; ‐0.41][1] + [‐0.48; ‐0.13] ) = [0.321; 0.368]
y2 = purelin( [0.09 ‐0.17][0.321; 0.368] + [0.48] ) = [0.446]
Error, E = 1+sin(x/4) – y2 = 1+sin((1)/4) – 0.446 = 1.261
xW1
2×1W2
a2
18/08/11 5RMUTK
f1y2a1
+2×1
2×11 B1
2×11×1f2
2 1
+1×2
1×11 B2
1×1
Backpropagation Learning (5)Step 3 Backward
The derivative of transfer functions are
y1 = f1 (a1) = d(1 / 1+e‐a )/da11
y = f (a ) = d(1 / 1+e )/da
= e‐a / (1+e‐a )2
= [1 – 1/(1+e‐a )] [1/(1+e‐a )]
= [1 – y1] y1
1
1
1
1
y2 = f2 (a2) = d(a2)/da2
= 1
18/08/11 6RMUTK
15/08/54
4
Backpropagation Learning (6)Backpropagate the sensitivities, using gradient descent
S2 = d(E2)/da2 = –2 E y2 = (–2)(1.261)(1) = –2.522
S1 = d(E2)/da1 = [da2/da1] [d(E2)/da2] , Chain ruleS d(E )/da [da /da ] [d(E )/da ] , Chain rule
= [d(W2y1+B2)/da1] (S2) = W2 y1 S2
= [0.09 ‐0.17] [(1‐0.321)(0.321); (1‐0.368)(0.368)] [‐2.522]
= [–0.0495; 0.0997]
xW1 W2y1
1+sin(x/4)
18/08/11 7RMUTK
f1y2
a11×1
+2×1
2×11
W1
B12×1
1×1
f2
a22×1
+1×2
1×11
W
B21×1
E
Backpropagation Learning (7)
Step 4 Update weights and biases with = 0.1 (batch training)
W1(1) = W1(0) – S1 x = [‐0.27; ‐0.41] – (0.1)[‐0.0495; 0.0997](1)
= [‐0.265 ‐0.420]
B1(1) = B1(0) – S1 = [‐0.48; ‐0.13] – (0.1)[‐0.0495; 0.0997]
= [‐0.475 ‐0.140]
W2(1) = W2(0) – S2 y1 = [0.09 ‐0.17] – (0.1)[‐2.522][0.321 0.368]
[ ]= [0.171 ‐0.0772]
B2(1) = B2(0) – S2 = [0.48] – (0.1)[‐2.522]
= [0.732]
Step 5 Repeat step 2 until OK18/08/11 8RMUTK
15/08/54
5
• For K‐layer network
ak = fk( Wkak‐1 + Bk ) , k=1,2,…,K
h 0 P
Backprop Algorithm (1)
where a0 = P
• Training L sample, input: P=[P11×L; P21×L; … PN1×L]
target: T=[T11×L; T21×L; … TMK1×L]
Input layer 1st Layer Kth Layer (output)2nd Layer
Hidden layer
18/08/11 RMUTK 9
n1N×1 +
P M1×N
1
W1
B1 M1×1
…
n2M1×1 +
a1 M2×M1
1
W2
B2 M2×1
f1 f2a2
nKMK‐1×1 +
ak‐1 MK×MK‐1
1
WK
BK MK×1
fKaK
M2×1
Input layer 1 Layer K Layer (output)…
MK×1
2 Layer
• Mean square error, E = E[(T – aK)2] = (T – aK)T(T – aK)
• Gradient descent update rule,
Wk(t 1) Wk(t) E/Wk
Backprop Algorithm (2)
Wk(t+1) = Wk(t) – E/Wk
Bk(t+1) = Bk(t) – E/Bk
Chain rule, E/Wk = (E/nk) (nk/Wk)
/ k ( / k) ( k/ k)E/Bk = (E/nk) (nk/Bk)
Let define the sensitivity of error, Sk = E/nk
18/08/11 RMUTK 10
15/08/54
6
From nk = Wkak‐1 + Bk
nk/Wk = ak‐1 and nk/Bk = 1
Backprop Algorithm (3)
Therefore, E/Wk = Sk ak‐1 and E/Bk = Sk
Update rule,
Wk(t+1) = Wk(t) – Sk ak‐1
Bk(t+1) = Bk(t) – Sk , k=1,2,…,K
18/08/11 RMUTK 11
• Chain rule, Sk = E/nk
= (E/nk+1) (nk+1/nk)
Backprop Algorithm (4)
= [ (Wk+1ak + Bk+1 )/nk ] (Sk+1) = [ (Wk+1 fk(nk) + Bk+1) / nk ] (Sk+1)= (Wk+1 fk(nk)/nk) (Sk+1)= Wk+1 fk (nk) Sk+1( )
for k = K‐1, K‐2, …, 3, 2, 1
S1 S2 … SK‐2 SK‐1 SK
Backpropagation18/08/11 RMUTK 12
15/08/54
7
At the final layer,
SK = E/nK = (T – aK)2/nK
(T fK( K))2/ K
Backprop Algorithm (5)
= (T – fK(nK))2/nK
= –2 E fK (nK)
• Repeat updating and check convergence/divergenceand check convergence/divergence
18/08/11 RMUTK 13
Faster Training• Gradient descent with momentum, 'traingdm', not only local gradient, but also recent trends in errors
W(t+1) = W(t) – E/W + [ W(t) – W(t‐1) ]
“Heuristic techniques”
• Adaptive learning‐rate gradient descent, 'traingda'
• Resilient backpropagation, 'trainrp', to reduce sigmoid function effects that their slope approaches zero as the input gets large p g g
• Conjugate gradient, 'traincgf', performs along conjugate directions of the gradient “Numerical optimization techniques”
18/08/11 RMUTK 14
15/08/54
8
Preprocessing and Postprocessing
• Normalization or to specify input and target i ()ranges, mapminmax()
• Set input and target to zero mean and unity standard deviation, mapstd()
• Principle component analysis, processpca(), using eigenvector technoqueusing eigenvector technoque
• Fix NaN value (not a number e.g. devided by zero), fixunknowns()
18/08/11 RMUTK 15
XOR Problem (1)
Known: p1, p2, t
Unknown: w11,1,w
11,2,
p2
(0 1) (1 1)w1
2,1, w12,2, w
21, w
22,
b11, b12, b
2
p1(0,0)
(1,0)
(0,1) (1,1)
P2a1
Input Layer Hidden Layer Output Layer
18/08/11 RMUTK 16
f1a2n1
2×1
+2×2
2×11
W1
B12×1
1×1f2n2
2×1
+1×2
1×11
W2
B21×1
a
15/08/54
9
XOR Problem (2)
% Define input and target vector
>> P=[0 0 1 1; 0 1 0 1];
>> T=[0 1 1 0];
>> plotpv(P, T)
18/08/11 RMUTK 17
XOR Problem (3)% Create a feedforward network, input range N×2, size of hidden‐output layers and transfer function of hidden‐output layer, backprop training functionof hidden output layer, backprop training function
>> net=newff([0 1; 0 1], [2 1], {'tansig' 'purelin'}, …
'traingdm' );
% Initialize weight (optional)
>> net=init(net);>> net=init(net);
% Learning rate
>> net.trainParam.lr=0.1;
18/08/11 RMUTK 18
15/08/54
10
XOR Problem (4)% Initial weights and biases
>> net.IW{1}, net.LW{2}
2 8409 2 7585ans = 2.8409 ‐2.7585
3.0861 ‐2.4812
ans = ‐0.3293 0.3595
>> net.b{1}, net.b{2}
ans = ‐2.0211
1.6774
ans = ‐0.7269
18/08/11 RMUTK 19
XOR Problem (5)% Number of updating through the entire data set
>> net.trainParam.epochs=1000;
t t i P l 1 1>> net.trainParam.goal = 1e‐1;
% Training
>> net=train(net, P, T);
>> net.IW{1}, net.LW{2}
ans = 2.6459 ‐2.7467
3.0803 ‐2.2518
ans = 0.3904 ‐0.3804
18/08/11 RMUTK 20
15/08/54
11
XOR Problem (6)>> net.b{1}
2 2043ans = ‐2.2043
1.8839
>> net.b{2}
ans = 0.8191
18/08/11 RMUTK 21
XOR Problem (7)>> y=sim(net, P) % Testing
y = 0.0749 0.5627 0.6008 0.0593
• Performance
18/08/11 RMUTK 22
15/08/54
12
XOR Problem (8)• Training State
18/08/11 RMUTK 23
• Regression
XOR Problem (9)
18/08/11 RMUTK 24
15/08/54
13
Training, Validation and Testingfor each epoch
for each training data set
Propagate error through the network
Adjust the weights
Calculate the accuracy over training data
for each validation data set
Calculate the accuracy over the validation data
if the threshold validation accuracy is met
Exit training % Early stopping for over training
else
Continue training
Testing data set 18/08/11 RMUTK 25
Neural Network GUI (1)>>nntool
17/08/11 RMUTK 26
15/08/54
14
Neural Network GUI (2)e.g. Solving XOR problem (2‐layer perceptron)
• Click “New…”
and “Data”and Data
• “Create” input P
17/08/11 RMUTK 27
Neural Network GUI (3)• “Create” target T
17/08/11 RMUTK 28
15/08/54
15
Neural Network GUI (4)• “Create” network
17/08/11 RMUTK 29
Neural Network GUI (5)• Choose and open the network
17/08/11 RMUTK 30
15/08/54
16
Neural Network GUI (6)
• “Initialize Weights”
18/08/11 RMUTK 31
Neural Network GUI (7)• “Training Info” and “Training Parameters”
17/08/11 RMUTK 32
15/08/54
17
Neural Network GUI (8)
• “Train Network”
17/08/11 RMUTK 33
Neural Network GUI (9)
• “Performance”
17/08/11 RMUTK 34
15/08/54
18
Neural Network GUI (10)
• “Training State”
17/08/11 RMUTK 35
Neural Network GUI (11)
• “Regression”
17/08/11 RMUTK 36
15/08/54
19
Neural Network GUI (12)• “Simulate Network”
18/08/11 RMUTK 37
Neural Network GUI (13)
18/08/11 RMUTK 38
• Final weights and biases
15/08/54
20
References
• Martin T. Hagan, Howard B. Demuth and Mark B l N l N t k D i 1996 PWSBeale, Neural Network Design, 1996, PWS Publishing
• Neural Network Design webpage, http://hagan.okstate.edu/nnd.html
• MathWorks Neural Network Toolbox webpageMathWorks Neural Network Toolbox webpage, http://www.mathworks.com/products/neuralnet/demos.html
18/08/11 RMUTK 39
Thank you for attention
Q & A