Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

20
15/08/54 1 Artificial Neural Network III Backpropagation Werapon Chiracharit Department of Electronic and Telecommunication Engineering Department of Electronic and Telecommunication Engineering King Mongkut’s University of Technology Thonburi Feedforward Backpropagation Backpropagation Learning (1) e.g. To approximate nonlinear function y = 1 + sin(x/4), 2x<2 y 1 + sin(x/4), 2x<2 18/08/11 2 RMUTK

Transcript of Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

Page 1: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

1

Artificial Neural Network IIIBackpropagation

Werapon Chiracharit

Department of Electronic and Telecommunication EngineeringDepartment of Electronic and Telecommunication Engineering

King Mongkut’s University of Technology Thonburi

Feedforward

Backpropagation

Backpropagation Learning (1)

e.g. To approximate non‐linear function 

y = 1 + sin(x/4) , ‐2x<2y   1 + sin(x/4) ,  2x<2

18/08/11 2RMUTK

Page 2: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

2

Create 2-layer network

Backpropagation Learning (2)

log-sigmoid layera11

b111

b1

b211

yxf11()

f1 ()

f21()

Linear layer

a12

a21

W1 = B1 = W2 = B2 =

18/08/11 3

b 21f 2()

w11,1

w12,1

2×1

b11b12

2×1

w21,1 w2

1,2

1×2

b211×1

RMUTK

Backpropagation Learning (3)

y = purelin( W2( logsig( W1x + B1 ) ) + B2)

h f1 ( 1) f1 ( 1) l i ( 1) 1 / 1 awhere   f11(a1) = f12(a

1) = logsig(a1) = 1 / 1+e‐a

f21(a2) = purelin(a2) = a2

Step 1

Initialize weight and bias, generally small random values

1

random values 

W1(0) = [‐0.27; ‐0.41], B1(0) = [‐0.48; ‐0.13]

W2(0) = [0.09 ‐0.17],  B2(0) = [0.48]

18/08/11 4RMUTK

Page 3: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

3

Backpropagation Learning (4)Step 2 Forward

Initialize input, let x = 11 l ( [ ][ ] [ ] ) [ ]

y1

1×1

y1 = logsig( [‐0.27; ‐0.41][1] + [‐0.48; ‐0.13] ) = [0.321; 0.368]

y2 = purelin( [0.09 ‐0.17][0.321; 0.368] + [0.48] ) = [0.446]

Error, E = 1+sin(x/4) – y2 = 1+sin((1)/4) – 0.446 = 1.261

xW1

2×1W2

a2

18/08/11 5RMUTK

f1y2a1

+2×1

2×11 B1

2×11×1f2

2 1

+1×2

1×11 B2

1×1

Backpropagation Learning (5)Step 3 Backward

The derivative of transfer functions are

y1  = f1 (a1) = d(1 / 1+e‐a )/da11

y = f (a ) = d(1 / 1+e )/da

= e‐a  / (1+e‐a )2

= [1 – 1/(1+e‐a )] [1/(1+e‐a )] 

= [1 – y1] y1

1

1

1

1

y2  = f2 (a2) = d(a2)/da2

= 1

18/08/11 6RMUTK

Page 4: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

4

Backpropagation Learning (6)Backpropagate the sensitivities, using gradient descent 

S2 = d(E2)/da2 = –2 E y2 =  (–2)(1.261)(1) = –2.522

S1 = d(E2)/da1 = [da2/da1] [d(E2)/da2] , Chain ruleS  d(E )/da  [da /da ] [d(E )/da ] , Chain rule 

= [d(W2y1+B2)/da1] (S2) = W2 y1  S2

= [0.09 ‐0.17] [(1‐0.321)(0.321); (1‐0.368)(0.368)] [‐2.522]

= [–0.0495; 0.0997]

xW1 W2y1

1+sin(x/4)

18/08/11 7RMUTK

f1y2

a11×1

+2×1

2×11

W1

B12×1

1×1

f2

a22×1

+1×2

1×11

W

B21×1

E

Backpropagation Learning (7)

Step 4 Update weights and biases with  = 0.1 (batch training)

W1(1) = W1(0) – S1 x = [‐0.27; ‐0.41] – (0.1)[‐0.0495; 0.0997](1)

= [‐0.265 ‐0.420]

B1(1) = B1(0) – S1 = [‐0.48; ‐0.13] – (0.1)[‐0.0495; 0.0997]

= [‐0.475 ‐0.140]

W2(1) = W2(0) – S2 y1 = [0.09 ‐0.17] – (0.1)[‐2.522][0.321 0.368]

[ ]= [0.171 ‐0.0772]

B2(1) = B2(0) – S2 = [0.48] – (0.1)[‐2.522]

= [0.732]

Step 5 Repeat step 2 until OK18/08/11 8RMUTK

Page 5: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

5

• For K‐layer network

ak = fk( Wkak‐1 + Bk )      , k=1,2,…,K

h 0 P

Backprop Algorithm (1)

where a0 = P

• Training L sample, input: P=[P11×L; P21×L; … PN1×L]

target: T=[T11×L; T21×L; … TMK1×L]

Input layer 1st Layer Kth Layer (output)2nd Layer

Hidden layer

18/08/11 RMUTK 9

n1N×1 +

P M1×N

1

W1

B1 M1×1

n2M1×1 +

a1 M2×M1

1

W2

B2 M2×1

f1 f2a2

nKMK‐1×1 +

ak‐1 MK×MK‐1

1

WK

BK MK×1

fKaK

M2×1

Input layer 1 Layer K Layer (output)…

MK×1

2 Layer

• Mean square error, E = E[(T – aK)2] = (T – aK)T(T – aK)

• Gradient descent update rule, 

Wk(t 1) Wk(t) E/Wk

Backprop Algorithm (2)

Wk(t+1) = Wk(t) – E/Wk

Bk(t+1) = Bk(t) – E/Bk

Chain rule, E/Wk = (E/nk) (nk/Wk)

/ k ( / k) ( k/ k)E/Bk = (E/nk) (nk/Bk)

Let define the sensitivity of error, Sk = E/nk

18/08/11 RMUTK 10

Page 6: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

6

From nk = Wkak‐1 + Bk 

nk/Wk = ak‐1 and nk/Bk = 1 

Backprop Algorithm (3)

Therefore, E/Wk = Sk ak‐1 and E/Bk = Sk

Update rule,

Wk(t+1) = Wk(t) – Sk ak‐1

Bk(t+1) = Bk(t) – Sk , k=1,2,…,K

18/08/11 RMUTK 11

• Chain rule, Sk = E/nk

= (E/nk+1) (nk+1/nk)

Backprop Algorithm (4)

= [ (Wk+1ak + Bk+1 )/nk ] (Sk+1) = [ (Wk+1 fk(nk) + Bk+1) / nk ] (Sk+1)= (Wk+1 fk(nk)/nk) (Sk+1)= Wk+1 fk (nk) Sk+1( )

for k = K‐1, K‐2, …, 3, 2, 1

S1 S2 …  SK‐2 SK‐1 SK

Backpropagation18/08/11 RMUTK 12

Page 7: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

7

At the final layer,

SK = E/nK = (T – aK)2/nK

(T fK( K))2/ K

Backprop Algorithm (5)

= (T – fK(nK))2/nK

= –2 E fK (nK)

• Repeat updating and check convergence/divergenceand check convergence/divergence

18/08/11 RMUTK 13

Faster Training• Gradient descent with momentum, 'traingdm', not only local gradient, but also recent trends in errors 

W(t+1) = W(t) – E/W +  [ W(t) – W(t‐1) ]

“Heuristic techniques”

• Adaptive learning‐rate gradient descent, 'traingda'

• Resilient backpropagation, 'trainrp', to reduce sigmoid function effects that  their slope approaches zero as the input gets large p g g

• Conjugate gradient, 'traincgf', performs along conjugate directions of the gradient  “Numerical optimization techniques”

18/08/11 RMUTK 14

Page 8: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

8

Preprocessing and Postprocessing

• Normalization or to specify input and target i ()ranges, mapminmax()

• Set input and target to zero mean and unity standard deviation, mapstd()

• Principle component analysis, processpca(), using eigenvector technoqueusing eigenvector technoque

• Fix NaN value (not a number e.g. devided by zero), fixunknowns() 

18/08/11 RMUTK 15

XOR Problem (1)

Known: p1, p2, t

Unknown: w11,1,w

11,2,

p2

(0 1) (1 1)w1

2,1, w12,2, w

21, w

22,

b11, b12, b

2

p1(0,0)

(1,0)

(0,1) (1,1)

P2a1

Input Layer Hidden Layer Output Layer

18/08/11 RMUTK 16

f1a2n1

2×1

+2×2

2×11

W1

B12×1

1×1f2n2

2×1

+1×2

1×11

W2

B21×1

a

Page 9: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

9

XOR Problem (2)

% Define input and target vector

>> P=[0 0 1 1; 0 1 0 1];

>> T=[0 1 1 0];

>> plotpv(P, T)

18/08/11 RMUTK 17

XOR Problem (3)% Create a feedforward network, input range N×2, size of hidden‐output layers and transfer function of hidden‐output layer, backprop training functionof hidden output layer, backprop training function 

>> net=newff([0 1; 0 1], [2 1], {'tansig' 'purelin'}, …

'traingdm' );

% Initialize weight (optional)

>> net=init(net);>> net=init(net);

% Learning rate

>> net.trainParam.lr=0.1;

18/08/11 RMUTK 18

Page 10: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

10

XOR Problem (4)% Initial weights and biases

>> net.IW{1}, net.LW{2}

2 8409 2 7585ans = 2.8409  ‐2.7585

3.0861  ‐2.4812 

ans = ‐0.3293  0.3595

>> net.b{1}, net.b{2}

ans = ‐2.0211

1.6774 

ans = ‐0.7269

18/08/11 RMUTK 19

XOR Problem (5)% Number of updating through the entire data set 

>> net.trainParam.epochs=1000;

t t i P l 1 1>> net.trainParam.goal = 1e‐1; 

% Training

>> net=train(net, P, T);

>> net.IW{1}, net.LW{2}

ans = 2.6459  ‐2.7467

3.0803  ‐2.2518 

ans = 0.3904  ‐0.3804

18/08/11 RMUTK 20

Page 11: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

11

XOR Problem (6)>> net.b{1}

2 2043ans = ‐2.2043

1.8839

>> net.b{2} 

ans = 0.8191

18/08/11 RMUTK 21

XOR Problem (7)>> y=sim(net, P) % Testing

y = 0.0749  0.5627  0.6008  0.0593

• Performance

18/08/11 RMUTK 22

Page 12: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

12

XOR Problem (8)• Training State

18/08/11 RMUTK 23

• Regression

XOR Problem (9)

18/08/11 RMUTK 24

Page 13: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

13

Training, Validation and Testingfor each epoch 

for each training data set

Propagate error through the network 

Adjust the weights 

Calculate the accuracy over training data 

for each validation data set 

Calculate the accuracy over the validation data 

if the threshold validation accuracy is met 

Exit training   % Early stopping for over training 

else 

Continue training

Testing data set 18/08/11 RMUTK 25

Neural Network GUI (1)>>nntool

17/08/11 RMUTK 26

Page 14: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

14

Neural Network GUI (2)e.g. Solving XOR problem (2‐layer perceptron)

• Click “New…”

and “Data”and  Data

• “Create” input P

17/08/11 RMUTK 27

Neural Network GUI (3)• “Create” target T

17/08/11 RMUTK 28

Page 15: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

15

Neural Network GUI (4)• “Create” network

17/08/11 RMUTK 29

Neural Network GUI (5)• Choose and open the network

17/08/11 RMUTK 30

Page 16: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

16

Neural Network GUI (6)

• “Initialize Weights”

18/08/11 RMUTK 31

Neural Network GUI (7)• “Training Info” and “Training Parameters”

17/08/11 RMUTK 32

Page 17: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

17

Neural Network GUI (8)

• “Train Network”

17/08/11 RMUTK 33

Neural Network GUI (9)

• “Performance”

17/08/11 RMUTK 34

Page 18: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

18

Neural Network GUI (10)

• “Training State”

17/08/11 RMUTK 35

Neural Network GUI (11)

• “Regression”

17/08/11 RMUTK 36

Page 19: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

19

Neural Network GUI (12)• “Simulate Network”

18/08/11 RMUTK 37

Neural Network GUI (13)

18/08/11 RMUTK 38

• Final weights and biases

Page 20: Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

20

References

• Martin T. Hagan, Howard B. Demuth and Mark B l N l N t k D i 1996 PWSBeale, Neural Network Design, 1996, PWS Publishing

• Neural Network Design webpage, http://hagan.okstate.edu/nnd.html

• MathWorks Neural Network Toolbox webpageMathWorks Neural Network Toolbox webpage, http://www.mathworks.com/products/neuralnet/demos.html

18/08/11 RMUTK 39

Thank you for attention

Q & A