Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

15/08/54

1

Artificial Neural Network IIIBackpropagation

Werapon Chiracharit

Department of Electronic and Telecommunication EngineeringDepartment of Electronic and Telecommunication Engineering

King Mongkut’s University of Technology Thonburi

Feedforward

Backpropagation

Backpropagation Learning (1)

e.g. To approximate non‐linear function

y = 1 + sin(x/4) , ‐2x<2y 1 + sin(x/4) , 2x<2

18/08/11 2RMUTK

15/08/54

2

Create 2-layer network


log-sigmoid layera11

b111

b1

b211

yxf11()

f1 ()

f21()

Linear layer

a12

a21

W1 = B1 = W2 = B2 =

18/08/11 3

b 21f 2()

w11,1

w12,1

2×1

b11b12

2×1

w21,1 w2

1,2

1×2

b211×1

RMUTK


y = purelin( W2( logsig( W1x + B1 ) ) + B2)

h f1 ( 1) f1 ( 1) l i ( 1) 1 / 1 awhere f11(a1) = f12(a

1) = logsig(a1) = 1 / 1+e‐a

f21(a2) = purelin(a2) = a2

Step 1

Initialize weight and bias, generally small random values

1

random values

W1(0) = [‐0.27; ‐0.41], B1(0) = [‐0.48; ‐0.13]

W2(0) = [0.09 ‐0.17], B2(0) = [0.48]

18/08/11 4RMUTK

15/08/54

3

Backpropagation Learning (4)Step 2 Forward

Initialize input, let x = 11 l ( [ ][ ] [ ] ) [ ]

y1

1×1

y1 = logsig( [‐0.27; ‐0.41][1] + [‐0.48; ‐0.13] ) = [0.321; 0.368]

y2 = purelin( [0.09 ‐0.17][0.321; 0.368] + [0.48] ) = [0.446]

Error, E = 1+sin(x/4) – y2 = 1+sin((1)/4) – 0.446 = 1.261

xW1

2×1W2

a2

18/08/11 5RMUTK

f1y2a1

+2×1

2×11 B1

2×11×1f2

2 1

+1×2

1×11 B2

1×1

Backpropagation Learning (5)Step 3 Backward

The derivative of transfer functions are

y1 = f1 (a1) = d(1 / 1+e‐a )/da11

y = f (a ) = d(1 / 1+e )/da

= e‐a / (1+e‐a )2

= [1 – 1/(1+e‐a )] [1/(1+e‐a )]

= [1 – y1] y1

1

1

1

1

y2 = f2 (a2) = d(a2)/da2

= 1

18/08/11 6RMUTK

15/08/54

4

Backpropagation Learning (6)Backpropagate the sensitivities, using gradient descent

S2 = d(E2)/da2 = –2 E y2 = (–2)(1.261)(1) = –2.522

S1 = d(E2)/da1 = [da2/da1] [d(E2)/da2] , Chain ruleS d(E )/da [da /da ] [d(E )/da ] , Chain rule

= [d(W2y1+B2)/da1] (S2) = W2 y1 S2

= [0.09 ‐0.17] [(1‐0.321)(0.321); (1‐0.368)(0.368)] [‐2.522]

= [–0.0495; 0.0997]

xW1 W2y1

1+sin(x/4)

18/08/11 7RMUTK

f1y2

a11×1

+2×1

2×11

W1

B12×1

1×1

f2

a22×1

+1×2

1×11

W

B21×1

E


Step 4 Update weights and biases with = 0.1 (batch training)

W1(1) = W1(0) – S1 x = [‐0.27; ‐0.41] – (0.1)[‐0.0495; 0.0997](1)

= [‐0.265 ‐0.420]

B1(1) = B1(0) – S1 = [‐0.48; ‐0.13] – (0.1)[‐0.0495; 0.0997]

= [‐0.475 ‐0.140]

W2(1) = W2(0) – S2 y1 = [0.09 ‐0.17] – (0.1)[‐2.522][0.321 0.368]

[ ]= [0.171 ‐0.0772]

B2(1) = B2(0) – S2 = [0.48] – (0.1)[‐2.522]

= [0.732]

Step 5 Repeat step 2 until OK18/08/11 8RMUTK

15/08/54

5

• For K‐layer network

ak = fk( Wkak‐1 + Bk ) , k=1,2,…,K

h 0 P

Backprop Algorithm (1)

where a0 = P

• Training L sample, input: P=[P11×L; P21×L; … PN1×L]

target: T=[T11×L; T21×L; … TMK1×L]

Input layer 1st Layer Kth Layer (output)2nd Layer

Hidden layer

18/08/11 RMUTK 9

n1N×1 +

P M1×N

1

W1

B1 M1×1

…

n2M1×1 +

a1 M2×M1

1

W2

B2 M2×1

f1 f2a2

nKMK‐1×1 +

ak‐1 MK×MK‐1

1

WK

BK MK×1

fKaK

M2×1

Input layer 1 Layer K Layer (output)…

MK×1

2 Layer

• Mean square error, E = E[(T – aK)2] = (T – aK)T(T – aK)

• Gradient descent update rule,

Wk(t 1) Wk(t) E/Wk


Wk(t+1) = Wk(t) – E/Wk

Bk(t+1) = Bk(t) – E/Bk

Chain rule, E/Wk = (E/nk) (nk/Wk)

/ k ( / k) ( k/ k)E/Bk = (E/nk) (nk/Bk)

Let define the sensitivity of error, Sk = E/nk

18/08/11 RMUTK 10

15/08/54

6

From nk = Wkak‐1 + Bk

nk/Wk = ak‐1 and nk/Bk = 1


Therefore, E/Wk = Sk ak‐1 and E/Bk = Sk

Update rule,

Wk(t+1) = Wk(t) – Sk ak‐1

Bk(t+1) = Bk(t) – Sk , k=1,2,…,K

18/08/11 RMUTK 11

• Chain rule, Sk = E/nk

= (E/nk+1) (nk+1/nk)


= [ (Wk+1ak + Bk+1 )/nk ] (Sk+1) = [ (Wk+1 fk(nk) + Bk+1) / nk ] (Sk+1)= (Wk+1 fk(nk)/nk) (Sk+1)= Wk+1 fk (nk) Sk+1( )

for k = K‐1, K‐2, …, 3, 2, 1

S1 S2 … SK‐2 SK‐1 SK

Backpropagation18/08/11 RMUTK 12

15/08/54

7

At the final layer,

SK = E/nK = (T – aK)2/nK

(T fK( K))2/ K


= (T – fK(nK))2/nK

= –2 E fK (nK)

• Repeat updating and check convergence/divergenceand check convergence/divergence

18/08/11 RMUTK 13

Faster Training• Gradient descent with momentum, 'traingdm', not only local gradient, but also recent trends in errors

W(t+1) = W(t) – E/W + [ W(t) – W(t‐1) ]

“Heuristic techniques”

• Adaptive learning‐rate gradient descent, 'traingda'

• Resilient backpropagation, 'trainrp', to reduce sigmoid function effects that their slope approaches zero as the input gets large p g g

• Conjugate gradient, 'traincgf', performs along conjugate directions of the gradient “Numerical optimization techniques”

18/08/11 RMUTK 14

15/08/54

8

Preprocessing and Postprocessing

• Normalization or to specify input and target i ()ranges, mapminmax()

• Set input and target to zero mean and unity standard deviation, mapstd()

• Principle component analysis, processpca(), using eigenvector technoqueusing eigenvector technoque

• Fix NaN value (not a number e.g. devided by zero), fixunknowns()

18/08/11 RMUTK 15

XOR Problem (1)

Known: p1, p2, t

Unknown: w11,1,w

11,2,

p2

(0 1) (1 1)w1

2,1, w12,2, w

21, w

22,

b11, b12, b

2

p1(0,0)

(1,0)

(0,1) (1,1)

P2a1

Input Layer Hidden Layer Output Layer

18/08/11 RMUTK 16

f1a2n1

2×1

+2×2

2×11

W1

B12×1

1×1f2n2

2×1

+1×2

1×11

W2

B21×1

a

15/08/54

9

XOR Problem (2)

% Define input and target vector

>> P=[0 0 1 1; 0 1 0 1];

>> T=[0 1 1 0];

>> plotpv(P, T)

18/08/11 RMUTK 17

XOR Problem (3)% Create a feedforward network, input range N×2, size of hidden‐output layers and transfer function of hidden‐output layer, backprop training functionof hidden output layer, backprop training function

>> net=newff([0 1; 0 1], [2 1], {'tansig' 'purelin'}, …

'traingdm' );

% Initialize weight (optional)

>> net=init(net);>> net=init(net);

% Learning rate

>> net.trainParam.lr=0.1;

18/08/11 RMUTK 18

15/08/54

10

XOR Problem (4)% Initial weights and biases

>> net.IW{1}, net.LW{2}

2 8409 2 7585ans = 2.8409 ‐2.7585

3.0861 ‐2.4812

ans = ‐0.3293 0.3595

>> net.b{1}, net.b{2}

ans = ‐2.0211

1.6774

ans = ‐0.7269

18/08/11 RMUTK 19

XOR Problem (5)% Number of updating through the entire data set

>> net.trainParam.epochs=1000;

t t i P l 1 1>> net.trainParam.goal = 1e‐1;

% Training

>> net=train(net, P, T);

>> net.IW{1}, net.LW{2}

ans = 2.6459 ‐2.7467

3.0803 ‐2.2518

ans = 0.3904 ‐0.3804

18/08/11 RMUTK 20

15/08/54

11

XOR Problem (6)>> net.b{1}

2 2043ans = ‐2.2043

1.8839

>> net.b{2}

ans = 0.8191

18/08/11 RMUTK 21

XOR Problem (7)>> y=sim(net, P) % Testing

y = 0.0749 0.5627 0.6008 0.0593

• Performance

18/08/11 RMUTK 22

15/08/54

12

XOR Problem (8)• Training State

18/08/11 RMUTK 23

• Regression

XOR Problem (9)

18/08/11 RMUTK 24

15/08/54

13

Training, Validation and Testingfor each epoch

for each training data set

Propagate error through the network

Adjust the weights

Calculate the accuracy over training data

for each validation data set

Calculate the accuracy over the validation data

if the threshold validation accuracy is met

Exit training % Early stopping for over training

else

Continue training

Testing data set 18/08/11 RMUTK 25

Neural Network GUI (1)>>nntool

17/08/11 RMUTK 26

15/08/54

14

Neural Network GUI (2)e.g. Solving XOR problem (2‐layer perceptron)

• Click “New…”

and “Data”and Data

• “Create” input P

17/08/11 RMUTK 27

Neural Network GUI (3)• “Create” target T

17/08/11 RMUTK 28

15/08/54

15

Neural Network GUI (4)• “Create” network

17/08/11 RMUTK 29

Neural Network GUI (5)• Choose and open the network

17/08/11 RMUTK 30

15/08/54

16

Neural Network GUI (6)

• “Initialize Weights”

18/08/11 RMUTK 31

Neural Network GUI (7)• “Training Info” and “Training Parameters”

17/08/11 RMUTK 32

15/08/54

17


• “Train Network”

17/08/11 RMUTK 33


• “Performance”

17/08/11 RMUTK 34

15/08/54

18


• “Training State”

17/08/11 RMUTK 35


• “Regression”

17/08/11 RMUTK 36

15/08/54

19

Neural Network GUI (12)• “Simulate Network”

18/08/11 RMUTK 37


18/08/11 RMUTK 38

• Final weights and biases

15/08/54

20

References

• Martin T. Hagan, Howard B. Demuth and Mark B l N l N t k D i 1996 PWSBeale, Neural Network Design, 1996, PWS Publishing

• Neural Network Design webpage, http://hagan.okstate.edu/nnd.html

• MathWorks Neural Network Toolbox webpageMathWorks Neural Network Toolbox webpage, http://www.mathworks.com/products/neuralnet/demos.html

18/08/11 RMUTK 39

Thank you for attention

Q & A

Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th

Documents

Transcript of Artificial Neural Network III Backpropagation - Staff.kmutt.ac.th