Computing Gradient Vector and Jacobian Matrix in Arbitrarily Connected Neural Networks Author :...
Transcript of Computing Gradient Vector and Jacobian Matrix in Arbitrarily Connected Neural Networks Author :...
1
Computing Gradient Vector and Jacobian Matrix inArbitrarily Connected Neural Networks
Author : Bogdan M. Wilamowski, Fellow, IEEE, Nicholas J. Cotton, Okyay Kaynak, Fellow, IEEE, and Günhan DündarSource : IEEE INDUSTRIAL ELECTRONICS MAGAZINEDate : 2012/3/28Presenter : 林哲緯
2
Outline
• Numerical Analysis Method• Neuron Network Architectures• NBN Algorithm
3
Minimization problem
Newton's method
4
Minimization problem
http://www.nd.com/NSBook/NEURAL%20AND%20ADAPTIVE%20SYSTEMS14_Adaptive_Linear_Systems.html
Steepest descent method
5
Least square problem
Gauss–Newton algorithm
http://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm
6
Levenberg–Marquardt algorithm
• Levenberg–Marquardt algorithm– Combine the advantages of Gauss–Newton
algorithm and Steepest descent method– far off the minimum like Steepest descent method– Close to the minimum like Newton algorithm– It’s find local minimum not global minimum
7
Levenberg–Marquardt algorithm
• Advantage– Linear– First-order differential
• Disadvantage– inverting is not used at all
8
Outline
• Numerical Analysis Method• Neuron Network Architectures• NBN Algorithm
9
Weight updating ruleSecond-order algorithmFirst-order algorithm
α : learning constantg : gradient vector
J : Jacobian matrixμ : learning parameterI : identity matrixe : error vector
MLP ACNFCN
10
Forward & Backward Computation
Forward : 12345, 21345, 12435, or 21435Backward : 54321, 54312, 53421, or 53412
11
Jacobian matrix
Row : pattern(input)*outputColumn : weightp = input numberno = output number
Row = 2*1 = 2Column = 8Jacobin size = 2*8
12
Jacobian matrix
13
Outline
• Numerical Analysis Method• Neuron Network Architectures• NBN Algorithm
14
Direct Computation of Quasi-Hessian Matrix and Gradient Vector
15
Conclusion
• memory requirement for quasi-Hessian matrix and gradient vector computation is decreased by(P × M) times
• can be used arbitrarily connected neural networks
• two procedures– Backpropagation process(single output)– Without backpropagation process(multiple
outputs)