Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm...
Transcript of Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm...
![Page 1: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/1.jpg)
Neumann OptimizerA Practical Optimization Algorithm for
Deep Neural Networks
Shankar Krishnan, Ying Xiao, Rif A. SaurousGoogle AI Perception Research
![Page 2: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/2.jpg)
Training Deep Networks
● Loss function is non-convex
● Finite sample size
● Lower loss doesn’t necessarily imply better model performance
● Computational hurdles
![Page 3: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/3.jpg)
Iterative Methods
Create function approximation around the current iterate
Different methods result depending on information used for the approximation
![Page 4: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/4.jpg)
Iterative Methods
Create function approximation around the current iterate
Different methods result depending on information used for the approximation
1st order:
![Page 5: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/5.jpg)
Iterative Methods
Create function approximation around the current iterate
Different methods result depending on information used for the approximation
1st order:
2nd order:
![Page 6: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/6.jpg)
Iterative Methods
Create function approximation around the current iterate
Different methods result depending on information used for the approximation
1st order:
2nd order:
Cubic reg.:
![Page 7: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/7.jpg)
Neumann Optimizer
● Designed for large batch setting
● Incorporates second order information, without explicitly representing
the Hessian
● Based on cubic regularized approximation
● Total per step cost is same as Adam
![Page 8: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/8.jpg)
Neumann Optimizer - Results
● Linear speedup with increasing batch size
● 0.8 - 0.9% test accuracy improvements over baseline on image
models
● 10-30% faster than current methods for comparable computational
resources
![Page 9: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/9.jpg)
Two Loop Algorithm
● For each mini-batch: solve the quadratic problem of the mini-batch to “good” accuracy.
● Use the solutions of the quadratic problem as updates.
● Hypothesis: with very large mini-batches, the cost-benefit is in favour of extracting more information.
![Page 10: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/10.jpg)
Inner loop: solving a quadratic via power series
How to solve the quadratic?
Use the geometric (Neumann) series for inverse, if
This implies the iteration:
![Page 11: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/11.jpg)
Eliminate the Hessian calculation (too expensive)
Back to the mini-batch problem:
This implies an iteration:
![Page 12: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/12.jpg)
Convex Quadratics
![Page 13: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/13.jpg)
Some tweaks for actual neural networksCubic regularizer (convexification of overall problem) + Repulsive regularizer
Convexification of each mini-batch.
![Page 14: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/14.jpg)
Spectrum of the Hessian
● Studied on various image models● Lanczos method during
optimization● Similar behavior of extreme
eigenvalues● Consistent with other studies
[Sagun ‘16, Chaudhari ‘16, Dauphin ‘14]
![Page 15: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/15.jpg)
Spectrum of the Hessian
● Studied on various image models● Lanczos method during
optimization● Similar behavior of extreme
eigenvalues● Consistent with other studies
[Sagun ‘16, Chaudhari ‘16, Dauphin ‘14]
![Page 16: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/16.jpg)
Our Algorithm
![Page 17: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/17.jpg)
Experimental Results
● Used image models for experimentation
○ Cifar10, Cifar100, Imagenet
● Various network architectures
○ Resnet-V1, Inception-V3, Inception-Resnet-V2
● Training on GPUs (P100, K40)
○ Multiple GPUs in sync mode for large batch training
● Update steps/epochs to measure performance
![Page 18: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/18.jpg)
Experiments on ImageNet (Inception V3)
![Page 19: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/19.jpg)
Experiments on ImageNet (Resnet Architectures)
![Page 20: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/20.jpg)
Final Top-1 Validation Error
![Page 21: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/21.jpg)
Large batch training
![Page 22: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/22.jpg)
Scaling Performance on Resnet-50
![Page 23: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/23.jpg)
Mini-batch Training
Stochastic methods use unbiased estimates to carry out optimization
For first order methods with batch size B,
![Page 24: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/24.jpg)
Mini-batch Training
Stochastic methods use unbiased estimates to carry out optimization
For first order methods with batch size B,
![Page 25: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/25.jpg)
Mini-batch Training
Stochastic methods use unbiased estimates to carry out optimization
For first order methods with batch size B,
After T steps:
![Page 26: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/26.jpg)
Mini-batch Training
Stochastic methods use unbiased estimates to carry out optimization
For first order methods with batch size B,
After T steps:
![Page 27: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/27.jpg)
Conclusion and Future Work
● Stochastic optimization algorithm for training deep nets
● Exhibits linear speedup with batch size
● Gives better model performance
● Total per step cost on par with existing popular optimizers
![Page 28: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/28.jpg)
Conclusion and Future Work
● Stochastic optimization algorithm for training deep nets
● Exhibits linear speedup with batch size
● Gives better model performance
● Total per step cost on par with existing popular optimizers
● Other model architectures and models
○ Preconditioned Neumann
● Experiment on TPUs
![Page 29: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/29.jpg)
Thank you!
Paper: https://goo.gl/7M9Avr
Code: Tensorflow implementation coming soon
![Page 30: Deep Neural Networks A Practical Optimization Algorithm for · A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan, Ying Xiao, Rif A. Saurous Google AI Perception](https://reader036.fdocuments.in/reader036/viewer/2022081601/61228db5cd9bba2b4850b8e5/html5/thumbnails/30.jpg)
Ablation Experiment