Google Brain team Thanks: AutoML: Automated Machine...
Transcript of Google Brain team Thanks: AutoML: Automated Machine...
![Page 1: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/1.jpg)
AutoML: Automated Machine LearningBarret Zoph, Quoc Le
Thanks:Google Brain team
![Page 2: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/2.jpg)
CIFAR-10
AutoML
Acc
ura
cy
ML Experts
![Page 3: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/3.jpg)
ImageNet
AutoML
Top
-1 A
ccu
racy
ML Experts
![Page 4: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/4.jpg)
Current:
![Page 5: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/5.jpg)
Current:
But can we turn this into:
![Page 6: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/6.jpg)
Importance of architectures for Vision
● Designing neural network architectures is hard● Lots of human efforts go into tuning them● There is not a lot of intuition into how to design them well● Can we try and learn good architectures automatically?
Two layers from the famous Inception V4 computer vision model.Canziani et al, 2017 Szegedy et al, 2017
![Page 7: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/7.jpg)
Convolutional Architectures
Krizhevsky et al, 2012
![Page 8: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/8.jpg)
Neural Architecture Search
● Key idea is that we can specify the structure and connectivity of a neural network by using a configuration string
○ [“Filter Width: 5”, “Filter Height: 3”, “Num Filters: 24”]
● Our idea is to use a RNN (“Controller”) to generate this string that specifies a neural network architecture
● Train this architecture (“Child Network”) to see how well it performs on a validation set
● Use reinforcement learning to update the parameters of the Controller model based on the accuracy of the child model
![Page 9: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/9.jpg)
Controller: proposes ML models Train & evaluate models
20K
Iterate to find the
most accurate
model
![Page 10: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/10.jpg)
Neural Architecture Search for Convolutional Networks
Controller RNNSoftmax classifier
Embedding
![Page 11: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/11.jpg)
Training with REINFORCE
![Page 12: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/12.jpg)
Training with REINFORCEAccuracy of architecture on held-out dataset
Architecture predicted by the controller RNN viewed as a sequence of actions
Parameters of Controller RNN
![Page 13: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/13.jpg)
Training with REINFORCEAccuracy of architecture on held-out dataset
Architecture predicted by the controller RNN viewed as a sequence of actions
Parameters of Controller RNN
![Page 14: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/14.jpg)
Training with REINFORCEAccuracy of architecture on held-out dataset
Architecture predicted by the controller RNN viewed as a sequence of actions
Parameters of Controller RNN
Number of models in minibatch
![Page 15: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/15.jpg)
Distributed Training
![Page 16: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/16.jpg)
Overview of Experiments
● Apply this approach to Penn Treebank and CIFAR-10● Evolve a convolutional neural network on CIFAR-10 and a recurrent neural
network cell on Penn Treebank● Achieve SOTA on the Penn Treebank dataset and almost SOTA on CIFAR-10
with a smaller and faster network● Cell found on Penn Treebank beats LSTM baselines on other language modeling
datasets and on machine translation
![Page 17: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/17.jpg)
Neural Architecture Search for CIFAR-10
● We apply Neural Architecture Search to predicting convolutional networks on CIFAR-10
● Predict the following for a fixed number of layers (15, 20, 13): ○ Filter width/height○ Stride width/height○ Number of filters
![Page 18: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/18.jpg)
Neural Architecture Search for CIFAR-10
[1,3,5,7] [1,3,5,7] [1,2,3] [1,2,3] [24,36,48,64]
![Page 19: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/19.jpg)
CIFAR-10 Prediction Method
● Expand search space to include branching and residual connections● Propose the prediction of skip connections to expand the search space● At layer N, we sample from N-1 sigmoids to determine what layers should be fed
into layer N● If no layers are sampled, then we feed in the minibatch of images● At final layer take all layer outputs that have not been connected and
concatenate them
![Page 20: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/20.jpg)
Neural Architecture Search for CIFAR-10Weight Matrices
![Page 21: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/21.jpg)
CIFAR-10 Experiment Details
● Use 100 Controller Replicas each training 8 child networks concurrently● Method uses 800 GPUs concurrently at one time● Reward given to the Controller is the maximum validation accuracy of the last 5
epochs squared● Split the 50,000 Training examples to use 45,000 for training and 5,000 for
validation● Each child model was trained for 50 epochs● Run for a total of 12,800 child models● Used curriculum training for the Controller by gradually increasing the number of
layers sampled
![Page 22: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/22.jpg)
Neural Architecture Search for CIFAR-10
5% faster
Best result of evolution (Real et al, 2017): 5.4%Best result of Q-learning (Baker et al, 2017): 6.92%
![Page 23: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/23.jpg)
Neural Architecture Search for ImageNet
● Neural Architecture Search directly on ImageNet is expensive
● Key idea is to run Neural Architecture Search on CIFAR-10 to find a “cell”
● Construct a bigger net from the “cell” and train the net on ImageNet
![Page 24: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/24.jpg)
Neural Architecture Search for ImageNet
![Page 25: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/25.jpg)
Neural Architecture Search for ImageNet
![Page 26: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/26.jpg)
How the cell was found
![Page 27: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/27.jpg)
How the cell was found
![Page 28: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/28.jpg)
How the cell was found
1. Elementwise addition2. Concatenation along the filter dimension
![Page 29: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/29.jpg)
The cell again
![Page 30: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/30.jpg)
Performance of cell on ImageNet
![Page 31: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/31.jpg)
Platform aware Architecture Search
![Page 32: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/32.jpg)
Platform aware Architecture Search
![Page 33: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/33.jpg)
Better ImageNet models transfer better
POC: skornblith@, shlens@, qvl@
![Page 34: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/34.jpg)
Controller: proposes Child Networks Train & evaluate Child Networks
20K
Iterate to find the
most accurate
Child Network
Reinforcement Learning or Evolution Search
Architecture / Optimization Algorithm / Nonlinearity
![Page 35: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/35.jpg)
Learn the Optimization Update Rule
Neural Optimizer Search using Reinforcement Learning,Irwan Bello, Barret Zoph, Vijay Vasudevan, and Quoc Le. ICML 2017
![Page 36: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/36.jpg)
Confidential + Proprietary
![Page 37: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/37.jpg)
Confidential + Proprietary
![Page 38: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/38.jpg)
Confidential + Proprietary
Strange hump
Basically linear
![Page 39: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/39.jpg)
Confidential + Proprietary
Mobile NASNet-A on ImageNet
![Page 40: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/40.jpg)
Data processing Machine Learning ModelData
Focus of machine learning research
![Page 41: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/41.jpg)
Data processing Machine Learning ModelData
Focus of machine learning research
Very important but manually tuned
![Page 42: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/42.jpg)
Data Augmentation
![Page 43: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/43.jpg)
Controller: proposes Child Networks Train & evaluate Child Networks
20K
Iterate to find the
most accurate
Child Network
Reinforcement Learning or Evolution Search
Architecture / Optimization Algorithm / Nonlinearity / Augmentation Strategy
![Page 44: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/44.jpg)
AutoAugment: Example Policy
Probability of applyingMagnitude
![Page 45: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/45.jpg)
CIFAR-10
State-of-art: 2.1% errorAutoAugment: 1.5% error
ImageNet
State-of-art: 3.9% errorAutoAugment: 3.5% error
![Page 46: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/46.jpg)
Controller: proposes Child Networks Train & evaluate Child Networks
20K
Iterate to find the
most accurate
Child Network
Reinforcement Learning or Evolution Search
Architecture / Optimization Algorithm / Nonlinearity / Augmentation Strategy
Summary of AutoML and its progress
![Page 47: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/47.jpg)
References
● Neural Architecture Search with Reinforcement Learning. Barret Zoph and Quoc V. Le. ICLR, 2017
● Learning Transferable Architectures for Large Scale Image Recognition. Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le. CVPR, 2018
● AutoAugment: Learning Augmentation Policies from Data. Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, Quoc V. Le. Arxiv, 2018
● Searching for Activation Functions. Prajit Ramachandran, Barret Zoph, Quoc Le. ICLR Workshop, 2018
![Page 48: Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,](https://reader036.fdocuments.in/reader036/viewer/2022063015/5fd3596c0bd6617d7120f092/html5/thumbnails/48.jpg)
RL vs random search