PVTOL: Designing Portability, Productivity and Performance for Multicore Architectures
Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand...
Transcript of Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand...
![Page 1: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/1.jpg)
![Page 2: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/2.jpg)
Designing architectures by hand is hard
Change architecture
Run experiments on architecture
Analyze results (and bugs, training
details, …) McCulloch-Pitts Neuron: 1943
LSTM: 1997
![Page 3: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/3.jpg)
Search architectures automatically
• speed up architecture search enormously
• remove the human prior• perhaps reveal what makes a
good architecture
Change architecture
Run experiments on architecture
Analyze results (and bugs, training
details, …)
Controller
PerformanceReward
Boot up GPUs
Baker et al. 2016, Zoph and Le 2017
![Page 4: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/4.jpg)
Recurrent Neural Networks (RNN)
RNN
𝑥𝑥𝑡𝑡
ℎ𝑡𝑡
![Page 5: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/5.jpg)
Recurrent Neural Networks (RNN)
Commonly used: Long Short-Term Memory (LSTM)
𝑐𝑐𝑡𝑡
𝑥𝑥𝑡𝑡
ℎ𝑡𝑡
𝑥𝑥𝑡𝑡−1
ℎ𝑡𝑡−1
![Page 6: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/6.jpg)
Outline
1. Flexible language (DSL) to define architectures
2. Components: Ranking Function & Reinforcement Learning Generator
3. Experiments: Language Modeling & Machine Translation
![Page 7: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/7.jpg)
Domain Specific Language (DSL)or how to define an architecture
Zoph and Le 2017
![Page 8: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/8.jpg)
Domain Specific Language (DSL)or how to define an architecture
𝑇𝑇𝑇𝑇𝑇𝑇ℎ(𝐴𝐴𝐴𝐴𝐴𝐴(𝑀𝑀𝑀𝑀 𝑥𝑥𝑡𝑡 ,𝑀𝑀𝑀𝑀 ℎ𝑡𝑡−1 )
![Page 9: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/9.jpg)
Core• Variables 𝑥𝑥𝑡𝑡 , 𝑥𝑥𝑡𝑡−1,ℎ𝑡𝑡−1• MM• Sigmoid, Tanh, ReLU• Add, Mult• Gate3 𝑥𝑥,𝑦𝑦, 𝑓𝑓
= 𝜎𝜎(𝑓𝑓) � 𝑥𝑥 + (1 − 𝜎𝜎 𝑓𝑓) � 𝑦𝑦• Memory cell 𝑐𝑐𝑡𝑡
Expanded• Sub, Div• Sin, Cos, PosEnc• LayerNorm• SeLU
Domain Specific Language (DSL)or how to define an architecture
![Page 10: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/10.jpg)
Instantiable Framework
![Page 11: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/11.jpg)
Architecture Generator
given the current architecture,output the next operator
1. Random
2. REINFORCE
![Page 12: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/12.jpg)
Reinforcement Learning Generator
ReLU
Performance: 42
Agent Environment
action
observation, reward
![Page 13: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/13.jpg)
Ranking Function
Goal: predict performance of an architecture
Train with architecture-performance pairs
![Page 14: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/14.jpg)
Language Modeling
𝑃𝑃 𝑤𝑤𝑖𝑖 𝑤𝑤1,𝑤𝑤2, … ,𝑤𝑤𝑖𝑖−1)“Why did the chicken cross the ___”Performance measurement: perplexity
![Page 15: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/15.jpg)
Language Modeling (LM) with Random Search + Ranking Function
![Page 16: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/16.jpg)
LM with Ranking Function:selected architectures improve
![Page 17: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/17.jpg)
The BC3 cell
Weight matrices 𝑊𝑊,𝑈𝑈,𝑉𝑉,𝑋𝑋 ∈ ℝ𝐻𝐻×𝐻𝐻
![Page 18: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/18.jpg)
LM with Ranking Function:Improvement over many human architectures
![Page 19: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/19.jpg)
Machine Translation
Test evaluation: BLEU score
Decoder
Softmax
Encoder
Embed
He loved to eat .
+
Er liebte
ErNULL
![Page 20: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/20.jpg)
Machine Translationwith Reinforcement Learning Generator
![Page 21: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/21.jpg)
Machine Translation (MT)with Reinforcement Learning Generator (RL)
• Generator = 3-layer NN (linear-LSTM-linear) outputting action scores
• Choose action with multinomial and epsilon-greedy strategy (𝜖𝜖 = 0.05)
• Train generator on soft priors first (use activations, …)
• Small dataset to evaluate an architecture in ~2 hours
![Page 22: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/22.jpg)
MT with RL:re-scale loss to reward great architectures more
∞ Loss 0
0Re
war
d
![Page 23: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/23.jpg)
MT with RL:switch between exploration and exploitation
Epochs
log(
perf
orm
ance
)
![Page 24: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/24.jpg)
MT with RL:good architectures found
![Page 25: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/25.jpg)
MT with RL:many good architectures found
Perplexity
Num
ber o
f arc
hite
ctur
es
![Page 26: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/26.jpg)
MT with RL:rediscovery of human architectures
• 𝐴𝐴𝐴𝐴𝐴𝐴(𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑓𝑓𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑥𝑥𝑡𝑡 , 𝑥𝑥𝑡𝑡)
variant of residual networks (He et al., 2016)
• 𝐺𝐺𝑇𝑇𝑇𝑇𝐺𝐺𝐺 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑓𝑓𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑥𝑥𝑡𝑡 , 𝑥𝑥𝑡𝑡 , 𝑆𝑆𝑇𝑇𝑆𝑆𝑇𝑇𝑇𝑇𝑇𝑇𝐴𝐴 …
highway networks (Srivastava et al., 2015)
• Motifs found in multiple cells
![Page 27: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/27.jpg)
MT with RL:novel operators only used after “it clicked”
Epochs
![Page 28: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/28.jpg)
MT with RL:novel operators contribute to successful architectures
![Page 29: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/29.jpg)
Related work
• Hyper-parameter search: Bergstra et al. 2011, Snoek et al. 2012
• Neuroevolution: Stanley et al. 2009, Bayer et al. 2009, Fernando et al. 2016,
Liu et al. 2017 (← also random search)
• RL search: Baker et al. 2016, Zoph and Le 2017
• Subgraph selection: Pham, Guan et al. 2018
• Weight prediction: Ha et al. 2016, Brock et al. 2018
• Optimizer search: Bello et al. 2017
![Page 30: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/30.jpg)
Discussion
• Remove need for expert knowledge to a degree• Cost of running these experiments
• us: 5 days on 28 GPUs (best architecture after 40 hours)• Zoph and Le 2017: 4 days using 450 GPUs
• Hard to analyze the diversity of architectures (much more quantitative than qualitative)
• Definition of search space difficult• We’re using a highly complex system
to find other highly complex systemsin a highly complex space
![Page 31: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/31.jpg)
Contributions
1. Flexible language (DSL) to define
architectures
2. Ranking Function
(Language Modeling)
Reinforcement Learning Generator
(Machine Translation)
3. Explore uncommon operators
• Search architectures that correspond to
biology
• Allow for more flexible search space
• Find architectures that do well on
multiple tasks
Future Work
![Page 32: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/32.jpg)
Backup
![Page 33: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/33.jpg)
Compilation: DSL Model
• DSL is basically executable• Traverse tree from source nodes towards final node ℎ𝑡𝑡• Produce code: initialization and forward call• Collect all matrix multiplications on single source node and batch
them
![Page 35: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/35.jpg)
Restrictions on generated architectures
• Gate3(…, …, Sigmoid(…))• Have to use 𝑥𝑥𝑡𝑡 ,ℎ𝑡𝑡−1• Maximum 21 nodes, depth 8• Prevent stacking two identical operations
• MM(MM(x)) is mathematically identical to MM(x)• Sigmoid(Sigmoid(x)) is unlikely to be useful• ReLU(ReLU(x)) is redundant
![Page 36: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/36.jpg)
How to define proper search space?
• Too small will find nothing radically novel• Too big need Google computing ressources
• Baseline experiment parameters restrict successful architectures
![Page 37: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/37.jpg)
MT with RL:Learned encoding very different
![Page 38: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and](https://reader036.fdocuments.in/reader036/viewer/2022070804/5f03432a7e708231d40857ab/html5/thumbnails/38.jpg)
MT with RL:Parent-Child operator preference