Post on 27-Feb-2020
JANUS: Fast and Flexible Deep Learningvia Symbolic Graph Execution of Imperative Programs
Eunji Jeong, Sungwoo Cho, Gyeong-In Yu,Joo Seong Jeong, Dong-Jin Shin, Byung-Gon Chun
1
Deep Learning (DL) Frameworks
ExecuteDefine
Images From:http://www.mdpi.com/https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/Going Deeper with Convolutions, 2014, https://towardsdatascience.com/learn-how-recurrent-neural-networks-work-84e975feaaf7Short-Term Load Forecasting Using EMD-LSTM Neural Networks with a Xgboost Algorithm for Feature Importance Evaluation, Energies 2017https://skymind.ai/wiki/generative-adversarial-network-ganhttps://en.wikipedia.org/wiki/Reinforcement_learninghttps://medium.com/@Petuum/intro-to-dynamic-neural-networks-and-dynet-67694b18cb23
2
Symbolic DL Frameworks
✓ Build a Symbolic Graph✓ Execute the Graph
def build_graph(g): x = g.input(float) linear = g.add(g.mul(W, x), b)
build_graph(graph)run_graph(graph, x_data)
Imperative DL Frameworks
✓ Directly Execute the Computations
Two Paradigms
x
Mul
Add
W
b
3
def linear(x): return W * x + blinear(x_data)
imperative
Symbolic DL Frameworks
✓ Build a Symbolic Graph✓ Execute the Graph
def build_graph(g): x = g.input(float) linear = g.add(g.mul(W, x), b)
build_graph(graph)run_graph(graph, x_data)
Imperative DL Frameworks
✓ Directly Execute the Computations
Two Paradigms
x
Mul
Add
W
b
4
def linear(x): return W * x + blinear(x_data)
imperative
Symbolic DL Frameworks
✓ Build a Symbolic Graph✓ Execute the Graph
def build_graph(g): x = g.input(float) linear = g.add(g.mul(W, x), b)
build_graph(graph)run_graph(graph, x_data)
Imperative DL Frameworks
✓ Directly Execute the Computations
Two Paradigms
x
Mul
Add
W
b
5
def linear(x): return W * x + blinear(x_data)
def linear(x): return W * x + blinear(x_data)
imperative
Symbolic DL Frameworks
✓ Build a Symbolic Graph✓ Execute the Graph
def build_graph(g): x = g.input(float) linear = g.add(g.mul(W, x), b)
build_graph(graph)run_graph(graph, x_data)
Imperative DL Frameworks
✓ Directly Execute the Computations
Two Paradigms
x
Mul
Add
W
b
6
def linear(x): return W * x + blinear(x_data)
imperative
Imperative DL Frameworks
✓ Directly Execute the Computations
Symbolic DL Frameworks
✓ Build a Symbolic Graph✓ Execute the Graph
def build_graph(g): x = g.input(float) linear = g.add(g.mul(W, x), b)
build_graph(graph)run_graph(graph, x_data)
Two Paradigms
x
Mul
Add
W
b
7
def linear(x): return W * x + blinear(x_data)
imperative
Imperative DL Frameworks
✓ Directly Execute the Computations
Symbolic DL Frameworks
✓ Build a Symbolic Graph✓ Execute the Graph
def build_graph(g): x = g.input(float) linear = g.add(g.mul(W, x), b)
build_graph(graph)run_graph(graph, x_data)
Two Paradigms
x
Mul
Add
W
b
8
def linear(x): return W * x + blinear(x_data)
imperative
Symbolic DL Frameworks
+ Easy to Optimize+ Compiler Optimization+ Parallel Execution of Operations+ Deploy on GPU, Cluster, Mobile, ...
- Decoupled View:Hard to Program & Debug
Imperative DL Frameworks
+ Direct Execution:Easy to Program & Debug
- Hard to Optimize
Pros & Cons
Pros
Cons
9
Performance Programmability
Symbolic DL Frameworks
+ Easy to Optimize+ Compiler Optimization+ Parallel Execution of Operations+ Deploy on GPU, Cluster, Mobile,...
- Decoupled View:Hard to Program & Debug
Imperative DL Frameworks
+ Direct Execution:Easy to Program & Debug
- Hard to Optimize
Pros & Cons
Pros
Cons
10
Performance Programmability
Symbolic DL Frameworks
+ Easy to Optimize+ Compiler Optimization+ Parallel Execution of Operations+ Deploy on GPU, Cluster, Mobile,...
- Decoupled View:Hard to Program & Debug
Imperative DL Frameworks
+ Direct Execution:Easy to Program & Debug
- Hard to Optimize
Pros & Cons
Pros
Cons
11
Performance Programmability
Symbolic DL Frameworks
+ Easy to Optimize+ Compiler Optimization+ Parallel Execution of Operations+ Deploy on GPU, Cluster, Mobile,...
- Decoupled View:Hard to Program & Debug
Imperative DL Frameworks
+ Direct Execution:Easy to Program & Debug
- Hard to Optimize
Pros & Cons
Pros
Cons
12
Performance Programmability
Symbolic DL Frameworks
+ Easy to Optimize+ Compiler Optimization+ Parallel Execution of Operations+ Deploy on GPU, Cluster, Mobile, ...
- Decoupled View:Hard to Program & Debug
Imperative DL Frameworks
+ Direct Execution:Easy to Program & Debug
- Hard to Optimize
Pros & Cons
Pros
Cons
13
Performance Programmability
Symbolic DL Frameworks
+ Easy to Optimize+ Compiler Optimization+ Parallel Execution of Operations+ Deploy on GPU, Cluster, Mobile, ...
- Decoupled View:Hard to Program & Debug
Imperative DL Frameworks
+ Direct Execution:Easy to Program & Debug
- Hard to Optimize
Pros & Cons
Pros
Cons
14
Performance Programmability
Symbolic DL Frameworks
+ Easy to Optimize+ Compiler Optimization+ Parallel Execution of Operations+ Deploy on GPU, Cluster, Mobile,...
- Decoupled View:Hard to Program & Debug
Imperative DL Frameworks
+ Direct Execution:Easy to Program & Debug
- Hard to Optimize
What People Want Is...
Pros
Cons
15
ProgrammabilityPerformance
Imperative DL Program
def foo(x): prod = mul(3, x) return add(prod, 2)
JANUS: Combining the Best of Both Worlds
16
Symbolic DL Graph
x
Mul
Add
3
2
“Easy Programmability” “High Performance”
TransparentConversion
JANUS: Combining the Best of Both Worlds
17
● 11 models in 5 major neural network categories:○ Convolutional Neural Networks (CNN) LeNet, ResNet-50, Inception-v3○ Recurrent Neural Networks (RNN) LSTM, LM○ Recursive Neural Networks (TreeNN) TreeRNN, TreeLSTM○ Generative Adversarial Networks (GAN) GAN, PIX2PIX○ Deep Reinforcement Learning (DRL) A3C, PPO
● Up to 47.6x speedup compared to imperative DL framework,comparable performance (within 4%) to symbolic DL frameworkwith unmodified imperative DL programs
Outline
● Approach
● Challenges
● JANUS
● Evaluation
18
Challenges in Graph Conversion
19
Imperative DL Program
def foo(x): tmp = mul(3, x) return add(tmp, 2)
TransparentConversion
Symbolic DL Graph
x
Mul
Add
3
2
Challenges in Graph Conversion
20
Imperative Python DL Program
def foo(x): tmp = mul(3, x) return add(tmp, 2)
TransparentConversion
Symbolic DL Graph
x
Mul
Add
3
2
De-facto Standard Languagefor DL Programming
Challenges in Graph Conversion
21
Imperative Python DL Program
def foo(x): tmp = mul(3, x) return add(tmp, 2)
TransparentConversion
Symbolic DL Graph
x
Mul
Add
3
2?
Discrepancy between Python Programs and DL Graphs
22
TransparentConversion?
“Dynamic” “Static”
Imperative Python DL Program
def foo(x): tmp = mul(3, x) return add(tmp, 2)
Symbolic DL Graph
x
Mul
Add
3
2
TransparentConversion
Discrepancy between Python Programs and DL Graphs
23
?Characteristics● determined at runtime● change at runtime
“Dynamic” “Static”
Imperative Python DL Program
def foo(x): tmp = mul(3, x) return add(tmp, 2)
Symbolic DL Graph
x
Mul
Add
3
2
TransparentConversion
Discrepancy between Python Programs and DL Graphs
24
Symbolic DL Graph
INT, 10x1x
INT, 10x1Mul
INT, 10x1Add
INT3
INT2
Characteristics● must be given
when building a graph
?Characteristics● determined at runtime● change at runtime
“Dynamic” “Static”
Imperative Python DL Program
def foo(x): tmp = mul(3, x) return add(tmp, 2)
Imperative Python DL Program
def foo(x): tmp = mul(3, x) return add(tmp, 2)
TransparentConversion
Discrepancy between Python Programs and DL Graphs
25
Symbolic DL Graph
INT, 10x1x
INT, 10x1Mul
INT, 10x1ADD
INT3
INT2
Characteristics● must be given
when building a graph
?Characteristics● determined at runtime● change at runtime
“Dynamic” “Static”
DST:NEED INFO
SRC: NO INFO
class RNNModel(object):def __call__(self, sequence):
state = self.stateoutputs = []for item in sequence:
state = rnn_cell(state, item)outputs += [state]
self.state = statereturn compute_loss(outputs)
for sequence in sequences:optimize(lambda: model(sequence))
Example: Recurrent Neural Network (RNN)
26
Correctness & Performance
of Graph Execution
Dynamic Features of Python
√ Dynamic Control Flow
√ Dynamic Types
√ Impure Functions
?
class RNNModel(object):def __call__(self, sequence):
state = self.stateoutputs = []for item in sequence:
state = rnn_cell(state, item)outputs += [state]
self.state = statereturn compute_loss(outputs)
for sequence in sequences:optimize(lambda: model(sequence))
RNN Example
27
sawThey dogsseq[0]:
sheWas sick?seq[1]:
Dynamic Control Flow Dynamic Types Impure Function
class RNNModel(object):def __call__(self, sequence):
state = self.stateoutputs = []for item in sequence:
state = rnn_cell(state, item)outputs += [state]
self.state = statereturn compute_loss(outputs)
for sequence in sequences:optimize(lambda: model(sequence))
RNN Example
28
RNNCell
sawThey dogs
state
out[0]
seq[0]:
sheWas sick?seq[1]:
Dynamic Control Flow Dynamic Types Impure Function
class RNNModel(object):def __call__(self, sequence):
state = self.stateoutputs = []for item in sequence:
state = rnn_cell(state, item)outputs += [state]
self.state = statereturn compute_loss(outputs)
for sequence in sequences:optimize(lambda: model(sequence))
RNN Example
29
RNNCell
RNNCell
sawThey dogs
state
out[1]
out[0]
seq[0]:
sheWas sick?seq[1]:
Dynamic Control Flow Dynamic Types Impure Function
class RNNModel(object):def __call__(self, sequence):
state = self.stateoutputs = []for item in sequence:
state = rnn_cell(state, item)outputs += [state]
self.state = statereturn compute_loss(outputs)
for sequence in sequences:optimize(lambda: model(sequence))
RNN Example
30
RNNCell
RNNCell
RNNCell
sawThey dogs
state
out[1]
out[0]
out[2]
seq[0]:
sheWas sick?seq[1]:
Dynamic Control Flow Dynamic Types Impure Function
class RNNModel(object):def __call__(self, sequence):
state = self.stateoutputs = []for item in sequence:
state = rnn_cell(state, item)outputs += [state]
self.state = statereturn compute_loss(outputs)
for sequence in sequences:optimize(lambda: model(sequence))
RNN Example
31
Dynamic Control Flow Dynamic Types Impure Function
RNN Example
32
class RNNModel(object):def __call__(self, sequence):
state = self.stateoutputs = []for item in sequence:
state = rnn_cell(state, item)outputs += [state]
self.state = statereturn compute_loss(outputs)
for sequence in sequences:optimize(lambda: model(sequence))
state
CellSwitch
Merge
i<N
Next
Dynamic Control Flow Dynamic Types Impure Function
RNN Example
● Correct
33
class RNNModel(object):def __call__(self, sequence):
state = self.stateoutputs = []for item in sequence:
state = rnn_cell(state, item)outputs += [state]
self.state = statereturn compute_loss(outputs)
for sequence in sequences:optimize(lambda: model(sequence))
state
CellSwitch
Merge
i<N
Next
Dynamic Control Flow Dynamic Types Impure Function
RNN Example
● Correct● Slow
34
class RNNModel(object):def __call__(self, sequence):
state = self.stateoutputs = []for item in sequence:
state = rnn_cell(state, item)outputs += [state]
self.state = statereturn compute_loss(outputs)
for sequence in sequences:optimize(lambda: model(sequence))
state
CellSwitch
Merge
i<N
Next
Dynamic Control Flow Dynamic Types Impure Function
RNN Example
● Correct● Slow
● Fast
35
class RNNModel(object):def __call__(self, sequence):
state = self.stateoutputs = []for item in sequence:
state = rnn_cell(state, item)outputs += [state]
self.state = statereturn compute_loss(outputs)
for sequence in sequences:optimize(lambda: model(sequence))
state
CellSwitch
Merge
i<N
Next
Cell
state
Cell
Cell
Dynamic Control Flow Dynamic Types Impure Function
RNN Example
● Correct● Slow
● Fast● Incorrect● Need Info
36
class RNNModel(object):def __call__(self, sequence):
state = self.stateoutputs = []for item in sequence:
state = rnn_cell(state, item)outputs += [state]
self.state = statereturn compute_loss(outputs)
for sequence in sequences:optimize(lambda: model(sequence))
state
CellSwitch
Merge
i<N
Next
Cell
state
Cell
Cell
Dynamic Control Flow Dynamic Types Impure Function
class RNNModel(object):def __call__(self, sequence):
state = self.stateoutputs = []for item in sequence:
state = rnn_cell(state, item)outputs += [state]
self.state = statereturn compute_loss(outputs)
for sequence in sequences:optimize(lambda: model(sequence))
RNN Example
37
RNNCell
RNNCell
RNNCell
sawThey dogs
state state
out[1]
out[0]
out[2]
seq[0]:
sheWas sick?seq[1]:
Dynamic Control Flow Dynamic Types Impure Function
class RNNModel(object):def __call__(self, sequence):
state = self.stateoutputs = []for item in sequence:
state = rnn_cell(state, item)outputs += [state]
self.state = statereturn compute_loss(outputs)
for sequence in sequences:optimize(lambda: model(sequence))
RNN Example
38
PlaceHoldertype: intshape: ?
PlaceHoldertype: float
shape: ?...
● Correct● Inefficient
Dynamic Control Flow Dynamic Types Impure Function
class RNNModel(object):def __call__(self, sequence):
state = self.stateoutputs = []for item in sequence:
state = rnn_cell(state, item)outputs += [state]
self.state = statereturn compute_loss(outputs)
for sequence in sequences:optimize(lambda: model(sequence))
RNN Example
39
PlaceHoldertype: int
shape: (3x128)
● Correct● Inefficient
● Fast● Incorrect● Need Info
...
Dynamic Control Flow Dynamic Types Impure Function
PlaceHoldertype: intshape: ?
PlaceHoldertype: float
shape: ?
class RNNModel(object):def __call__(self, sequence):
state = self.stateoutputs = []for item in sequence:
state = rnn_cell(state, item)outputs += [state]
self.state = statereturn compute_loss(outputs)
for sequence in sequences:optimize(lambda: model(sequence))
RNN Example
40
RNNCell
RNNCell
RNNCell
sawThey dogs
state state
out[1]
out[0]
out[2]
seq[0]:
sheWas sick?seq[1]:
Dynamic Control Flow Dynamic Types Impure Function
class RNNModel(object):def __call__(self, sequence):
state = self.stateoutputs = []for item in sequence:
state = rnn_cell(state, item)outputs += [state]
self.state = statereturn compute_loss(outputs)
for sequence in sequences:optimize(lambda: model(sequence))
RNN Example
41
RNNCell
RNNCell
RNNCell
sawThey dogs
state state
out[1]
out[0]
out[2]
seq[0]:
sheWas sick?
state
seq[1]:
RNNCell
RNNCell
RNNCell
out[1]
out[0]
out[2]
state
Dynamic Control Flow Dynamic Types Impure Function
class RNNModel(object):def __call__(self, sequence):
state = self.stateoutputs = []for item in sequence:
state = rnn_cell(state, item)outputs += [state]
self.state = statereturn compute_loss(outputs)
for sequence in sequences:optimize(lambda: model(sequence))
RNN Example
42
?
Dynamic Control Flow Dynamic Types Impure Function
Challenge:
achieving Correctness & Performance at the same time
Challenge Summary
43
Imperative Python DL Programwith Dynamic Features
Correct & FastSymbolic DL Graph?
Challenge:
achieving Correctness & Performance at the same time
Challenge Summary
44
Imperative Python DL Programwith Dynamic Features
CorrectGraph
FastGraph
Slow
Incorrect
Challenge:
achieving Correctness & Performance at the same time
Challenge Summary
45
Imperative Python DL Programwith Dynamic Features
CorrectGraph
FastGraph
Slow
Incorrect
Outline
● Approach
● Challenges
● JANUS
● Evaluation
46
Solution: Speculative Graph Generation and Execution
● Goal: Correctness & Performance
● [Performance] Speculatively Specialize the Graph○ Make reasonable assumptions based on the execution history (Profiling)○ Run specialized graph (Common Case)
● [Correctness] Validate Assumptions○ Fallback if an assumption is broken (Rare Case)
47
Imperative DL Program
for item in sequence: state = rnn(state, item) outputs += [state]
Imperative Executor
Pre-defined DL Operations .Python Interpreter .
Overall Workflow on JANUS
48
Fast Path(Common Case)
Correct Path(Rare Case)
Imperative DL Program
for item in sequence: state = rnn(state, item) outputs += [state]
Imperative Executor
Pre-defined DL Operations .Python Interpreter .
Profiler
len:3
49
Overall Workflow on JANUS
Modified Python Interpreterfor Transparent Profiling
Fast Path(Common Case)
Correct Path(Rare Case)
Imperative DL Program
for item in sequence: state = rnn(state, item) outputs += [state]
Symbolic DL Graph
Pre-defined DL Operations .Python Interpreter .
Profiler
GraphGenerator
50
Overall Workflow on JANUS
Cell
state
Cell
Cell
Optimize Graph withProfile Information
Standard Compiler Pass● Reaching Definition Analysis● Type inference● Constant Propagation● ...
len:3
Fast Path(Common Case)
Correct Path(Rare Case)
Imperative DL Program
for item in sequence: state = rnn(state, item) outputs += [state]
Symbolic DL Graph
Pre-defined DL Operations .Python Interpreter .
Profiler
GraphGenerator
51
Overall Workflow on JANUS
Cell
state
Cell
Cell
len == 3?
Assert
Validate Assumptionfor Correctness
len:3
Fast Path(Common Case)
Correct Path(Rare Case)
Imperative DL Program
for item in sequence: state = rnn(state, item) outputs += [state]
Symbolic DL Graph
Pre-defined DL Operations .Python Interpreter .
Profiler
GraphGenerator
52
Graph Cache
Overall Workflow on JANUS
Cell
state
Cell
Cell
len == 3?
Assert
len:3
Fast Path(Common Case)
Correct Path(Rare Case)
Symbolic Graph Executor
Imperative DL Program
for item in sequence: state = rnn(state, item) outputs += [state]
Symbolic DL Graph
Pre-defined DL Operations .Python Interpreter .
Profiler
GraphGenerator
53
Graph Cache
Overall Workflow on JANUS
Cell
state
Cell
Cell
len == 3?
Assert
len:3
Fast Path(Common Case)
Correct Path(Rare Case)
Cell
state
Cell
Cell
len == 3?
Assert
Symbolic Graph Executor
Imperative DL Program
for item in sequence: state = rnn(state, item) outputs += [state]
Symbolic DL Graph
Pre-defined DL Operations .Python Interpreter .
Profiler
GraphGenerator
54
Graph Cache
Overall Workflow on JANUS
AssumptionFailure
len:3
Fast Path(Common Case)
Correct Path(Rare Case)
Symbolic Graph Executor
Cell
state
Cell
Cell
len == 3?
Assert
Imperative DL Program
for item in sequence: state = rnn(state, item) outputs += [state]
Symbolic DL Graph
Pre-defined DL Operations .Python Interpreter .
Profiler
GraphGenerator
55
Graph Cache
Overall Workflow on JANUS
len:3
Fast Path(Common Case)
Correct Path(Rare Case)
Imperative Executor
Imperative DL Program
for item in sequence: state = rnn(state, item) outputs += [state]
Pre-defined DL Operations .Python Interpreter .
56
Overall Workflow on JANUS
len:3
Fast Path(Common Case)
Correct Path(Rare Case)
Profiler
GraphGenerator
Graph Cache
Imperative Executor
Imperative DL Program
for item in sequence: state = rnn(state, item) outputs += [state]
Pre-defined DL Operations .Python Interpreter .
Profiler
GraphGenerator
57
Graph Cache
Overall Workflow on JANUS
len:?
Fast Path(Common Case)
Correct Path(Rare Case)
Imperative Executor
Imperative DL Program
for item in sequence: state = rnn(state, item) outputs += [state]
Symbolic DL Graph
Pre-defined DL Operations .Python Interpreter .
Profiler
GraphGenerator
58
Graph Cache
Overall Workflow on JANUS
state
CellSwitch
Merge
i<N
Next
len:?
Fast Path(Common Case)
Correct Path(Rare Case)
Imperative Executor Symbolic Graph Executor
Imperative DL Program
for item in sequence: state = rnn(state, item) outputs += [state]
Pre-defined DL Operations .Python Interpreter .
Profiler
GraphGenerator
59
Graph Cache
Overall Workflow on JANUS
Cell
state
Cell
Cell
len == 3?
Assert
Symbolic DL Graphlen:3
Imperative Executor
Imperative DL Program
for item in sequence: state = rnn(state, item) outputs += [state]
Pre-defined DL Operations .Python Interpreter .
Profiler
GraphGenerator
60
Additional System Aspects
Imperative Execution for Full Python Coverage
len:3
See our paper for more details!
Python Coverage Global State Consistency
SetAttr
0xb84c “data”
value
Symbolic Graph Executor
Python Interpreter .
61
Additional System Aspects
Pre-defined DL Operations. .
“Impure”Symbolic DL Graph
“Impure”Imperative DL Program
def foo(obj): obj.data = value do_sth if pred else pass
Python Coverage Global State Consistency
pred?
Assertdo_sth
Python Heap
ImpureImperative DL Program
def foo(obj): obj.data = value do_sth if pred else pass
SetAttr
0xb84c “data”
value
pred?
Assertdo_sth
Symbolic Graph Executor
Python Interpreter .
62
Additional System Aspects
Pre-defined DL Operations. .
AssumptionFailure
Unsafe to fallback after heap update
ImpureSymbolic DL Graph
?
Python Coverage Global State Consistency
ModifiedPython Heap
Symbolic Graph Executor
Python Interpreter .
63
Additional System Aspects
Pre-defined DL Operations. .
Python Heap Local Copy
ImpureImperative DL Program
def foo(obj): obj.data = value do_sth if pred else pass
SetAttr
0xb84c “data”
value
pred?
Assertdo_sth
ImpureSymbolic DL Graph
Write-back after validating assumptions
See our paper for more details!
Python Coverage Global State Consistency
Outline
● Approach
● Challenges
● JANUS
● Evaluation
√ Correctness and Performance
√ Breakdown of Performance Improvement
64
Evaluation Setup: Frameworks & Environments
● Frameworks
○ JANUS. Implemented on top of TensorFlow and CPython
○ Symbolic. TensorFlow
○ Imperative. TensorFlow Eager
● Hardware & Software Setup
○ 6 machines connected via Mellanox ConnectX-4 cards w/ 100Gbps InfiniBand
○ Each machine w/ 2x(Intel Xeon E5-2695)+6x(NVIDIA GeForce Titan Xp)
○ Ubuntu 16.04, TensorFlow 1.8.0, CUDA 9.0
○ Horovod 0.12.1, NCCL v2.1, OpenMPI v3.0.065
Executors: TensorFlow (TF) + TF Eager (modified)LoC: 4700 LoC / TF diff 771 LoC / CPython diff 1096 LoC
Evaluation Setup: Applications
11 models in 5 categories using various dynamic characteristics of Python
● Convolutional Neural Networks (CNN) LeNet, ResNet-50, Inception-v3
● Recurrent Neural Networks (RNN) LSTM, LM
● Recursive Neural Networks (TreeNN) TreeRNN, TreeLSTM
● Deep Reinforcement Learning (DRL) A3C, PPO
● Generative Adversarial Networks (GAN) AN, PIX2PIX
66
ImageNet Test Error with ResNet50
67
ImperativeSymbolic
Faster
Time
Test Error (%)
36 GPUs
ImageNet Test Error with ResNet50
68
ImperativeSymbolic
Time
Test Error (%)
JANUS
Faster
36 GPUs
ImageNet Test Error with ResNet50
69
ImperativeSymbolic
Time
Test Error (%)
JANUS
Faster
36 GPUs
3.4x Faster Convergence
ImageNet Test Error with ResNet50
70
ImperativeSymbolic
Time
Test Error (%)
JANUS
Faster
36 GPUs
3.4x Faster Convergence
Overlapping computationand communication
Model Convergence
71 Faster
RNN TreeNN
DRL GAN
ImperativeSymbolic
6 GPUs CPU
4 GPUs 1 GPU
Model Convergence
72
3.1x
Faster
18.4x
2.6x3.2x
ImperativeSymbolic
JANUSRNN TreeNN
DRL GAN
6 GPUs CPU
4 GPUs 1 GPU
CNN
GAN
DRL
TreeNN
RNN
Normalized Training Throughput
73
LeNet
ResNet-50
Inception-v3
LSTM
LM
TreeRNN
TreeLSTM
A3C
PPO
AN
PIX2PIX
Single machine w/ Single GPU
Imp.
Symbolic
47.6x over Imperative
96.0% ofSymbolic
Imperative JANUS
CNN
GAN
DRL
TreeNN
RNN
Normalized Training Throughput
74
LeNet
ResNet-50
Inception-v3
LSTM
LM
TreeRNN
TreeLSTM
A3C
PPO
AN
PIX2PIX
Imp.
SymbolicImperative JANUS
}Hand-optimized GPU ops
dominated execution time;
Working on applying further graph optimizations!
Single machine w/ Single GPU
CNN
GAN
DRL
TreeNN
RNN
JANUS Speedup over Imperative Execution
75
LeNet
ResNet-50
Inception-v3
LSTM
LM
TreeRNN
TreeLSTM
A3C
PPO
AN
PIX2PIX
Imp.
JANUSSingle machine w/ Single GPU
CNN
GAN
DRL
TreeNN
RNN
JANUS Speedup over Imperative Execution: Breakdown
76
LeNet
ResNet-50
Inception-v3
LSTM
LM
TreeRNN
TreeLSTM
A3C
PPO
AN
PIX2PIX
Imp.
Single machine w/ Single GPUSpeedup
“without” specializationby runtime profiling
CNN
GAN
DRL
TreeNN
RNN
JANUS Speedup over Imperative Execution: Breakdown
77
LeNet
ResNet-50
Inception-v3
LSTM
LM
TreeRNN
TreeLSTM
A3C
PPO
AN
PIX2PIX
Imp.
Speedup“with” specializationby runtime profiling
Single machine w/ Single GPU
CNN
GAN
DRL
TreeNN
RNN
JANUS Speedup over Imperative Execution: Breakdown
78
LeNet
ResNet-50
Inception-v3
LSTM
LM
TreeRNN
TreeLSTM
A3C
PPO
AN
PIX2PIX
Imp.
BypassingPython Heap
Control Flow Unrolling
Type Specialization
Composed of CNNs
Single machine w/ Single GPU
Related Works
79
● Imperative to symbolic: one-shot converters
○ TensorFlow: defun, AutoGraph, Swift for TensorFlow, JAX, ...○ PyTorch JIT trace, script○ MXNet Gluon
● Cannot handle the dynamic semantics of Python correctly & efficiently
Conclusion
● Programmability and debuggability of imperative DL frameworkswith the performance of symbolic DL frameworks
● Speculative graph generation and execution with runtime profiling
● Up to 47.6x speedup over imperative DL framework,within up to 4% difference compared to symbolic DL framework,while transparently and correctly executing imperative DL programs
80