JANUS: Fast and Flexible Deep Learning via Symbolic Graph … · 2019-03-12 · JANUS: Fast and...

Post on 27-Feb-2020

10 views 0 download

Transcript of JANUS: Fast and Flexible Deep Learning via Symbolic Graph … · 2019-03-12 · JANUS: Fast and...

JANUS: Fast and Flexible Deep Learningvia Symbolic Graph Execution of Imperative Programs

Eunji Jeong, Sungwoo Cho, Gyeong-In Yu,Joo Seong Jeong, Dong-Jin Shin, Byung-Gon Chun

1

Deep Learning (DL) Frameworks

ExecuteDefine

Images From:http://www.mdpi.com/https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/Going Deeper with Convolutions, 2014, https://towardsdatascience.com/learn-how-recurrent-neural-networks-work-84e975feaaf7Short-Term Load Forecasting Using EMD-LSTM Neural Networks with a Xgboost Algorithm for Feature Importance Evaluation, Energies 2017https://skymind.ai/wiki/generative-adversarial-network-ganhttps://en.wikipedia.org/wiki/Reinforcement_learninghttps://medium.com/@Petuum/intro-to-dynamic-neural-networks-and-dynet-67694b18cb23

2

Symbolic DL Frameworks

✓ Build a Symbolic Graph✓ Execute the Graph

def build_graph(g): x = g.input(float) linear = g.add(g.mul(W, x), b)

build_graph(graph)run_graph(graph, x_data)

Imperative DL Frameworks

✓ Directly Execute the Computations

Two Paradigms

x

Mul

Add

W

b

3

def linear(x): return W * x + blinear(x_data)

imperative

Symbolic DL Frameworks

✓ Build a Symbolic Graph✓ Execute the Graph

def build_graph(g): x = g.input(float) linear = g.add(g.mul(W, x), b)

build_graph(graph)run_graph(graph, x_data)

Imperative DL Frameworks

✓ Directly Execute the Computations

Two Paradigms

x

Mul

Add

W

b

4

def linear(x): return W * x + blinear(x_data)

imperative

Symbolic DL Frameworks

✓ Build a Symbolic Graph✓ Execute the Graph

def build_graph(g): x = g.input(float) linear = g.add(g.mul(W, x), b)

build_graph(graph)run_graph(graph, x_data)

Imperative DL Frameworks

✓ Directly Execute the Computations

Two Paradigms

x

Mul

Add

W

b

5

def linear(x): return W * x + blinear(x_data)

def linear(x): return W * x + blinear(x_data)

imperative

Symbolic DL Frameworks

✓ Build a Symbolic Graph✓ Execute the Graph

def build_graph(g): x = g.input(float) linear = g.add(g.mul(W, x), b)

build_graph(graph)run_graph(graph, x_data)

Imperative DL Frameworks

✓ Directly Execute the Computations

Two Paradigms

x

Mul

Add

W

b

6

def linear(x): return W * x + blinear(x_data)

imperative

Imperative DL Frameworks

✓ Directly Execute the Computations

Symbolic DL Frameworks

✓ Build a Symbolic Graph✓ Execute the Graph

def build_graph(g): x = g.input(float) linear = g.add(g.mul(W, x), b)

build_graph(graph)run_graph(graph, x_data)

Two Paradigms

x

Mul

Add

W

b

7

def linear(x): return W * x + blinear(x_data)

imperative

Imperative DL Frameworks

✓ Directly Execute the Computations

Symbolic DL Frameworks

✓ Build a Symbolic Graph✓ Execute the Graph

def build_graph(g): x = g.input(float) linear = g.add(g.mul(W, x), b)

build_graph(graph)run_graph(graph, x_data)

Two Paradigms

x

Mul

Add

W

b

8

def linear(x): return W * x + blinear(x_data)

imperative

Symbolic DL Frameworks

+ Easy to Optimize+ Compiler Optimization+ Parallel Execution of Operations+ Deploy on GPU, Cluster, Mobile, ...

- Decoupled View:Hard to Program & Debug

Imperative DL Frameworks

+ Direct Execution:Easy to Program & Debug

- Hard to Optimize

Pros & Cons

Pros

Cons

9

Performance Programmability

Symbolic DL Frameworks

+ Easy to Optimize+ Compiler Optimization+ Parallel Execution of Operations+ Deploy on GPU, Cluster, Mobile,...

- Decoupled View:Hard to Program & Debug

Imperative DL Frameworks

+ Direct Execution:Easy to Program & Debug

- Hard to Optimize

Pros & Cons

Pros

Cons

10

Performance Programmability

Symbolic DL Frameworks

+ Easy to Optimize+ Compiler Optimization+ Parallel Execution of Operations+ Deploy on GPU, Cluster, Mobile,...

- Decoupled View:Hard to Program & Debug

Imperative DL Frameworks

+ Direct Execution:Easy to Program & Debug

- Hard to Optimize

Pros & Cons

Pros

Cons

11

Performance Programmability

Symbolic DL Frameworks

+ Easy to Optimize+ Compiler Optimization+ Parallel Execution of Operations+ Deploy on GPU, Cluster, Mobile,...

- Decoupled View:Hard to Program & Debug

Imperative DL Frameworks

+ Direct Execution:Easy to Program & Debug

- Hard to Optimize

Pros & Cons

Pros

Cons

12

Performance Programmability

Symbolic DL Frameworks

+ Easy to Optimize+ Compiler Optimization+ Parallel Execution of Operations+ Deploy on GPU, Cluster, Mobile, ...

- Decoupled View:Hard to Program & Debug

Imperative DL Frameworks

+ Direct Execution:Easy to Program & Debug

- Hard to Optimize

Pros & Cons

Pros

Cons

13

Performance Programmability

Symbolic DL Frameworks

+ Easy to Optimize+ Compiler Optimization+ Parallel Execution of Operations+ Deploy on GPU, Cluster, Mobile, ...

- Decoupled View:Hard to Program & Debug

Imperative DL Frameworks

+ Direct Execution:Easy to Program & Debug

- Hard to Optimize

Pros & Cons

Pros

Cons

14

Performance Programmability

Symbolic DL Frameworks

+ Easy to Optimize+ Compiler Optimization+ Parallel Execution of Operations+ Deploy on GPU, Cluster, Mobile,...

- Decoupled View:Hard to Program & Debug

Imperative DL Frameworks

+ Direct Execution:Easy to Program & Debug

- Hard to Optimize

What People Want Is...

Pros

Cons

15

ProgrammabilityPerformance

Imperative DL Program

def foo(x): prod = mul(3, x) return add(prod, 2)

JANUS: Combining the Best of Both Worlds

16

Symbolic DL Graph

x

Mul

Add

3

2

“Easy Programmability” “High Performance”

TransparentConversion

JANUS: Combining the Best of Both Worlds

17

● 11 models in 5 major neural network categories:○ Convolutional Neural Networks (CNN) LeNet, ResNet-50, Inception-v3○ Recurrent Neural Networks (RNN) LSTM, LM○ Recursive Neural Networks (TreeNN) TreeRNN, TreeLSTM○ Generative Adversarial Networks (GAN) GAN, PIX2PIX○ Deep Reinforcement Learning (DRL) A3C, PPO

● Up to 47.6x speedup compared to imperative DL framework,comparable performance (within 4%) to symbolic DL frameworkwith unmodified imperative DL programs

Outline

● Approach

● Challenges

● JANUS

● Evaluation

18

Challenges in Graph Conversion

19

Imperative DL Program

def foo(x): tmp = mul(3, x) return add(tmp, 2)

TransparentConversion

Symbolic DL Graph

x

Mul

Add

3

2

Challenges in Graph Conversion

20

Imperative Python DL Program

def foo(x): tmp = mul(3, x) return add(tmp, 2)

TransparentConversion

Symbolic DL Graph

x

Mul

Add

3

2

De-facto Standard Languagefor DL Programming

Challenges in Graph Conversion

21

Imperative Python DL Program

def foo(x): tmp = mul(3, x) return add(tmp, 2)

TransparentConversion

Symbolic DL Graph

x

Mul

Add

3

2?

Discrepancy between Python Programs and DL Graphs

22

TransparentConversion?

“Dynamic” “Static”

Imperative Python DL Program

def foo(x): tmp = mul(3, x) return add(tmp, 2)

Symbolic DL Graph

x

Mul

Add

3

2

TransparentConversion

Discrepancy between Python Programs and DL Graphs

23

?Characteristics● determined at runtime● change at runtime

“Dynamic” “Static”

Imperative Python DL Program

def foo(x): tmp = mul(3, x) return add(tmp, 2)

Symbolic DL Graph

x

Mul

Add

3

2

TransparentConversion

Discrepancy between Python Programs and DL Graphs

24

Symbolic DL Graph

INT, 10x1x

INT, 10x1Mul

INT, 10x1Add

INT3

INT2

Characteristics● must be given

when building a graph

?Characteristics● determined at runtime● change at runtime

“Dynamic” “Static”

Imperative Python DL Program

def foo(x): tmp = mul(3, x) return add(tmp, 2)

Imperative Python DL Program

def foo(x): tmp = mul(3, x) return add(tmp, 2)

TransparentConversion

Discrepancy between Python Programs and DL Graphs

25

Symbolic DL Graph

INT, 10x1x

INT, 10x1Mul

INT, 10x1ADD

INT3

INT2

Characteristics● must be given

when building a graph

?Characteristics● determined at runtime● change at runtime

“Dynamic” “Static”

DST:NEED INFO

SRC: NO INFO

class RNNModel(object):def __call__(self, sequence):

state = self.stateoutputs = []for item in sequence:

state = rnn_cell(state, item)outputs += [state]

self.state = statereturn compute_loss(outputs)

for sequence in sequences:optimize(lambda: model(sequence))

Example: Recurrent Neural Network (RNN)

26

Correctness & Performance

of Graph Execution

Dynamic Features of Python

√ Dynamic Control Flow

√ Dynamic Types

√ Impure Functions

?

class RNNModel(object):def __call__(self, sequence):

state = self.stateoutputs = []for item in sequence:

state = rnn_cell(state, item)outputs += [state]

self.state = statereturn compute_loss(outputs)

for sequence in sequences:optimize(lambda: model(sequence))

RNN Example

27

sawThey dogsseq[0]:

sheWas sick?seq[1]:

Dynamic Control Flow Dynamic Types Impure Function

class RNNModel(object):def __call__(self, sequence):

state = self.stateoutputs = []for item in sequence:

state = rnn_cell(state, item)outputs += [state]

self.state = statereturn compute_loss(outputs)

for sequence in sequences:optimize(lambda: model(sequence))

RNN Example

28

RNNCell

sawThey dogs

state

out[0]

seq[0]:

sheWas sick?seq[1]:

Dynamic Control Flow Dynamic Types Impure Function

class RNNModel(object):def __call__(self, sequence):

state = self.stateoutputs = []for item in sequence:

state = rnn_cell(state, item)outputs += [state]

self.state = statereturn compute_loss(outputs)

for sequence in sequences:optimize(lambda: model(sequence))

RNN Example

29

RNNCell

RNNCell

sawThey dogs

state

out[1]

out[0]

seq[0]:

sheWas sick?seq[1]:

Dynamic Control Flow Dynamic Types Impure Function

class RNNModel(object):def __call__(self, sequence):

state = self.stateoutputs = []for item in sequence:

state = rnn_cell(state, item)outputs += [state]

self.state = statereturn compute_loss(outputs)

for sequence in sequences:optimize(lambda: model(sequence))

RNN Example

30

RNNCell

RNNCell

RNNCell

sawThey dogs

state

out[1]

out[0]

out[2]

seq[0]:

sheWas sick?seq[1]:

Dynamic Control Flow Dynamic Types Impure Function

class RNNModel(object):def __call__(self, sequence):

state = self.stateoutputs = []for item in sequence:

state = rnn_cell(state, item)outputs += [state]

self.state = statereturn compute_loss(outputs)

for sequence in sequences:optimize(lambda: model(sequence))

RNN Example

31

Dynamic Control Flow Dynamic Types Impure Function

RNN Example

32

class RNNModel(object):def __call__(self, sequence):

state = self.stateoutputs = []for item in sequence:

state = rnn_cell(state, item)outputs += [state]

self.state = statereturn compute_loss(outputs)

for sequence in sequences:optimize(lambda: model(sequence))

state

CellSwitch

Merge

i<N

Next

Dynamic Control Flow Dynamic Types Impure Function

RNN Example

● Correct

33

class RNNModel(object):def __call__(self, sequence):

state = self.stateoutputs = []for item in sequence:

state = rnn_cell(state, item)outputs += [state]

self.state = statereturn compute_loss(outputs)

for sequence in sequences:optimize(lambda: model(sequence))

state

CellSwitch

Merge

i<N

Next

Dynamic Control Flow Dynamic Types Impure Function

RNN Example

● Correct● Slow

34

class RNNModel(object):def __call__(self, sequence):

state = self.stateoutputs = []for item in sequence:

state = rnn_cell(state, item)outputs += [state]

self.state = statereturn compute_loss(outputs)

for sequence in sequences:optimize(lambda: model(sequence))

state

CellSwitch

Merge

i<N

Next

Dynamic Control Flow Dynamic Types Impure Function

RNN Example

● Correct● Slow

● Fast

35

class RNNModel(object):def __call__(self, sequence):

state = self.stateoutputs = []for item in sequence:

state = rnn_cell(state, item)outputs += [state]

self.state = statereturn compute_loss(outputs)

for sequence in sequences:optimize(lambda: model(sequence))

state

CellSwitch

Merge

i<N

Next

Cell

state

Cell

Cell

Dynamic Control Flow Dynamic Types Impure Function

RNN Example

● Correct● Slow

● Fast● Incorrect● Need Info

36

class RNNModel(object):def __call__(self, sequence):

state = self.stateoutputs = []for item in sequence:

state = rnn_cell(state, item)outputs += [state]

self.state = statereturn compute_loss(outputs)

for sequence in sequences:optimize(lambda: model(sequence))

state

CellSwitch

Merge

i<N

Next

Cell

state

Cell

Cell

Dynamic Control Flow Dynamic Types Impure Function

class RNNModel(object):def __call__(self, sequence):

state = self.stateoutputs = []for item in sequence:

state = rnn_cell(state, item)outputs += [state]

self.state = statereturn compute_loss(outputs)

for sequence in sequences:optimize(lambda: model(sequence))

RNN Example

37

RNNCell

RNNCell

RNNCell

sawThey dogs

state state

out[1]

out[0]

out[2]

seq[0]:

sheWas sick?seq[1]:

Dynamic Control Flow Dynamic Types Impure Function

class RNNModel(object):def __call__(self, sequence):

state = self.stateoutputs = []for item in sequence:

state = rnn_cell(state, item)outputs += [state]

self.state = statereturn compute_loss(outputs)

for sequence in sequences:optimize(lambda: model(sequence))

RNN Example

38

PlaceHoldertype: intshape: ?

PlaceHoldertype: float

shape: ?...

● Correct● Inefficient

Dynamic Control Flow Dynamic Types Impure Function

class RNNModel(object):def __call__(self, sequence):

state = self.stateoutputs = []for item in sequence:

state = rnn_cell(state, item)outputs += [state]

self.state = statereturn compute_loss(outputs)

for sequence in sequences:optimize(lambda: model(sequence))

RNN Example

39

PlaceHoldertype: int

shape: (3x128)

● Correct● Inefficient

● Fast● Incorrect● Need Info

...

Dynamic Control Flow Dynamic Types Impure Function

PlaceHoldertype: intshape: ?

PlaceHoldertype: float

shape: ?

class RNNModel(object):def __call__(self, sequence):

state = self.stateoutputs = []for item in sequence:

state = rnn_cell(state, item)outputs += [state]

self.state = statereturn compute_loss(outputs)

for sequence in sequences:optimize(lambda: model(sequence))

RNN Example

40

RNNCell

RNNCell

RNNCell

sawThey dogs

state state

out[1]

out[0]

out[2]

seq[0]:

sheWas sick?seq[1]:

Dynamic Control Flow Dynamic Types Impure Function

class RNNModel(object):def __call__(self, sequence):

state = self.stateoutputs = []for item in sequence:

state = rnn_cell(state, item)outputs += [state]

self.state = statereturn compute_loss(outputs)

for sequence in sequences:optimize(lambda: model(sequence))

RNN Example

41

RNNCell

RNNCell

RNNCell

sawThey dogs

state state

out[1]

out[0]

out[2]

seq[0]:

sheWas sick?

state

seq[1]:

RNNCell

RNNCell

RNNCell

out[1]

out[0]

out[2]

state

Dynamic Control Flow Dynamic Types Impure Function

class RNNModel(object):def __call__(self, sequence):

state = self.stateoutputs = []for item in sequence:

state = rnn_cell(state, item)outputs += [state]

self.state = statereturn compute_loss(outputs)

for sequence in sequences:optimize(lambda: model(sequence))

RNN Example

42

?

Dynamic Control Flow Dynamic Types Impure Function

Challenge:

achieving Correctness & Performance at the same time

Challenge Summary

43

Imperative Python DL Programwith Dynamic Features

Correct & FastSymbolic DL Graph?

Challenge:

achieving Correctness & Performance at the same time

Challenge Summary

44

Imperative Python DL Programwith Dynamic Features

CorrectGraph

FastGraph

Slow

Incorrect

Challenge:

achieving Correctness & Performance at the same time

Challenge Summary

45

Imperative Python DL Programwith Dynamic Features

CorrectGraph

FastGraph

Slow

Incorrect

Outline

● Approach

● Challenges

● JANUS

● Evaluation

46

Solution: Speculative Graph Generation and Execution

● Goal: Correctness & Performance

● [Performance] Speculatively Specialize the Graph○ Make reasonable assumptions based on the execution history (Profiling)○ Run specialized graph (Common Case)

● [Correctness] Validate Assumptions○ Fallback if an assumption is broken (Rare Case)

47

Imperative DL Program

for item in sequence: state = rnn(state, item) outputs += [state]

Imperative Executor

Pre-defined DL Operations .Python Interpreter .

Overall Workflow on JANUS

48

Fast Path(Common Case)

Correct Path(Rare Case)

Imperative DL Program

for item in sequence: state = rnn(state, item) outputs += [state]

Imperative Executor

Pre-defined DL Operations .Python Interpreter .

Profiler

len:3

49

Overall Workflow on JANUS

Modified Python Interpreterfor Transparent Profiling

Fast Path(Common Case)

Correct Path(Rare Case)

Imperative DL Program

for item in sequence: state = rnn(state, item) outputs += [state]

Symbolic DL Graph

Pre-defined DL Operations .Python Interpreter .

Profiler

GraphGenerator

50

Overall Workflow on JANUS

Cell

state

Cell

Cell

Optimize Graph withProfile Information

Standard Compiler Pass● Reaching Definition Analysis● Type inference● Constant Propagation● ...

len:3

Fast Path(Common Case)

Correct Path(Rare Case)

Imperative DL Program

for item in sequence: state = rnn(state, item) outputs += [state]

Symbolic DL Graph

Pre-defined DL Operations .Python Interpreter .

Profiler

GraphGenerator

51

Overall Workflow on JANUS

Cell

state

Cell

Cell

len == 3?

Assert

Validate Assumptionfor Correctness

len:3

Fast Path(Common Case)

Correct Path(Rare Case)

Imperative DL Program

for item in sequence: state = rnn(state, item) outputs += [state]

Symbolic DL Graph

Pre-defined DL Operations .Python Interpreter .

Profiler

GraphGenerator

52

Graph Cache

Overall Workflow on JANUS

Cell

state

Cell

Cell

len == 3?

Assert

len:3

Fast Path(Common Case)

Correct Path(Rare Case)

Symbolic Graph Executor

Imperative DL Program

for item in sequence: state = rnn(state, item) outputs += [state]

Symbolic DL Graph

Pre-defined DL Operations .Python Interpreter .

Profiler

GraphGenerator

53

Graph Cache

Overall Workflow on JANUS

Cell

state

Cell

Cell

len == 3?

Assert

len:3

Fast Path(Common Case)

Correct Path(Rare Case)

Cell

state

Cell

Cell

len == 3?

Assert

Symbolic Graph Executor

Imperative DL Program

for item in sequence: state = rnn(state, item) outputs += [state]

Symbolic DL Graph

Pre-defined DL Operations .Python Interpreter .

Profiler

GraphGenerator

54

Graph Cache

Overall Workflow on JANUS

AssumptionFailure

len:3

Fast Path(Common Case)

Correct Path(Rare Case)

Symbolic Graph Executor

Cell

state

Cell

Cell

len == 3?

Assert

Imperative DL Program

for item in sequence: state = rnn(state, item) outputs += [state]

Symbolic DL Graph

Pre-defined DL Operations .Python Interpreter .

Profiler

GraphGenerator

55

Graph Cache

Overall Workflow on JANUS

len:3

Fast Path(Common Case)

Correct Path(Rare Case)

Imperative Executor

Imperative DL Program

for item in sequence: state = rnn(state, item) outputs += [state]

Pre-defined DL Operations .Python Interpreter .

56

Overall Workflow on JANUS

len:3

Fast Path(Common Case)

Correct Path(Rare Case)

Profiler

GraphGenerator

Graph Cache

Imperative Executor

Imperative DL Program

for item in sequence: state = rnn(state, item) outputs += [state]

Pre-defined DL Operations .Python Interpreter .

Profiler

GraphGenerator

57

Graph Cache

Overall Workflow on JANUS

len:?

Fast Path(Common Case)

Correct Path(Rare Case)

Imperative Executor

Imperative DL Program

for item in sequence: state = rnn(state, item) outputs += [state]

Symbolic DL Graph

Pre-defined DL Operations .Python Interpreter .

Profiler

GraphGenerator

58

Graph Cache

Overall Workflow on JANUS

state

CellSwitch

Merge

i<N

Next

len:?

Fast Path(Common Case)

Correct Path(Rare Case)

Imperative Executor Symbolic Graph Executor

Imperative DL Program

for item in sequence: state = rnn(state, item) outputs += [state]

Pre-defined DL Operations .Python Interpreter .

Profiler

GraphGenerator

59

Graph Cache

Overall Workflow on JANUS

Cell

state

Cell

Cell

len == 3?

Assert

Symbolic DL Graphlen:3

Imperative Executor

Imperative DL Program

for item in sequence: state = rnn(state, item) outputs += [state]

Pre-defined DL Operations .Python Interpreter .

Profiler

GraphGenerator

60

Additional System Aspects

Imperative Execution for Full Python Coverage

len:3

See our paper for more details!

Python Coverage Global State Consistency

SetAttr

0xb84c “data”

value

Symbolic Graph Executor

Python Interpreter .

61

Additional System Aspects

Pre-defined DL Operations. .

“Impure”Symbolic DL Graph

“Impure”Imperative DL Program

def foo(obj): obj.data = value do_sth if pred else pass

Python Coverage Global State Consistency

pred?

Assertdo_sth

Python Heap

ImpureImperative DL Program

def foo(obj): obj.data = value do_sth if pred else pass

SetAttr

0xb84c “data”

value

pred?

Assertdo_sth

Symbolic Graph Executor

Python Interpreter .

62

Additional System Aspects

Pre-defined DL Operations. .

AssumptionFailure

Unsafe to fallback after heap update

ImpureSymbolic DL Graph

?

Python Coverage Global State Consistency

ModifiedPython Heap

Symbolic Graph Executor

Python Interpreter .

63

Additional System Aspects

Pre-defined DL Operations. .

Python Heap Local Copy

ImpureImperative DL Program

def foo(obj): obj.data = value do_sth if pred else pass

SetAttr

0xb84c “data”

value

pred?

Assertdo_sth

ImpureSymbolic DL Graph

Write-back after validating assumptions

See our paper for more details!

Python Coverage Global State Consistency

Outline

● Approach

● Challenges

● JANUS

● Evaluation

√ Correctness and Performance

√ Breakdown of Performance Improvement

64

Evaluation Setup: Frameworks & Environments

● Frameworks

○ JANUS. Implemented on top of TensorFlow and CPython

○ Symbolic. TensorFlow

○ Imperative. TensorFlow Eager

● Hardware & Software Setup

○ 6 machines connected via Mellanox ConnectX-4 cards w/ 100Gbps InfiniBand

○ Each machine w/ 2x(Intel Xeon E5-2695)+6x(NVIDIA GeForce Titan Xp)

○ Ubuntu 16.04, TensorFlow 1.8.0, CUDA 9.0

○ Horovod 0.12.1, NCCL v2.1, OpenMPI v3.0.065

Executors: TensorFlow (TF) + TF Eager (modified)LoC: 4700 LoC / TF diff 771 LoC / CPython diff 1096 LoC

Evaluation Setup: Applications

11 models in 5 categories using various dynamic characteristics of Python

● Convolutional Neural Networks (CNN) LeNet, ResNet-50, Inception-v3

● Recurrent Neural Networks (RNN) LSTM, LM

● Recursive Neural Networks (TreeNN) TreeRNN, TreeLSTM

● Deep Reinforcement Learning (DRL) A3C, PPO

● Generative Adversarial Networks (GAN) AN, PIX2PIX

66

ImageNet Test Error with ResNet50

67

ImperativeSymbolic

Faster

Time

Test Error (%)

36 GPUs

ImageNet Test Error with ResNet50

68

ImperativeSymbolic

Time

Test Error (%)

JANUS

Faster

36 GPUs

ImageNet Test Error with ResNet50

69

ImperativeSymbolic

Time

Test Error (%)

JANUS

Faster

36 GPUs

3.4x Faster Convergence

ImageNet Test Error with ResNet50

70

ImperativeSymbolic

Time

Test Error (%)

JANUS

Faster

36 GPUs

3.4x Faster Convergence

Overlapping computationand communication

Model Convergence

71 Faster

RNN TreeNN

DRL GAN

ImperativeSymbolic

6 GPUs CPU

4 GPUs 1 GPU

Model Convergence

72

3.1x

Faster

18.4x

2.6x3.2x

ImperativeSymbolic

JANUSRNN TreeNN

DRL GAN

6 GPUs CPU

4 GPUs 1 GPU

CNN

GAN

DRL

TreeNN

RNN

Normalized Training Throughput

73

LeNet

ResNet-50

Inception-v3

LSTM

LM

TreeRNN

TreeLSTM

A3C

PPO

AN

PIX2PIX

Single machine w/ Single GPU

Imp.

Symbolic

47.6x over Imperative

96.0% ofSymbolic

Imperative JANUS

CNN

GAN

DRL

TreeNN

RNN

Normalized Training Throughput

74

LeNet

ResNet-50

Inception-v3

LSTM

LM

TreeRNN

TreeLSTM

A3C

PPO

AN

PIX2PIX

Imp.

SymbolicImperative JANUS

}Hand-optimized GPU ops

dominated execution time;

Working on applying further graph optimizations!

Single machine w/ Single GPU

CNN

GAN

DRL

TreeNN

RNN

JANUS Speedup over Imperative Execution

75

LeNet

ResNet-50

Inception-v3

LSTM

LM

TreeRNN

TreeLSTM

A3C

PPO

AN

PIX2PIX

Imp.

JANUSSingle machine w/ Single GPU

CNN

GAN

DRL

TreeNN

RNN

JANUS Speedup over Imperative Execution: Breakdown

76

LeNet

ResNet-50

Inception-v3

LSTM

LM

TreeRNN

TreeLSTM

A3C

PPO

AN

PIX2PIX

Imp.

Single machine w/ Single GPUSpeedup

“without” specializationby runtime profiling

CNN

GAN

DRL

TreeNN

RNN

JANUS Speedup over Imperative Execution: Breakdown

77

LeNet

ResNet-50

Inception-v3

LSTM

LM

TreeRNN

TreeLSTM

A3C

PPO

AN

PIX2PIX

Imp.

Speedup“with” specializationby runtime profiling

Single machine w/ Single GPU

CNN

GAN

DRL

TreeNN

RNN

JANUS Speedup over Imperative Execution: Breakdown

78

LeNet

ResNet-50

Inception-v3

LSTM

LM

TreeRNN

TreeLSTM

A3C

PPO

AN

PIX2PIX

Imp.

BypassingPython Heap

Control Flow Unrolling

Type Specialization

Composed of CNNs

Single machine w/ Single GPU

Related Works

79

● Imperative to symbolic: one-shot converters

○ TensorFlow: defun, AutoGraph, Swift for TensorFlow, JAX, ...○ PyTorch JIT trace, script○ MXNet Gluon

● Cannot handle the dynamic semantics of Python correctly & efficiently

Conclusion

● Programmability and debuggability of imperative DL frameworkswith the performance of symbolic DL frameworks

● Speculative graph generation and execution with runtime profiling

● Up to 47.6x speedup over imperative DL framework,within up to 4% difference compared to symbolic DL framework,while transparently and correctly executing imperative DL programs

80