Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML,...
Transcript of Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML,...
![Page 1: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/1.jpg)
Relay: a high level differentiable IRJared Roesch
TVMConf December 12th, 2018
!1
![Page 2: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/2.jpg)
!2
This represents months of joint work with lots of great folks:
![Page 3: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/3.jpg)
TVM Stack
Optimization
AutoTVM
AutoVTA
High-Level Differentiable IR
Tensor Expression IR
LLVM, CUDA, Metal VTA
Edge FPGA
Cloud FPGA
ASIC Hardware Fleet
!3
Relay
![Page 4: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/4.jpg)
How do we represent deep learning?
• Build parametric functions which approximate impossible or hard to program functions.
• In order to perform deep learning we need:
• To represent computation
• To differentiate
• To optimize
!4
![Page 5: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/5.jpg)
!5
Existing Approach
Computation Graph
Tensor Expression IR
LLVM, CUDA, Metal VTA
Edge FPGA
Cloud FPGA
ASIC
!5
LSTM Training LoopResnet, DCGAN
![Page 6: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/6.jpg)
!6
Existing Approach
High-Level Differentiable IR
Tensor Expression IR
LLVM, CUDA, Metal VTA
Edge FPGA
Cloud FPGA
ASIC
!6
LSTM Training LoopResnet, DCGAN
![Page 7: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/7.jpg)
for i in range(…): inp, hs = …
out, nhs = RNNCell(inp, hs)
Python
Relay
for i in range(…): input, hs = …
out, nhs = RNNCell(inp, hs)
![Page 8: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/8.jpg)
Challenges
• How do we represent control-flow, functional abstraction, and recursion?
• How do we represent and optimize training?
• How do we perform end-to-end whole model optimization?
!8
![Page 9: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/9.jpg)
Relay
• Relay is the high level IR of the TVM stack.
• Generalize computation graphs to differentiable programs.
• Enables whole-program optimization for deep learning.
• Composed of new IR, auto-diff, optimizer, and backends.
• Relay is open source.
!9
![Page 10: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/10.jpg)
Initial Results
• Relay shows promising initial results when evaluated in inference tasks:
• We are able fully optimize models such as generative RNNs, outperforming PyTorch by up to 3x on model inference.
• We demonstrate performance comparable to NNVM and outperform TensorFlow and TensorFlow Lite.
• We show that Relay can be executed on FPGAs, resulting in up to an 11x performance improvement over baseline.
!10
![Page 11: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/11.jpg)
Text Format
AST Optimizer
CompiledOperators
OperatorLanguage
On-disk representat
ion
Model Importer
DSL
Ahead of time
compiler
Reference Interpreter
Graph Runtime
GPU
CPU
FPGA
Frontend Compiler Execution
!11
![Page 12: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/12.jpg)
IR
• A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning.
• Features closures, reference, ADTs, and primitive operators, tensors are the primary value type.
• We can use this to represent full-models including a generative RNN and training loops.
• Functional style makes it possible to analyze and transform as pure data-flow.
!12
![Page 13: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/13.jpg)
RNN
x0
…
oN
sns1 s2 sn + 1
o1 o2
x1 xN
s0
!13
![Page 14: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/14.jpg)
!14
def @generate(n, i, h, …): if (n == 0) [] else let (output, new_hidden) = @rnn_cell(i, h, …); output + @generate( n - 1, output, new_hidden, …)
Parameters
Loop Counter
Functional style loop
![Page 15: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/15.jpg)
Typing
• Typing these programs introduces a few challenges:
• Need static Tensor shape information to match accelerator primitives, optimize aggressively, and provide better errors.
• Provide flexible typing for operators which contain shape input and output relationships such as broadcast, flatten, concat, squeeze, and more.
!15
![Page 16: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/16.jpg)
!16
Tensor<f32, (32, 3, 32, 32)>
Tensor : (BaseType, Shape) -> Type
Float : (Width: Int, Lanes: Int) -> BaseType
f32 = Float<32, 1>
4-d Tensor N * Channels * Height * Width
![Page 17: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/17.jpg)
Type Relation
• Operators, the primitive building block of machine learning, are hard to type check (e.g. preconditions must hold over input tensors).
• A call can contain a series of relations which must hold over the input types.
• Enables very flexible typing of operators.
• For example can implement variable arguments using relations (concat) and input/output relationships (broadcast).
!17
![Page 18: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/18.jpg)
add : forall (Lhs: Type, Rhs: Type, Out: Type),(Lhs, Rhs) -> Out
where Broadcast(Lhs, Rhs, Out)
Broadcast(Tensor<f32, (3, 4, 5)>, Tensor<f32 (n, 3, 4, 5), Tensor<f32, (n, 3, 4, 5)>)
Broadcast(Tensor<f32, (1, 5)>, Tensor<f32, (n, 5)>, Tensor<f32, (n, 5)>)
For example we can type broadcasting addition:
Broadcasting is a tricky rule often employed in machine learning:
!18
![Page 19: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/19.jpg)
!19
concat : forall (Args: Type, Out: Type),(Args) -> Out
where IsTuple(Args), Concat(Args, Out)
Or more complex constraints such as:
![Page 20: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/20.jpg)
Optimizations• We implement various optimizations over these programs including:
• Standard Optimizations
• Fusion
• Constant Propagation
• Accelerator Specific Optimizations
• Quantization (see Ziheng’s talk)
• FoldScaleAxis
• Data Packing
!20
![Page 21: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/21.jpg)
Backends
Graph Runtime
Interpreter
AoT Compiler
FPGA
GPU
CPU
Relay
!21
![Page 22: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/22.jpg)
Backends
• We implemented multiple execution backends to demonstrate the versatility of Relay as an IR.
• Each backend builds on TVM’s existing low level Tensor IR (HalideIR).
• TVM is used for operators, but the rest of the program must be executed (e.g. allocation, control-flow, recursion).
!22
![Page 23: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/23.jpg)
Operator Compilation
!23
TVM operators.so
def @my_func(…) {…
}
![Page 24: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/24.jpg)
Graph Runtime
• TVM’s existing execution pipeline, can execute a subset of Relay programs.
• Requires a graph, a shared library containing operators, and parameters
GraphRTS
+ operators.so
!24
![Page 25: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/25.jpg)
Interpreter
• A reference interpreter for Relay.
• Implements the reference semantics.
• Uses naive recursive AST traversal for interpreting control flow.
• Uses JIT compilation for operators.
!25
![Page 26: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/26.jpg)
AoT Compiler
• A case study of what Relay IR affords, we built prototype compiler in less than 3 weeks.
• Generates code for CPU/GPU, FPGA support in the future.
• Removes interpretation overhead and enables optimization.
• Written as a pure Python library and uses Relay as dependency.
!26
![Page 27: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/27.jpg)
Ahead of time compiler
def @my_func(…) {…
}Standard Optimize
AoT Optimize LittleCpp Clang
librelay_aot_my_func.so
!27
f = compile(my_func)f(…)
![Page 28: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/28.jpg)
VTA
• VTA is a target for Relay.
• We can compile high level models written in Frameworks such as MxNet directly to Relay.
• Generic compilation to VTA will be upstreamed soon after the conference.
!28
![Page 29: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/29.jpg)
VTA
• VTA is a target for Relay.
• We can compile high level models written in Frameworks such as MxNet directly to Relay.
• Generic compilation to VTA will be upstreamed soon after the conference.
DRAM
LOADMODULE
INPUT BUFFER
WEIGHT BUFFER
STORE BUFFER
MICRO-OP BUFFER
REGISTER FILE
Tensor Core
Vector ALU
LD→CMP Q
CMP→LD Q
CMP→ST Q
ST→CMP Q
COMPUTE MODULE
STOREMODULE
LOAD CMD Q
COMPUTE CMD Q
STORE CMD Q
INSTRUCTION FETCH MODULE
!28
![Page 30: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/30.jpg)
Evaluation• Relay supports expressive models:
• We demonstrate Relay’s ability to optimize full models such as generative RNNs, beating PyTorch by up to 3x.
• Relay provides competitive performance:
• We demonstrate better than TensorFlow and on par performance with NNVM on a suite of models.
• Relay supports customized hardware:
• We show how Relay and TVM can be used to execute on FPGA based accelerators, bring 11x performance improvement over baseline.
!29
![Page 31: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/31.jpg)
!30
Relay-Interpreted RNNRelay-Interpreted Cell
Relay-Compiled CellRelay-Compiled RNN
PyTorch
![Page 32: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/32.jpg)
Relay
Relay
CNN Results
Relay
!31
![Page 33: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/33.jpg)
VTA Results
!32
![Page 34: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/34.jpg)
Future Work
• Evaluating Relay on training tasks.
• AutoRelay: applying ideas from AutoTVM to Relay.
• A high-level full differentiable programming language frontend (i.e Python frontend, Haskell DSL).
• Novel analyses and optimizations for DL (e.g automatic differential privacy).
• Non-standard data types (e.g unums, posits).
!33
![Page 35: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/35.jpg)
Lessons Learned
• Using a full program representation we were able to:
• Rephrase shape inference as type checking.
• Use Relay as platform to develop novel optimizations such as automatic quantization.
• Execute Relay programs via a variety of backends and hardware devices.
• Demonstrate an increase in expressiveness does not come at the cost of performance.
!34
![Page 36: Relay: a high level differentiable IR · • A functional IR, an ML-like (ReasonML, OCaml, SML, …) language tailored to machine learning. • Features closures, reference, ADTs,](https://reader035.fdocuments.in/reader035/viewer/2022070713/5ed003f1f1ee1431c204c142/html5/thumbnails/36.jpg)
Conclusion
• Relay is a new intermediate representation for optimizing deep learning programs.
• We apply the straightforward insight that machine learning models are just programs.
• This generalization enables support for a greater range of programs, new optimizations, and the ability to target a wide range of devices.
• Excited about production and research collaborations.
http://sampl.cs.washington.edu
http://tvm.ai
!35