Formanature SARL 11 chemin du dubach 68140 MUNSTER e.mail : contact@formanature
Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM...
Transcript of Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM...
![Page 1: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/1.jpg)
▣ Naums Mogers
Christophe Dubach
ARM Research Summit, September 2019
Functional Interface forPerformance Portability on Parallel Accelerators
![Page 2: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/2.jpg)
Hardware accelerators
Architectures
Scopes
Applications
2Requirements
CPU GPU
CGRA
FPGA
ASIC
ML
Security
Server Desktop Mobile Embed
Vision
DBGraph
Energy Mem
![Page 3: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/3.jpg)
Hardware accelerators
Architectures
Scopes
Applications
3Requirements
CPU GPU
CGRA
FPGA
ASIC
ML
Security
Server Desktop Mobile Embed
Vision
DBGraph
Energy Mem
![Page 4: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/4.jpg)
Applications
Current landscape
XLA
TPU
BrainWaveISA
BrainWave
ArmNN
ARM ML
MDK
Movidius
HiAI
HiSiliconNPU
OpenCL
GPUsMulticore
CPUs
OpenMP VHDL
FPGAs
![Page 5: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/5.jpg)
XLA
TPU
BrainWaveISA
BrainWave
ArmNN
ARM ML
MDK
Movidius
HiAI
HiSiliconNPU
Applications
Image Processing
Neural Networks
Graph Analytics
Data-ParallelIntermediate Language
Performance PortableCode Generator
OpenCL
GPUsMulticore
CPUs
OpenMP VHDL
FPGAs
Domain-Specific Languages (DSLs)
Language for Parallelism
Compiler Technology
What we need
![Page 6: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/6.jpg)
6
What is the right interface for HW accelerators?
![Page 7: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/7.jpg)
7
What is the right interface for HW accelerators?
![Page 8: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/8.jpg)
Functional approach
Abstract
▣ Expresses algorithm (WHAT), not implementation (HOW)
8
Safe
▣ Easier to use and parallelise
High-level
▣ Captures plenty of algorithmic meta-info for analysis
Expressive
▣ Control flow
▣ Memory management
Pure
▣ Easy to transform
Composable
▣ Easier to maintain, code-reuse
![Page 9: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/9.jpg)
XLA
TPU
BrainWaveISA
BrainWave
ArmNN
ARM ML
MDK
Movidius
HiAI
HiSiliconNPU
Applications
Image Processing
Neural Networks
Graph Analytics
Lift: Data-ParallelIntermediate Language
Lift: Rewrite rule-based compilerto HW-specific languages
OpenCL
GPUsMulticore
CPUs
OpenMP VHDL
FPGAs
Domain-Specific Languages (DSLs)
Language for Parallelism
Compiler Technology
Lift
![Page 10: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/10.jpg)
10
Lift www.lift-project.org
▣ Functional data-parallel language and compiler
![Page 11: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/11.jpg)
Lift IR www.lift-project.org
toGlobal
toLocal
toPrivate
toVector
toScalar
add, mul
dot
tanh
Address space operators
Casters Data operators
Int, Float
Vector
Array
Data types
Map, Reduce
Zip, Split
Scatter, Gather
Slide
Algorithmic patterns
![Page 12: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/12.jpg)
12
Lift IR: Views www.lift-project.org
Reorder(stride(s)) >> Map(f)
▣ Virtual composable data layout transformations□ Reorder, Transpose, Slide, Slice, etc
▣ Expressed with Views▣ Help avoid extra memory writes
NO WRITES TO MEMORY
TRANSFORMED READS
![Page 13: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/13.jpg)
Lift IR www.lift-project.org
toGlobal
toLocal
toPrivate
toVector
toScalar
add, mul
dot
tanh
Address space operators
Casters Data operators
Int, Float
Vector
Array
Data types
Map, Reduce
Zip, Split
Scatter, Gather
Slide
Algorithmic patterns
![Page 14: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/14.jpg)
14
Lift IR www.lift-project.org
Address space operators
Casters Data operators
IR level
toGlobal
toLocal
toPrivate
toVector
toScalar
add, mul
dot
tanh
Generic
DSL
conv, lstm
blur, sharpen
select
toDRAM
toSRAM
toRegistor
MapGlobalMapLocalReduceSeq
toInt8toFloat16
VVMulMVMulVTanh
Platform-specific
Map, Reduce
Zip, Split
Scatter, Gather
Slide
Algorithmic patterns
Int, Float
Vector
Array
Data types
Vector
Matrix
Tensor
Int8Float16Float32
![Page 15: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/15.jpg)
15
How do we achieve performance portability?
![Page 16: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/16.jpg)
Lift: Rewrite Rules
▣ Express algorithmic implementation choices▣ Preserve semantic correctness▣ Leverage algorithmic info
▣ Decouples optimisation from code generation16
Split-join rule Map fusion rule GEMV rule
![Page 17: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/17.jpg)
Rewrite rules
17
Lift: Rewrite Rules
IR level
▣ Split-join rule
▣ Map fusion rule
▣ Reduce rules
▣ ...
Generic
DSL
▣ Algorithm choices for high-level primitives
▣ Precision level
▣ ...
Platform-specific
▣ Using built-ins
▣ Lowering to the platform programming model
▣ ...
![Page 18: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/18.jpg)
18
Lift: rewriting
HOW TO OPTIMISE?
![Page 19: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/19.jpg)
19
Lift: rewriting
HARD STARTING POINT
![Page 20: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/20.jpg)
20
Lift: rewriting Search
![Page 21: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/21.jpg)
21
Lift: rewriting
Exploitation
Search
Built-in primitive
![Page 22: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/22.jpg)
22
Lift: rewriting
Exploitation
Search
Built-in primitive
Parallelisation choice
![Page 23: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/23.jpg)
23
Lift: rewriting
Exploitation
Code generation
Search
Built-in primitive
Parallelisation choice
![Page 24: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/24.jpg)
Lift: Rewrite Rules
▣ Domain-specific and generic▣ Reusable▣ Provably correct▣ Self-contained, extensible
24
![Page 25: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/25.jpg)
Lift: Constraint Inference
▣ Required for valid search space generation when using tuning parameters
▣ Leverages algorithmic meta-info▣ Can express heuristics and HW restrictions
25
![Page 26: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/26.jpg)
Lift: Search Space Exploration
▣ Uniform random sampling▣ Predictor models▣ Genetic algorithms▣ …
26
![Page 27: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/27.jpg)
Lift: Research Directions
▣ Linear algebra▣ Sparse data parallelism▣ Optimising reductions▣ Stencil computations▣ 3D wave modelling▣ High-level synthesis for FPGAs▣ Machine Learning
27
![Page 28: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/28.jpg)
Lift for Machine Learning
▣ Machine Learning□ Convolution inference optimisation□ Platforms: Mali GPUs, BrainWave
28
![Page 29: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/29.jpg)
Lift for Machine Learning
▣ Machine Learning□ Convolution inference optimisation□ Platforms: Mali GPUs, BrainWave
29
![Page 30: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/30.jpg)
Lift for Machine Learning
▣ Machine Learning□ Convolution inference optimisation□ Platforms: Mali GPUs, BrainWave
30
![Page 31: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/31.jpg)
Lift for Machine Learning
Naums Mogers, PhD student, Edinburgh
How to best exploit HW accelerators?
31
Christof Schlaak, PhD student, Edinburgh
How to generate accelerator architectures?
Lu Li, Postdoctoral Researcher, Edinburgh
How to optimise the host code?How to drive the rewriting process?
Christophe Dubach, Reader, Edinburgh
All of the above
![Page 32: Naums Mogers - Functional Interface for Performance ......Naums Mogers Christophe Dubach ARM Research Summit, September 2019 Functional Interface for Performance Portability on Parallel](https://reader036.fdocuments.in/reader036/viewer/2022071114/5feeabf58ffe9054b87c2a00/html5/thumbnails/32.jpg)
32
Lift source code is published
https://github.com/lift-project/lift
References(icons) Noun Project, https://thenounproject.com(icons) Font Awesome, https://fontawesome.com
http://www.lift-project.org