L Llvm T CS 744: TVM

CS 744: TVM

Shivaram VenkataramanFall 2020

TensorVirtual Machine

x.dk! L, Llvm

ADMINISTRIVIA

- Course project titles- Project proposal aka Introduction (10/16)

IntroductionRelated WorkTimeline (with eval plan)

- Midterm: Oct 22

Assignment→

]→ 2 pagewriteup

MACHINE LEARNING: STACK

✓Distributed

noowed

Train efferent just lie↳ forward pass

→ Interplayinference

quondam \, training

makedistributedeasy ( dealing inference

groin v

Hardwareand saddle

MOTIVATION: PERFORAMNCE PORTABILITYPytoreh → model file

"intent :?÷:www.rayf/4TE-iTIyconfute primitives matrix cow

multiply ed

you want high performance I 1

across hardware backends-

Dependence onvendor specific o o • q

libraries

MLmodels evolve fast ⇒ new operators

new combination of operators Y ⇒ Notavailable

in existingvendor

libraries

→ Python code describes

ML model

→Tvm

. . )=

→ Binary file thatruns on hardware

OPTIMIZATION COMPUTATION GRAPHSOperator Fusion

Data layout

[ I¥÷÷÷¥÷÷÷÷÷:* " "÷ . :÷÷÷:- -T -

→ 1-1 operators ,"

→ Sum reduction, scaling after

↳(Spg-

Kow Major ,

column Major ,Blocked

g, Infest

teatisrepresented

2- layerNN

.in/TEHtEi/:::IS...:: :as layout

TENSOR EXPRESSION LANGUAGE

Common Arithmetic, Math operationsKnow the shape of the output and the data accessed

operator cry↳ expressed in tensor expression language

↳tensor) math operations

CODE GENERATION

Nested parallelism

Tensorization

Halide→ expression OpenMP ← gu+ of imtmhns

ead does

a.← anime fi:÷÷÷÷÷÷:÷÷i÷÷÷'

jaihe"

for i in l : "

for j . int :S

threads can use as , Bstdgmptd.im#isIL;i!dffIHdeu- poker=doopiterah#bad ,

store, add → whet is the

hardware instructionset-

= Allows you to

- - register operatorExtensible ! intrinsic

Latency HIDING

What is the goal?

Someas

Pytorchetc .

9↳ Overlap computation and

communication

Schedule thatutilizes

memorybandwidth &

compute units

ig:*year 'fad

AUTOMATING OPTIMIZATION

Goal: Create a specialized operator for input shape and layoutChallenge:

Choose appropriate schedule optimizationsTiling size, loop unrolling

Automate the optimizer!

- - - - r . lots of differentchoices&

also lots of parameters Huntsto

choose .

FimMl ? -

what configurationsI

to Try ←

ML-Based Cost model

Machine Learning Model Design ChoicesSpeed: Faster than time it takes to evaluate a configQuality: Use a rank objective to predict the relative order of runtime

Gradient tree boosting modelmemory access countreuse ratio of each memory buffer at each loop levelone-hot encoding of loop annotations

→ →n seconds

code generated

features

ML-BASED COST MODEL

IterationSelect a batch of candidates Collect data Use as training data to update the model

How to select candidates?Parallel Simulated Annealing

Start from a random configWalk to a nearby config à

Successful if cost decreases Else Reject

model perfwhen using ←config

tom >LE -y

( Cz ,20ms

→each candidate is

8ms>41

a wyignratim lahhhh

< £ , 'fail>fief a ← :

harder:c ,

→ step a)above trashy data

- toaB7✓~,

Aa ↳ → ↳'

Task model is cj better than b

Yes → go& try d

on cluster

No → generateanother

oyer config

Distributed device pool

Pool of devices to speed up profilingRPC interface to run a trial on device

Share device pools for multiple graphs

SUMMARY

TVM: Compiler for ML inference modelsSupport high performance for range of models, hardware devices

Key ideasGraph-level optimizationsTensor expression language: Code-gen, Latency hiding etcML based Cost Model for automation

→→ operator fusion

DISCUSSIONhttps://forms.gle/WiVgJ3abGXXgfBN99

Consider that you are building an optimizer for Spark programs instead of ML inference. What would be some configuration knobs that you could similarly tune? What might be different from the TVM optimizer?

Similar logic → latency hidingoverlap comp ,

communication

7rYYdimemim'operatorfmon→mapBoperahnaccess patterns

↳laa#↳ operators are

user definedchallenging ??-Partitioning → can you automate

↳ number of partitions / co- partitioning\ performance !Had .

cache →config space

Persistence → manually insert

What is your takeaway from the following figure?

→ fastingqmon

f-T" bae:Enea.:c.

honey! week,unite:b.or

NEXT STEPS

Next class: RayCourse project: Oct 16 (introductions)Midterm: Oct 22

latency hiding in spark ?

Drddlsmaf tasks > ✓

credence tasks|D÷i;:D rddz map ← Hanffiles:

Edna > ← nocomm

D wait

L Llvm T CS 744: TVM

Documents

Transcript of L Llvm T CS 744: TVM

OpenCL with LLVM - The LLVM Compiler Infrastructure Project

LLVM Intro2008/08/23 · Intro & history LLVM overview Demo Pros & Cons LLVM Intermediate Language LLVM tools なぜ LLVM か 1/2 実務的 llvm-gcc により full C のサポート

What is LLVM? And a Status Update.€¦ · Clang, LLVM, etc. LLVM/Clang is both a research platform and a production-quality compiler. LLVM is a liberally-licensed(*) infrastructure

Tvm gurukul ppt

TVM Tables

Manual Tvm

CET, TVM Broucher

Tvm w 2014

3 tvm tables

LLVM-CTH @ ISC 2020 SVE in LLVM · SVE in LLVM LLVM-CTH @ ISC 2020 Will Lovett –Principal Technical Product Owner will.lovett@arm.com @hpc_will 26thJune 2020

Tvm Manual

p5vdc Tvm Max

Bringing Clang and LLVM to Visual C++ users - LLVM Compiler

Slide 2b - Tvm

CodeCompass - LLVM

Farmacoterapia en TVM

PanasonicKX TVM Gettingstarted

Verticutter Mower TVM-3077 and TVM-5130 · Verticutter Mower TVM-3077 and TVM-5130 Locke Turf 307 Highway 52E, Opp, Alabama 36467, (334) 493-1300 Parts Manual

Collective characterization, optimization and design of ... · ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.0 Phoenix MVS XLC Open64 Jikes

Lecture 4 tvm