Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler...

23
Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input from Sandhya Dwarkadas, Michael Scott, Engin Ipek, and Chen Ding 1 Special thanks to feedback from NSF ACAR II Panel, “What next in Instruction-Level-Parallelism research?”

Transcript of Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler...

Page 1: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Compiler needs to drive the ISAArrvindh Shriraman

Simon Fraser University, Jan 1

with input fromSandhya Dwarkadas, Michael Scott, Engin Ipek, and Chen Ding

1

Special thanks to feedback from NSF ACAR II Panel,“What next in Instruction-Level-Parallelism research?”

Page 2: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Brief Introduction

I am a computer architectyou guys probably know more about compilers!

While many collaborators, opinions and mistakes are my own

Non Goal : What should the architecture stack should look like in the future ?

Goal : get compiler designers to actively participate in defining HW-SW interface

2

Page 3: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Outline

Inefficiency of current microprocessorsEnergy wallRISC ISAWhere does energy go ?

Big and Morphable ISA

Heterogeneous execution substrate

3

Page 4: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Energy Wall2014 2020

Total Si

Power/Si

1 4 16

1 1 0.6

Usable Si

200845nm 22nm 11nm

100% 25% 4%

Page 5: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Simple by designfocus on pipelineability and latency

Instructions encode little workno parallelism within instructionsInstruction-parallelism reaching its limits

RISC Instructions

5

Classic study: Tejas and Smith

(HW Overheads)2

Instr.Parallelism

Page 6: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

RISC Instructions (contd..)

OO HWneeds to scale resources quadraticallyneeds to reduce misses and branch penalty quadratically

Pipelining small instructions is inefficientfetch consumes 4x more energy than execute

6

10%

15%

6%

31%

38%Fetch

Pipeline

Reg.

D$

Exe

Energy

Page 7: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Current ISAs - Do we need them ?

7

Tension between hardware and software

Software allowed to provide only limited info on data types and operation semantics

Not necessarily useful for hardware eitherenergy is the #1 constraint, not area or clockHW needs to work hard to mine parallelismdoes not support energy-efficient functional units

Hardware

ApplicationsISA

Page 8: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Agenda

8

Need to perform more work per instruction

ISA needs app-specific information

Hardware needs to cut-down overheads

Page 9: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Outline

Inefficiency of current microprocessors

Dynamic Morphable ISADecoupled Multi-level ISABig InstructionsApplication-based ISA extensionsChallenges

Heterogeneous execution substrate

9

Page 10: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

De-coupled ISAs

Two-levels of ISA abstractionApp ISA for customization, more info, and portabilityHW ISA for platform-specific opt. and accelerators

Always-present virtual machinecan use rich SW program informationcan use feedback from HW for dynamic optimization

10

Hardware

Applications

Hardware ISA

Application ISACode translator and

optimizer

Page 11: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Application ISA

Big Instructionscompound instructions that encode more work

Can capture relationship between operationsprovides information about operation parallelismenables hardware to exploit parallelism

11

D1 D2 D3 D4

SSE, AVX VLIW, EPICD1 D2 D3 D4

Austin TRIPSD1 D2

D3

Page 12: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Application ISA

Custom instructionslibrary runtimes, C++ Boost, MATLAB, R, Sphinxgenerate application-specific instructions

Static instructionsmany applications have a small hot code-regions70% code shared between applications

Dynamic instructionsenable software to add new instructions to ISA90% coverage with 40,000 static instruction*

12* GreenDroid - HotChips 2010

Page 13: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Research Agenda (1/2)

What is the basic application ISAX86, LLVM, RISC, Java bytecode, Nvidia PTX ?

How to specify and modify application ISA ?pragmas, macro language ?generate from hot path ?

13

How to enable software to make use of new ISA?compiler generators ? Is that so crazy or feasible ?how to alter the backend of a compiler at runtime ?

Page 14: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Research Agenda (2/2)

Reconsider Instruction semantics do we need precise exceptions at the instruction level ?atomicity of which operation guaranteed ?major impact on hardware complexity

14

Page 15: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Example Designs

IBM System 38Application ISA : TMIImplementation ISA : IBM AS-400 decendants

Transmeta CMS (ahead of its time ?) Application ISA : x86 for compatibilityImplementation ISA: VLIW for energy efficiency

Nvidia CUDAApplication ISA : PTXImplementation ISA : Nvidia GPUs

15

Page 16: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Outline

Inefficiency of current microprocessors

Dynamic Morphable ISA

Heterogeneous execution substratesArchitectureProgram state managementExamples

16

Page 17: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Accelerator-based architecture

Dedicated specialized hardware to improve performance under same energy

Granularity 80s, floating point unit, co-processors90s-00s : Vector units.Future: GPU, DSP, and more...

17

CPU

Signal CryptoAccelerator

GPU

Processor

Page 18: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

New Challenge : Program StateMulticore CPU (e.g., Power6)

Hardware cachingFine-grained sharing

18

Accelerators (e.g, GPU)

Scratchpad memoriesCoarse-grain sharing

Minimal programmer effort

SW cannot directly optimizeSW can optimize

Weak memory models

Latency-optimized Throughput-optimizedsmall reg files (<128bytes)large caches

large reg files. (2MB) small cache all L1s = L2 = 768KB

70% of on-chip mem.software controlled

90% of on-chip mem.hardware controlled

per L1 = 64KB L2 = 4MB

Page 19: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Program state challenges

Where to stage data required by big instructions ?registers, on-chip cache, memorywhat about bandwidth ?

SW runtime to manage on-chip caches ?similar to GC, except for physical storage,on-chip memory shared by many tasksneed to organize private and shared-data

Managing program state in SW too hard ?can we afford not to, HW too reactive and unoptimal

19

Page 20: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

New Challenges: Task Mapping

20

0

2.5

5.0

7.5

10.0

0 10 20 30 40 50 60 70 80 90 100

Spe

edup

% on CPU

Adaptive Task Mapping [Micro 2010]

Matrix-Multiplication

Task : Bag of instructionsHow to choose which tasks to run on accelerator?

How to choose accelerator?

Page 21: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Example systemsFocus : design time chip generator for embedded

applications are reasonably stableprofile application and create HW for hot-code

Industry : Tensilica Systemsnew language to specify extension units and instructionsautomate integration of special-purpose HW w/ CPUgenerates compiler backend for new instructions

Academia : Conservation Cores [UCSD]generated HW from profiled hot-code in programcold-code on CPU; hot-code on specialized HW

21

Page 22: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

Summary

Decouple ISA

Application ISAHLLs can communicate more informationenable more dynamic optimizationsallow for dynamic extensions for app-customization

Heterogeneous accelerator coresnew compute units driven by software librariesSW needs to manage on-chip program-state

22

Hardware

Applications

Hardware ISA

Application ISASoftware code translator

Page 23: Compiler needs to drive the ISA - sable.mcgill.caclump/cdp2010/CASCON_Shriraman.pdf · Compiler needs to drive the ISA Arrvindh Shriraman Simon Fraser University, Jan 1 with input

23