Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly...

47
Composite - ISA Cores: Enabling Multi - ISA Heterogeneity using a Single ISA Ashish Venkat, Harsha Basavaraj , Dean Tullsen

Transcript of Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly...

Page 1: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Composite-ISA Cores: Enabling Multi-ISA Heterogeneity using a Single ISA

Ashish Venkat, Harsha Basavaraj, Dean Tullsen

Page 2: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

The Landscape of Modern Computing

Software (rapidly evolving, more complex, and diverse)2

Page 3: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Software (rapidly evolving, more complex, and diverse)Hardware (traditionally homogeneous)

The Landscape of Modern Computing

3

Area: 263 mm2

Transistors: 731 MTechnology: 45 nm

Intel Nehalem

Page 4: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Software (rapidly evolving, more complex, and diverse)Hardware (traditionally homogeneous)

The Landscape of Modern Computing

4

Area: 240 mm2

Transistors: 1.17 BTechnology: 32 nm

Intel Westmere

Page 5: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Software (rapidly evolving, more complex, and diverse)Hardware (leakage-limited era)

The Landscape of Modern Computing

5

As we continue to shrink transistors, power density will shoot up.

Power efficiency is key

Page 6: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Software (rapidly evolving, more complex, and diverse)Hardware (leakage-limited era)

The Landscape of Modern Computing

6

As we continue to shrink transistors, power density will shoot up.

Cost of Generality

Page 7: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Digital Signal

Processing

Multimedia Processing

GPU

Cryptographic Acceleration

Image Processing

High Performance

Cores

Low Power Cores

Software (rapidly evolving, more complex, and diverse)Hardware (more and more specialized)

The Landscape of Modern Computing

7

Page 8: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

8

– Domain-specific specialization:accelerate the performance of a particular class of computation

Hardware Specialization

Page 9: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

9

– Domain-specific specialization:accelerate the performance of a particular class of computation

– Microarchitectural heterogeneity:use small power-efficient and large high performance cores that cater to diverse execution characteristics

Hardware Specialization

Page 10: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

10

Hardware Specialization– Domain-specific specialization:

accelerate the performance of a particular class of computation

Intel HD Graphics AMD APU Huawei Kirin Google Cloud TPU Microsoft Catapult(CPU+GPU) (CPU+GPU) (Neural Acceleration) (ML acceleration) (Bing search acceleration)

– Microarchitectural heterogeneity:use small power-efficient and large high performance cores that cater to diverse execution characteristics

Qcomm Snapdragon Intel GoTM Samsung Exynos 7 Nvidia Tegra 3 Apple A11(A57 + A53) (Xeon + Atom) (A73 + A53) (A-9 Variable SMP) (Monsoon+Mistral)

Page 11: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Hardware Specialization vs Programmability

CPU Microarchitecture

Instruction Set Architecture (Machine Language)

Assembly Code

High Level Language Code

Algorithms

Hardware Logic (Gates/Registers)

Devices (Silicon)

Traditional Programming/Execution ModelTraditional Hardware

11

Page 12: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Hardware Specialization vs Programmability

Specialized Core-1

Digital Signal

Processing

Multimedia Processing

GPU

Cryptographic Acceleration

Image Processing

"Big” Cores

”Little” Cores

Specialized Core-2

Specialized Core-n

ISA-1 ISA-2 ISA-n

Assembly Language-1

Assembly Language-2

Assembly Language-n

High Level Language-1

High Level Language-2

High Level Language-n

. . .

Algorithms

Hardware Logic (Gates/Registers)

Devices (Silicon)

CPU Microarchitecture

Instruction Set Architecture (Machine Language)

Assembly Code

High Level Language Code

Rapidly diverging Programming/Execution ModelSpecialized Hardware

12

Page 13: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Hardware Specialization vs Programmability

Specialized Core-1

Specialized Core-2

Specialized Core-n

ISA-1 ISA-2 ISA-n

Assembly Language-1

Assembly Language-2

Assembly Language-n

High Level Language-1

High Level Language-2

High Level Language-n

. . .

Algorithms

Hardware Logic (Gates/Registers)

Devices (Silicon)

CPU Microarchitecture

Instruction Set Architecture (Machine Language)

Assembly Code

High Level Language Code

How can we benefit from more specialization while preserving our

traditional models of programming?

13

Page 14: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

14

MIPS R10000

ThumbCortex-A7

x86-64 ev6

x86-64 core-i7

Heterogeneous-ISA multicore

ARM Cortex

A15

Single-ISA heterogeneous multicore

x86-64core-i7

Homogeneous multicore

x86-64core-i7

x86-64core-i7

x86-64core-i7

*Rakesh Kumar, Keith Farkas, Norm P. Jouppi, Partha Ranganathan, Dean M. Tullsen, MICRO’03

63% speedupOR

69% energy savings with 3% performance loss*

Restricting cores to a single ISA eliminates an important

dimension of heterogeneity

ARM Cortex

A5

ARM Cortex A12

ARM CortexA9

Evolution of Architectural Heterogeneity

Same ISASame Microarchitecture

Same ISADifferent Microarchitectures

Different ISAsDifferent Microarchitectures

*Ashish Venkat, Matthew DeVuyst, Sriskanda Shamasunder, Kazem Taram, Dean M. Tullsen, ASPLOS’12, ISCA’14, ASPLOS ‘16, ISCA’18

Page 15: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Our contention is . . .

• Restricting cores to a single ISA eliminates an important dimension of heterogeneity

• ISAs are designed for different goals:– High performance (e.g., x86-64)

– Low power (e.g., ARM)

– Reduced code size - Thumb ISA saves 30% in instruction fetch energy

– Domain specific instructions

– Compute bound vs memory bound

– Instruction-level parallelism vs Data-level parallelism

15

Page 16: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Harnessing ISA Diversity (ISCA 2014)

• Exploits ISA Affinity

– Application code regions have a natural ISA preference

• Enables ISA-microarchitecture co-design

– Significant synergy in combining heterogeneous ISAs w/ heterogeneous hardware

• 21% Performance Improvement and 23% Energy Savings on average

MIPS R10000

ThumbCortex-A7

Alpha ev6

x86-64 core-i7

Page 17: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Why is cross-ISA process migration a hard problem?

• Different machine code

• Different data formats (types, widths, endianness, alignment)

• Different register sets

• Different stack frame layouts

17

MIPS R10000

ThumbCortex-A7

x86-64 ev6

x86-64 core-i7

Page 18: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Other deployment concerns

• Multi-vendor Licensing

• Legal Barriers

• Verification Costs

• Differences in ABI

• Heterogeneous Memory Consistency Models

18

Page 19: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Composite-ISA Cores: Enabling Multi-ISA Heterogeneity using a Single ISA

• Avoid multi-vendor licensing issues.

• Significantly reduces binary translation costs.

• Greater flexibility allows us to match/supersede the performance and efficiency advantages of multi-vendor ISA heterogeneity.

19

x86-custom-1

x86-custom-2

x86-custom-3

x86-custom-4

This research . . .

Page 20: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Outline

Specialized Core-1

Specialized Core-2

Specialized Core-n

ISA-1 ISA-2 ISA-n

Assembly Language-1

Assembly Language-2

Assembly Language-n

High Level Language-1

High Level Language-2

High Level Language-n

. . .

Algorithms

Hardware Logic (Gates/Registers)

Devices (Silicon)

CPU Microarchitecture

Instruction Set Architecture (Machine Language)

Assembly Code

High Level Language Code

ISA Feature Set Derivation

Compiler and Runtime Strategy

Architectural Design Space Exploration

20

Page 21: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

ISA Feature Set Derivation

• Start with a baseline (x86-like) superset ISA

• Customize along 5 different dimensions

– Register Depth

– Register Width

– Addressing Mode Complexity

– Predication

– Data-Parallel Execution

• 26 different composite ISAs

21

Page 22: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Feature Diversity: Register Depth

22

• The number of programmable registers exposed by the ISA

• Performance/Power Implications:

– Impacts a number of machine-specific and machine-independent compiler optimizations

– Increasing the register depth from 16 to 32 results in 10.3% fewer loads, 3.7% fewer stores, 3.5% fewer integer arithmetic, and 2.7% fewer branches.

– Greater register depth typically implies a larger register file

x86-64 (32 regs)

x86-64 (8 regs)

x86-64 (16 regs)

x86-64 (64 regs)

Composite-ISA heterogeneous multicore

x86-64(16 regs)

“x86-64 only” heterogeneous multicore

x86-64(16 regs)

x86-64 (16 regs)

x86-64 (16 regs)

Page 23: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Feature Diversity: Register Width

23

• Wider Types (64-bit)

– Allows access to larger virtual memory

– May allow for better register usage via sub-register coalescing (improves performance)

– Potentially larger cache working set (e.g., when pointers are members of a large structure)

• Smaller Types (32-bit)

– Require emulation of wider types (hurts performance)

– Enable compact register files (consumer 6.4% less power than a 64-bit organization)

x86(32-bit)

x86(32-bit)

x86-64 (64-bit)

x86-64 (64-bit)

Composite-ISA heterogeneous multicore

x86-64(64-bit)

“x86-64 only” heterogeneous multicore

x86-64(64-bit)

x86-64 (64-bit)

x86-64 (64-bit)

Page 24: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Feature Diversity: Addressing Mode Complexity

24

• Reduced set of addressing modes (microx86 – a RISC version of x86)

– 1:1 macro-op to micro-op encoding (simpler decoders)

– 9.8% reduction in peak power and 15.1% reduction in area

• Complete set of addressing modes (CISC x86)

– Compact code generation (fewer instruction cache accesses)

– Multiple bandwidth optimizations (micro-op cache, micro-op fusion, loop buffer, etc.)

x86(CISC)

microx86 (RISC)

x86(CISC)

microx86 (RISC)

Composite-ISA heterogeneous multicore

x86-64(CISC)

“x86-64 only” heterogeneous multicore

x86-64(CISC)

x86-64 (CISC)

x86-64 (CISC)

Page 25: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Feature Diversity: Predication

25

• Partial Predication

– x86 already implements partial predication via CMOV instructions (predicated on condition codes)

• Full Predication

– Any instruction can be predicated on any architectural register

– Enables more aggressive if-conversion (6.5% fewer branches and 0.6% more integer arithmetic)

– Allows the designer to choose simpler branch predictors in tightly power-constrained environments

x86-64 (full

predication)

x86-64 (CMOV)

x86-64 (full predication)

x86-64 (CMOV)

Composite-ISA heterogeneous multicore

x86-64(CMOV)

“x86-64 only” heterogeneous multicore

x86-64(CMOV)

x86-64 (CMOV)

x86-64 (CMOV)

Page 26: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Feature Diversity: Data-Parallel Execution

26

• microx86 cores do not implement SIMD instructions

– Saves 7.4% in peak power and 17.3% in area

– Execute a pre-compiled scalarized version when available

– Migrate to an x86 core that implements SIMD during vector phases

x86-64 (SSE)

microx86 (no-SSE)

microx86-64 (no-SSE)

microx86-64 (no-SSE)

Composite-ISA heterogeneous multicore

x86-64(SSE)

“x86-64 only” heterogeneous multicore

x86-64(SSE)

x86-64 (SSE)

x86-64 (SSE)

Page 27: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

ISA Encoding

• Standard ISA extension methodology

• Two new prefix bytes (for predication and register depth) that leverage unimplemented opcodes

• Compiler/Assembler is code density-aware27

Page 28: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Decoder Design• 4

• Impact on the x86 front end– More Prefix Decoding Logic

– Wider queues and buffers

– Wider Micro-Op Cache

– Mix of simple/complex macro-op decoders (microx86 vs x86)

• Decoder Power and Area estimates with our customizations– Pre-decoder (Full RTL Design): 0.87% increase in

peak power and 0.65% in area

– Smallest ISA (microx86-32) consumes 0.66% less peak power and 1.12% less area than x86-64

– Largest ISA (superx86) consumes 0.3% more peak power and 0.46% more area than x86-64

28

Page 29: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Outline

Specialized Core-1

Specialized Core-2

Specialized Core-n

ISA-1 ISA-2 ISA-n

Assembly Language-1

Assembly Language-2

Assembly Language-n

High Level Language-1

High Level Language-2

High Level Language-n

. . .

Algorithms

Hardware Logic (Gates/Registers)

Devices (Silicon)

CPU Microarchitecture

Instruction Set Architecture (Machine Language)

Assembly Code

High Level Language Code

ISA Feature Set Exploration

Compiler and Runtime Strategy

Architectural Design Space Exploration

29

Page 30: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Compiler Strategy

30

Vectorization

Type Legalization

Instruction Selection

Simple, Triangle, Diamond If-Conversion

Machine Code Generation (LLVM-MC)

Register Allocation

Composite-ISA Features:

• Data Parallelism: {SIMD, no SIMD}

• Register Width: {32-bit, 64-bit}

• Addressing Mode Options: {x86, microx86}

• Register Depth: {8, 16, 32, 64 registers}

• Predication: {partial (CMOV), full predication}

Composite-ISA Encoding Prefixes and Options

Page 31: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

31

Feature Affinity

Page 32: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Migration Strategy

32

application

implements features {x, y}

uses features {x, y}

implements features{w, x, y, z}

application

implements features {w, x, y, z}

uses features {x, y, z}

implements features{x, y}

Feature Upgrade

• Common Case (91.5% of migrations)

• No binary translation required

Feature Downgrade

• Minimal binary translation required

• Average Performance Impact: 0.46%

Page 33: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Outline

Specialized Core-1

Specialized Core-2

Specialized Core-n

ISA-1 ISA-2 ISA-n

Assembly Language-1

Assembly Language-2

Assembly Language-n

High Level Language-1

High Level Language-2

High Level Language-n

. . .

Algorithms

Hardware Logic (Gates/Registers)

Devices (Silicon)

CPU Microarchitecture

Instruction Set Architecture (Machine Language)

Assembly Code

High Level Language Code

ISA Feature Set Exploration

Compiler and Runtime Strategy

Architectural Design Space Exploration

33

Page 34: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

34

Design Space Exploration

AlphaThumb Alpha

x86-64

Multi-vendor heterogeneous-ISA multicore

x86-64

Homogeneous multicore

x86-64 x86-64 x86-64

x86-64

x86-64

Single-ISA heterogeneous multicore

x86-64 x86-64DSE

Micro-architectural parameters

+ISA parameters

+Budget constraints

x86 custom-1 x86

custom-3

x86custom-2

x86 custom-4

Composite-ISA heterogeneous multicore

Page 35: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

35

Design Parameter Design Choice

Execution Semantics In-order, Out-of-order

Issue Width 1, 2, 4

Branch Predictor 2-level local, gshare, tournament

Instruction Queue Size 32, 64 entries

Reorder Buffer Size 64, 128 entries

Physical Register File Configurations (96 INT, 64 FP/SIMD), (64 INT, 96 FP/SIMD)

Integer ALUs 1, 3, 6

Integer Multiply/Divide Units 1, 2

FP/SIMD ALUs 1, 2, 4

FP Multiply/Divide Units 1, 2

Load/Store Queue 16,32 entries

Instruction Cache 32KB 4-way, 64KB 4-way

Private Data Cache 32KB 4-way, 64KB 8-way

Shared Last Level (L2) cache 4-banked 4MB 4-way, 4-banked 8MB 8-way

4680 distinct single core design points and a 102.5 trillion 4-core configurations

49733 core hours on the 2 petaflop Comet Cluster at the San Diego Supercomputing Center

Design Space ExplorationChoice of micro-architectural parameters

Page 36: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

0

0.2

0.4

0.6

0.8

1

1.2

1.4

20 40 60 Unlimited

Spe

ed

up

Peak Power Budget

Homogeneous (x86-64)

Single-ISA Heterogeneous (x86-64 + Hardware Heterogeneity)

Heterogeneous-ISA (x86-64 + Alpha + Thumb + Hardware Heterogeneity)

Composite-ISA (x86-64 + Hardware Heterogeneity + Full Feature Diversity)

36

Multi-programmed Workload Throughput

We generally gain more from ISA feature diversity than hardware heterogeneity

Tight constraints Liberal constraints

Page 37: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

37

Design Space ExplorationMulti-programmed workload throughput

Benefits of composite-ISA cores come from:

• Feature affinity: different code regions have a natural affinity for one feature or another

• ISA-microarchitecture co-design: squeeze in more powerful cores into the same budget

Best Single-ISA Heterogeneous CMP

OOO Medium

End

Best Composite-ISA CMP

OOO Medium EndOOO

Medium End

InOrder

OOOMedium End

OOO Medium End

InOrder

OOO High End

Both designs are constrained at a peak power budget of 40W

Micro-architectural

diversity

Page 38: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

38

Design Space ExplorationMulti-programmed workload throughput

Benefits of composite-ISA cores come from:

• Feature affinity: different code regions have a natural affinity for one feature or another

• ISA-microarchitecture co-design: squeeze in more powerful cores into the same budget

Best Single-ISA Heterogeneous CMP

x86-64 OOO

Medium End

Best Composite-ISA CMP

x86-64 OOO Medium Endx86-64 OOO

Medium End

x86-64 InOrder

microx86OOO Medium

End

microx86OOO Medium

End

x86 InOrder

microx86OOO High

End

Both designs are constrained at a peak power budget of 40W

Diversity of Addressing

Mode Availability

Page 39: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

39

Design Space ExplorationMulti-programmed workload throughput

Benefits of composite-ISA cores come from:

• Feature affinity: different code regions have a natural affinity for one feature or another

• ISA-microarchitecture co-design: squeeze in more powerful cores into the same budget

Best Single-ISA Heterogeneous CMP

x86-64 OOO

Medium End

Best Composite-ISA CMP

x86-64 OOO Medium Endx86-64 OOO

Medium End

x86-64 InOrder

microx86-64WOOO Medium

End

microx86-32W OOO

Medium End

x86-32W InOrder

microx86-32W

OOO High End

Both designs are constrained at a peak power budget of 40W

Diversity of Register Width

Page 40: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

40

Design Space ExplorationMulti-programmed workload throughput

Benefits of composite-ISA cores come from:

• Feature affinity: different code regions have a natural affinity for one feature or another

• ISA-microarchitecture co-design: squeeze in more powerful cores into the same budget

Best Single-ISA Heterogeneous CMP

x86-64 OOO

Medium End

Best Composite-ISA CMP

x86-64 OOO Medium Endx86-64 OOO

Medium End

x86-64 InOrder

microx86-64W-32D

OOO Medium End

microx86-32W-16D

OOO Medium End

x86-32W-64DInOrder

microx86-32W-16D

OOO High End

Both designs are constrained at a peak power budget of 40W

Diversity of Register Depth

Page 41: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

41

Design Space ExplorationMulti-programmed workload throughput

Benefits of composite-ISA cores come from:

• Feature affinity: different code regions have a natural affinity for one feature or another

• ISA-microarchitecture co-design: squeeze in more powerful cores into the same budget

Best Single-ISA Heterogeneous CMP

x86-64 OOO

Medium End

Best Composite-ISA CMP

x86-64 OOO Medium Endx86-64 OOO

Medium End

x86-64 InOrder

microx86-64W-32D-Partial

OOO Medium End

microx86-32W-16D-

Partial OOO Medium End

x86-32W-64D-Full InOrder

microx86-32W-16D-

PartialOOO High

End

Both designs are constrained at a peak power budget of 40W

Diversity of Predication

Support

Page 42: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

42

Design Space ExplorationMulti-programmed workload throughput

Benefits of composite-ISA cores come from:

• Feature affinity: different code regions have a natural affinity for one feature or another

• ISA-microarchitecture co-design: squeeze in more powerful cores into the same budget

Best Single-ISA Heterogeneous CMP

x86-64 OOO

Medium End

Best Composite-ISA CMP

x86-64 OOO Medium Endx86-64 OOO

Medium End

x86-64 InOrder

microx86-64W-32D-PartialOOO Medium

End

microx86-32W-16D-

Partial OOO Medium End

x86-32W-64D-Full InOrder

microx86-32W-16D-

PartialOOO High

End

Both designs are constrained at a peak power budget of 40W

Full Feature Set Diversity

Page 43: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Feature Sensitivity

43

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

7.00%

8.00%

8 16 32 64 32 64 microx86 x86 Partial Full

Register Depth Register Width InstructionComplexity

Predication

Pe

rfo

rman

ce D

egr

adat

ion

Feature Constraint

The best performing designs typically employ most features.

Page 44: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Processor Transistor Investment

44

0

0.2

0.4

0.6

0.8

1

1.2

1.4

8 16 32 64 32 64 microx86 x86 Partial Full

Register Depth Register Width InstructionComplexity

Predication FullFeatureDiversity

Are

a (N

orm

aliz

ed

to

Fu

ll Fe

atu

re D

ive

rsit

y)

Feature Constraint

Fetch Decode Branch Predictor Scheduler Register File Functional Units

Page 45: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

45

31% energy savings and 35% reduction in EDP at ZERO performance loss

We gain performance and save energy simultaneously

Multi-programmed Workload Efficiency

0

0.2

0.4

0.6

0.8

1

1.2

20 40 60 Unlimited

No

rmal

ize

d E

DP

Peak Power Budget

Homogeneous (x86-64)

Single-ISA Heterogeneous (x86-64 + Hardware Heterogeneity)

Heterogeneous-ISA (x86-64 + Alpha + Thumb + Hardware Heterogeneity)

Composite-ISA (x86-64 + Hardware Heterogeneity + Full Feature Diversity)

Page 46: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Composite-ISA Cores: Enabling Multi-ISA Heterogeneity using a Single ISA

• Effectively avoids multi-vendor licensing issues, verification, binary translation costs

• Gives the processor designer and the compiler a rich set of ISA feature options

• Greater flexibility allows us to match/supersede the performance and efficiency advantages of multi-vendor ISA heterogeneity.

46

x86-custom-1

x86-custom-2

x86-custom-3

x86-custom-4

In summary . . .

Page 47: Composite-ISA Cores: Enabling Multi-ISA Heterogeneity ...av6ds/talks/hpca2019.pdfSoftware (rapidly evolving, more complex, and diverse) 2. ... Intel HD Graphics AMD APU Huawei Kirin

Composite-ISA Cores: Enabling Multi-ISA Heterogeneity using a Single ISA

Ashish Venkat, Harsha Basavaraj, Dean Tullsen