Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space...

70
GPRM Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede First International Workshop on Heterogeneous High-performance Reconfigurable Computing (H 2 RC'15) Sunday, November 15, 2015 Austin, TX www.tytra.org.uk

Transcript of Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space...

Page 1: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

GPRM

Towards Automated Design Space Exploration

and Code Generation using

Type Transformations

S Waqar Nabi & Wim Vanderbauwhede

First International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC'15)

Sunday, November 15, 2015

Austin, TX

www.tytra.org.uk

Page 2: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Using Safe Transformations and a

Cost-Model for HPC on FPGAs

The TyTra project context

o Our approach, blue-sky target, down-to-earth target, where

we are now, how we are different

Key contributions

o (1) Type transformations to create design-variants, (2) a new

Intermediate Language, and (3) an FPGA Cost model

The cost model

o Performance and resource-usage estimates, some results

Using safe transformations and an associated light-weight cost-model opens the

route to a fully automated design-space exploration flow

Page 3: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

THE CONTEXTOur approach, blue-sky target, down-to-earth target, where we are now,

how we are different

Page 4: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Blue Sky Target

Page 5: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Blue Sky Target

Cost Model

Legacy Scientific Code

Heterogeneous HPC Target Description

Optimized HPC

solution!

The goal that keeps us motivated!

(The pragmatic target is somewhat more modest…)

Page 6: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

The Short-Term Target

Our focus is on FPGA targets, and we currently require design entry in a

Functional Language using High-Level Functions (maps, folds) [a kind of DSL]

Page 7: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

The cunning plan…

1. Use the functional programming paradigm to (auto) generate

program-variants which translate to design-variants on the

FPGA.

2. Create an Intermediate Language that:

• Is able to capture points entire design-space

• Allows a light-weight cost-model to be built around it

• Is a convenient target for front-end compiler

3. Create a light-weight cost-model that can estimate the

performance and resource-utilization for each variant.

7

A performance portable code-base that builds on a purely software programming

paradigm.

Page 8: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

And you may very well ask…

8

The jury is still out…

Page 9: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

How our work is different

Our observations on limitations of current tools and flows:

1. Design-entry in a custom high-level language which nevertheless has

hardware-specific semantics

2. Architecture of the FPGA-solution specified by programmer; compilers

cannot optimize it.

3. Solutions create soft-processors on the FPGA; not optimized for HPC

(orientation towards embedded applications)

4. Design-space exploration requires prohibitively long time

5. Compiler is application specific (e.g. DSP applications)

We are not there yet, but in principle, our approach entirely eliminates the first

four, and mitigates the fifth.

Page 10: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

KEY CONTRIBUTIONS(1) Type transformations for generating program variants, (2) a new

Intermediate Language, and (3) a light-weight Cost Model

Page 11: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

1. Type Transformations to Generate

Program Variants

Functional Programming

Types

o More general than types in C

o Our focus is on types of functions that perform array

operations

o reshape, maps and folds

Type transformations

o Can be derived automatically

o Provably correct

o Essentially reshape the arrays

A functional paradigm with high-level functions allows creation of design-variants

that are correct-by-construction.

Page 12: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Illustration of Variant Generation through

Type-Transformation

• typeA :Vect (im*jm*km) dataType --1D data

• Single execution thread

• typeB :Vect km (Vect im*jm dataType)--transformed 2D data

• (km concurrent execution threads)

• output = mappipe kernel_func input --original program

• inputTr = reshapeTo km input --reshaping data

• output = mappar (mappipe kernel_func) inputTr --new program

Simple and provably correct transformations in a high-level functional language

translates to design-variants on the FPGA.

Page 13: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

• Manage-IR

• Deals with

• memory objects (arrays)

• streams (loops over arrays)

• offset streams

• loops over work-unit

• block-memory transfers

Strongly and statically typed

All computations expressed as SSA (Single-Static Assignments)

Largely (and deliberately) based on the LLVM-IR

• Compute-IR

• Streaming model

• SSA instructions define

the datapath

2. A New Intermediate Language

Page 14: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

2. A New Intermediate Language

Page 15: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Design Space

Estimation Space

The Cost Model

3. Cost Model

Page 16: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

THE FPGA COST-MODELPerformance Estimate, Resource-utilization estiamte, Experimental

Results

Page 17: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

The Cost-Model Use-Case

17

A set of standardized experiments feed target-specific empirical data to the cost

model, and the rest comes from the IR descripition.

Page 18: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Two Types of Estimates

Resource-Utilization Estimates

o ALUTs, REGs, DSPs

Performance Estimates

o Estimating memory-access

bandwidth for specific data

patterns

o Estimating FPGA operating

frequency

18

Both estimates needed to allow compiler to choose the best design variant.

Page 19: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

1. Resource Estimates

Observation

o Regularity of FPGA fabric allows some very simple first or second order

expressions to be built up for most instructions based on a few

experiments.

Key Determinants

o Primitive (SSA) instructions used in IR of the kernel functions

o Data-types

o Structure of various functions (par, comb, par, seq)

o Control logic over-head

19

A set of one-time simple synthesis experiments on the target device helps us

create a very accurate resource-utilization cost model

Page 20: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Resource Estimates - Example

20

Integer Division

Integer Multiplication

Light-weight cost expressions associated with every legal SSA instruction in the

TyTra-IR

Page 21: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

2. Performance Estimate

Effective Work-Unit Throughput (EWUT)

o Work-Unit = Executing the kernel over the entire index-space

Key Determinants

o Memory execution model

o Sustained memory bandwidth for the target architecture and design-

variant

• Data-access pattern

o Design configuration of the FPGA

o Operating frequency of the FPGA

o Compute-bound or IO-bound?

21

Performance model is trickier, especially calculating estimates of sustained

memory bandwidth and FPGA operating frequency.

Page 22: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

2. Performance Estimate

Effective Work-Unit Throughput (EWUT)

o Work-Unit = Executing the kernel over the entire index-space

Key Determinants

o Memory execution model

o Sustained memory bandwidth for the target architecture and design-

variant

• Data-access pattern

o Design configuration of the FPGA

o Operating frequency of the FPGA

o Compute-bound or IO-bound?

22

Performance model is trickier, especially calculating estimates of sustained

memory bandwidth and FPGA operating frequency.

Page 23: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Performance Estimate

Dependence on Memory Execution Model

Time

Activity

Host

Device-DRAM

Device-DRAM

Device-Buffers

Device-Buffers

Offset-Buffers

Kernel Pipeline

Execution

Three Types of memory executions

A given design-variant can be categorized based on:

- Architectural description

- IR description

Page 24: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Performance Estimate

Dependence on Memory Execution Model

Time

Activity

Host

Device-DRAM

Device-DRAM

Device-Buffers

Device-Buffers

Offset-Buffers

Kernel Pipeline

Execution

Three Types of memory executions

A given design-variant can be categorized based on:

- Architectural description

- IR description

Page 25: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Performance Estimate

Dependence on Memory Execution Model

Time

Activity

Host

Device-DRAM

Device-DRAM

Device-Buffers

Device-Buffers

Offset-Buffers

Kernel Pipeline

Execution

Work-Unit Iterations

Type AAll iterations

Page 26: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Performance Estimate

Dependence on Memory Execution Model

Time

Activity

Host

Device-DRAM

Device-DRAM

Device-Buffers

Device-Buffers

Offset-Buffers

Kernel Pipeline

Execution

First Iteration

only

Last Iteration

only

Work-Unit Iterations

Type B

All other

iterations

Page 27: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Performance Estimate

Dependence on Memory Execution Model

Time

Activity

Host

Device-DRAM

Device-DRAM

Device-Buffers

Device-Buffers

Offset-Buffers

Kernel Pipeline

Execution

First Iteration

only

Last Iteration

only

Work-Unit Iterations

Type C

All other

iterations

Once a design-variant is categorized, performance can be estimated accordingly

Page 28: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

2. Performance Estimate

Effective Work-Unit Throughput (EWUT)

o Work-Unit = Executing the kernel over the entire index-space

Key Determinants

o Memory execution model

o Sustained memory bandwidth for the target architecture and

design-variant

• Data-access pattern

o Design configuration of the FPGA

o Operating frequency of the FPGA

o Compute-bound or IO-bound?

28

Performance model is trickier, especially calculating estimates of sustained

memory bandwidth and FPGA operating frequency.

Page 29: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Performance Estimate

Dependence on Data Access Pattern

We have defined a rho (ρ) factor defined as a scaling factor of the

peak memory bandwidth

Varies from 0-1

Based on data-access pattern

Derived empirically through one-time standardized experiments on

target node

29

Page 30: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

2. Performance Estimate

Effective Work-Unit Throughput (EWUT)

o Work-Unit = Executing the kernel over the entire index-space

Key Determinants

o Memory execution model

o Sustained memory bandwidth for the target architecture and design-

variant

• Data-access pattern

o Design configuration of the FPGA

o Operating frequency of the FPGA

o Compute-bound or IO-bound?

30

Performance model is trickier, especially calculating estimates of sustained

memory bandwidth and FPGA operating frequency.

Determined from the IR

description of design-variant

Page 31: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Performance Estimates

The Parameters and their Evaluation

31

Page 32: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Performance Estimates

Parameters from Architecture Description

32

Page 33: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Performance Estimates

Parameters Calculated Empirically

33

Page 34: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Performance Estimates

Parameters derived from IR description of Kernel

34

Page 35: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Performance Estimates

The Expressions

35

Page 36: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Performance Estimates

The Expressions

36

Page 37: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Performance Estimates

The Expressions

37

Page 38: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Performance Estimates

The Expressions

38

Page 39: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Performance Estimates

The Expressions

39

Page 40: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Performance Estimates

The Expressions

40

Page 41: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Performance Estimates

Experimental Results (Type C)

41

Estimated (E) vs actual (A) cost and throughput for C2 and

C1 configurations of a successive over-relaxation kernel. [Note that the cycles/kernel are estimated very accurately, but the Effective Workgroup Throughput

(EWGT) is off because of inaccuracy of frequency estimate for the FPGA]

Page 42: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Does the TyTra Approach Work?

Page 43: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Design-Space Exploration?

Page 44: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

CONCLUSION

Page 45: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

The route to Automated Design Space

Exploration on FPGAs for HPC Applications

The larger aim is to create a turn-key compiler for:

Legacy scientific code Heterogeneous HPC Platform

o Current focus is on FPGAs, and on using a Functional

Language design entry

Our main contributions are:

o Type transformations to create design-variants,

o New Intermediate Language, and

o FPGA Cost model

Our FPGA Cost Model

o Works on the TyTra-UIR, is light-weight, accurate (enough),

and allows us to evaluate design-variants

Using safe transformations on a functional language paradigm and a light-weight

cost-model to brings us closer to a turn-key HPC compiler for legacy code

Page 46: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

46

AcknowledgementWe wish to acknowledge support

by EPSRC through grant EP/L00058X/1.

The woods are lovely, dark and deep, But I have promises to keep,

And lines to code before I sleep, And lines to code before I sleep.

Page 47: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

EXTRAS

Page 48: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Parallel Approaches

• What we do is very similar to:

• Loop optimizations to accelerate a scientific application

• Using skeletons to create a high-level abstraction for parallel

programming

• Tools that automatically explore design-space

Page 49: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Our Approach to a Light-Weight

Cost Model An IR sufficiently low-level to expose the parameters needed for the

o The TyTra-IR has sufficient structural information to associate it directly

with resources on an FPGA

Because TyTra-IR is a customized language, we can ensure that:

o All legal instructions (and structures) have a cost associated with them

o As long as the front-end compiler can target a HLL on the TyTra-IR, we

can cost HL program variants

o Costing resources on specific FPGA devices, and estimating memory

bandwidth for various patterns on the target node, requires some

empirical data.

• We are working on creating a set of standardized experiments that

49

We are not there yet, but in principle, our approach entirely eliminates all these

limitations.

Page 50: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Quite a few avenues…• Experiment with more kernels, their program-variants, estimated vs actual costs, (correct) code-generation. Use

(CHStone) benchmarks.

• Computation-aware caches, optimized for halo-based scientific computations

• Integrate with Altera-OpenCL platform for host-device communication

• Back-end optimizations, LLVM passes, LLVM TyTra-IR translation

• Route to TyTra-IR from SAC

• Integrate Tytra-FPGA flow with SACGPU(OpenCL flow) for heterogeneous targets

• Use of Multi-party Session Types to ensure correctness of transformations• Even code-generation for clusters?

• Abstract descriptions of target hardware

• SystemC-TLM model to profile application and high-level partitioning in a heterogeneous environment

50

Page 51: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Quite a few avenues…• Experiment with more kernels, their program-variants, estimated vs actual costs, (correct) code-generation. Use

(CHStone) benchmarks.

• Computation-aware caches, optimized for halo-based scientific computations

• Integrate with Altera-OpenCL platform for host-device communication

• Back-end optimizations, LLVM passes, LLVM TyTra-IR translation

• Route to TyTra-IR from SAC

• Integrate Tytra-FPGA flow with SACGPU(OpenCL flow) for heterogeneous targets

• Use of Multi-party Session Types to ensure correctness of transformations• Even code-generation for clusters?

• Abstract descriptions of target hardware

• SystemC-TLM model to profile application and high-level partitioning in a heterogeneous environment

51

etcetera, etcetera,

etcetera

Page 52: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

The platform model

for TyTra (FPGA)

52

Page 53: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

The Manage-IR; Memory Objects

53

TyTra-IR OpenCL view LLVM-SPIR View Hardware (FPGA)

CmemConstant Memory Constant Memory 3: Constant

ImemInstruction Memory Constant Memory DistRAM / BRAM

PipememPipeline registers DistRAM

PmemPrivate Memory(Data Mem for Instruc’ Proc’) Private Memory 0: Private DistRAM

CachememData (and Constant) Cache DistRAM / BRAM

LmemLocal (shared) memory Local Memory 4: Local M20K (BRAM) or Dist RAM

GmemGlobal memory Global Memory 1: Global On-board DRAM

HmemHost memory Host Memory Host communication

Page 54: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

The Manage-IR; Stream Objects

54

Can have a 1-1 or Many-1 relation with memory

objects

Have a 1-1 relation with arguments to pipe functions

(i.e. port connections to compute-cores)

Page 55: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

The Manage-IR; repeat blocks

• Repeatedly call a kernel without referring back to the host

(outer-loop)

• May involve block memory transfers between iterations

55

Page 56: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

The Manage-IR; stream windows

• Access offsets in streams

• Use on-chip buffers for storing data read from memory

56

Page 57: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

The Compute-IR

• Structural semantics• @function_name (…args…) par

• @function_name (…args…) seq

• @function_name (…args…) pipe

• @function_name (…args…) comb

• Nesting these functions gives us the expressiveness to explore various

parallelism configurations

• Streaming ports

• Counters and nested counters

• SSA data-path instructions

57

Page 58: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Example: Simple Vector Operation

The Kernel

Page 59: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Version 1 – Single Pipeline (C2)

Page 60: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Core_Compute

addWrmul

ad

daddRd

lmem

a

lmem

b

lmem

c

lmem

y

Str

ea

m C

on

tro

l

Str

ea

m C

on

tro

l

Version 1 – Single Pipeline (C2)

Page 61: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Core_Compute

addWrmul

ad

d

Version 1 – Single Pipeline

addRd

lmem

a

lmem

b

lmem

c

lmem

y

Str

ea

m C

on

tro

l

Str

ea

m C

on

tro

l

The parser can also

automatically find ILP

and schedule in an

ASAP fashion

Page 62: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Version 2 – 4 Parallel Pipelines (C1)

Page 63: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Core_Compute

addWrmul

ad

d

Version 2 – 4 Parallel Pipelines

addRd

lmem

y

Str

ea

m C

on

tro

l

Core_Compute

addWrmul

ad

daddRd

Core_Compute

addWrmul

ad

daddRd

Core_Compute

addWrmul

ad

daddRd

lmem

a

lmem

b

lmem

c

Str

eam

Contr

ol

Page 64: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Core_Compute

addWrmul

ad

d

Version 2 – 4 Parallel Pipelines

addRd

lmem

y

Str

ea

m C

on

tro

l

Core_Compute

addWrmul

ad

daddRd

Core_Compute

addWrmul

ad

daddRd

Core_Compute

addWrmul

ad

daddRd

lmem

a

lmem

b

lmem

c

Str

eam

Contr

ol

Page 65: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Version 3 – Scalar Instruction Processor

(C4)

Page 66: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Core_Compute

Version 3 – Scalar Instruction Processor

(C4)

lmem

a

lmem

b

lmem

c

lmem

yS

tre

am

Co

ntr

ol

Str

eam

Contr

ol

PE(Instruction Processor)

{addaddmuladd

}

AL

U

The ALU would be

customized for the

instructions mapped to this

PE at compile-time

Page 67: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Core_Compute

Version 3 – Single Sequential Processor

lmem

a

lmem

b

lmem

c

lmem

yS

tre

am

Co

ntr

ol

Str

eam

Contr

ol

Generic PE

{addaddmuladd

}

AL

U

Page 68: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Version 4 – Multiple Processors /

Vectorization (C5)

Page 69: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Core

_C

om

put e

Version 4 – Multiple Processors / Vectorization

(C5)

Generic PE

{ addaddmuladd }

AL

U

lmem

a

lmem

b

lmem

c

Str

ea

m C

on

tro

l

lmem

y

Str

ea

m C

on

tro

l

Co

re_

Co

mp

ut e

Generic PE

{ addaddmuladd }

AL

U

Co

re_

Co

mp

ut e

Generic PE

{ addaddmuladd }

AL

U

Core

_C

om

put e

Generic PE

{ addaddmuladd }

AL

U

Page 70: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede

Core

_C

om

put e

Version 4 – Multiple Sequential

Processors (Vectorization)

Generic PE

{ addaddmuladd }

AL

U

lmem

a

lmem

b

lmem

c

Str

ea

m C

on

tro

l

lmem

y

Str

ea

m C

on

tro

l

Co

re_

Co

mp

ut e

Generic PE

{ addaddmuladd }

AL

U

Co

re_

Co

mp

ut e

Generic PE

{ addaddmuladd }

AL

U

Core

_C

om

put e

Generic PE

{ addaddmuladd }

AL

U

Note the continued use of stream

abstractions even through the PEs are

Instruction Processors now