CR18: Advanced Compilers L01 Introduction Tomofumi Yuki.

Post on 18-Jan-2016

221 views 0 download

Tags:

Transcript of CR18: Advanced Compilers L01 Introduction Tomofumi Yuki.

CR18: Advanced Compilers

L01 Introduction

Tomofumi Yuki

Myself

Tomofumi Yuki researcher at Inria

Ph.D. from Colorado State University in 2012 up to high school in Japan CSU for all of bachelor, masters, phd

Member of Compsys @ LIP compilers/languages automatic parallelization

2

This Course

Part I: High-level (loop-level) transformations parallelism data locality

Part II: High-Level Synthesis C to hardware

3

Compiler Optimizations

Low-level Optimizations register allocation instruction scheduling constant propagation ...

High-level Optimizations loop transformations coarse grained parallelism ...

4

Our focus

High-Level Optimizations

Goals: Parallelism and Data Locality

Why Parallelism?

Why Data Locality?

Why High-Level?

5

Why Loop Transformations?

The 90/10 Rule

Loop Nests hotspot of almost all programs few lines of change => huge impact natural source of parallelism

6

“90% of the execution time is spent in less than

10% of the source code”

Why Loop Transformations?

Which is faster?

7

for (i=0; i<N; i++) for (j=0; j<N; j++) for (k=0; k<N; k++) C[i][j] += A[i][k] * B[k][j];

for (i=0; i<N; i++) for (k=0; k<N; k++) for (j=0; j<N; j++) C[i][j] += A[i][k] * B[k][j];

Why is it Faster?

Hardware Prefetching

8

for (i=0; i<N; i++) for (j=0; j<N; j++) for (k=0; k<N; k++) C[i][j] += A[i][k] * B[k][j];

for (i=0; i<N; i++) for (k=0; k<N; k++) for (j=0; j<N; j++) C[i][j] += A[i][k] * B[k][j];

unchanged next col next row

unchangednext col next col

How to Automate?

The most challenging part! The same optimization doesn’t work for:

Why?

9

for (i=0; i<N; i++) for (j=0; j<N; j++) for (k=0; k<N; k++) { C1[i][j] += A1[i][k] * B1[k][j]; C2[i][j] += A2[i][k] * B2[k][j]; C3[i][j] += A3[i][k] * B3[k][j]; C4[i][j] += A4[i][k] * B4[k][j];}

It’s Not Just Transformations

Many many reasoning steps: What to apply? How to apply? When to apply? What is its impact?

Quality of the analysis: How long does it take? Can it potentially degrade performance? Provable properties (completeness, etc.)

10

Compiler Research is all about coming up with techniques/abstractions/representations to allowthe compiler to perform deep analysis.

Today’s Agenda

The Big Picture programming language compilers

Basic Concepts iteration space and loop nests polyhedral domains and functions parametric integer programming

Short history of polyhedral model

11

Compiler Advances

Old compiler vs recent compiler modern architecture different versions of gcc

How much speedup by compiler alone after 20 years of research?

12

Compiler Advances

Old compiler vs recent compiler modern architecture different versions of gcc 2x difference after 20 years (anecdotal)

Not so much?

13

Compiler Advances

Old compiler vs recent compiler modern architecture different versions of gcc 2x difference after 20 years (anecdotal)

Not so much?

14

“The most remarkable accomplishment by far of the compiler field is the widespread use of high-level languages.”

by Mary Hall, David Padua, and Keshav Pingali[Compiler Research: The Next 50 Years, 2009]

Placement of Compiler Research Part of Programming Languages

15

compiler

runtime systems program

verification

type theory

program synthesis

program analysis

program trans.

Earlier Accomplishments

Getting efficient assembly register allocation instruction scheduling ...

High-level language features object-orientation dynamic types automated memory management ...

16

New twists

New machines SIMD, IBM Cell, GPGPU, Xeon-phi

New language features even Java has lambda functions now parallelism oriented features

New types of Apps smartphones, tablets

New goals energy and security

17

Recent research topics

Parallelism multi-cores, GPUs, ... language features for parallelism

Security/Reliability verification certified compilers

Power/Energy data movement voltage scaling

18

Goals of the Compiler

Higher abstraction No more writing assemblies! enables language features

loops, functions, classes, aspects, ...

Performance while increasing productivity speed, space, energy, ... compiler optimizations

19

Personal View:Compiler is there to allow lazy

programming

Job Market

Where do they work at? IBM Mathworks amazon start-ups Apple

Many opportunities in France Mathworks @ Grenoble Many start-ups

20

Today’s Agenda

The Big Picture programming language compilers

Basic Concepts iteration space and loop nests polyhedral domains and functions parametric integer programming

Short history of polyhedral model

21

Program IR

Abstract Syntax Tree basic representation within compilers

how to inspectthe AST to determineif a loop is parallel?

22

for (i in 1..N) A[i] = B[i] + 1;

NodeForiterator=i, LB=1,

UB=N

NodeAssignment

A[i]

B[i]

1

NodeBinOpop=+Not really suitable

for high-level analysis

Extended Graphs

Completely unroll the loops

23

for (i=0; i<5; i++) for (j=1; j<4; j++) { A[i][j] = A[i][j-1] + B[i][j]; }

A[0][1] = A[0][0] + B[0][1];A[0][2] = A[0][1] + B[0][2];A[0][3] = A[0][2] + B[0][3];A[1][1] = A[1][0] + B[1][1];A[1][2] = A[1][1] + B[1][2];A[1][3] = A[1][2] + B[1][3];

....

Extended Graphs

Completely unroll the loops

The difficulty: program parameters its “easy” with DAG representation scalability issues what if parameters are not known?

24

for (i=0; i<N; i++) for (j=1; j<M; j++) { A[i][j] = A[i][j-1] + B[i][j]; }

Iteration Spaces

Need an abstraction for statement instances

25

for (i=0; i<N; i++) for (j=1; j<M; j++) { A[i][j] = A[i][j-1] + B[i][j]; }

i

j instance = integer

vector [i,j]

space = integer set 0≤i<N and 1≤j<M

Lexicographic Order

Dictionary order applied to loop nests a aaa aab aba aaaa b

Compare instances (i,j) is before(i’,j’)i<i’ or i=i’ and j<j’

26

i

j

for (i=1; i<N; i++) for (j=1; j<M; j++) S0;

What is the Polyhedral Model? It Depends (on who you ask)

If you ask me... Compiler Intermediate Representation

(IR) linear algebra based compact representation takes advantage of regularities

27

Polyhedral Representation

High-level abstraction of the program Iteration space: integer polyhedron Dependences: affine functions

Usual optimization flow 1. extract polyhedral representation 2. reason/transform the model 3. generate code in the end

28

Polyhedral Domains

Statements instances as integer polyhedra

Example: N2/2 instances of S0 Denoted as S0<i,j>

Represented as polyhedron {i,j|1≤i<N, 1≤j≤i} Geometric view

29

for (i=1; i<N; i++) for (j=1; j<=i; j++) S0;

i

j

i<N

j≤i

1≤j

1≤i

Examples (Domains)

What are the domain of these statements?

30

for (i=0; i<=N; i++) { for (j=0; 0<=M; j++) { S1; } S2;}

for (i=0; i<=N; i++) { for (j=M; j>=0; j--) { S1; }}

for (i=0; i<=N; i++) { for (j=0; j<=M; j+=2) { S1; }}

for (i=0; i<=N; i++) { for (j=0; j<=M; j++) { if (j>i) S1; }}

Z-Polyhedron

Polyhedron with holes intersection with lattices image of domain by affine function

Just a polyhedron in higher dimensional space

31

0<=i<=N and i%2=0

0<=i<=N and i=2j

i

j

2

1

Dependence Functions

Affine functions over statement instances

Dataflow (i,j→i,j+1)

Dependence (i,j→i,j-1)

32

for (i=1; i<N; i++) for (j=1; j<M; j++)S0: A[i][j] = A[i][j-1];

i

j

Dependence Functions

Dependences can be domain qualified

Dataflow if j=M-1

(i,j→i+1,1) else

(i,j→i,j+1)

33

for (i=1; i<N; i++) for (j=1; j<M; j++)S0: v++;

i

j

Composing Transformations

Key strength of the framework

35

for i for j ...

for j for i ...

for j for i’ ... for i’’ ...

T1 T2

poly poly’

loop world

abstraction

Parametric Analysis

Real-world code is filled with parameters code for NxM matrix, not 100x200

If the code is not parametric, and compilation time is not a big deal, it is an “easy” problem

Dealing with (potentially) infinitely different executions of a program

36

What is the last iteration?

Key analysis

What is the instance that last wrote to A[k]?

Can be formulated as an ILP 0<i<N, 0<j<=i, i+j=k find lexicographically maximum k many analysis questions become ILP

for regular programs

37

for (i=1; i<N; i++) for (j=1; j<=i; j++)S0: A[i+j] = ...;

Parametric Integer Programming Constraints

j≤10, i+j≤10 j-i≤N i,j≥0, N>0

Objective maximize j

Parametric Solution (0,N) if N≥10 (N,N) if N<10

38

maxim

ize

j≤10

j-i≤N

i+j≤10

Parametric Integer Programming Constraints

j≤10, i+j≤10 j-i≤N i,j≥0, N>0

Objective maximize j

Parametric Solution (0,N) if N≥10 N-j+i≥0 (N,N) if N<10 N-j+i<0

39

maxim

ize

j≤10

j-i≤N

i+j≤10

2. Create branches for each case

1. Look at the sign of constraints

Today’s Agenda

The Big Picture programming language compilers

Basic Concepts iteration space and loop nests polyhedral domains and functions parametric integer programming

Short history of polyhedral model

40

History of the Polyhedral Model Also layout for Part I of the class

Keep in mind history is not objective

41

Origins of the Polyhedral Model Two Starting Points

Loop program analysis Systems of recurrence equations

Loop-view is this loop parallel? what are the dependences?

Equational-view is this system of equations executable? how to find legal schedules?

42

Polyhedral Timeline

43

recurrence equationssystolic arrays

loop dependence analysisloop transformation

1970 1990 2000

Array Dataflow Analysis 1991

Parametric Integer Programming 1988

Scheduling

Code Generation

Memory Allocation

multi-core

GPGPU

Distributed Memory

Polyhedral Model: Short Story

44

Pluto(2008)

Cloog(2003)

Polylib, PIP(early 90s)

Multi-core

GPU

MPSoc

FPGA

VLSI

Automatic parallelization for shared and distributed

memory machines

Multi-dimensional Process Networks for System Level Design

Loop transformationsfor HLS

Multi-core era

Memory optimization for embedded multimedia

From a (very) subjective point of view … (originally by Steven Derrien)

Massively parallel Processor Arrays

Polyhedral Equational Model

Idea: Map computations to code/hardware computations specified as equations

Example: Matrix Multiply

45

for i in 0 .. P for j in 0 .. Q for k in 0 .. R C[i][j] += A[i][k] * B[k][j];

C[i,j,k] = A[i,k] * B[k,j] : if k=0 = A[i,k] * B[k,j] + C[i,j,k-1] : if k>0

C[i,j] = Σk(A[i,k]*B[k,j]);

The Connection

Array Dataflow Analysis [Feautrier 1991]

convert loops to equations limited to affine loops

domain: {[i,j,k]:0≤i≤P 0≤j≤Q ∧ ∧0≤k≤R}

dependences: S0<i,j,k> → S0<i,j,k-1> dataflow: (i,j,k→i,j,k+1)

46

for i in 0 .. P for j in 0 .. Q for k in 0 .. RS0: C[i][j] += A[i][k] * B[k][j];

Next Time

Dependence Analysis Array Dataflow Analysis Legality of transformations

47