CR18: Advanced Compilers L01 Introduction Tomofumi Yuki.
-
Upload
shanon-hutchinson -
Category
Documents
-
view
221 -
download
0
Transcript of CR18: Advanced Compilers L01 Introduction Tomofumi Yuki.
CR18: Advanced Compilers
L01 Introduction
Tomofumi Yuki
Myself
Tomofumi Yuki researcher at Inria
Ph.D. from Colorado State University in 2012 up to high school in Japan CSU for all of bachelor, masters, phd
Member of Compsys @ LIP compilers/languages automatic parallelization
2
This Course
Part I: High-level (loop-level) transformations parallelism data locality
Part II: High-Level Synthesis C to hardware
3
Compiler Optimizations
Low-level Optimizations register allocation instruction scheduling constant propagation ...
High-level Optimizations loop transformations coarse grained parallelism ...
4
Our focus
High-Level Optimizations
Goals: Parallelism and Data Locality
Why Parallelism?
Why Data Locality?
Why High-Level?
5
Why Loop Transformations?
The 90/10 Rule
Loop Nests hotspot of almost all programs few lines of change => huge impact natural source of parallelism
6
“90% of the execution time is spent in less than
10% of the source code”
Why Loop Transformations?
Which is faster?
7
for (i=0; i<N; i++) for (j=0; j<N; j++) for (k=0; k<N; k++) C[i][j] += A[i][k] * B[k][j];
for (i=0; i<N; i++) for (k=0; k<N; k++) for (j=0; j<N; j++) C[i][j] += A[i][k] * B[k][j];
Why is it Faster?
Hardware Prefetching
8
for (i=0; i<N; i++) for (j=0; j<N; j++) for (k=0; k<N; k++) C[i][j] += A[i][k] * B[k][j];
for (i=0; i<N; i++) for (k=0; k<N; k++) for (j=0; j<N; j++) C[i][j] += A[i][k] * B[k][j];
unchanged next col next row
unchangednext col next col
How to Automate?
The most challenging part! The same optimization doesn’t work for:
Why?
9
for (i=0; i<N; i++) for (j=0; j<N; j++) for (k=0; k<N; k++) { C1[i][j] += A1[i][k] * B1[k][j]; C2[i][j] += A2[i][k] * B2[k][j]; C3[i][j] += A3[i][k] * B3[k][j]; C4[i][j] += A4[i][k] * B4[k][j];}
It’s Not Just Transformations
Many many reasoning steps: What to apply? How to apply? When to apply? What is its impact?
Quality of the analysis: How long does it take? Can it potentially degrade performance? Provable properties (completeness, etc.)
10
Compiler Research is all about coming up with techniques/abstractions/representations to allowthe compiler to perform deep analysis.
Today’s Agenda
The Big Picture programming language compilers
Basic Concepts iteration space and loop nests polyhedral domains and functions parametric integer programming
Short history of polyhedral model
11
Compiler Advances
Old compiler vs recent compiler modern architecture different versions of gcc
How much speedup by compiler alone after 20 years of research?
12
Compiler Advances
Old compiler vs recent compiler modern architecture different versions of gcc 2x difference after 20 years (anecdotal)
Not so much?
13
Compiler Advances
Old compiler vs recent compiler modern architecture different versions of gcc 2x difference after 20 years (anecdotal)
Not so much?
14
“The most remarkable accomplishment by far of the compiler field is the widespread use of high-level languages.”
by Mary Hall, David Padua, and Keshav Pingali[Compiler Research: The Next 50 Years, 2009]
Placement of Compiler Research Part of Programming Languages
15
compiler
runtime systems program
verification
type theory
program synthesis
program analysis
program trans.
Earlier Accomplishments
Getting efficient assembly register allocation instruction scheduling ...
High-level language features object-orientation dynamic types automated memory management ...
16
New twists
New machines SIMD, IBM Cell, GPGPU, Xeon-phi
New language features even Java has lambda functions now parallelism oriented features
New types of Apps smartphones, tablets
New goals energy and security
17
Recent research topics
Parallelism multi-cores, GPUs, ... language features for parallelism
Security/Reliability verification certified compilers
Power/Energy data movement voltage scaling
18
Goals of the Compiler
Higher abstraction No more writing assemblies! enables language features
loops, functions, classes, aspects, ...
Performance while increasing productivity speed, space, energy, ... compiler optimizations
19
Personal View:Compiler is there to allow lazy
programming
Job Market
Where do they work at? IBM Mathworks amazon start-ups Apple
Many opportunities in France Mathworks @ Grenoble Many start-ups
20
Today’s Agenda
The Big Picture programming language compilers
Basic Concepts iteration space and loop nests polyhedral domains and functions parametric integer programming
Short history of polyhedral model
21
Program IR
Abstract Syntax Tree basic representation within compilers
how to inspectthe AST to determineif a loop is parallel?
22
for (i in 1..N) A[i] = B[i] + 1;
NodeForiterator=i, LB=1,
UB=N
NodeAssignment
A[i]
B[i]
1
NodeBinOpop=+Not really suitable
for high-level analysis
Extended Graphs
Completely unroll the loops
23
for (i=0; i<5; i++) for (j=1; j<4; j++) { A[i][j] = A[i][j-1] + B[i][j]; }
A[0][1] = A[0][0] + B[0][1];A[0][2] = A[0][1] + B[0][2];A[0][3] = A[0][2] + B[0][3];A[1][1] = A[1][0] + B[1][1];A[1][2] = A[1][1] + B[1][2];A[1][3] = A[1][2] + B[1][3];
....
Extended Graphs
Completely unroll the loops
The difficulty: program parameters its “easy” with DAG representation scalability issues what if parameters are not known?
24
for (i=0; i<N; i++) for (j=1; j<M; j++) { A[i][j] = A[i][j-1] + B[i][j]; }
Iteration Spaces
Need an abstraction for statement instances
25
for (i=0; i<N; i++) for (j=1; j<M; j++) { A[i][j] = A[i][j-1] + B[i][j]; }
i
j instance = integer
vector [i,j]
space = integer set 0≤i<N and 1≤j<M
Lexicographic Order
Dictionary order applied to loop nests a aaa aab aba aaaa b
Compare instances (i,j) is before(i’,j’)i<i’ or i=i’ and j<j’
26
i
j
for (i=1; i<N; i++) for (j=1; j<M; j++) S0;
What is the Polyhedral Model? It Depends (on who you ask)
If you ask me... Compiler Intermediate Representation
(IR) linear algebra based compact representation takes advantage of regularities
27
Polyhedral Representation
High-level abstraction of the program Iteration space: integer polyhedron Dependences: affine functions
Usual optimization flow 1. extract polyhedral representation 2. reason/transform the model 3. generate code in the end
28
Polyhedral Domains
Statements instances as integer polyhedra
Example: N2/2 instances of S0 Denoted as S0<i,j>
Represented as polyhedron {i,j|1≤i<N, 1≤j≤i} Geometric view
29
for (i=1; i<N; i++) for (j=1; j<=i; j++) S0;
i
j
i<N
j≤i
1≤j
1≤i
Examples (Domains)
What are the domain of these statements?
30
for (i=0; i<=N; i++) { for (j=0; 0<=M; j++) { S1; } S2;}
for (i=0; i<=N; i++) { for (j=M; j>=0; j--) { S1; }}
for (i=0; i<=N; i++) { for (j=0; j<=M; j+=2) { S1; }}
for (i=0; i<=N; i++) { for (j=0; j<=M; j++) { if (j>i) S1; }}
Z-Polyhedron
Polyhedron with holes intersection with lattices image of domain by affine function
Just a polyhedron in higher dimensional space
31
0<=i<=N and i%2=0
0<=i<=N and i=2j
i
j
2
1
Dependence Functions
Affine functions over statement instances
Dataflow (i,j→i,j+1)
Dependence (i,j→i,j-1)
32
for (i=1; i<N; i++) for (j=1; j<M; j++)S0: A[i][j] = A[i][j-1];
i
j
Dependence Functions
Dependences can be domain qualified
Dataflow if j=M-1
(i,j→i+1,1) else
(i,j→i,j+1)
33
for (i=1; i<N; i++) for (j=1; j<M; j++)S0: v++;
i
j
Composing Transformations
Key strength of the framework
35
for i for j ...
for j for i ...
for j for i’ ... for i’’ ...
T1 T2
poly poly’
loop world
abstraction
Parametric Analysis
Real-world code is filled with parameters code for NxM matrix, not 100x200
If the code is not parametric, and compilation time is not a big deal, it is an “easy” problem
Dealing with (potentially) infinitely different executions of a program
36
What is the last iteration?
Key analysis
What is the instance that last wrote to A[k]?
Can be formulated as an ILP 0<i<N, 0<j<=i, i+j=k find lexicographically maximum k many analysis questions become ILP
for regular programs
37
for (i=1; i<N; i++) for (j=1; j<=i; j++)S0: A[i+j] = ...;
Parametric Integer Programming Constraints
j≤10, i+j≤10 j-i≤N i,j≥0, N>0
Objective maximize j
Parametric Solution (0,N) if N≥10 (N,N) if N<10
38
maxim
ize
j≤10
j-i≤N
i+j≤10
Parametric Integer Programming Constraints
j≤10, i+j≤10 j-i≤N i,j≥0, N>0
Objective maximize j
Parametric Solution (0,N) if N≥10 N-j+i≥0 (N,N) if N<10 N-j+i<0
39
maxim
ize
j≤10
j-i≤N
i+j≤10
2. Create branches for each case
1. Look at the sign of constraints
Today’s Agenda
The Big Picture programming language compilers
Basic Concepts iteration space and loop nests polyhedral domains and functions parametric integer programming
Short history of polyhedral model
40
History of the Polyhedral Model Also layout for Part I of the class
Keep in mind history is not objective
41
Origins of the Polyhedral Model Two Starting Points
Loop program analysis Systems of recurrence equations
Loop-view is this loop parallel? what are the dependences?
Equational-view is this system of equations executable? how to find legal schedules?
42
Polyhedral Timeline
43
recurrence equationssystolic arrays
loop dependence analysisloop transformation
1970 1990 2000
Array Dataflow Analysis 1991
Parametric Integer Programming 1988
Scheduling
Code Generation
Memory Allocation
multi-core
GPGPU
Distributed Memory
Polyhedral Model: Short Story
44
Pluto(2008)
Cloog(2003)
Polylib, PIP(early 90s)
Multi-core
GPU
MPSoc
FPGA
VLSI
Automatic parallelization for shared and distributed
memory machines
Multi-dimensional Process Networks for System Level Design
Loop transformationsfor HLS
Multi-core era
Memory optimization for embedded multimedia
From a (very) subjective point of view … (originally by Steven Derrien)
Massively parallel Processor Arrays
Polyhedral Equational Model
Idea: Map computations to code/hardware computations specified as equations
Example: Matrix Multiply
45
for i in 0 .. P for j in 0 .. Q for k in 0 .. R C[i][j] += A[i][k] * B[k][j];
C[i,j,k] = A[i,k] * B[k,j] : if k=0 = A[i,k] * B[k,j] + C[i,j,k-1] : if k>0
C[i,j] = Σk(A[i,k]*B[k,j]);
The Connection
Array Dataflow Analysis [Feautrier 1991]
convert loops to equations limited to affine loops
domain: {[i,j,k]:0≤i≤P 0≤j≤Q ∧ ∧0≤k≤R}
dependences: S0<i,j,k> → S0<i,j,k-1> dataflow: (i,j,k→i,j,k+1)
46
for i in 0 .. P for j in 0 .. Q for k in 0 .. RS0: C[i][j] += A[i][k] * B[k][j];
Next Time
Dependence Analysis Array Dataflow Analysis Legality of transformations
47