Advanced Compilers Code Generation - CNUplas.cnu.ac.kr/courses/2017f/a_compilers/ac 7 code...
Transcript of Advanced Compilers Code Generation - CNUplas.cnu.ac.kr/courses/2017f/a_compilers/ac 7 code...
Backend of Compilers
Machine-independent
Optimization
Instruction Scheduling
Register Allocation
Instruction Selection
Machine Code Emission/Opti
Machine
-independent
Optimization
Virtual to physical
Mapping /
Machine-dependent
Optimization
Backend = Code generation + Optimization
Code Generation
• Storage Management
• Exception Handling
• Instruction Selection
• Register Allocation
3
Management of Storage
• In compiler
generated machine
codes, memory
management codes
play critical roles.
#include <stdio.h>
void main(){
int i;
printf(“Hello, CSE!\n”);
}
5
.file "s09.c".section .rodata
.LC0:.string "Hello, CSE!".text
.globl main.type main, @function
main:pushl %ebpmovl %esp, %ebpsubl $20, %espmovl $.LC0, (%esp)call putsaddl $20, %esppopl %ebpret.size main, .-main
allocate stack memory for int i, and the string parameter
compile
2 Classes of Storage in Process
• Registers
– Fast access
– Invisible for users (programmers) in most cases
– NO indirect access is allowed
• Memory
– (relatively) Slow access, indirect accesses are allowed
– Candidates: Globals/statics, Composite types (structs, arrays..) ,
Variables accessed via ‘&’ operator
*Whether a variable is translated as a register or a memory variable should be
determined in the middle of HIR to LIR translation
4 Categories of Memory
• Code space : an area of memory for instruction sequence
– read-only, if possible
• Static (or Global) – an area of memory for a set of variables
with the same life time as the program
• Stack – an area of memory for a set of local variables (with
“block” life time)
• Heap – an area of dynamically allocated memory by System
calls (via malloc, new, etc.)
Memory Organization
Code
Static Data
Stack
Heap
. . .
code, static data :
fixed sizes
(by the compiler)
stack, heap :
variable sizes at runtime
Stack: grows upward
Heap; grows downward
The relative positions of
stack/heap might be switched
Memory Organization
Code
Static Data
Stack
Heap
. . .
code, static data :
fixed sizes
(by the compiler)
stack, heap :
variable sizes at runtime
Stack: grows upward
Heap; grows downward
The relative positions of
stack/heap might be switched
• Run-time stack
– A stack made of frames
• one frame (or an activation record) for each function call
– Activation record : execution environment for execution of a
corresponding function
• Each call has one frame even for recursive calls
• contents: local variables, arguments, return values, other temporary
storage ...
• Heap allocation
– a contiguous portion of the global area, returned from OS
– operations for memory-request and memory-return during the program
execution are necessary,
– otherwise, garbage collection should be supported in the programming
language
• keep available memory categorized into free section and in-use
section (see OS textbook!)
Initial Stack Frame (startup state)
• Command line arguments– argc, argv
• Environment variables (env)
NULL end of environment (integer)
env[0]
env[1]
…
env[n]
environment variables (pointers)
NULL end of args (integer)
argv[1]
…
argv[arc-1]
program args (pointer)
argv[0] program name (poiner)
argc argument counter (integer)
<Initial stack layout for ELF binaries>A figure in http://asm.sourceforge.net/articles/startup.html#st used
with some modificcation
top
bottom
address
decreasing
Runtime Layout
(ELF)
system
env / argv / argc
stack stack frame for main()
… available for stack growth
shared malloc.o (lib*.so)
library printf.o (lib*.so)
… available for heap
heap heap
(malloc(), calloc(), new)
data int x; (global var)
int y = 100; (global var)
xx.o (lib*.a)
xxx.o (lib*.a)
text file.o
(a.out) main.o …func(72,73);…
crt0.o (startup routine)13
higher
addr.
lower
addr.
ebp of main()
stack pointer : esp
library functions
(dynamically linked)
library functions
(static linked)
what existed from before loading
.text, .data ..
What if main() calls
function func(72, 73)?
Runtime Layout
(ELF)
system
env / argv / argc
stack stack frame for main()
stack frame for func()
… available for stack growth
shared malloc.o (lib*.so)
library printf.o (lib*.so)
… available for heap
heap heap
(malloc(), calloc(), new)
data int x; (global var)
int y = 100; (global var)
xx.o (lib*.a)
xxx.o (lib*.a)
text file.o
(a.out) main.o …func(72,73);…
crt0.o (startup routine)14
higher
addr.
lower
addr.
ebp of main()
stack pointer : esp ; while executing func()
library functions
(dynamically linked)
library functions
(static linked)
what existed from before loading
.text, .data ..
What if main() calls
function func(72, 73)?
ebp of func()
Functions and Run-time stacks
• Call/return of a function and run-time stack operation– when f is called, push f’s frame to RT stack
– when f is returned, pop-up f’s frame from RT stack
– Top frame = frame of the function currently being executed
• How to access the top frame?
– Stack pointer (esp): top position of the frame
– Base pointer (ebp): base position of the frame
– A local variable is accessed via its offset from FP (or SP)
Role of Compiler
“to generate codes which force the above system”
15
When main()calls func(72, 73)
func(
int x,
int y)
{
int a;
int b[3];
…
16
stack system
env / argv / argc
main()’s local variables
+12 73 y
+8 72 x
+4 ra return address
0 mpf caller’s frame pointer
-4 garbage a
-8 garbage b[2]
-12 garbage b[1]
-16 garbage b[0]
… available for stack
growth
frame formain()
frame forfunc()
ebp
mpf
esp
Variables and Argumentsfunc(72, 73)
func(
int x,
int y)
{
int a;
int b[3];
…
17
stack system
env / argv / argc
main()’s local variables
+12 73 y
+8 72 x
+4 ra return address
0 mpf caller’s frame pointer
-4 garbage a
-8 garbage b[2]
-12 garbage b[1]
-16 garbage b[0]
… available for stack
growth
frame formain()
frame forfunc()
ebp
mpf
esp
[ebp+4] : return address
[ebp+8] : 72, that is, x
[ebp+12] : 73, that is, y
[ebp] : main()’s ebp
a : [ebp-4]
b[1] : [ebp-12]
Variables and Argumentsfunc(72, 73)
func(
int x,
int y)
{
int a;
int b[3];
…
18
stack system
env / argv / argc
main()’s local variables
+12 73 y
+8 72 x
+4 ra return address
0 mpf caller’s frame pointer
-4 garbage a
-8 garbage b[2]
-12 garbage b[1]
-16 garbage b[0]
… available for stack
growth
frame formain()
frame forfunc()
ebp
mpf
esp
push 73 ; y
push 72 ; x
call func ;
char w;int x[3]char y;short z;
• In most cases, a variable is aligned based on its size
eg. C/C++ : char byte aligned, short halfword aligned, int
word aligned
eg.
char w 1 byte
x[3] 12 bytes, starting at a word aligned address
(3 empty bytes between w and x)
char y 1byte, starting at any address
short z 2 bytes, starting at a halfword aligned address
(1 empty byte between y and z)
Total size = 20 bytes!
Alignment
Alignment of Structures
struct {
char w;
int x[3]
char y;
short z;
}
eg. the largest field is int (4 bytes)
size of the struct : a multilcation of 4
starting address of the struct : also a multiplication of 4
“word aligned”
fields in struct : align to the largest field size
Example. GCC-x86 4.7.x
21
#include <stdio.h>
void main(){
int i;
printf(“Hello,CSE!\n”);
}
21
.file "s09.c".section .rodata
.LC0:.string "Hello, CSE!".text
.globl main.type main, @function
main:pushl %ebpmovl %esp, %ebpsubl $20, %espmovl $.LC0, (%esp)call putsaddl $20, %esppopl %ebpret.size main, .-main
allocate stack memory for int i, and the string parameter
compile
16 for i,
because of their own alignment policy
16 for i and 4 for the string pointer
in gcc-x86
Exceptions
• Exception is for error-handling
– invalid input
– invalid resource state
– file not exists, network error, …
– erroraneous execution condition
• divide-by-zero, …
• In real production code, error-handling code may be a
large part (30%-50% or more)
23
C++
#include <iostream>
#include <fstream>
using namespace std;
int main () {
ifstream file;
//Set the state flags for which a failure exception is thrown.
file.exceptions ( ifstream::failbit | ifstream::badbit );
try {
file.open ("test.txt");
while (!file.eof()) file.get();
} catch (ifstream::failure e) {
cout << "Exception opening/reading file";
}
file.close();
24
class ios_base::failure : public exception {// the exceptions thrown by the elements of // the standard input/output library
public: explicit failure (const string& msg); virtual ~failure(); virtual const char* what() const noexcept;
}
flag values of std::ios_base::iostate
eofbitfailbitbadbitgoodbit
Java
InputStream input = null;
try{
input = new FileInputStream("c:\\data\\input-text.txt");
int data = input.read();
while(data != -1) {
//do something with data...
doSomethingWithData(data);
data = input.read();
}
}catch(IOException e){
//do something with e... log, perhaps rethrow etc.
} finally {
if(input != null) input.close();
}
25
Note : C++ does not support 'finally' blocks.
Throw
int main () {
try
{
throw 20;
} catch (int e)
{
cout << "An exception occurred. Exception Nr. "
<< e << endl;
}
return 0;
}
26
An exception occurred.
Exception Nr. 20
Chaining
InputStream input = null;
try{
input = new FileInputStream("c:\\data\\input-text.txt");
int data = input.read();
while(data != -1) {
//do something with data...
doSomethingWithData(data);
data = input.read();
}
}catch(IOException e){
throw new MyException();
}
27
What Should Do for An Exception Occurs
28
try {
f(1);
Object x;
g(2);
}catch (Exc) {
// handler
}
goto handler A
destroy x
+ goto handler A
handler A:
catches type “Exc”
Note: “Try” can be nested, so the handlers are organized in a stack
when an exception occurs
when an exception occurs
Basic Exception Handling Mechanism
1 Setjmp/longjmp-based
– global “goto”
– C’s primitive exception
2 Table-driven method
– more complex and more space usage
– but faster
29
1 Setjmp/longjmp
#include < setjmp.h >
main() {
jmp_buf env; int i;
i = setjmp(env);
printf("i = %d\n", i);
if (i != 0) exit(0);
longjmp(env, 2);
printf("get printed?\n");
}
• First, we call setjmp(), and it returns 0.
• Then we call longjmp() with a value of 2, which causes the code to return
from setjmp() with a value of 2.
– That value is printed out, and the code exits. (“get printed?” will not be printed)
30
$ sj1 i = 0 i = 2 $ _
• setjmp() : save the contents of the
registers
• longjmp() : restore them later.
``returns'' to the state of the program
when setjmp() was called.
Setjmp/longjmp Approach
buffer buf;
void f() {
if (0 == setjmp(buf))
g();
}
void g() {
h();
}
void h() {
longjmp(buf, 1);
}
31
struct context
{
int ebx;
int edi;
int esi;
int ebp;
int esp;
int eip;
};
typedef struct context buffer[1];
Setjmp/longjmp Approach Conts'
32
buffer buf;
void f () {
if (0==setjmp (buf))
g ();
else
k();
}
void g () {
h ();
}
void h () {
longjmp (buf, 1);
}
try ..catch
Save the context before try block
This context also calls handler
throws
Fetch the handler, restore machine states and jump to the handler’s code
Handle exception with k()
2 Table Driven Approach
33
• Table 1 : Each throw point to its action table
– from the program counter (PC) at the point where the exception is
thrown
– to an action table
• Table 2 : Action table
– perform the various operations required for exception processing
eg.
• invoking destructors
• adjusting the stack
• matching the exception type to the address of an exception handling
Discussions
• All variables that are declared outside the try block have to be
restored to their initial value
Lecture s = new Lecture();
// s.lecturer is assumed initially null
try {
s.lecturer = new ThatMan();
FileInputStream(); // exception!
// s.lecturer (in memory) should be restored
...
} catch (IOException e) {...}
34
Discussions
• Setjmp/longjmp approach
– setjump should be called at the beginning of every try
block even if no exception is ever thrown
– list of buf must be maintained
– list of objects on the stack must be maintained (in C++)
35
Discussions Conts’
• Table driven approach Mostly used
– Significantly more efficient than setjmp/longjmp approach
– Table themselves have to encode a lot of possible actions
• Space problem
• Reorganizing the code implies reorganizing the table accordingly
• Vulnerable to attack
• Compiler optimization should not be allowed
void f(){
int x = 0; // dead code, but cannot be optimized out
try { x = f1(x); … } catch (…) { cout <<“…”;}
}
36
Low-level, Tree-basedIntermediate Representation
Tree-based IR
– With abstract machine instructions
– used in machine code generation
eg) from Tiger book
41
PLUS
e
CONST
c
BINOP
MEM
e + c
cf. RTL
Tree-basedIntermediate Representation
42
MEM(e) : this means the value of one word of memory starting at the
address e. When this is used at left-hand side of MOVE, it is interpreted as
store, otherwise it means a “fetch” operation
TEMP(t) : register t
SEQ (s1, s2) : after evaluation of statement s1, statement s2 is evaluated
ESEQ(s,e) : statement s evaluated for side effects and then e is evaluated
for a result
BINOP(o, e1, e2) : o is a binary operator like PLUS and MINUS.
The result is the evaluation of o with e1 and e2 as operands
This result is saved in memory and the address is returned
const(i) : integer constant i
from Tiger book
Simple Equivalence Relationships
We can choose one among the sub-trees of the same semantics
43
s1
s2 e
ESEQ
ESEQ
e
s1 s2
SEQ
ESEQ
44
op
e2
ESEQe1
BINOP
s
MOVE ESEQ
ESEQ
s
op e2TEMP
BINOPe1 TEMP
t
t
op
e2
ESEQe1
BINOP
s
s
e2
BINOP
e1
ESEQ
op
op
e1
e2ESEQ
BINOP
s
More Instruction Selection (Option1)
45
MEM MEM
MOVE
PLUS CONSTTEMP
BINOPBINOP
PLUS
fp x
MEMBINOP
MULT TEMP
i
CONST
4
PLUS CONSTTEMP
BINOP
fp a
More Instruction Selection (Option2)
46
MEM MEM
MOVE
PLUS CONSTTEMP
BINOPBINOP
PLUS
fp x
MEMBINOP
MULT TEMP
i
CONST
4
PLUS CONSTTEMP
BINOP
fp a
Equivalence of The Machine Codes
LOAD r1 M[fp+a]
ADDI r2 r0 + 4
MUL r2 ri r2
ADD r1 r1 + r2
LOAD r2 M[fp+x]
STORE M[r1+0] r2
47
LOAD r1 M[fp+a]
ADDI r2 r0 + 4
MUL r2 ri r2
ADD r1 r1 + r2
LOAD r2 fp + x
STORE M[r1] M[r2]
Operand in Low Level IR Review
• Operands
– Virtual registers
• We assume infinitely many virtual registers
– Special registers – stack pointer, pc, …
– Literals
• We assume there is no limits of values of literals
– Symbolic names – in most cases, “labels”
49
Register Allocation
• Motivation
– Virtual register (VR)
• Although we assume infinitely many virtual registers
– The number of actual registers is finite, and various from machine to
machine
• Register allocation
– Put as many as VRs to physical registers, and allocate the remained
VRs to memory
– Optimization for the best performance : put frequently used VRs to
physical registers
– Spilling : allocating virtual registers to memory, inevitably
Interference
• Interference : two different definitions have a common operations in their live
ranges
– Live range : generated from liveness analysis and reaching definition analysis
• Interference graph
– Nodes of the graph = variables
– Edges : linked if two nodes interfere each other
51
a
cb
For def1 a = {1,2,3,4,5}
For def2 b = {2,3}
For def4 c = {4,5}
1: a = 0
2: b = a
3: b*b
4: c = 2
5: a*c+3
examples and materials from Princeton Univ.
Graph Coloring
• Graph Coloring
– Used to allocate virtual registers (that is, variables) to physical registers
– “Linked nodes should be painted in different colors”
– Simple example:
• Two registers : 2-coloring (two colors)
a
cb
eax
ebx
color register
1: a = 0
2: b = a
3: b*b
4: c = 2
5: a*c+3
K-Graph Coloring Algorithm
• Kempe’s algorithm [1879] --- Old problem
• Step 1 (simplify) Find a node linked with less than k edges, and cut that
node with the edges linked to it
– save these to a stack
• Step 2 (color) if a remaining graph is a simplied subgraph and
can be k-graphed colored
– pop a node (and all the related edges pushed together) from the stack,
– and color the node in different colors from all the neighbor nodes
• Step 3 (Spill) –optional If failed with above the algorithm
– Actually Step1~step2 is not applicable to many cases
• Graph coloring is NP-complete problem
– Solution : select several (victim) variables and allocate them to memory
53
Case of Step 3(1)
• Some lucky cases!
b
ed
eax
ebx
color register
a
c
stack:d
all nodes have
2 neighbours!
Case of Step 3 (2)
b
ed
a
c
• But there exist graphs where coloring with only k colors is
impossible
spilling!
b
ed
a
c
no colors left for e or a !
Spilling code
• Code rewriting
– Introduce new temporary, and rewrite codes
• eg. Assuming that t2 is supposed to be spilled
Then, ‘add t1, t2’ will be;
– “define a memory area bound to to-be-spilled variables”
(here, t2)
eg. [ebp-24] in runtime stack
– and “introduce a new temporary variable” (here, t35)
• mov t35, [ebp – 24]
• add t1, t35
note : t35’s live range is very short (one or two commands)
so possibility of interference is very low (much less than t2)