Advanced Compilers Code Generation - CNUplas.cnu.ac.kr/courses/2017f/a_compilers/ac 7 code...

58
1 Advanced Compilers Code Generation Fall. 2017 Chungnam National Univ. Eun-Sun Cho

Transcript of Advanced Compilers Code Generation - CNUplas.cnu.ac.kr/courses/2017f/a_compilers/ac 7 code...

1

Advanced CompilersCode Generation

Fall. 2017

Chungnam National Univ.

Eun-Sun Cho

Backend of Compilers

Machine-independent

Optimization

Instruction Scheduling

Register Allocation

Instruction Selection

Machine Code Emission/Opti

Machine

-independent

Optimization

Virtual to physical

Mapping /

Machine-dependent

Optimization

Backend = Code generation + Optimization

Code Generation

• Storage Management

• Exception Handling

• Instruction Selection

• Register Allocation

3

Storage Management

4

Management of Storage

• In compiler

generated machine

codes, memory

management codes

play critical roles.

#include <stdio.h>

void main(){

int i;

printf(“Hello, CSE!\n”);

}

5

.file "s09.c".section .rodata

.LC0:.string "Hello, CSE!".text

.globl main.type main, @function

main:pushl %ebpmovl %esp, %ebpsubl $20, %espmovl $.LC0, (%esp)call putsaddl $20, %esppopl %ebpret.size main, .-main

allocate stack memory for int i, and the string parameter

compile

2 Classes of Storage in Process

• Registers

– Fast access

– Invisible for users (programmers) in most cases

– NO indirect access is allowed

• Memory

– (relatively) Slow access, indirect accesses are allowed

– Candidates: Globals/statics, Composite types (structs, arrays..) ,

Variables accessed via ‘&’ operator

*Whether a variable is translated as a register or a memory variable should be

determined in the middle of HIR to LIR translation

4 Categories of Memory

• Code space : an area of memory for instruction sequence

– read-only, if possible

• Static (or Global) – an area of memory for a set of variables

with the same life time as the program

• Stack – an area of memory for a set of local variables (with

“block” life time)

• Heap – an area of dynamically allocated memory by System

calls (via malloc, new, etc.)

Memory Organization

Code

Static Data

Stack

Heap

. . .

code, static data :

fixed sizes

(by the compiler)

stack, heap :

variable sizes at runtime

Stack: grows upward

Heap; grows downward

The relative positions of

stack/heap might be switched

Executable Formats

ELF

(Executable and Linkable Format)9

Windows PE(Portable Excutable)

Memory Organization

Code

Static Data

Stack

Heap

. . .

code, static data :

fixed sizes

(by the compiler)

stack, heap :

variable sizes at runtime

Stack: grows upward

Heap; grows downward

The relative positions of

stack/heap might be switched

• Run-time stack

– A stack made of frames

• one frame (or an activation record) for each function call

– Activation record : execution environment for execution of a

corresponding function

• Each call has one frame even for recursive calls

• contents: local variables, arguments, return values, other temporary

storage ...

• Heap allocation

– a contiguous portion of the global area, returned from OS

– operations for memory-request and memory-return during the program

execution are necessary,

– otherwise, garbage collection should be supported in the programming

language

• keep available memory categorized into free section and in-use

section (see OS textbook!)

Initial Stack Frame (startup state)

• Command line arguments– argc, argv

• Environment variables (env)

NULL end of environment (integer)

env[0]

env[1]

env[n]

environment variables (pointers)

NULL end of args (integer)

argv[1]

argv[arc-1]

program args (pointer)

argv[0] program name (poiner)

argc argument counter (integer)

<Initial stack layout for ELF binaries>A figure in http://asm.sourceforge.net/articles/startup.html#st used

with some modificcation

top

bottom

address

decreasing

Runtime Layout

(ELF)

system

env / argv / argc

stack stack frame for main()

… available for stack growth

shared malloc.o (lib*.so)

library printf.o (lib*.so)

… available for heap

heap heap

(malloc(), calloc(), new)

data int x; (global var)

int y = 100; (global var)

xx.o (lib*.a)

xxx.o (lib*.a)

text file.o

(a.out) main.o …func(72,73);…

crt0.o (startup routine)13

higher

addr.

lower

addr.

ebp of main()

stack pointer : esp

library functions

(dynamically linked)

library functions

(static linked)

what existed from before loading

.text, .data ..

What if main() calls

function func(72, 73)?

Runtime Layout

(ELF)

system

env / argv / argc

stack stack frame for main()

stack frame for func()

… available for stack growth

shared malloc.o (lib*.so)

library printf.o (lib*.so)

… available for heap

heap heap

(malloc(), calloc(), new)

data int x; (global var)

int y = 100; (global var)

xx.o (lib*.a)

xxx.o (lib*.a)

text file.o

(a.out) main.o …func(72,73);…

crt0.o (startup routine)14

higher

addr.

lower

addr.

ebp of main()

stack pointer : esp ; while executing func()

library functions

(dynamically linked)

library functions

(static linked)

what existed from before loading

.text, .data ..

What if main() calls

function func(72, 73)?

ebp of func()

Functions and Run-time stacks

• Call/return of a function and run-time stack operation– when f is called, push f’s frame to RT stack

– when f is returned, pop-up f’s frame from RT stack

– Top frame = frame of the function currently being executed

• How to access the top frame?

– Stack pointer (esp): top position of the frame

– Base pointer (ebp): base position of the frame

– A local variable is accessed via its offset from FP (or SP)

Role of Compiler

“to generate codes which force the above system”

15

When main()calls func(72, 73)

func(

int x,

int y)

{

int a;

int b[3];

16

stack system

env / argv / argc

main()’s local variables

+12 73 y

+8 72 x

+4 ra return address

0 mpf caller’s frame pointer

-4 garbage a

-8 garbage b[2]

-12 garbage b[1]

-16 garbage b[0]

… available for stack

growth

frame formain()

frame forfunc()

ebp

mpf

esp

Variables and Argumentsfunc(72, 73)

func(

int x,

int y)

{

int a;

int b[3];

17

stack system

env / argv / argc

main()’s local variables

+12 73 y

+8 72 x

+4 ra return address

0 mpf caller’s frame pointer

-4 garbage a

-8 garbage b[2]

-12 garbage b[1]

-16 garbage b[0]

… available for stack

growth

frame formain()

frame forfunc()

ebp

mpf

esp

[ebp+4] : return address

[ebp+8] : 72, that is, x

[ebp+12] : 73, that is, y

[ebp] : main()’s ebp

a : [ebp-4]

b[1] : [ebp-12]

Variables and Argumentsfunc(72, 73)

func(

int x,

int y)

{

int a;

int b[3];

18

stack system

env / argv / argc

main()’s local variables

+12 73 y

+8 72 x

+4 ra return address

0 mpf caller’s frame pointer

-4 garbage a

-8 garbage b[2]

-12 garbage b[1]

-16 garbage b[0]

… available for stack

growth

frame formain()

frame forfunc()

ebp

mpf

esp

push 73 ; y

push 72 ; x

call func ;

char w;int x[3]char y;short z;

• In most cases, a variable is aligned based on its size

eg. C/C++ : char byte aligned, short halfword aligned, int

word aligned

eg.

char w 1 byte

x[3] 12 bytes, starting at a word aligned address

(3 empty bytes between w and x)

char y 1byte, starting at any address

short z 2 bytes, starting at a halfword aligned address

(1 empty byte between y and z)

Total size = 20 bytes!

Alignment

Alignment of Structures

struct {

char w;

int x[3]

char y;

short z;

}

eg. the largest field is int (4 bytes)

size of the struct : a multilcation of 4

starting address of the struct : also a multiplication of 4

“word aligned”

fields in struct : align to the largest field size

Example. GCC-x86 4.7.x

21

#include <stdio.h>

void main(){

int i;

printf(“Hello,CSE!\n”);

}

21

.file "s09.c".section .rodata

.LC0:.string "Hello, CSE!".text

.globl main.type main, @function

main:pushl %ebpmovl %esp, %ebpsubl $20, %espmovl $.LC0, (%esp)call putsaddl $20, %esppopl %ebpret.size main, .-main

allocate stack memory for int i, and the string parameter

compile

16 for i,

because of their own alignment policy

16 for i and 4 for the string pointer

in gcc-x86

22

Exception Handling Codes

Exceptions

• Exception is for error-handling

– invalid input

– invalid resource state

– file not exists, network error, …

– erroraneous execution condition

• divide-by-zero, …

• In real production code, error-handling code may be a

large part (30%-50% or more)

23

C++

#include <iostream>

#include <fstream>

using namespace std;

int main () {

ifstream file;

//Set the state flags for which a failure exception is thrown.

file.exceptions ( ifstream::failbit | ifstream::badbit );

try {

file.open ("test.txt");

while (!file.eof()) file.get();

} catch (ifstream::failure e) {

cout << "Exception opening/reading file";

}

file.close();

24

class ios_base::failure : public exception {// the exceptions thrown by the elements of // the standard input/output library

public: explicit failure (const string& msg); virtual ~failure(); virtual const char* what() const noexcept;

}

flag values of std::ios_base::iostate

eofbitfailbitbadbitgoodbit

Java

InputStream input = null;

try{

input = new FileInputStream("c:\\data\\input-text.txt");

int data = input.read();

while(data != -1) {

//do something with data...

doSomethingWithData(data);

data = input.read();

}

}catch(IOException e){

//do something with e... log, perhaps rethrow etc.

} finally {

if(input != null) input.close();

}

25

Note : C++ does not support 'finally' blocks.

Throw

int main () {

try

{

throw 20;

} catch (int e)

{

cout << "An exception occurred. Exception Nr. "

<< e << endl;

}

return 0;

}

26

An exception occurred.

Exception Nr. 20

Chaining

InputStream input = null;

try{

input = new FileInputStream("c:\\data\\input-text.txt");

int data = input.read();

while(data != -1) {

//do something with data...

doSomethingWithData(data);

data = input.read();

}

}catch(IOException e){

throw new MyException();

}

27

What Should Do for An Exception Occurs

28

try {

f(1);

Object x;

g(2);

}catch (Exc) {

// handler

}

goto handler A

destroy x

+ goto handler A

handler A:

catches type “Exc”

Note: “Try” can be nested, so the handlers are organized in a stack

when an exception occurs

when an exception occurs

Basic Exception Handling Mechanism

1 Setjmp/longjmp-based

– global “goto”

– C’s primitive exception

2 Table-driven method

– more complex and more space usage

– but faster

29

1 Setjmp/longjmp

#include < setjmp.h >

main() {

jmp_buf env; int i;

i = setjmp(env);

printf("i = %d\n", i);

if (i != 0) exit(0);

longjmp(env, 2);

printf("get printed?\n");

}

• First, we call setjmp(), and it returns 0.

• Then we call longjmp() with a value of 2, which causes the code to return

from setjmp() with a value of 2.

– That value is printed out, and the code exits. (“get printed?” will not be printed)

30

$ sj1 i = 0 i = 2 $ _

• setjmp() : save the contents of the

registers

• longjmp() : restore them later.

``returns'' to the state of the program

when setjmp() was called.

Setjmp/longjmp Approach

buffer buf;

void f() {

if (0 == setjmp(buf))

g();

}

void g() {

h();

}

void h() {

longjmp(buf, 1);

}

31

struct context

{

int ebx;

int edi;

int esi;

int ebp;

int esp;

int eip;

};

typedef struct context buffer[1];

Setjmp/longjmp Approach Conts'

32

buffer buf;

void f () {

if (0==setjmp (buf))

g ();

else

k();

}

void g () {

h ();

}

void h () {

longjmp (buf, 1);

}

try ..catch

Save the context before try block

This context also calls handler

throws

Fetch the handler, restore machine states and jump to the handler’s code

Handle exception with k()

2 Table Driven Approach

33

• Table 1 : Each throw point to its action table

– from the program counter (PC) at the point where the exception is

thrown

– to an action table

• Table 2 : Action table

– perform the various operations required for exception processing

eg.

• invoking destructors

• adjusting the stack

• matching the exception type to the address of an exception handling

Discussions

• All variables that are declared outside the try block have to be

restored to their initial value

Lecture s = new Lecture();

// s.lecturer is assumed initially null

try {

s.lecturer = new ThatMan();

FileInputStream(); // exception!

// s.lecturer (in memory) should be restored

...

} catch (IOException e) {...}

34

Discussions

• Setjmp/longjmp approach

– setjump should be called at the beginning of every try

block even if no exception is ever thrown

– list of buf must be maintained

– list of objects on the stack must be maintained (in C++)

35

Discussions Conts’

• Table driven approach Mostly used

– Significantly more efficient than setjmp/longjmp approach

– Table themselves have to encode a lot of possible actions

• Space problem

• Reorganizing the code implies reorganizing the table accordingly

• Vulnerable to attack

• Compiler optimization should not be allowed

void f(){

int x = 0; // dead code, but cannot be optimized out

try { x = f1(x); … } catch (…) { cout <<“…”;}

}

36

Exception Handling in GIMPLE

• throw is NOT directly supported BUT by function calls

37

38

invoking destructors

and adjusting the stack…

Instruction Selection

40

Low-level, Tree-basedIntermediate Representation

Tree-based IR

– With abstract machine instructions

– used in machine code generation

eg) from Tiger book

41

PLUS

e

CONST

c

BINOP

MEM

e + c

cf. RTL

Tree-basedIntermediate Representation

42

MEM(e) : this means the value of one word of memory starting at the

address e. When this is used at left-hand side of MOVE, it is interpreted as

store, otherwise it means a “fetch” operation

TEMP(t) : register t

SEQ (s1, s2) : after evaluation of statement s1, statement s2 is evaluated

ESEQ(s,e) : statement s evaluated for side effects and then e is evaluated

for a result

BINOP(o, e1, e2) : o is a binary operator like PLUS and MINUS.

The result is the evaluation of o with e1 and e2 as operands

This result is saved in memory and the address is returned

const(i) : integer constant i

from Tiger book

Simple Equivalence Relationships

We can choose one among the sub-trees of the same semantics

43

s1

s2 e

ESEQ

ESEQ

e

s1 s2

SEQ

ESEQ

44

op

e2

ESEQe1

BINOP

s

MOVE ESEQ

ESEQ

s

op e2TEMP

BINOPe1 TEMP

t

t

op

e2

ESEQe1

BINOP

s

s

e2

BINOP

e1

ESEQ

op

op

e1

e2ESEQ

BINOP

s

More Instruction Selection (Option1)

45

MEM MEM

MOVE

PLUS CONSTTEMP

BINOPBINOP

PLUS

fp x

MEMBINOP

MULT TEMP

i

CONST

4

PLUS CONSTTEMP

BINOP

fp a

More Instruction Selection (Option2)

46

MEM MEM

MOVE

PLUS CONSTTEMP

BINOPBINOP

PLUS

fp x

MEMBINOP

MULT TEMP

i

CONST

4

PLUS CONSTTEMP

BINOP

fp a

Equivalence of The Machine Codes

LOAD r1 M[fp+a]

ADDI r2 r0 + 4

MUL r2 ri r2

ADD r1 r1 + r2

LOAD r2 M[fp+x]

STORE M[r1+0] r2

47

LOAD r1 M[fp+a]

ADDI r2 r0 + 4

MUL r2 ri r2

ADD r1 r1 + r2

LOAD r2 fp + x

STORE M[r1] M[r2]

Register Allocation

48

Operand in Low Level IR Review

• Operands

– Virtual registers

• We assume infinitely many virtual registers

– Special registers – stack pointer, pc, …

– Literals

• We assume there is no limits of values of literals

– Symbolic names – in most cases, “labels”

49

Register Allocation

• Motivation

– Virtual register (VR)

• Although we assume infinitely many virtual registers

– The number of actual registers is finite, and various from machine to

machine

• Register allocation

– Put as many as VRs to physical registers, and allocate the remained

VRs to memory

– Optimization for the best performance : put frequently used VRs to

physical registers

– Spilling : allocating virtual registers to memory, inevitably

Interference

• Interference : two different definitions have a common operations in their live

ranges

– Live range : generated from liveness analysis and reaching definition analysis

• Interference graph

– Nodes of the graph = variables

– Edges : linked if two nodes interfere each other

51

a

cb

For def1 a = {1,2,3,4,5}

For def2 b = {2,3}

For def4 c = {4,5}

1: a = 0

2: b = a

3: b*b

4: c = 2

5: a*c+3

examples and materials from Princeton Univ.

Graph Coloring

• Graph Coloring

– Used to allocate virtual registers (that is, variables) to physical registers

– “Linked nodes should be painted in different colors”

– Simple example:

• Two registers : 2-coloring (two colors)

a

cb

eax

ebx

color register

1: a = 0

2: b = a

3: b*b

4: c = 2

5: a*c+3

K-Graph Coloring Algorithm

• Kempe’s algorithm [1879] --- Old problem

• Step 1 (simplify) Find a node linked with less than k edges, and cut that

node with the edges linked to it

– save these to a stack

• Step 2 (color) if a remaining graph is a simplied subgraph and

can be k-graphed colored

– pop a node (and all the related edges pushed together) from the stack,

– and color the node in different colors from all the neighbor nodes

• Step 3 (Spill) –optional If failed with above the algorithm

– Actually Step1~step2 is not applicable to many cases

• Graph coloring is NP-complete problem

– Solution : select several (victim) variables and allocate them to memory

53

b

ed

a

stack:

c c

stack:

e

c b

ed

a

c

stack:

a

e

cb

ed

a

c

stack:b

a

e

cb

ed

a

c

Step 1

stack:

a

e

cb

ed

a

c

stack:b

a

e

cb

ed

a

c

stack:

c

stack:

e

c b

ed

a

cb

ed

a

c

Step 2

Case of Step 3(1)

• Some lucky cases!

b

ed

eax

ebx

color register

a

c

stack:d

all nodes have

2 neighbours!

Case of Step 3 (2)

b

ed

a

c

• But there exist graphs where coloring with only k colors is

impossible

spilling!

b

ed

a

c

no colors left for e or a !

Spilling code

• Code rewriting

– Introduce new temporary, and rewrite codes

• eg. Assuming that t2 is supposed to be spilled

Then, ‘add t1, t2’ will be;

– “define a memory area bound to to-be-spilled variables”

(here, t2)

eg. [ebp-24] in runtime stack

– and “introduce a new temporary variable” (here, t35)

• mov t35, [ebp – 24]

• add t1, t35

note : t35’s live range is very short (one or two commands)

so possibility of interference is very low (much less than t2)