40-414 Compiler Design Register Allocation

54
1 40-414 Compiler Design Lecture 13 Register Allocation

Transcript of 40-414 Compiler Design Register Allocation

Page 1: 40-414 Compiler Design Register Allocation

1

40-414 Compiler Design

Lecture 13

Register Allocation

Page 2: 40-414 Compiler Design Register Allocation

2

Back-End (Revisited)

Back-End:

• Translate IR into machine code

• Choose instructions for each IR operation

• Decide what to keep in registers at each point

Instructionselection

IRMachine

code

Errors

RegisterAllocation

Page 3: 40-414 Compiler Design Register Allocation

3

The Register Allocation Problem

• Intermediate code uses unlimited temporaries– Simplifies code generation and optimization

– Complicates final translation to assembly

• Typical intermediate code uses too many temporaries

Prof. Aiken

Page 4: 40-414 Compiler Design Register Allocation

4

The Register Allocation Problem (Cont.)

• The problem:Rewrite the intermediate code to use no more temporaries than there are machine registers

• Method: – Assign multiple temporaries to each register

– But without changing the program behavior

Prof. Aiken

Many temps to one

Page 5: 40-414 Compiler Design Register Allocation

5

An Example

• Consider the program

a := c + de := a + bf := e - 1

• Assume a and e dead after use– Temporary a can be

“reused” after e := a + b– So can temporary e

• Can allocate a, e, and f all to one register (r1):

r1 := r2 + r3

r1 := r1 + r4

r1 := r1 - 1

• A dead temporary is not needed – A dead temporary can be

reused

Prof. Aiken

Many to one mapping

Page 6: 40-414 Compiler Design Register Allocation

6

History

• Register allocation is as old as compilers– Register allocation was used in the original

FORTRAN compiler in the ‘50s

– Very crude algorithms

• A breakthrough came in 1980 – Register allocation scheme based on graph coloring

– Relatively simple, global and works well in practice

Prof. Aiken

Page 7: 40-414 Compiler Design Register Allocation

Temporaries t1 and t2 can share the same register if at any point in the program at most one of t1 or t2 is live .

Or

If t1 and t2 are live at the same time, they cannot share a register

7

The Idea

Prof. Aiken

Page 8: 40-414 Compiler Design Register Allocation

8

Algorithm: Part I

• Compute live variables for each point:

a := b + cd := -ae := d + f

f := 2 * eb := d + e

e := e - 1

b := f + c

{b}

{c,e}

{b}

{c,f} {c,f}

{b,c,e,f}

{c,d,e,f}

{b,c,f}

{c,d,f}{a,c,f}

Prof. Aiken

Page 9: 40-414 Compiler Design Register Allocation

9

The Register Interference Graph

• Construct an undirected graph– A node for each temporary

– An edge between t1 and t2 if they are live simultaneously at some point in the program

• This is the register interference graph (RIG)– Two temporaries can be allocated to the same

register if there is no edge connecting them

Prof. Aiken

Page 10: 40-414 Compiler Design Register Allocation

10

Example

• For our example:

a

f

e

d

c

b

• E.g., b and c cannot be in the same register

• E.g., b and d could be in the same register

Prof. Aiken

Page 11: 40-414 Compiler Design Register Allocation

11

Notes on Register Interference Graphs

• Extracts exactly the information needed to characterize legal register assignments

• Gives a global (i.e., over the entire flow graph) picture of the register requirements

• After RIG construction the register allocation algorithm is architecture independent– It does not depend on any property of the machine

except for the number of registers

Prof. Aiken

Page 12: 40-414 Compiler Design Register Allocation

12

Definitions

• A coloring of a graph is an assignment of colors to nodes, such that nodes connected by an edge have different colors

• A graph is k-colorable if it has a coloring with k colors

Prof. Aiken

Page 13: 40-414 Compiler Design Register Allocation

13

Register Allocation Through Graph Coloring

• In our problem, colors = registers– We need to assign colors (registers) to graph nodes

(temporaries)

• Let k = number of machine registers

• If the RIG is k-colorable then there is a register assignment that uses no more than k registers

Prof. Aiken

Page 14: 40-414 Compiler Design Register Allocation

14

Graph Coloring Example

• Consider the example RIG

a

f

e

d

c

b

• There is no coloring with less than 4 colors

• There are 4-colorings of this graph

r4

r1

r2

r3

r2

r3

Prof. Aiken

Page 15: 40-414 Compiler Design Register Allocation

15

Example Review

a := b + cd := -ae := d + f

f := 2 * eb := d + e

e := e - 1

b := f + c

Prof. Aiken

Page 16: 40-414 Compiler Design Register Allocation

16

Example After Register Allocation

• Under this coloring the code becomes:

r2 := r3 + r4

r3 := -r2

r2 := r3 + r1

r1 := 2 * r2

r3 := r3 + r2

r2 := r2 - 1

r3 := r1 + r4

Prof. Aiken

Page 17: 40-414 Compiler Design Register Allocation

17

Computing Graph Colorings

• How do we compute graph colorings?

• It isn’t easy:1. This problem is very hard (NP-hard). No efficient

algorithms are known.– Solution: use heuristics

2. A coloring might not exist for a given number of registers– Solution: later

Prof. Aiken

Page 18: 40-414 Compiler Design Register Allocation

18

Graph Coloring Heuristic

• Observation:– Pick a node t with fewer than k neighbors in RIG

– Eliminate t and its edges from RIG

– If resulting graph is k-colorable, then so is the original graph

• Why?– Let c1,…,cn be the colors assigned to the neighbors

of t in the reduced graph

– Since n < k we can pick some color for t that is different from those of its neighbors

Prof. Aiken

n

Page 19: 40-414 Compiler Design Register Allocation

19

Graph Coloring Heuristic

1. The following works well in practice:– Pick a node t with fewer than k neighbors

– Put t on a stack and remove it from the RIG

– Repeat until the graph has one node

2. Assign colors to nodes on the stack – Start with the last node added

– At each step pick a color different from those assigned to already colored neighbors

Prof. Aiken

Page 20: 40-414 Compiler Design Register Allocation

20

Graph Coloring Example (1)

• Remove a

a

f

e

d

c

b

• Start with the RIG and with k = 4:

Stack: {}

Prof. Aiken

Page 21: 40-414 Compiler Design Register Allocation

21

Graph Coloring Example (2)

• Remove d

f

e

d

c

b

Stack: {a}

Prof. Aiken

Page 22: 40-414 Compiler Design Register Allocation

22

Graph Coloring Example (3)

• Note: all nodes now have fewer than 4 neighbors

f

e c

b

Stack: {d, a}

• Remove c

Prof. Aiken

Page 23: 40-414 Compiler Design Register Allocation

23

Graph Coloring Example (4)

f

e

b

Stack: {c, d, a}

• Remove b

Prof. Aiken

Page 24: 40-414 Compiler Design Register Allocation

24

Graph Coloring Example (5)

f

e

Stack: {b, c, d, a}

• Remove e

Prof. Aiken

Page 25: 40-414 Compiler Design Register Allocation

25

Graph Coloring Example (6)

fStack: {e, b, c, d, a}

• Remove f

Page 26: 40-414 Compiler Design Register Allocation

26

Graph Coloring Example (7)

• Now start assigning colors to nodes, starting with the top of the stack

Stack: {f, e, b, c, d, a}

Prof. Aiken

• Empty graph – done with the first part!

Page 27: 40-414 Compiler Design Register Allocation

27

Graph Coloring Example (8)

fStack: {e, b, c, d, a}

r1

Prof. Aiken

Page 28: 40-414 Compiler Design Register Allocation

28

Graph Coloring Example (9)

f

e

Stack: {b, c, d, a}

• e must be in a different register from f

r1

r2

Prof. Aiken

Page 29: 40-414 Compiler Design Register Allocation

29

Graph Coloring Example (10)

f

e

b

Stack: {c, d, a} r1

r2

r3

Prof. Aiken

Page 30: 40-414 Compiler Design Register Allocation

30

Graph Coloring Example (11)

f

e c

b

Stack: {d, a} r1

r2

r3

r4

Prof. Aiken

Page 31: 40-414 Compiler Design Register Allocation

31

Graph Coloring Example (12)

• d can be in the same register as b

f

e

d

c

b

Stack: {a} r1

r2

r3

r4

r3

Prof. Aiken

Page 32: 40-414 Compiler Design Register Allocation

32

Graph Coloring Example (13)

b

a

e c r4

fr1

r2

r3

r2

r3

d

Prof. Aiken

Page 33: 40-414 Compiler Design Register Allocation

33

What if the Heuristic Fails?

Prof. Aiken

• What happens if the graph coloring heuristic fails to find a coloring?

• In this case, we can’t hold all values in registers.

– Some values are spilled to memory

Page 34: 40-414 Compiler Design Register Allocation

34

What if the Heuristic Fails?

• What if all nodes have k or more neighbors ?

• Example: Try to find a 3-coloring of the RIG:

a

f

e

d

c

b

Prof. Aiken

Page 35: 40-414 Compiler Design Register Allocation

35

What if the Heuristic Fails?

• Remove a and get stuck (as shown below)

f

e

d

c

b

– There is no node with fewer than 3 neighbors

• Pick a node as a candidate for spilling– A spilled temporary “lives” in memory

– Assume that f is picked as a candidate

Prof. Aiken

Page 36: 40-414 Compiler Design Register Allocation

36

What if the Heuristic Fails?

• Remove f and continue the simplification– Simplification now succeeds: b, d, e, c

e

d

c

b

Prof. Aiken

Page 37: 40-414 Compiler Design Register Allocation

37

What if the Heuristic Fails?

• Eventually we must assign a color to f

• We hope that among the 4 neighbors of f we use less than 3 colors optimistic coloring

f

e

d

c

b r3

r1r2

r3

?

Prof. Aiken

In this ex., it doesn’t work

Page 38: 40-414 Compiler Design Register Allocation

38

Spilling

• If optimistic coloring fails, we spill f– Allocate a memory location for f

• Typically in the current stack frame

• Call this address fa

• Before each operation that reads f, insertf := load fa

• After each operation that writes f, insertstore f, fa

Prof. Aiken

Page 39: 40-414 Compiler Design Register Allocation

39

a := b + cd := -ae := d + f

f := 2 * eb := d + e

e := e - 1

b := f + c

Prof. Aiken

Spilling Example

• Original code

Page 40: 40-414 Compiler Design Register Allocation

40

Spilling Example

• This is the new code after spilling f

a := b + cd := -af := load fae := d + f

f := 2 * e

store f, fa

b := d + e

e := e - 1

f := load fab := f + c

Prof. Aiken

Page 41: 40-414 Compiler Design Register Allocation

A Problem

• This code reuses the register name f

• Correct, but suboptimal– Should use distinct register names whenever

possible

– Allows different uses to have different colors

41Prof. Aiken

Page 42: 40-414 Compiler Design Register Allocation

42

Spilling Example

• This is the new code after spilling f

a := b + cd := -af1 := load fae := d + f1

f2 := 2 * e

store f2, fa

b := d + e

e := e - 1

f3 := load fab := f3 + c

Prof. Aiken

Page 43: 40-414 Compiler Design Register Allocation

43

Recomputing Liveness Information

• The new liveness information after spilling:

a := b + cd := -af1 := load fae := d + f1

f2 := 2 * e

store f2, fa

b := d + e

e := e - 1

f3 := load fab := f3 + c

{b}

{c,e}

{b}

{c,f}{c,f}

{b,c,e,f}

{c,d,e,f}

{b,c,f}

{c,d,f}{a,c,f}

{c,d,f1}

{c,f2}

{c,f3}

Prof. Aiken

Page 44: 40-414 Compiler Design Register Allocation

44

Recomputing Liveness Information

• New liveness information is almost as before– Note f has been split into three temporaries

• fi is live only– Between a fi := load fa and the next instruction

– Between a store fi, fa and the preceding instr.

• Spilling reduces the live range of f– And thus reduces its interferences

– Which results in fewer RIG neighborsProf. Aiken

Page 45: 40-414 Compiler Design Register Allocation

45

Recompute RIG After Spilling

• Some edges of the spilled node are removed

• In our case f still interferes only with c and d

• And the resulting RIG is 3-colorable

a

f1

e

d

c

bf3

f2

Prof. Aiken

Page 46: 40-414 Compiler Design Register Allocation

46

Spilling Notes

• Additional spills might be required before a coloring is found

• The tricky part is deciding what to spill– But any choice is correct

• Possible heuristics:– Spill temporaries with most conflicts

– Spill temporaries with few definitions and uses

– Avoid spilling in inner loopsProf. Aiken

Page 47: 40-414 Compiler Design Register Allocation

47

Conclusions

• Register allocation is a “must have” in compilers:– Because intermediate code uses too many

temporaries

– Because it makes a big difference in performance

• Register allocation is more complicated for CISC machines

Prof. Aiken

Page 48: 40-414 Compiler Design Register Allocation

48

Caches

• Compilers are very good at managing registers– Much better than a programmer could be

• Compilers are not good at managing caches– This problem is still left to programmers– It is still an open question how much a compiler can

do to improve cache performance

• Compilers can, and a few do, perform some cache optimizations

Prof. Aiken

Page 49: 40-414 Compiler Design Register Allocation

49

Cache Optimization

• Consider the loopfor(j := 1; j < 10; j++)

for(i=1; i<1000; i++)

a[i] *= b[i]

• This program has terrible cache performance– Because each iteration of the inner loop refers to a new

element of arrays (i.e., fresh data) = [a cache miss]

Prof. Aiken

a[1] a[2] … b[1] …

a[1] *= b[1]a[2] *= b[2]

cache

Page 50: 40-414 Compiler Design Register Allocation

50

Cache Optimization (Cont.)

• Consider the program:for(i=1; i<1000; i++)

for(j := 1; j < 10; j++)

a[i] *= b[i]

– Computes the same thing

– But with much better cache behavior

– Might actually be more than 10x faster

• A compiler can perform this optimization– called loop interchange

Prof. Aiken

Page 51: 40-414 Compiler Design Register Allocation

51

Question?

Prof. Aiken

A := 1

B := A * 2

C := C - B

A := B + 1

A < 16

D := C + 1

Which of the following pairs of temporariesinterfere in the code fragment given at right?

A and B

B and C

A and C

C and D

1

2

3

4

5

6

Page 52: 40-414 Compiler Design Register Allocation

52

Question?

Prof. Aiken

f

e

d

c

b

f

e

da

c

b

a

f

e c

b

d a

f

e

d

c

b

Which of the following colorings is a validminimal coloring of the given RIG?

a

Page 53: 40-414 Compiler Design Register Allocation

53

Question?

Prof. Aiken

a

f

e

d

c

b

For the given RIG and k = 3, which of the following are valid deletion orders for the nodes of the RIG?

{d, e, c, b, a, f}

{e, f, a, b, c, d}

{d, c, b, a, f , e}

{d, e, b, c, a, f}

Page 54: 40-414 Compiler Design Register Allocation

54

Question?

Prof. Aiken

A := 1

B := A * 2

C := C - B

A := B + 1

A < 16

D := C + 1

For the given code fragment and RIG, find theminimum cost spill. In this example, the cost of spilling a node is given by:

A

C

B

D

1

2

3

4

5

6

# of occurrences (use or definition)- # of conflicts

+ 5 if the node corresponds to avariable used in a loop

A

D C

B