Integrated Register Allocation introduction

39
Integrated Register Allocation Introduction Shiva Chen

Transcript of Integrated Register Allocation introduction

Page 1: Integrated Register Allocation introduction

Integrated Register Allocation Introduction

Shiva Chen

Page 2: Integrated Register Allocation introduction

Outline

• Register allocator• Graph coloring register allocator• Caller/callee save registers• Register coalescing/live range splitting• Integrated Register allocation• Reading IRA RTL dump file• Reference

2

Page 3: Integrated Register Allocation introduction

Register allocator

• Pseudo-registers–Most of modern compiler are written as if there is

an infinite number of virtual registers• Register allocator– The pass to map pseudo-registers onto hard-

registers and memory

3

Page 4: Integrated Register Allocation introduction

Graph coloring register allocator

• The first graph coloring allocator was built by Chaitin.

• To model register allocation as a graph colorin g problem.– Constructs an interference graph G• Each node in G: a live ranges• Each edge in G: interference between live ranges

4

move r1, r3move r5, r4Sub r6, r3, r5

Interference graph G

Note:The interference graph Is not colored yet.

The color in the exampleis only help to distinguisheach live range.

Page 5: Integrated Register Allocation introduction

Graph coloring register allocator

• Degree– The number of neighbors of node

• K-coloring– An assignment of k colors to the nodes of G• Adjacent nodes always have distinct colors

• Choose k as hardware register number– The we can map k-coloring of G in to hardware

register assignment

5Interference graph G

Degree 1

Degree 2 Degree 3

Degree 2

Degree 4

Page 6: Integrated Register Allocation introduction

Graph coloring register allocator

• Chaitin’s allocator– Create an empty stack– Repeat following two steps until the graph is

empty• If there exist a node with degree < k (trivially colorable)

– Remove the node and add to stack• Otherwise, choose a node to spill

– Remove the node and it’s edges from the graph– Select assigns colors to the node• Pop a node from the stack• Give the node distinct color from it’s neighbors

6

Interference graph G

x

yw z

Page 7: Integrated Register Allocation introduction

Graph coloring register allocator

• Chaitin’s allocator– E.g. each node in G degree is 2, suppose k(color) =

2 • There is no node degree less then 2

– 1. Spill one of the node» Choose x to spill

7

x

yw z

yw z

1.

w

2. y

wz

y wz

5.

yw z

7. x x: spilled

yz

w

3.

yz

4.

w

6.

yz

When color z in step 6,Because we have been guarantee z’s degree < k in step 2.Therefore, there must remain one Color for z.

Page 8: Integrated Register Allocation introduction

Graph coloring register allocator

• Chaitin’s allocator– E.g. each node in G degree is 2, suppose k(color) =

2 • The case should be 2-colorable without spilled• The spilled cause by Chaitin’s approach suggest 2

degree’s node must have two different colors– E.g x’s neighbors w and z must have different colors

» However, w and z could be same color• And then no spilled is needed for k =2 in G

8

x

yw z

x

yw z

Page 9: Integrated Register Allocation introduction

Graph coloring register allocator

• Briggs improvement–We also called Chaitin-Briggs allocator• Push x to stack even x’s degree not < 2• Spilling decision make while coloring

– In step 6, could consider colors needed instead of by degree» G become 2-colorable

99

x

yw z

yw z

1.

y xw

2. z

xw

3.

y

4.

yw z

5.

x

zy

xwz

x

x

yw z

6.

Page 10: Integrated Register Allocation introduction

Caller/callee save registers

Register allocation for a function.

It’s simpler to consider only one Function a time.

Which meansWe should preservethe register contentafter function call.

func1

func2

call func2

push regs

pop regs

10

Page 11: Integrated Register Allocation introduction

Caller/callee save registers

It’s too expansiveIf we push all registersfor each function call.

Could we just pusha part of registers ?

func1

func2

call func2

push regs

pop regs

Define register usage:A part of register’s life time will end after function call.The others life time could cross function call.We only need to push the registers which life time would cross function call. 11

Page 12: Integrated Register Allocation introduction

Caller/callee save registers

In func2 use caller savefirst.

Because use callee saveneed push/pop.

func1

func2

call func2

push regs

pop regs

12

Page 13: Integrated Register Allocation introduction

Caller/callee save registers

Use callee saveIf the value should cross function call.

If not enough callee save to allocate.

Use caller save register to cross callneed extra push pop around call.

func1

call func2push caller save

pop caller save

13

Page 14: Integrated Register Allocation introduction

Caller/callee save registers

GCC have –fipa-ra flagsIt tend to use caller save registerWithout push/pop around function callIf the compiler could know the registerUsage of func2

func1

call func2push caller save

pop caller save

14

Page 15: Integrated Register Allocation introduction

Register coalescing/live range splitting

• Register coalescing– Remove unnecessary moves by using just one pseudo-

register• Live range splitting– Split live range of a pseudo-register if splitting could

reduce conflicting with other pseudo-registers.

15

move r1, r2

add r7, r1, r5 add r7, r2, r5

mult r2, r7, r8 mult r2, r7, r8

Coalescing

Splitting

Page 16: Integrated Register Allocation introduction

Integrated Register allocation (IRA)

• IRA– Perform graph coloring on a top-down traversal of

nested regions.– Perform following three steps in integrated way

base on dynamically changing hard register costs• Register coalescing• live range splitting • choosing hard register.

16

Page 17: Integrated Register Allocation introduction

Integrated Register allocation (IRA)

• Internal representation for IRA– Regions • Entire function for the root region• Natural loops• Use –fira-region=(one|all|mixed) to decide region rule

– One: only one root region (entire function)– All: all loops as regions– Mixed: loops without low register pressure loops as regions

• Without command line –fira-region=– -Os or –O0 default -fira-region=one– -O1 or above default -fira-region=mixed– Defined in toplev.c

17

Page 18: Integrated Register Allocation introduction

Integrated Register allocation (IRA)

• Internal representation for IRA– allocno• The live range of a pseudo-register in a region

18pseudo r111

Region A

Region B

Allocno of r111 in A

Allocno of r111 in B

Page 19: Integrated Register Allocation introduction

Integrated Register allocation (IRA)

• Internal representation for IRA– Each allocno have following attribute• Cover class

– Hard-register class available for the allocno• Hard register costs

– The cost of each cover class hard-register available for the allocno

– The caller saved register cost will increase when the allocno cross a call

• Conflict hard-register costs– To help calculate hard register cost

• More detail study ira-int.c – struct ira_allocno

19

Page 20: Integrated Register Allocation introduction

Integrated Register allocation (IRA)• To assign hard register for allocno 1

– Choose the full hard register cost with minimum cost• For each hard register which allocno1 could choose

– Full_cost = hard_register_cost – (conflict_hard_register_cost of allocno2 and allocno3) + (conflict_hard_regsiter cost of allocno4)» conflict_hard_register_cost

• Cost high: intend not to use• Cost low: intend to use

» If conflict_hard_register_cost of allocno2 and allocno3 is high• allocno2 and allocno3 are not prefer to assign the

hard register• Full cost of allocno1 will smaller

• allocno1 could more prefer to assign the hard register

20allocno 1

allocno 2

allocno 3

allocno 4

Page 21: Integrated Register Allocation introduction

Integrated Register allocation (IRA)

• Internal representation for IRA– Copy• Allocno can be connect by copies.• Copies are used to modify hard register cost for

allocnos during coloring

21

Page 22: Integrated Register Allocation introduction

Integrated Register allocation (IRA)

• Reduce hard register cost for allocnos connected by copies– If one of the allocno assign to hard register n• The allocno’s hard register cost of n of all connected

allocnos will reduce– Which means intend to choose to the same hard register for

connected allocnos» The Register coalescing in IRA.

22

Allocno 1

Allocno 2Allocno 2 ‘s register n hard register cost will reduceIf allocno 1 assigned to hard register n

Page 23: Integrated Register Allocation introduction

• Copy will create for– 1. move– 2. operand constraint– 3. shuffle

23

move r133, r145 addx r33, r145, r223

Create copy if addx haveoperand contraint thatoperand 0 and 1 should be the same register

Create copy for the allocnosCross the region

1. 2.3.

Page 24: Integrated Register Allocation introduction

Integrated Register allocation (IRA)• Internal representation for IRA

– Cap• To present the allocno exist in inner region but not in outer region.• Let outer region could also consider the information in inner region’s

allocno.• create_caps () in ira-build.c• Caps only exist in parent region

24

Page 25: Integrated Register Allocation introduction

Integrated Register allocation (IRA)

• IRA regional coloring– Start from root region• Coloring one region a time• From outer region to inner region

– Implementation• do_coloring() in ira-color.c

– Traverse loop tree by function ira_traverse_loop_tree– Each time coloring one region by function color_pass

» color_pass will setup allocnos in the regions and call color_allocnos () start to allocate allocnos in the region

25

Page 26: Integrated Register Allocation introduction

Integrated Register allocation (IRA)

• IRA regional coloring– Update cost of allocnos in sub-region (Region B)

after finish allocation for parent region (Region A)

26

Region A

Region B

Allocno 12

Allocno 13

pseudo r111

Update code implement in the end of color_pass()

If allocno 12 and allocno 13 belong to same pseudo

1. If allocno12 assign to hard register r1

Hard_register_cost[other regs(!r1)] += move_cost * (exit_freq + enter_freq)

Memove_cost +=Load_cost * exit_freq + store_cost * enter_freq

Page 27: Integrated Register Allocation introduction

Integrated Register allocation (IRA)

• IRA regional coloring– Update cost of allocnos in sub-region (Region B)

after finish allocation for parent region (Region A)

27

Region A

Region B

Allocno 12

Allocno 13

pseudo r111

Update code implement in the end of color_pass()

If allocno 12 and allocno 13 belong to same pseudo

2. If allocno12 assign to memory

Memove_cost -=Load_cost * exit_freq + store_cost * enter_freq

Page 28: Integrated Register Allocation introduction

Integrated Register allocation (IRA)

• IRA Coloring– Use Chaitin-Briggs coloring in each region• Start in color_allocnos () • Two buckets

– Trivially colorable allocno– non-trivially colorable allocno

28

Page 29: Integrated Register Allocation introduction

Integrated Register allocation (IRA)

• IRA Coloring– First pass • put all allocnos on the coloring stack

– Function push_allocnos_to_stack() in ira-color.c– Move colorable allocno into stack

» Update colorable bucket after remove an allocno from interference graph

– If colorable bucket become empty» Choose an allocno from uncolorable bucket with

minimum cost.• Minimum cost allocno will sort to the head of

uncolorable bucket • sorted by function

allocno_spill_priority_compare in ira-color.c29

Page 30: Integrated Register Allocation introduction

Integrated Register allocation (IRA)

• IRA Coloring– Second pass• Pop allocnos from the stack and assign hard-registers.

– Function pop_allocnos_from_stack () in ira-color.c– Pop allocno from stack top and call assign_hard_reg()

» Assign_hard_reg()• Calculate full_hard_register_cost• Add cost for callee save registers

• Callee save registers need push/pop on prologue/epilogue

• Choose the hard register of allocno with minimum cost

30

Page 31: Integrated Register Allocation introduction

Integrated Register allocation (IRA)

• IRA Coloring– Sorted uncolorable bucket

31

ALLOCNO_BAD_SPILL_P (a) == 1 meansSpill a will result in additional reload

1. Sorted by ALLOCNO_BAD_SPILL_P if only one of the allocno is bad spill.2. Sorted by allocno_spill_priority () function3. If priority are equal, sorted by ALLOC_COLOR_DATA(a)->temp4. If still equal, sorted by ALLOCNO_NUM (a)

Page 32: Integrated Register Allocation introduction

Integrated Register allocation (IRA)

• IRA Coloring– Sorted uncolorable bucket

32

1. data->temp get from calculate_allocno_spill_cost (a)2. ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (a):

The number of live point allocno number more than the number of hard-registersof the class available.

3. ira_reg_class_max_nregs : available hard-register number of ALLOCNO_CLASS(a)4. The allocno with less spill cost and longer high pressure live point attempt to spilled.

Page 33: Integrated Register Allocation introduction

Integrated Register allocation (IRA)• Emitting code for register shuffling

– Two allocnos representing the same pseudo-register may be assigned to different location (hard-register or memory)• Reload/LRA works on pseudo-register basis

– No way for Reload/LRA assign different location for same pseudo-register– Split pseudo-register

» Create new pseudo-register and generate move » Source cod e in ira-emit.c

33

Region A

Region B

Allocno of r111 in A

Allocno of r111 in B

pseudo r111

pseudo r111

pseudo r111

Move r199, r111

Page 34: Integrated Register Allocation introduction

Reading IRA RTL dump file

34

Pass 1 for finding pseudo/allocno costs

r113: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS

preferred: best register classalternative: alternative regster classallocno: current register class

a0(r113,l0) costs: GENERAL_REGS:0,0 VFP_D0_D7_REGS:23490,23490 VFP_LO_REGS:23490,23490 ALL_REGS:23490,23490 MEM:15660,15660

Allocno 0 (r113 in region 0)register class GENERAL_REGS with cost 0,0 is the cost after propagate upper region cost

Compile with: -fdump-rtl-ira –fira-verbose=9To create IRA dump file

Page 35: Integrated Register Allocation introduction

Reading IRA RTL dump file

35

Insn 13(l0): point = 1 Insn 12(l0): point = 3 Insn 11(l0): point = 5 Insn 10(l0): point = 7 Insn 45(l0): point = 9 Insn 7(l0): point = 12 Insn 6(l0): point = 14 a0(r113): [2..9] a1(r112): [2..3] a2(r110): [10..14]Compressing live ranges: from 17 to 4 - 23%Ranges after the compression: a0(r113): [0..1] a1(r112): [0..1] a2(r110): [2..3]

Assign point to each instructionuse to describe live range

Live range of a0 is [2..9]

Compress program point

Page 36: Integrated Register Allocation introduction

Reading IRA RTL dump file

36

a0(r113): [0..1] a1(r112): [0..1] a2(r110): [2..3]+++Allocating 16 bytes for conflict table (uncompressed size 24);; a0(r113,l0) conflicts: a1(r112,l0);; total conflict hard regs: 0 12 14;; conflict hard regs: 0 12 14

;; a1(r112,l0) conflicts: a0(r113,l0);; total conflict hard regs:;; conflict hard regs:

;; a2(r110,l0) conflicts: cp0:a0(r113)<->a2(r110)@783:move pref0:a2(r110)<-hr0@125 regions=1, blocks=5, points=4 allocnos=3 (big 0), copies=1, conflicts=0, ranges=3

total conflict hard regs:Registers conflict with a0conflict hard regs:Registers in region 0 conflict with a0

A copy between a0 and a2Have frequency 783Instruction is move

a2 prefer hard register 0 with frequency 125

Page 37: Integrated Register Allocation introduction

Reading IRA RTL dump file

37

**** Allocnos coloring:

Loop 0 (parent -1, header bb2, depth 0) bbs: 4 3 2 all: 0r113 1r112 2r110 modified regnos: 110 112 113 border: Pressure: GENERAL_REGS=4 Hard reg set forest: 0:( 0-12 14 16-17)@0 1:( 0-12 14)@67480 2:( 1-11)@31320

Loop 0 with parrent -1:Which means Loop 0 is entired functionWith loop depth 0

all: all pseudos in the Loop0modified regnos: pseudos will beassigned value in Loop0

Pressure:The number of allocno chooseGENERAL_REGS register classIn the Loop 0

1:( 0-12 14)@67480Pre-order-num (hard register could use for the set 1) @ spill cost

According to conflict relation, each allocno may have different available hard registers.Hard reg set forest: will list all possible available hard register sets.

Page 38: Integrated Register Allocation introduction

Reading IRA RTL dump file

38

Allocno a0r113 of GENERAL_REGS(14) has 11 avail. regs 1-11, node: 1-11 (confl regs = 0 12-102)

Allocno a1r112 of GENERAL_REGS(14) has 14 avail. regs 0-12 14, node: 0-12 14 (confl regs = 13 15-102)

Forming thread by copy 0:a0r113-a2r110 (freq=783): Result (freq=3349): a0r113(1566) a2r110(1783)

Pushing a1(r112,l0)(cost 0) Pushing a0(r113,l0)(cost 0) Pushing a2(r110,l0)(cost 0) Popping a2(r110,l0) -- assign reg 0 Popping a0(r113,l0) -- assign reg 4 Popping a1(r112,l0) -- assign reg 3

Available hard registers for a0 in GENERAL_REGS is 11.available registers are 1-11. (conflict registers are 0, 12-102)

Create a thread to presenta0-copy-a2

Page 39: Integrated Register Allocation introduction

Reference

• Paper– Improvements to Graph Coloring Register

Allocation– The top-down regional register allocation for

irregular register file architectures– Register Allocation via Hierarchical Graph Coloring

• Source code– GCC official git branch gcc-5-branch• With sha1

– deeac8d177ce6aa25ef631b3785a0eed0df18d2c

39