Future Works in LLVM Register Allocationllvm.org/devmtg/2009-10/RegisterAllocationFutureWorks.pdf0...

Post on 03-Oct-2020

4 views 0 download

Transcript of Future Works in LLVM Register Allocationllvm.org/devmtg/2009-10/RegisterAllocationFutureWorks.pdf0...

Future Works in LLVM Register Allocation

1

Talk Overview

1. Introduction

2. Upcoming Changes

3. PBQP

2

Register Allocation in LLVM

PHI Elimination Two-Address

Coalescing Allocator

Liveness

Rewriter

3

PHI Elimination Two-Address

Coalescing Allocator

Liveness

Rewriter

Lower PHI-instructions to copies

Register Allocation in LLVM

4

PHI Lowering

BB1: %x1 = .

BB3:%x3 = phi [%x1, BB1], [%x2, BB2]

BB2: %x2 = .

5

PHI Lowering

BB1: %x1 = .

BB3:%x3 = %xP

BB2: %x2 = .

6

PHI Lowering

BB1: %x1 = . %xP = %x1

BB3:%x3 = %xP

BB2: %x2 = . %xP = %x2

7

PHI Elimination Two-Address

Coalescing Allocator

Liveness

Rewriter

Lower PHI-instructions to copies

Register Allocation in LLVM

8

PHI Elimination Two-Address

Coalescing Allocator

Liveness

Rewriter

Register Allocation in LLVM

Lower Three-Address Instructions

9

Two-Address Instructions

10

Two-Address Instructions

x3 = x2 + x1

10

Two-Address Instructions

x3 = x2 + x1

x3 = x2x3 += x1

10

PHI Elimination Two-Address

Coalescing Allocator

Liveness

Rewriter

Register Allocation in LLVM

Lower three-address instructions

11

Register Allocation in LLVM

PHI Elimination Two-Address

Coalescing Allocator

Liveness

Rewriter

Construct live intervals

12

Live Intervals

BB:%x1 = ...

.

.

.%x2 = %x1

.

.

. ... = %x2

13

Live Intervals

BB:%x1 = ...

.

.

.%x2 = %x1

.

.

. ... = %x2

%x1

13

Live Intervals

BB:%x1 = ...

.

.

.%x2 = %x1

.

.

. ... = %x2

%x1 %x2

13

Register Allocation in LLVM

PHI Elimination Two-Address

Coalescing Allocator

Liveness

Rewriter

Construct Live Intervals

14

Register Allocation in LLVM

PHI Elimination Two-Address

Coalescing Allocator

Liveness

Rewriter

Aggressively eliminate copies

15

BB:%x1 = ...

.

.

.%x2 = %x1

.

.

. ... = %x2

Coalescing

%x1 %x2

16

BB:%x1 = ...

.

.

.

.

.

. ... = %x1

Coalescing

%x1

16

Register Allocation in LLVM

PHI Elimination Two-Address

Coalescing Allocator

Liveness

Rewriter

Aggressively eliminate copies

17

Register Allocation in LLVM

PHI Elimination Two-Address

Coalescing Allocator

Liveness

Rewriter

Compute register assignment

18

Register Allocation in LLVM

PHI Elimination Two-Address

Coalescing Allocator

Liveness

Rewriter

Apply register assignment

19

Register Allocation in LLVM

PHI Elimination Two-Address

Coalescing Allocator

Liveness

Rewriter

20

Improvements

• New and better optimizations

• New allocators

• Cleaner infrastructure

21

1. Optimizations

22

Rematerialization

23

. // stuff

... =

. // more stuff

... =

<expr>

vr1

vr1

vr1

Rematerialization

=

23

. // stuff

... =

. // more stuff

... =

<expr>

vr1

vr1

vr1

Rematerialization

=

23

. // stuff

... =

. // more stuff

... =

<expr>

vr1

vr1

vr1

[M]

[M]

[M]

Rematerialization

=

23

. // stuff

... =

. // more stuff

... =

<expr>

vr1

vr1

vr1

Rematerialization

=

23

. // stuff

... =

. // more stuff

... =

<expr>

<expr>

Rematerialization

23

Splitting

24

Splitting

24

Splitting

24

Splitting

24

Splitting

Spill

24

Spills introducedin to other loops

Splitting

Spill

24

Splitting

Register

Spill

24

2. New Allocators

25

New Allocators

26

New Allocators

• “Linear Scan” is not, in fact, linear

26

New Allocators

• “Linear Scan” is not, in fact, linear

• We want something faster

26

New Allocators

• “Linear Scan” is not, in fact, linear

• We want something faster

• Priority coloring?

26

New Allocators

• “Linear Scan” is not, in fact, linear

• We want something faster

• Priority coloring?

• Linear Scan?

26

New Allocators

• “Linear Scan” is not, in fact, linear

• We want something faster

• Priority coloring?

• Linear Scan?

• Need to tidy the infrastructure

26

3. Cleaner Infrastructure

27

Currently...

28

Currently...

Liveness Analysis

28

Currently...

Liveness Analysis Allocator

28

Currently...

Liveness Analysis Allocator

VirtualRegister Map

28

Currently...

Liveness Analysis Allocator

VirtualRegister Map

28

Currently...

Liveness Analysis Allocator

VirtualRegister Map

28

Currently...

Liveness Analysis Allocator

VirtualRegister Map

28

Currently...

Liveness Analysis Allocator Rewriter

VirtualRegister Map

28

Rewrite In Place

29

Rewrite In Place

Liveness Analysis

29

Rewrite In Place

Liveness Analysis

Live Intervals

29

Rewrite In Place

Liveness Analysis Allocator

Live Intervals

29

Rewrite In Place

Liveness Analysis Allocator

Live Intervals

Modify code in-place

29

Rewrite In Place

Liveness Analysis Allocator

Live Intervals

Modify code in-place

Live intervalskept up-to-date

29

Rewrite In Place

Liveness Analysis Allocator

Live IntervalsLive intervals remain valid post-alloc

Modify code in-place

Live intervalskept up-to-date

29

Improvements

• New and better optimizations

• New allocators

• Cleaner infrastructure

30

Upcoming Changes

31

Upcoming Changes

• Live index renumbering

• Improved splitting

• Better def/kill tracking for values

32

Live Indexes

33

BB: . . %x2 = %x1 add %x3, %x2 . .

Live Indexes

33

BB: . . %x2 = %x1 add %x3, %x2 . .

Live Indexes

123456

33

%x3 = ...

BB: . . %x2 = %x1

add %x3, %x2 . .

Live Indexes

123

456

33

%x3 = ...

BB: . . %x2 = %x1

add %x3, %x2 . .

Live Indexes

123

456

?

33

BB: . . %x2 = %x1 %x3 = ... add %x3, %x2 . .

Live Indexes

P1P2P3

P4P5P6

34

BB: . . %x2 = %x1 %x3 = ... add %x3, %x2 . .

Live Indexes

P1P2P3

P4P5P6

P1 ≺ P2 ≺ P3 ≺ P4 ≺ P5 ≺ P634

BB: . . %x2 = %x1 %x3 = ... add %x3, %x2 . .

Live Indexes

P1P2P3

P4P5P6

P7

P1 ≺ P2 ≺ P3 ≺ P4 ≺ P5 ≺ P634

BB: . . %x2 = %x1 %x3 = ... add %x3, %x2 . .

Live Indexes

P1P2P3

P4P5P6

P7

P1 ≺ P2 ≺ P3 ≺ P4 ≺ P5 ≺ P6P7 ≺34

Live Indexes

• unsigned ➙ LiveIndex

• Index Renumbering

✓⌚

35

Improved Splitting

36

Improved Splitting

• Break multi-value intervals into component values.

36

Improved Splitting

• Break multi-value intervals into component values.

• Each value gets a 2nd chance at allocation.

36

Improved Splitting

• Break multi-value intervals into component values.

• Each value gets a 2nd chance at allocation.

• ... but not a 3rd.

36

Improved Splitting

• Break multi-value intervals into component values.

• Each value gets a 2nd chance at allocation.

• ... but not a 3rd.

• 13% reduction in static memory references on test case (a pathological SSE kernel).

36

Better Def/Kill Tracking

37

Better Def/Kill Tracking

• Defined by a PHI - Track the def block.

For Values

37

Better Def/Kill Tracking

• Defined by a PHI - Track the def block.

• Killed by a PHI - Track the appropriate predecessor.

For Values

37

Future Work

• Better value def/kill tracking

• LiveIndex renumbering

• Improved splitting

• Rewrite-in-place

• New allocators

⌚⌚⌚⌚

38

PBQP

39

PBQP

• Discrete optimization problems

• NP-complete

• Subclass solvable in linear time

Partitioned Boolean Quadratic Problems

40

Irregular Architectures

• Multiple register classes.

• Register aliasing.

• Register pairing.

• ...

41

PBQP Example

42

PBQP Example

ZX

Y

42

PBQP Example

901

436

702

ZX

Y

42

PBQP Example

901

436

702

ZX

Y

42

PBQP Example

901

436

702

3 8 18 5 35 2 7

5 2 26 7 31 4 8

1 9 34 5 39 7 6

ZX

Y

42

PBQP Example

901

436

702

3 8 18 5 35 2 7

5 2 26 7 31 4 8

1 9 34 5 39 7 6

ZX

Y

Solution [3,2,1]:

42

PBQP Example

901

436

702

3 8 18 5 35 2 7

5 2 26 7 31 4 8

1 9 34 5 39 7 6

ZX

Y

Solution [3,2,1]:

42

PBQP Example

901

436

702

3 8 18 5 35 2 7

5 2 26 7 31 4 8

1 9 34 5 39 7 6

ZX

Y

Solution [3,2,1]:

42

PBQP Example

901

436

702

3 8 18 5 35 2 7

5 2 26 7 31 4 8

1 9 34 5 39 7 6

ZX

Y

Solution [3,2,1]:

42

PBQP Example

901

436

702

3 8 18 5 35 2 7

5 2 26 7 31 4 8

1 9 34 5 39 7 6

ZX

YNode Costs: 6+0+9 = 15

Solution [3,2,1]:

42

PBQP Example

901

436

702

3 8 18 5 35 2 7

5 2 26 7 31 4 8

1 9 34 5 39 7 6

ZX

YNode Costs: 6+0+9 = 15

Solution [3,2,1]:

42

PBQP Example

901

436

702

3 8 18 5 35 2 7

5 2 26 7 31 4 8

1 9 34 5 39 7 6

ZX

YNode Costs: 6+0+9 = 15

Solution [3,2,1]:

42

PBQP Example

901

436

702

3 8 18 5 35 2 7

5 2 26 7 31 4 8

1 9 34 5 39 7 6

ZX

YNode Costs: 6+0+9 = 15

Solution [3,2,1]:

42

PBQP Example

901

436

702

3 8 18 5 35 2 7

5 2 26 7 31 4 8

1 9 34 5 39 7 6

ZX

YNode Costs: 6+0+9 = 15Edge Costs: 2+6+9 = 17

Solution [3,2,1]:

42

PBQP Example

901

436

702

3 8 18 5 35 2 7

5 2 26 7 31 4 8

1 9 34 5 39 7 6

ZX

YNode Costs: 6+0+9 = 15Edge Costs: 2+6+9 = 17

Total: 32

Solution [3,2,1]:

42

PBQP Example

901

436

702

3 8 18 5 35 2 7

5 2 26 7 31 4 8

1 9 34 5 39 7 6

ZX

YNode Costs: 6+0+9 = 15Edge Costs: 2+6+9 = 17

Total: 32

Solution [3,2,1]:

Solution [1,2,3]: 19

42

PBQP Example

900

400

700

0 0 00 ∞ 00 0 ∞

0 0 00 ∞ 00 0 ∞

0 0 00 ∞ 00 0 ∞

ZX

Y

Nodes represent virtual registers.

Options reflect storage locations.

Option costs:

Typically zero cost for registers,spill cost estimate for stack slot.

Edge costs:

Depends on the constraint.

For Register Allocation:

43

Example 1

X Y

Interference on a Regular Architecture

44

Example 1

400

X Y

Interference on a Regular Architecture

Sp

r0

r1

44

Example 1

400

700

X Y

Interference on a Regular Architecture

Sp

r0

r1

Sp

r0

r1

44

Example 1

400

700

0 0 00 ∞ 00 0 ∞

X Y

Interference on a Regular Architecture

Sp

r0

r1

Sp

r0

r1

44

Example 1

400

700

0 0 00 ∞ 00 0 ∞

X Y

Interference on a Regular Architecture

Sp

r0

r1

Sp

r0

r1

44

Example 1

400

700

0 0 00 ∞ 00 0 ∞

X Y

Interference on a Regular Architecture

Sp

r0

r1

Sp

r0

r1

Cost (Sp, Sp) = 1144

Example 1

400

700

0 0 00 ∞ 00 0 ∞

X Y

Interference on a Regular Architecture

Sp

r0

r1

Sp

r0

r1

Cost (r0, Sp) = 745

Example 1

400

700

0 0 00 ∞ 00 0 ∞

X Y

Interference on a Regular Architecture

Sp

r0

r1

Sp

r0

r1

Cost (Sp, r0) = 446

Example 1

400

700

0 0 00 ∞ 00 0 ∞

X Y

Interference on a Regular Architecture

Sp

r0

r1

Sp

r0

r1

Cost (r0, r1) = 047

Example 1

400

700

0 0 00 ∞ 00 0 ∞

X Y

Interference on a Regular Architecture

Sp

r0

r1

Sp

r0

r1

Cost (r0, r1) = 0 ✓47

0 0 00 ∞ 00 0 ∞

Example 1

400

700

X Y

Interference on a Regular Architecture

Sp

r0

r1

Sp

r0

r1

Cost (r0, r1) = 0 ✓48

Example 1

400

700

0 0 00 ∞ 00 0 ∞

X Y

Interference on a Regular Architecture

Sp

r0

r1

Sp

r0

r1

Cost (r0, r1) = 0 ✓49

Example 1

400

700

0 0 00 ∞ 00 0 ∞

X Y

Interference on a Regular Architecture

Sp

r0

r1

Sp

r0

r1

Cost (r0, r0) = ∞50

Example 1

400

700

0 0 00 ∞ 00 0 ∞

X Y

Interference on a Regular Architecture

Sp

r0

r1

Sp

r0

r1

Cost (r0, r0) = ✗∞50

Example 1

0 0 00 ∞ 00 0 ∞

Interference on a Regular Architecture

50

Example 2

0 0 0 0 00 ∞ ∞ 0 00 0 0 ∞ 0

Interference on an Irregular Architecture

Sp

AX

BX

Sp AL AH BL CL

51

Example 3

0 0 00 -c 00 0 -c

Coalescing

Sp

AX

BX

Sp AX BX

52

Example 4

Register Pairing (Ri, Ri+1)

Sp

s0

s1

Sp s0 s1 s2 s3

s2

0 0 0 0 00 ∞ 0 ∞ ∞0 ∞ ∞ 0 ∞0 ∞ ∞ ∞ 0

53

The PBQP Allocator

solution = solve pbqpsolution -> allocation

regalloc -> pbqp

54

The PBQP Allocator

solution = solve pbqpsolution -> allocation

regalloc -> pbqp

54

The PBQP Allocator

solution = solve pbqpsolution -> allocation

regalloc -> pbqp

54

The PBQP Allocator

llvm/lib/CodeGen/RegAllocPBQP.cpp

solution = solve pbqpsolution -> allocation

regalloc -> pbqp

54

How does it work?

Solver uses a graph reduction algorithm.

Reduce problem to the empty graph with reduction rules, then reconstruct it.

55

PBQP

56

PBQP

PROS CONS

• Ideal for Irregularity

• Very Simple

• Reasonable quality

• Slooooooow

56

PBQP

PROS CONS

• Ideal for Irregularity

• Very Simple

• Reasonable quality

• Slooooooow

56

PBQP

PROS CONS

• Ideal for Irregularity

• Very Simple

• Reasonable quality

• Slooooooow

• Perfect opportunity

for a coffee

56

57

• Improved Optimizations.

• New Allocators.

• Cleaner Architecture.

• PBQP.

57

the end.

58