Advanced Compiler Techniques

65
Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Loops

description

Advanced Compiler Techniques. Loops. LIU Xianhua School of EECS, Peking University. Content. Concepts: Dominators Depth-First Ordering Back edges Graph depth Reducibility Natural Loops Efficiency of Iterative Algorithms Dependences & Loop Transformation. Loops are Important!. - PowerPoint PPT Presentation

Transcript of Advanced Compiler Techniques

Page 1: Advanced Compiler Techniques

Advanced Compiler Techniques

LIU Xianhua

School of EECS, Peking University

Loops

Page 2: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Content

• Concepts:– Dominators– Depth-First Ordering– Back edges– Graph depth– Reducibility

• Natural Loops• Efficiency of Iterative Algorithms• Dependences & Loop Transformation

2

Page 3: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Loops are Important!

• Loops dominate program execution time– Needs special treatment during

optimization• Loops also affect the running time of

program analyses– e.g., A dataflow problem can be solved

in just a single pass if a program has no loops

3

Page 4: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Dominators

• Node d dominates node n if every path from the entry to n goes through d.– written as: d dom n

• Quick observations: Every node dominates itself. The entry dominates every node.

• Common Cases:– The test of a while loop dominates all blocks

in the loop body.– The test of an if-then-else dominates all

blocks in either branch.4

Page 5: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Dominator Tree

• Immediate dominance: d idom n – d dom n, d n, no m s.t. d dom m and m

dom n• Immediate dominance relationships form a

tree1

35

24

1

35

24

5

Page 6: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Finding Dominators

• A dataflow analysis problem: For each node, find all of its dominators.– Direction: forward– Confluence: set intersection– Boundary: OUT[Entry] = {Entry}– Initialization: OUT[B] = All nodes– Equations:• OUT[B] = IN[B] U {B}• IN[B] = p is a predecessor of B OUT[p]

6

Page 7: Advanced Compiler Techniques

7

Example: Dominators

1

35

24

{1,5}

{1,4}

{1,2,3}

{1,2}

{1}

{1}{1} {1}

{1} {1,2}

“Advanced Compiler Techniques”

Page 8: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Depth-First Search

• Start at entry.• If you can follow an edge to an

unvisited node, do so.• If not, backtrack to your parent

(node from which you were visited).

8

Page 9: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Depth-First Spanning Tree

• Root = entry.• Tree edges are the edges along

which we first visit the node at the head.

1

53

42

9

Page 10: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Depth-First Node Order

• The reverse of the order in which a DFS retreats from the nodes.

1-4-5-2-3• Alternatively, reverse of postorder

traversal of the tree.3-2-5-4-1

1

35

24

10

Page 11: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Four Kinds of Edges

1. Tree edges.2. Advancing edges (node to proper

descendant).3. Retreating edges (node to

ancestor, including edges to self).4. Cross edges (between two nodes,

neither of which is an ancestor of the other.

11

Page 12: Advanced Compiler Techniques

“Advanced Compiler Techniques”

A Little Magic

• Of these edges, only retreating edges go from high to low in DF order.– Example of proof: You must retreat from

the head of a tree edge before you can retreat from its tail.

• Also surprising: all cross edges go right to left in the DFST.– Assuming we add children of any node

from the left.12

Page 13: Advanced Compiler Techniques

13

Example: Non-Tree Edges

1

35

24

Retreating

Forward

Cross

“Advanced Compiler Techniques”

Page 14: Advanced Compiler Techniques

14

Back Edges

• An edge is a back edge if its head dominates its tail.

• Theorem: Every back edge is a retreating edge in every DFST of every flow graph.– Converse almost always true, but not

always. Backedge

Head reached beforetail in any DFST

Search must reach thetail before retreatingfrom the head, so tail isa descendant of the head

“Advanced Compiler Techniques”

Page 15: Advanced Compiler Techniques

15

Example: Back Edges

1

35

24

{1,5}

{1,4}

{1,2,3}

{1,2}

{1}

“Advanced Compiler Techniques”

Page 16: Advanced Compiler Techniques

16

Reducible Flow Graphs

• A flow graph is reducible if every retreating edge in any DFST for that flow graph is a back edge.

• Testing reducibility: Remove all back edges from the flow graph and check that the result is acyclic.

• Hint why it works: All cycles must include some retreating edge in every DFST.– In particular, the edge that enters the

first node of the cycle that is visited.“Advanced Compiler Techniques”

Page 17: Advanced Compiler Techniques

17

DFST on a Cycle

Depth-first searchreaches here first

Search must reachthese nodes beforeleaving the cycle

So this is aretreating edge

“Advanced Compiler Techniques”

Page 18: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Why Reducibility?

• Folk theorem: All flow graphs in practice are reducible.

• Fact: If you use only while-loops, for-loops, repeat-loops, if-then(-else), break, and continue, then your flow graph is reducible.

18

Page 19: Advanced Compiler Techniques

19

Example: Remove Back Edges

1

35

24

Remaining graph is acyclic.

“Advanced Compiler Techniques”

Page 20: Advanced Compiler Techniques

20

Example: Nonreducible Graph

A

CB

In any DFST, oneof these edges willbe a retreating edge.

A

B

C

A

B

C

But no heads dominate their tails, so deletingback edges leaves the cycle.

“Advanced Compiler Techniques”

Page 21: Advanced Compiler Techniques

21

Why Care AboutBack/Retreating Edges?1. Proper ordering of nodes during

iterative algorithm assures number of passes limited by the number of “nested” back edges.

2. Depth of nested loops upper-bounds the number of nested back edges.

“Advanced Compiler Techniques”

Page 22: Advanced Compiler Techniques

“Advanced Compiler Techniques”

DF Order and Retreating Edges Suppose that for a RD analysis, we visit

nodes during each iteration in DF order. The fact that a definition d reaches a

block will propagate in one pass along any increasing sequence of blocks.

When d arrives at the tail of a retreating edge, it is too late to propagate d from OUT to IN. The IN at the head has already been

computed for that round.

22

Page 23: Advanced Compiler Techniques

23

Example: DF Order

1

35

24

d d

d

d

d

d

d d

d

d

Definition d isGen’d by node 2. The first pass

The second pass

“Advanced Compiler Techniques”

Page 24: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Depth of a Flow Graph

• The depth of a flow graph with a given DFST and DF-order is the greatest number of retreating edges along any acyclic path.

• For RD, if we use DF order to visit nodes, we converge in depth+2 passes.– Depth+1 passes to follow that number

of increasing segments.– 1 more pass to realize we converged.24

Page 25: Advanced Compiler Techniques

25

Example: Depth = 2

1->4->7 ---> 3->10->17 ---> 6->18->20

increasing

retreating

increasingincreasing

retreating

Pass 1 Pass 2 Pass 3

“Advanced Compiler Techniques”

Page 26: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Similarly . . .

• AE also works in depth+2 passes.– Unavailability propagates along retreat-

free node sequences in one pass.• So does LV if we use reverse of DF

order.– A use propagates backward along paths

that do not use a retreating edge in one pass.

26

Page 27: Advanced Compiler Techniques

“Advanced Compiler Techniques”

In General . . .

• The depth+2 bound works for any monotone framework, as long as information only needs to propagate along acyclic paths.– Example: if a definition reaches a point,

it does so along an acyclic path.

27

Page 28: Advanced Compiler Techniques

28

However . . . Constant propagation does not have this

property.

a = b

b = c

c = 1

L: a = b b = c c = 1 goto L

“Advanced Compiler Techniques”

Page 29: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Why Depth+2 is Good

• Normal control-flow constructs produce reducible flow graphs with the number of back edges at most the nesting depth of loops.– Nesting depth tends to be small.

– A study by Knuth has shown that average depth of typical flow graphs =~2.75.

29

Page 30: Advanced Compiler Techniques

30

Example: Nested Loops

3 nested while-loops; depth = 3

3 nested repeat-loops; depth = 1

“Advanced Compiler Techniques”

Page 31: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Natural Loops

• A natural loop is defined by:– A single entry-point called header• a header dominates all nodes in the loop

– A back edge that enters the loop header• Otherwise, it is not possible for the flow of

control to return to the header directly from the "loop" ; i.e., there really is no loop.

31

Page 32: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Find Natural Loops

• The natural loop of a back edge a->b is {b} plus the set of nodes that can reach a without going through b

• Remove b from the flow graph, find all predecessors of a

• Theorem: two natural loops are either disjoint, identical, or nested.

32

Page 33: Advanced Compiler Techniques

33

Example: Natural Loops

1

35

24

Natural loopof 3 -> 2

Natural loopof 5 -> 1

“Advanced Compiler Techniques”

Page 34: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Relationship between Loops

• If two loops do not have the same header– they are either disjoint, or– one is entirely contained (nested within) the

other– innermost loop: one that contains no other

loop.• If two loops share the same header– Hard to tell which is the inner loop– Combine as one

1

2

3 4

34

Page 35: Advanced Compiler Techniques

35

Basic Parallelism Examples:

FOR i = 1 to 100a[i] = b[i] + c[i]FOR i = 11 TO 20a[i] = a[i-1] + 3FOR i = 11 TO 20a[i] = a[i-10] + 3

Does there exist a data dependence edge between two different iterations?

A data dependence edge is loop-carried if it crosses iteration boundaries

DoAll loops: loops without loop-carried dependences

“Advanced Compiler Techniques”

Page 36: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Data Dependence of Variables

• True dependence

• Anti-dependence

a = = a

= aa =

a = a =

= a = a

Output dependence

Input dependence

36

Page 37: Advanced Compiler Techniques

37

Affine Array Accesses Common patterns of data accesses: (i, j, k

are loop indexes)A[i], A[j], A[i-1], A[0], A[i+j], A[2*i], A[2*i+1], A[i,j], A[i-1, j+1]

Array indexes are affine expressions of surrounding loop indexes Loop indexes: in, in-1, ... , i1 Integer constants: cn, cn-1, ... , c0

Array index: cnin + cn-1in-1+ ... + c1i1+ c0 Affine expression: linear expression + a

constant term (c0)

“Advanced Compiler Techniques”

Page 38: Advanced Compiler Techniques

38

Formulating DataDependence AnalysisFOR i := 2 to 5 do

A[i-2] = A[i]+1; Between read access A[i] and write access A[i-2]

there is a dependence if: there exist two iterations ir and iw within the loop bounds, s.t. iterations ir & iw read & write the same array element, respectively

∃integers iw, ir 2≤iw,ir≤5 ir=iw-2 Between write access A[i-2] and write access A[i-

2] there is a dependence if:∃integers iw, iv 2≤iw,iv≤5 iw–2=iv–2

To rule out the case when the same instance depends on itself:

add constraint iw ≠ iv“Advanced Compiler Techniques”

Page 39: Advanced Compiler Techniques

39

Memory Disambiguation Undecidable at Compile Time

read(n)For i = …

a[i] = a[n]

“Advanced Compiler Techniques”

Page 40: Advanced Compiler Techniques

40

Domain of Data Dependence Analysis Only use loop bounds and array indexes that are

affine functions of loop variablesfor i = 1 to nfor j = 2i to 100

a[i + 2j + 3][4i + 2j][i * i] = …… = a[1][2i + 1][j]

Assume a data dependence between the read & write operation if there exists:

∃integers ir,jr,iw,jw 1 ≤ iw, ir ≤ n 2iw ≤ jw ≤ 1002ir ≤ jr ≤ 10iw + 2jw + 3 = 1 4iw + 2jw = 2ir + 1

Equate each dimension of array access; ignore non-affine ones No solution No data dependence Solution There may be a dependence

“Advanced Compiler Techniques”

Page 41: Advanced Compiler Techniques

41

Iteration Space

An abstraction for loops

Iteration is represented as coordinates in iteration space.

for i= 0, 5 for j = 0, 3 a[i, j] = 3

i

j

“Advanced Compiler Techniques”

Page 42: Advanced Compiler Techniques

42

Iteration Space An abstraction for

loops for i = 0, 5 for j = i, 3 a[i, j] = 0

i

j

“Advanced Compiler Techniques”

Page 43: Advanced Compiler Techniques

43

Iteration Space An abstraction for

loops for i = 0, 5 for j = i, 7 a[i, j] = 0

i

j

0000

7050

10110101

ji

“Advanced Compiler Techniques”

Page 44: Advanced Compiler Techniques

44

Affine Access

“Advanced Compiler Techniques”

Page 45: Advanced Compiler Techniques

45

Affine Transform

i

j

u

v

bjiB

vu

“Advanced Compiler Techniques”

Page 46: Advanced Compiler Techniques

46

Loop Transformation

for i = 1, 100 for j = 1, 200 A[i, j] = A[i, j] + 3 end_forend_for

for u = 1, 200 for v = 1, 100 A[v,u] = A[v,u]+ 3 end_forend_for

ji

vu

0110

“Advanced Compiler Techniques”

Page 47: Advanced Compiler Techniques

47

Old Iteration Space

ji

vu

0110

0000

2001

1001

10100101

ji

for i = 1, 100 for j = 1, 200 A[i, j] = A[i, j] + 3 end_forend_for

0000

2001

1001

1

0110

10100101

vu

“Advanced Compiler Techniques”

Page 48: Advanced Compiler Techniques

48

New Iteration Space

0000

2001

1001

01011010

vu

0000

2001

1001

1

0110

10100101

vu

for u = 1, 200 for v = 1, 100 A[v,u] = A[v,u]+ 3 end_forend_for

“Advanced Compiler Techniques”

Page 49: Advanced Compiler Techniques

49

Old Array Accesses

ji

vu

0110

]10,01[

ji

jiA

for i = 1, 100 for j = 1, 200 A[i, j] = A[i, j] + 3 end_forend_for

]1

011010,

1

011001[

vu

vu

A

“Advanced Compiler Techniques”

Page 50: Advanced Compiler Techniques

50

New Array Accesses

]1

011010,

1

011001[

vu

vu

A

for u = 1, 200 for v = 1, 100 A[v,u] = A[v,u]+ 3 end_forend_for

]01,10[

vu

vu

A

“Advanced Compiler Techniques”

Page 51: Advanced Compiler Techniques

51

Interchange Loops?

for i = 2, 1000 for j = 1, 1000 A[i, j] = A[i-1, j+1]+3 end_forend_for

• e.g. dependence vector dold = (1,-1)

i

jfor u = 1, 1000 for v = 2, 1000 A[v, u] = A[v-1, u+1]+3 end_forend_for

“Advanced Compiler Techniques”

Page 52: Advanced Compiler Techniques

52

Interchange Loops? A transformation is legal, if the new dependence

is lexicographically positive, i.e. the leading non-zero in the dependence vector is positive.

Distance vector (1,-1) = (4,2)-(3,3) Loop interchange is not legal if there exists

dependence (+, -)

ji

vu

0110

11

11

0110

0110

oldnew dd

“Advanced Compiler Techniques”

Page 53: Advanced Compiler Techniques

53

GCD Test

Is there any dependence?

Solve a linear Diophantine equation 2*iw = 2*ir + 1

for i = 1, 100 a[2*i] = … … = a[2*i+1] + 3

“Advanced Compiler Techniques”

Page 54: Advanced Compiler Techniques

54

GCD The greatest common divisor (GCD) of

integers a1, a2, …, an, denoted gcd(a1, a2, …, an), is the largest integer that evenly divides all these integers.

Theorem: The linear Diophantine equation

has an integer solution x1, x2, …, xn iff gcd(a1, a2, …, an) divides c

cxaxaxa nn ...2211

“Advanced Compiler Techniques”

Page 55: Advanced Compiler Techniques

55

Examples

Example 1: gcd(2,-2) = 2. No solutions

Example 2: gcd(24,36,54) = 6. Many solutions

122 21 xx

30543624 zyx

“Advanced Compiler Techniques”

Page 56: Advanced Compiler Techniques

56

Loop Fusion

for i = 1, 1000 A[i] = B[i] + 3end_for

for j = 1, 1000 C[j] = A[j] + 5end_for

for i = 1, 1000 A[i] = B[i] + 3 C[i] = A[i] + 5end_for

Better reuse between A[i] and A[i]

“Advanced Compiler Techniques”

Page 57: Advanced Compiler Techniques

57

Loop Distribution

for i = 1, 1000 A[i] = A[i-1] + 3end_for

for i = 1, 1000 C[i] = B[i] + 5end_for

for i = 1, 1000 A[i] = A[i-1] + 3 C[i] = B[i] + 5end_for

2nd loop is parallel

“Advanced Compiler Techniques”

Page 58: Advanced Compiler Techniques

Register Blocking

for j = 1, 2*m for i = 1, 2*n A[i, j] = A[i-1, j] + A[i-1, j-1] end_forend_for

for j = 1, 2*m, 2 for i = 1, 2*n, 2 A[i, j] = A[i-1,j] + A[i-1,j-1] A[i, j+1] = A[i-1,j+1] + A[i-1,j] A[i+1, j] = A[i, j] + A[i, j-1] A[i+1, j+1] = A[i, j+1] + A[i, j] end_forend_for

Better reuse between A[i,j] and A[i,j]

“Advanced Compiler Techniques” 58

Page 59: Advanced Compiler Techniques

Virtual Register Allocation

for j = 1, 2*M, 2 for i = 1, 2*N, 2 r1 = A[i-1,j] r2 = r1 + A[i-1,j-1] A[i, j] = r2 r3 = A[i-1,j+1] + r1 A[i, j+1] = r3 A[i+1, j] = r2 + A[i, j-1] A[i+1, j+1] = r3 + r2 end_forend_for

Memory operations reduced to register load/store

8MN loads to 4MN loads

“Advanced Compiler Techniques” 59

Page 60: Advanced Compiler Techniques

Scalar Replacement

for i = 2, N+1 = A[i-1]+1 A[i] =end_for

t1 = A[1]for i = 2, N+1 = t1 + 1 t1 = A[i] = t1end_for

Eliminate loads and stores for array references

“Advanced Compiler Techniques” 60

Page 61: Advanced Compiler Techniques

Unroll-and-Jam

for j = 1, 2*M for i = 1, N A[i, j] = A[i-1, j] + A[i-1, j-1] end_forend_for

for j = 1, 2*M, 2 for i = 1, N A[i, j]=A[i-1,j]+A[i-1,j-1] A[i, j+1]=A[i-1,j+1]+A[i-1,j] end_forend_for

Expose more opportunity for scalar replacement

“Advanced Compiler Techniques” 61

Page 62: Advanced Compiler Techniques

Large Arrays

for i = 1, 1000 for j = 1, 1000 A[i, j] = A[i, j] + B[j, i] end_forend_for

Suppose arrays A and B have row-major layout

B has poor cache locality. Loop interchange will not help.

“Advanced Compiler Techniques” 62

Page 63: Advanced Compiler Techniques

Loop Blocking

for v = 1, 1000, 20 for u = 1, 1000, 20 for j = v, v+19 for i = u, u+19 A[i, j] = A[i, j] + B[j, i] end_for end_for end_forend_for

Access to small blocks of the arrays has good cache locality.

“Advanced Compiler Techniques” 63

Page 64: Advanced Compiler Techniques

Loop Unrolling for ILP

for i = 1, 10 a[i] = b[i]; *p = ... end_for

for I = 1, 10, 2 a[i] = b[i]; *p = … a[i+1] = b[i+1]; *p = …end_for

Large scheduling regions. Fewer dynamic branches

Increased code size

“Advanced Compiler Techniques” 64

Page 65: Advanced Compiler Techniques

“Advanced Compiler Techniques”

Next Time

• Homework– 9.6.2, 9.6.4, 9.6.7

• Single Static Assignment (SSA)– Readings: Cytron'91, Chow'97

65