Advanced Compiler Techniques

Post on 22-Feb-2016

45 views 0 download

Tags:

description

Advanced Compiler Techniques. Loops. LIU Xianhua School of EECS, Peking University. Content. Concepts: Dominators Depth-First Ordering Back edges Graph depth Reducibility Natural Loops Efficiency of Iterative Algorithms Dependences & Loop Transformation. Loops are Important!. - PowerPoint PPT Presentation

Transcript of Advanced Compiler Techniques

Advanced Compiler Techniques

LIU Xianhua

School of EECS, Peking University

Loops

“Advanced Compiler Techniques”

Content

• Concepts:– Dominators– Depth-First Ordering– Back edges– Graph depth– Reducibility

• Natural Loops• Efficiency of Iterative Algorithms• Dependences & Loop Transformation

2

“Advanced Compiler Techniques”

Loops are Important!

• Loops dominate program execution time– Needs special treatment during

optimization• Loops also affect the running time of

program analyses– e.g., A dataflow problem can be solved

in just a single pass if a program has no loops

3

“Advanced Compiler Techniques”

Dominators

• Node d dominates node n if every path from the entry to n goes through d.– written as: d dom n

• Quick observations: Every node dominates itself. The entry dominates every node.

• Common Cases:– The test of a while loop dominates all blocks

in the loop body.– The test of an if-then-else dominates all

blocks in either branch.4

“Advanced Compiler Techniques”

Dominator Tree

• Immediate dominance: d idom n – d dom n, d n, no m s.t. d dom m and m

dom n• Immediate dominance relationships form a

tree1

35

24

1

35

24

5

“Advanced Compiler Techniques”

Finding Dominators

• A dataflow analysis problem: For each node, find all of its dominators.– Direction: forward– Confluence: set intersection– Boundary: OUT[Entry] = {Entry}– Initialization: OUT[B] = All nodes– Equations:• OUT[B] = IN[B] U {B}• IN[B] = p is a predecessor of B OUT[p]

6

7

Example: Dominators

1

35

24

{1,5}

{1,4}

{1,2,3}

{1,2}

{1}

{1}{1} {1}

{1} {1,2}

“Advanced Compiler Techniques”

“Advanced Compiler Techniques”

Depth-First Search

• Start at entry.• If you can follow an edge to an

unvisited node, do so.• If not, backtrack to your parent

(node from which you were visited).

8

“Advanced Compiler Techniques”

Depth-First Spanning Tree

• Root = entry.• Tree edges are the edges along

which we first visit the node at the head.

1

53

42

9

“Advanced Compiler Techniques”

Depth-First Node Order

• The reverse of the order in which a DFS retreats from the nodes.

1-4-5-2-3• Alternatively, reverse of postorder

traversal of the tree.3-2-5-4-1

1

35

24

10

“Advanced Compiler Techniques”

Four Kinds of Edges

1. Tree edges.2. Advancing edges (node to proper

descendant).3. Retreating edges (node to

ancestor, including edges to self).4. Cross edges (between two nodes,

neither of which is an ancestor of the other.

11

“Advanced Compiler Techniques”

A Little Magic

• Of these edges, only retreating edges go from high to low in DF order.– Example of proof: You must retreat from

the head of a tree edge before you can retreat from its tail.

• Also surprising: all cross edges go right to left in the DFST.– Assuming we add children of any node

from the left.12

13

Example: Non-Tree Edges

1

35

24

Retreating

Forward

Cross

“Advanced Compiler Techniques”

14

Back Edges

• An edge is a back edge if its head dominates its tail.

• Theorem: Every back edge is a retreating edge in every DFST of every flow graph.– Converse almost always true, but not

always. Backedge

Head reached beforetail in any DFST

Search must reach thetail before retreatingfrom the head, so tail isa descendant of the head

“Advanced Compiler Techniques”

15

Example: Back Edges

1

35

24

{1,5}

{1,4}

{1,2,3}

{1,2}

{1}

“Advanced Compiler Techniques”

16

Reducible Flow Graphs

• A flow graph is reducible if every retreating edge in any DFST for that flow graph is a back edge.

• Testing reducibility: Remove all back edges from the flow graph and check that the result is acyclic.

• Hint why it works: All cycles must include some retreating edge in every DFST.– In particular, the edge that enters the

first node of the cycle that is visited.“Advanced Compiler Techniques”

17

DFST on a Cycle

Depth-first searchreaches here first

Search must reachthese nodes beforeleaving the cycle

So this is aretreating edge

“Advanced Compiler Techniques”

“Advanced Compiler Techniques”

Why Reducibility?

• Folk theorem: All flow graphs in practice are reducible.

• Fact: If you use only while-loops, for-loops, repeat-loops, if-then(-else), break, and continue, then your flow graph is reducible.

18

19

Example: Remove Back Edges

1

35

24

Remaining graph is acyclic.

“Advanced Compiler Techniques”

20

Example: Nonreducible Graph

A

CB

In any DFST, oneof these edges willbe a retreating edge.

A

B

C

A

B

C

But no heads dominate their tails, so deletingback edges leaves the cycle.

“Advanced Compiler Techniques”

21

Why Care AboutBack/Retreating Edges?1. Proper ordering of nodes during

iterative algorithm assures number of passes limited by the number of “nested” back edges.

2. Depth of nested loops upper-bounds the number of nested back edges.

“Advanced Compiler Techniques”

“Advanced Compiler Techniques”

DF Order and Retreating Edges Suppose that for a RD analysis, we visit

nodes during each iteration in DF order. The fact that a definition d reaches a

block will propagate in one pass along any increasing sequence of blocks.

When d arrives at the tail of a retreating edge, it is too late to propagate d from OUT to IN. The IN at the head has already been

computed for that round.

22

23

Example: DF Order

1

35

24

d d

d

d

d

d

d d

d

d

Definition d isGen’d by node 2. The first pass

The second pass

“Advanced Compiler Techniques”

“Advanced Compiler Techniques”

Depth of a Flow Graph

• The depth of a flow graph with a given DFST and DF-order is the greatest number of retreating edges along any acyclic path.

• For RD, if we use DF order to visit nodes, we converge in depth+2 passes.– Depth+1 passes to follow that number

of increasing segments.– 1 more pass to realize we converged.24

25

Example: Depth = 2

1->4->7 ---> 3->10->17 ---> 6->18->20

increasing

retreating

increasingincreasing

retreating

Pass 1 Pass 2 Pass 3

“Advanced Compiler Techniques”

“Advanced Compiler Techniques”

Similarly . . .

• AE also works in depth+2 passes.– Unavailability propagates along retreat-

free node sequences in one pass.• So does LV if we use reverse of DF

order.– A use propagates backward along paths

that do not use a retreating edge in one pass.

26

“Advanced Compiler Techniques”

In General . . .

• The depth+2 bound works for any monotone framework, as long as information only needs to propagate along acyclic paths.– Example: if a definition reaches a point,

it does so along an acyclic path.

27

28

However . . . Constant propagation does not have this

property.

a = b

b = c

c = 1

L: a = b b = c c = 1 goto L

“Advanced Compiler Techniques”

“Advanced Compiler Techniques”

Why Depth+2 is Good

• Normal control-flow constructs produce reducible flow graphs with the number of back edges at most the nesting depth of loops.– Nesting depth tends to be small.

– A study by Knuth has shown that average depth of typical flow graphs =~2.75.

29

30

Example: Nested Loops

3 nested while-loops; depth = 3

3 nested repeat-loops; depth = 1

“Advanced Compiler Techniques”

“Advanced Compiler Techniques”

Natural Loops

• A natural loop is defined by:– A single entry-point called header• a header dominates all nodes in the loop

– A back edge that enters the loop header• Otherwise, it is not possible for the flow of

control to return to the header directly from the "loop" ; i.e., there really is no loop.

31

“Advanced Compiler Techniques”

Find Natural Loops

• The natural loop of a back edge a->b is {b} plus the set of nodes that can reach a without going through b

• Remove b from the flow graph, find all predecessors of a

• Theorem: two natural loops are either disjoint, identical, or nested.

32

33

Example: Natural Loops

1

35

24

Natural loopof 3 -> 2

Natural loopof 5 -> 1

“Advanced Compiler Techniques”

“Advanced Compiler Techniques”

Relationship between Loops

• If two loops do not have the same header– they are either disjoint, or– one is entirely contained (nested within) the

other– innermost loop: one that contains no other

loop.• If two loops share the same header– Hard to tell which is the inner loop– Combine as one

1

2

3 4

34

35

Basic Parallelism Examples:

FOR i = 1 to 100a[i] = b[i] + c[i]FOR i = 11 TO 20a[i] = a[i-1] + 3FOR i = 11 TO 20a[i] = a[i-10] + 3

Does there exist a data dependence edge between two different iterations?

A data dependence edge is loop-carried if it crosses iteration boundaries

DoAll loops: loops without loop-carried dependences

“Advanced Compiler Techniques”

“Advanced Compiler Techniques”

Data Dependence of Variables

• True dependence

• Anti-dependence

a = = a

= aa =

a = a =

= a = a

Output dependence

Input dependence

36

37

Affine Array Accesses Common patterns of data accesses: (i, j, k

are loop indexes)A[i], A[j], A[i-1], A[0], A[i+j], A[2*i], A[2*i+1], A[i,j], A[i-1, j+1]

Array indexes are affine expressions of surrounding loop indexes Loop indexes: in, in-1, ... , i1 Integer constants: cn, cn-1, ... , c0

Array index: cnin + cn-1in-1+ ... + c1i1+ c0 Affine expression: linear expression + a

constant term (c0)

“Advanced Compiler Techniques”

38

Formulating DataDependence AnalysisFOR i := 2 to 5 do

A[i-2] = A[i]+1; Between read access A[i] and write access A[i-2]

there is a dependence if: there exist two iterations ir and iw within the loop bounds, s.t. iterations ir & iw read & write the same array element, respectively

∃integers iw, ir 2≤iw,ir≤5 ir=iw-2 Between write access A[i-2] and write access A[i-

2] there is a dependence if:∃integers iw, iv 2≤iw,iv≤5 iw–2=iv–2

To rule out the case when the same instance depends on itself:

add constraint iw ≠ iv“Advanced Compiler Techniques”

39

Memory Disambiguation Undecidable at Compile Time

read(n)For i = …

a[i] = a[n]

“Advanced Compiler Techniques”

40

Domain of Data Dependence Analysis Only use loop bounds and array indexes that are

affine functions of loop variablesfor i = 1 to nfor j = 2i to 100

a[i + 2j + 3][4i + 2j][i * i] = …… = a[1][2i + 1][j]

Assume a data dependence between the read & write operation if there exists:

∃integers ir,jr,iw,jw 1 ≤ iw, ir ≤ n 2iw ≤ jw ≤ 1002ir ≤ jr ≤ 10iw + 2jw + 3 = 1 4iw + 2jw = 2ir + 1

Equate each dimension of array access; ignore non-affine ones No solution No data dependence Solution There may be a dependence

“Advanced Compiler Techniques”

41

Iteration Space

An abstraction for loops

Iteration is represented as coordinates in iteration space.

for i= 0, 5 for j = 0, 3 a[i, j] = 3

i

j

“Advanced Compiler Techniques”

42

Iteration Space An abstraction for

loops for i = 0, 5 for j = i, 3 a[i, j] = 0

i

j

“Advanced Compiler Techniques”

43

Iteration Space An abstraction for

loops for i = 0, 5 for j = i, 7 a[i, j] = 0

i

j

0000

7050

10110101

ji

“Advanced Compiler Techniques”

44

Affine Access

“Advanced Compiler Techniques”

45

Affine Transform

i

j

u

v

bjiB

vu

“Advanced Compiler Techniques”

46

Loop Transformation

for i = 1, 100 for j = 1, 200 A[i, j] = A[i, j] + 3 end_forend_for

for u = 1, 200 for v = 1, 100 A[v,u] = A[v,u]+ 3 end_forend_for

ji

vu

0110

“Advanced Compiler Techniques”

47

Old Iteration Space

ji

vu

0110

0000

2001

1001

10100101

ji

for i = 1, 100 for j = 1, 200 A[i, j] = A[i, j] + 3 end_forend_for

0000

2001

1001

1

0110

10100101

vu

“Advanced Compiler Techniques”

48

New Iteration Space

0000

2001

1001

01011010

vu

0000

2001

1001

1

0110

10100101

vu

for u = 1, 200 for v = 1, 100 A[v,u] = A[v,u]+ 3 end_forend_for

“Advanced Compiler Techniques”

49

Old Array Accesses

ji

vu

0110

]10,01[

ji

jiA

for i = 1, 100 for j = 1, 200 A[i, j] = A[i, j] + 3 end_forend_for

]1

011010,

1

011001[

vu

vu

A

“Advanced Compiler Techniques”

50

New Array Accesses

]1

011010,

1

011001[

vu

vu

A

for u = 1, 200 for v = 1, 100 A[v,u] = A[v,u]+ 3 end_forend_for

]01,10[

vu

vu

A

“Advanced Compiler Techniques”

51

Interchange Loops?

for i = 2, 1000 for j = 1, 1000 A[i, j] = A[i-1, j+1]+3 end_forend_for

• e.g. dependence vector dold = (1,-1)

i

jfor u = 1, 1000 for v = 2, 1000 A[v, u] = A[v-1, u+1]+3 end_forend_for

“Advanced Compiler Techniques”

52

Interchange Loops? A transformation is legal, if the new dependence

is lexicographically positive, i.e. the leading non-zero in the dependence vector is positive.

Distance vector (1,-1) = (4,2)-(3,3) Loop interchange is not legal if there exists

dependence (+, -)

ji

vu

0110

11

11

0110

0110

oldnew dd

“Advanced Compiler Techniques”

53

GCD Test

Is there any dependence?

Solve a linear Diophantine equation 2*iw = 2*ir + 1

for i = 1, 100 a[2*i] = … … = a[2*i+1] + 3

“Advanced Compiler Techniques”

54

GCD The greatest common divisor (GCD) of

integers a1, a2, …, an, denoted gcd(a1, a2, …, an), is the largest integer that evenly divides all these integers.

Theorem: The linear Diophantine equation

has an integer solution x1, x2, …, xn iff gcd(a1, a2, …, an) divides c

cxaxaxa nn ...2211

“Advanced Compiler Techniques”

55

Examples

Example 1: gcd(2,-2) = 2. No solutions

Example 2: gcd(24,36,54) = 6. Many solutions

122 21 xx

30543624 zyx

“Advanced Compiler Techniques”

56

Loop Fusion

for i = 1, 1000 A[i] = B[i] + 3end_for

for j = 1, 1000 C[j] = A[j] + 5end_for

for i = 1, 1000 A[i] = B[i] + 3 C[i] = A[i] + 5end_for

Better reuse between A[i] and A[i]

“Advanced Compiler Techniques”

57

Loop Distribution

for i = 1, 1000 A[i] = A[i-1] + 3end_for

for i = 1, 1000 C[i] = B[i] + 5end_for

for i = 1, 1000 A[i] = A[i-1] + 3 C[i] = B[i] + 5end_for

2nd loop is parallel

“Advanced Compiler Techniques”

Register Blocking

for j = 1, 2*m for i = 1, 2*n A[i, j] = A[i-1, j] + A[i-1, j-1] end_forend_for

for j = 1, 2*m, 2 for i = 1, 2*n, 2 A[i, j] = A[i-1,j] + A[i-1,j-1] A[i, j+1] = A[i-1,j+1] + A[i-1,j] A[i+1, j] = A[i, j] + A[i, j-1] A[i+1, j+1] = A[i, j+1] + A[i, j] end_forend_for

Better reuse between A[i,j] and A[i,j]

“Advanced Compiler Techniques” 58

Virtual Register Allocation

for j = 1, 2*M, 2 for i = 1, 2*N, 2 r1 = A[i-1,j] r2 = r1 + A[i-1,j-1] A[i, j] = r2 r3 = A[i-1,j+1] + r1 A[i, j+1] = r3 A[i+1, j] = r2 + A[i, j-1] A[i+1, j+1] = r3 + r2 end_forend_for

Memory operations reduced to register load/store

8MN loads to 4MN loads

“Advanced Compiler Techniques” 59

Scalar Replacement

for i = 2, N+1 = A[i-1]+1 A[i] =end_for

t1 = A[1]for i = 2, N+1 = t1 + 1 t1 = A[i] = t1end_for

Eliminate loads and stores for array references

“Advanced Compiler Techniques” 60

Unroll-and-Jam

for j = 1, 2*M for i = 1, N A[i, j] = A[i-1, j] + A[i-1, j-1] end_forend_for

for j = 1, 2*M, 2 for i = 1, N A[i, j]=A[i-1,j]+A[i-1,j-1] A[i, j+1]=A[i-1,j+1]+A[i-1,j] end_forend_for

Expose more opportunity for scalar replacement

“Advanced Compiler Techniques” 61

Large Arrays

for i = 1, 1000 for j = 1, 1000 A[i, j] = A[i, j] + B[j, i] end_forend_for

Suppose arrays A and B have row-major layout

B has poor cache locality. Loop interchange will not help.

“Advanced Compiler Techniques” 62

Loop Blocking

for v = 1, 1000, 20 for u = 1, 1000, 20 for j = v, v+19 for i = u, u+19 A[i, j] = A[i, j] + B[j, i] end_for end_for end_forend_for

Access to small blocks of the arrays has good cache locality.

“Advanced Compiler Techniques” 63

Loop Unrolling for ILP

for i = 1, 10 a[i] = b[i]; *p = ... end_for

for I = 1, 10, 2 a[i] = b[i]; *p = … a[i+1] = b[i+1]; *p = …end_for

Large scheduling regions. Fewer dynamic branches

Increased code size

“Advanced Compiler Techniques” 64

“Advanced Compiler Techniques”

Next Time

• Homework– 9.6.2, 9.6.4, 9.6.7

• Single Static Assignment (SSA)– Readings: Cytron'91, Chow'97

65