Advanced Compiler Techniques
LIU Xianhua
School of EECS, Peking University
Loops
“Advanced Compiler Techniques”
Content
• Concepts:– Dominators– Depth-First Ordering– Back edges– Graph depth– Reducibility
• Natural Loops• Efficiency of Iterative Algorithms• Dependences & Loop Transformation
2
“Advanced Compiler Techniques”
Loops are Important!
• Loops dominate program execution time– Needs special treatment during
optimization• Loops also affect the running time of
program analyses– e.g., A dataflow problem can be solved
in just a single pass if a program has no loops
3
“Advanced Compiler Techniques”
Dominators
• Node d dominates node n if every path from the entry to n goes through d.– written as: d dom n
• Quick observations: Every node dominates itself. The entry dominates every node.
• Common Cases:– The test of a while loop dominates all blocks
in the loop body.– The test of an if-then-else dominates all
blocks in either branch.4
“Advanced Compiler Techniques”
Dominator Tree
• Immediate dominance: d idom n – d dom n, d n, no m s.t. d dom m and m
dom n• Immediate dominance relationships form a
tree1
35
24
1
35
24
5
“Advanced Compiler Techniques”
Finding Dominators
• A dataflow analysis problem: For each node, find all of its dominators.– Direction: forward– Confluence: set intersection– Boundary: OUT[Entry] = {Entry}– Initialization: OUT[B] = All nodes– Equations:• OUT[B] = IN[B] U {B}• IN[B] = p is a predecessor of B OUT[p]
6
7
Example: Dominators
1
35
24
{1,5}
{1,4}
{1,2,3}
{1,2}
{1}
{1}{1} {1}
{1} {1,2}
“Advanced Compiler Techniques”
“Advanced Compiler Techniques”
Depth-First Search
• Start at entry.• If you can follow an edge to an
unvisited node, do so.• If not, backtrack to your parent
(node from which you were visited).
8
“Advanced Compiler Techniques”
Depth-First Spanning Tree
• Root = entry.• Tree edges are the edges along
which we first visit the node at the head.
1
53
42
9
“Advanced Compiler Techniques”
Depth-First Node Order
• The reverse of the order in which a DFS retreats from the nodes.
1-4-5-2-3• Alternatively, reverse of postorder
traversal of the tree.3-2-5-4-1
1
35
24
10
“Advanced Compiler Techniques”
Four Kinds of Edges
1. Tree edges.2. Advancing edges (node to proper
descendant).3. Retreating edges (node to
ancestor, including edges to self).4. Cross edges (between two nodes,
neither of which is an ancestor of the other.
11
“Advanced Compiler Techniques”
A Little Magic
• Of these edges, only retreating edges go from high to low in DF order.– Example of proof: You must retreat from
the head of a tree edge before you can retreat from its tail.
• Also surprising: all cross edges go right to left in the DFST.– Assuming we add children of any node
from the left.12
13
Example: Non-Tree Edges
1
35
24
Retreating
Forward
Cross
“Advanced Compiler Techniques”
14
Back Edges
• An edge is a back edge if its head dominates its tail.
• Theorem: Every back edge is a retreating edge in every DFST of every flow graph.– Converse almost always true, but not
always. Backedge
Head reached beforetail in any DFST
Search must reach thetail before retreatingfrom the head, so tail isa descendant of the head
“Advanced Compiler Techniques”
15
Example: Back Edges
1
35
24
{1,5}
{1,4}
{1,2,3}
{1,2}
{1}
“Advanced Compiler Techniques”
16
Reducible Flow Graphs
• A flow graph is reducible if every retreating edge in any DFST for that flow graph is a back edge.
• Testing reducibility: Remove all back edges from the flow graph and check that the result is acyclic.
• Hint why it works: All cycles must include some retreating edge in every DFST.– In particular, the edge that enters the
first node of the cycle that is visited.“Advanced Compiler Techniques”
17
DFST on a Cycle
Depth-first searchreaches here first
Search must reachthese nodes beforeleaving the cycle
So this is aretreating edge
“Advanced Compiler Techniques”
“Advanced Compiler Techniques”
Why Reducibility?
• Folk theorem: All flow graphs in practice are reducible.
• Fact: If you use only while-loops, for-loops, repeat-loops, if-then(-else), break, and continue, then your flow graph is reducible.
18
19
Example: Remove Back Edges
1
35
24
Remaining graph is acyclic.
“Advanced Compiler Techniques”
20
Example: Nonreducible Graph
A
CB
In any DFST, oneof these edges willbe a retreating edge.
A
B
C
A
B
C
But no heads dominate their tails, so deletingback edges leaves the cycle.
“Advanced Compiler Techniques”
21
Why Care AboutBack/Retreating Edges?1. Proper ordering of nodes during
iterative algorithm assures number of passes limited by the number of “nested” back edges.
2. Depth of nested loops upper-bounds the number of nested back edges.
“Advanced Compiler Techniques”
“Advanced Compiler Techniques”
DF Order and Retreating Edges Suppose that for a RD analysis, we visit
nodes during each iteration in DF order. The fact that a definition d reaches a
block will propagate in one pass along any increasing sequence of blocks.
When d arrives at the tail of a retreating edge, it is too late to propagate d from OUT to IN. The IN at the head has already been
computed for that round.
22
23
Example: DF Order
1
35
24
d d
d
d
d
d
d d
d
d
Definition d isGen’d by node 2. The first pass
The second pass
“Advanced Compiler Techniques”
“Advanced Compiler Techniques”
Depth of a Flow Graph
• The depth of a flow graph with a given DFST and DF-order is the greatest number of retreating edges along any acyclic path.
• For RD, if we use DF order to visit nodes, we converge in depth+2 passes.– Depth+1 passes to follow that number
of increasing segments.– 1 more pass to realize we converged.24
25
Example: Depth = 2
1->4->7 ---> 3->10->17 ---> 6->18->20
increasing
retreating
increasingincreasing
retreating
Pass 1 Pass 2 Pass 3
“Advanced Compiler Techniques”
“Advanced Compiler Techniques”
Similarly . . .
• AE also works in depth+2 passes.– Unavailability propagates along retreat-
free node sequences in one pass.• So does LV if we use reverse of DF
order.– A use propagates backward along paths
that do not use a retreating edge in one pass.
26
“Advanced Compiler Techniques”
In General . . .
• The depth+2 bound works for any monotone framework, as long as information only needs to propagate along acyclic paths.– Example: if a definition reaches a point,
it does so along an acyclic path.
27
28
However . . . Constant propagation does not have this
property.
a = b
b = c
c = 1
L: a = b b = c c = 1 goto L
“Advanced Compiler Techniques”
“Advanced Compiler Techniques”
Why Depth+2 is Good
• Normal control-flow constructs produce reducible flow graphs with the number of back edges at most the nesting depth of loops.– Nesting depth tends to be small.
– A study by Knuth has shown that average depth of typical flow graphs =~2.75.
29
30
Example: Nested Loops
3 nested while-loops; depth = 3
3 nested repeat-loops; depth = 1
“Advanced Compiler Techniques”
“Advanced Compiler Techniques”
Natural Loops
• A natural loop is defined by:– A single entry-point called header• a header dominates all nodes in the loop
– A back edge that enters the loop header• Otherwise, it is not possible for the flow of
control to return to the header directly from the "loop" ; i.e., there really is no loop.
31
“Advanced Compiler Techniques”
Find Natural Loops
• The natural loop of a back edge a->b is {b} plus the set of nodes that can reach a without going through b
• Remove b from the flow graph, find all predecessors of a
• Theorem: two natural loops are either disjoint, identical, or nested.
32
33
Example: Natural Loops
1
35
24
Natural loopof 3 -> 2
Natural loopof 5 -> 1
“Advanced Compiler Techniques”
“Advanced Compiler Techniques”
Relationship between Loops
• If two loops do not have the same header– they are either disjoint, or– one is entirely contained (nested within) the
other– innermost loop: one that contains no other
loop.• If two loops share the same header– Hard to tell which is the inner loop– Combine as one
1
2
3 4
34
35
Basic Parallelism Examples:
FOR i = 1 to 100a[i] = b[i] + c[i]FOR i = 11 TO 20a[i] = a[i-1] + 3FOR i = 11 TO 20a[i] = a[i-10] + 3
Does there exist a data dependence edge between two different iterations?
A data dependence edge is loop-carried if it crosses iteration boundaries
DoAll loops: loops without loop-carried dependences
“Advanced Compiler Techniques”
“Advanced Compiler Techniques”
Data Dependence of Variables
• True dependence
• Anti-dependence
a = = a
= aa =
a = a =
= a = a
Output dependence
Input dependence
36
37
Affine Array Accesses Common patterns of data accesses: (i, j, k
are loop indexes)A[i], A[j], A[i-1], A[0], A[i+j], A[2*i], A[2*i+1], A[i,j], A[i-1, j+1]
Array indexes are affine expressions of surrounding loop indexes Loop indexes: in, in-1, ... , i1 Integer constants: cn, cn-1, ... , c0
Array index: cnin + cn-1in-1+ ... + c1i1+ c0 Affine expression: linear expression + a
constant term (c0)
“Advanced Compiler Techniques”
38
Formulating DataDependence AnalysisFOR i := 2 to 5 do
A[i-2] = A[i]+1; Between read access A[i] and write access A[i-2]
there is a dependence if: there exist two iterations ir and iw within the loop bounds, s.t. iterations ir & iw read & write the same array element, respectively
∃integers iw, ir 2≤iw,ir≤5 ir=iw-2 Between write access A[i-2] and write access A[i-
2] there is a dependence if:∃integers iw, iv 2≤iw,iv≤5 iw–2=iv–2
To rule out the case when the same instance depends on itself:
add constraint iw ≠ iv“Advanced Compiler Techniques”
39
Memory Disambiguation Undecidable at Compile Time
read(n)For i = …
a[i] = a[n]
“Advanced Compiler Techniques”
40
Domain of Data Dependence Analysis Only use loop bounds and array indexes that are
affine functions of loop variablesfor i = 1 to nfor j = 2i to 100
a[i + 2j + 3][4i + 2j][i * i] = …… = a[1][2i + 1][j]
Assume a data dependence between the read & write operation if there exists:
∃integers ir,jr,iw,jw 1 ≤ iw, ir ≤ n 2iw ≤ jw ≤ 1002ir ≤ jr ≤ 10iw + 2jw + 3 = 1 4iw + 2jw = 2ir + 1
Equate each dimension of array access; ignore non-affine ones No solution No data dependence Solution There may be a dependence
“Advanced Compiler Techniques”
41
Iteration Space
An abstraction for loops
Iteration is represented as coordinates in iteration space.
for i= 0, 5 for j = 0, 3 a[i, j] = 3
i
j
“Advanced Compiler Techniques”
42
Iteration Space An abstraction for
loops for i = 0, 5 for j = i, 3 a[i, j] = 0
i
j
“Advanced Compiler Techniques”
43
Iteration Space An abstraction for
loops for i = 0, 5 for j = i, 7 a[i, j] = 0
i
j
0000
7050
10110101
ji
“Advanced Compiler Techniques”
44
Affine Access
“Advanced Compiler Techniques”
45
Affine Transform
i
j
u
v
bjiB
vu
“Advanced Compiler Techniques”
46
Loop Transformation
for i = 1, 100 for j = 1, 200 A[i, j] = A[i, j] + 3 end_forend_for
for u = 1, 200 for v = 1, 100 A[v,u] = A[v,u]+ 3 end_forend_for
ji
vu
0110
“Advanced Compiler Techniques”
47
Old Iteration Space
ji
vu
0110
0000
2001
1001
10100101
ji
for i = 1, 100 for j = 1, 200 A[i, j] = A[i, j] + 3 end_forend_for
0000
2001
1001
1
0110
10100101
vu
“Advanced Compiler Techniques”
48
New Iteration Space
0000
2001
1001
01011010
vu
0000
2001
1001
1
0110
10100101
vu
for u = 1, 200 for v = 1, 100 A[v,u] = A[v,u]+ 3 end_forend_for
“Advanced Compiler Techniques”
49
Old Array Accesses
ji
vu
0110
]10,01[
ji
jiA
for i = 1, 100 for j = 1, 200 A[i, j] = A[i, j] + 3 end_forend_for
]1
011010,
1
011001[
vu
vu
A
“Advanced Compiler Techniques”
50
New Array Accesses
]1
011010,
1
011001[
vu
vu
A
for u = 1, 200 for v = 1, 100 A[v,u] = A[v,u]+ 3 end_forend_for
]01,10[
vu
vu
A
“Advanced Compiler Techniques”
51
Interchange Loops?
for i = 2, 1000 for j = 1, 1000 A[i, j] = A[i-1, j+1]+3 end_forend_for
• e.g. dependence vector dold = (1,-1)
i
jfor u = 1, 1000 for v = 2, 1000 A[v, u] = A[v-1, u+1]+3 end_forend_for
“Advanced Compiler Techniques”
52
Interchange Loops? A transformation is legal, if the new dependence
is lexicographically positive, i.e. the leading non-zero in the dependence vector is positive.
Distance vector (1,-1) = (4,2)-(3,3) Loop interchange is not legal if there exists
dependence (+, -)
ji
vu
0110
11
11
0110
0110
oldnew dd
“Advanced Compiler Techniques”
53
GCD Test
Is there any dependence?
Solve a linear Diophantine equation 2*iw = 2*ir + 1
for i = 1, 100 a[2*i] = … … = a[2*i+1] + 3
“Advanced Compiler Techniques”
54
GCD The greatest common divisor (GCD) of
integers a1, a2, …, an, denoted gcd(a1, a2, …, an), is the largest integer that evenly divides all these integers.
Theorem: The linear Diophantine equation
has an integer solution x1, x2, …, xn iff gcd(a1, a2, …, an) divides c
cxaxaxa nn ...2211
“Advanced Compiler Techniques”
55
Examples
Example 1: gcd(2,-2) = 2. No solutions
Example 2: gcd(24,36,54) = 6. Many solutions
122 21 xx
30543624 zyx
“Advanced Compiler Techniques”
56
Loop Fusion
for i = 1, 1000 A[i] = B[i] + 3end_for
for j = 1, 1000 C[j] = A[j] + 5end_for
for i = 1, 1000 A[i] = B[i] + 3 C[i] = A[i] + 5end_for
Better reuse between A[i] and A[i]
“Advanced Compiler Techniques”
57
Loop Distribution
for i = 1, 1000 A[i] = A[i-1] + 3end_for
for i = 1, 1000 C[i] = B[i] + 5end_for
for i = 1, 1000 A[i] = A[i-1] + 3 C[i] = B[i] + 5end_for
2nd loop is parallel
“Advanced Compiler Techniques”
Register Blocking
for j = 1, 2*m for i = 1, 2*n A[i, j] = A[i-1, j] + A[i-1, j-1] end_forend_for
for j = 1, 2*m, 2 for i = 1, 2*n, 2 A[i, j] = A[i-1,j] + A[i-1,j-1] A[i, j+1] = A[i-1,j+1] + A[i-1,j] A[i+1, j] = A[i, j] + A[i, j-1] A[i+1, j+1] = A[i, j+1] + A[i, j] end_forend_for
Better reuse between A[i,j] and A[i,j]
“Advanced Compiler Techniques” 58
Virtual Register Allocation
for j = 1, 2*M, 2 for i = 1, 2*N, 2 r1 = A[i-1,j] r2 = r1 + A[i-1,j-1] A[i, j] = r2 r3 = A[i-1,j+1] + r1 A[i, j+1] = r3 A[i+1, j] = r2 + A[i, j-1] A[i+1, j+1] = r3 + r2 end_forend_for
Memory operations reduced to register load/store
8MN loads to 4MN loads
“Advanced Compiler Techniques” 59
Scalar Replacement
for i = 2, N+1 = A[i-1]+1 A[i] =end_for
t1 = A[1]for i = 2, N+1 = t1 + 1 t1 = A[i] = t1end_for
Eliminate loads and stores for array references
“Advanced Compiler Techniques” 60
Unroll-and-Jam
for j = 1, 2*M for i = 1, N A[i, j] = A[i-1, j] + A[i-1, j-1] end_forend_for
for j = 1, 2*M, 2 for i = 1, N A[i, j]=A[i-1,j]+A[i-1,j-1] A[i, j+1]=A[i-1,j+1]+A[i-1,j] end_forend_for
Expose more opportunity for scalar replacement
“Advanced Compiler Techniques” 61
Large Arrays
for i = 1, 1000 for j = 1, 1000 A[i, j] = A[i, j] + B[j, i] end_forend_for
Suppose arrays A and B have row-major layout
B has poor cache locality. Loop interchange will not help.
“Advanced Compiler Techniques” 62
Loop Blocking
for v = 1, 1000, 20 for u = 1, 1000, 20 for j = v, v+19 for i = u, u+19 A[i, j] = A[i, j] + B[j, i] end_for end_for end_forend_for
Access to small blocks of the arrays has good cache locality.
“Advanced Compiler Techniques” 63
Loop Unrolling for ILP
for i = 1, 10 a[i] = b[i]; *p = ... end_for
for I = 1, 10, 2 a[i] = b[i]; *p = … a[i+1] = b[i+1]; *p = …end_for
Large scheduling regions. Fewer dynamic branches
Increased code size
“Advanced Compiler Techniques” 64
“Advanced Compiler Techniques”
Next Time
• Homework– 9.6.2, 9.6.4, 9.6.7
• Single Static Assignment (SSA)– Readings: Cytron'91, Chow'97
65
Top Related