1
Proactive Loop-nest Optimizations
Mei Ye
Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai
2
Adjacent Loops
Five little pumpkins sitting on a gate …
3
4
Func
If Block If
Then Else
If
Then Else
Loop Loop
Then Else
Loop IfBlock
Then
Loop
Else
5
Proactive Loop Fusion
An automation that applies a set of code transformations (if-merging, head/tail duplication, code motion and etc.) iteratively over the whole function without a fixed order to bring pairs of loops adjacent to each other for the purpose of enabling loop fusion.
6
Proactive Loop Fusion Candidates
A pair of loops are proactive loop fusion candidates iff:
1) Have a Least Common Predecessor (LCP) in the tree. 2) Paths from candidates to LCP have equal length.3) Each pair of nodes on the path have the same type. Pairs of Ifs have
identical values for condition expressions.4) Loops not adjacent to each other but are otherwise good fusion candidates.
O(( depth * n)^2) (depth: depth in tree, n: number of loops at that depth)
LCP
If Block If
Then Else Then Else
Loop1 Loop2
7
Proactive Loop Fusion Transformation Candidates
Proactive loop fusion transformation candidates, cand1 and cand2:1. Are immediate children of the LCP of loop fusion candidates.2. Are either a If or a Loop.3. For every sibling in-between (cand1, cand2) that is a Block or a If. The Block can be safely and legally move above cand1 if cand1 is a Loop. The If has at least one path that does not have dependency on loop fusion candidates.4. For every sibling in-between (cand1, cand2] that is a If, Its preceding siblings can be legally if-merged or head-duplicated into it. 5. For every sibling in-between [cand1, cand2) that is a if. Its succeeding siblings can be legally if-merged or tail-duplicated into it.
LCP
If Block If
Then Else Then Else
Loop1 Loop2
cand1 cand2
8
If Block Ifcand1 cand2
sc1
LCP
sc2
LCP
If If
sc1 sc2
tail-duplication
if-merging
LCP
If
(1)
(2)
(3)
9
Action Table
sc1 sc2 Action
Loop Block Safe code motion of sc2 before sc1;
Iteration continues on sc1. If Block Tail-duplication of sc2 into sc1;
Iteration continues on sc1. Loop If Head duplication of sc1 into sc2;
Iteration continues on sc2. If If If-merging or tail duplication of sc2
into sc1. Iteration continues on sc1.
If Loop Tail duplication of sc2 into sc1.
Iteration continues on sc1.
10
if (a) {
for (i=0; i<n;i++)
stmt1;
if (b)
stmt2;
}
if (a) {
for (i=0; i<n;i++)
stmt3;
}
if (a) {
for (i=0;i<n;i++)
stmt1;
if (b)
stmt2;
for (i=0;i<n;i++)
stmt3;
}
Func
If(a) If(a)
Then Else
Func
If(a)
cand1 cand2
Loop If(b)
Then Else
Else
Block
Loop
Then
Then Else
Loop If(b) Loop
Then Else
Block
----------------------------------if-merging------------------------------------------------
LCP
(sc1) (sc2)
LCP
11
if (a) {
for (i=0;i<n;i++)
stmt1;
if (b)
stmt2;
for (i=0;i<n;i++)
stmt3;
}
if (a) {
if (b) {
for (i=0;i<n;i++)
stmt1;
stmt2;
}
else {
for (i=0;i<n;i++)
stmt1;
}
for (i=0;i<n;i++)
stmt3;
}
If(a)
Then Else
Loop If(b) Loop
Then Else
Block
If(a)
Then Else
If(b) Loop
Then Else
Loop Block Loop
cand1 cand2
------------------------------head duplication-----------------------------------------------
LCP
(sc1) sc2
LCP
sc1 sc2
12
if (a) { if (b) { for (i=0;i<n;i++) stmt1; stmt2; } else { for (i=0;i<n;i++) stmt1; } for (i=0;i<n;i++) stmt3;}
if (a) { if (b) { for (i=0;i<n;i++) stmt1; stmt2; for (i=0; i<n;i++) stmt3; } else { for (i=0;i<n;i++) stmt1; for (i=0;i<n;i++) stmt3; }}
If(a)Then Else
If(b) Loop
Then Else
Loop Block Loop
sc2sc1
If(a)
Then Else
If(b)
Then Else
Loop Block Loop Loop Loop
---------------------------------- tail duplication----------------------------------------------------
LCP
LCP
13
if (a) { if (b) { for (i=0; i<n;i++) stmt1; stmt2; for (i=0;i<n;i++) stmt3; }}else { for (i=0;i<n;i++) stmt1; for(i=0;i<n;i++) stmt3;}
if (a) { if (b) { stmt2; for (i=0;i<n;i++) stmt1; for (i=0;i<n;i++) stmt3; }}else { for (i=0;i<n;i++) stmt1; for (i=0;i<n;i++) stmt3;}
If(a)
Then Else
If(b)Then Else
Loop Block Loop Loop Loop
cand1 cand2
If(a)
Then Else
If(b)
Then Else
Block Loop Loop Loop Loop
-----------------------------------code motion-------------------------------------------------------
LCP
(sc1) sc2
LCP
14
1. void COMP_UNIT::Pro_loop_fusion_trans() {2. // Identifying proactive loop fusion candidates and flags LCPs3. pro_loop_fusion_trans->Classify_loops(func);4. // Start a top-down proactive loop fusion transformations.5. pro_loop_fusion_trans->Top_down_trans(func); }
6. void PRO_LOOP_FUSION_TRANS::Top_down_trans(SC_NODE * sc) {7. if (sc is a LCP) { // Process LCPs8. while (1) {9. // Find proactive loop fusion transformation candidates.10. Find_cand(sc, &cand1, &cand2);11. // Invoke proactive loop fusion transformations.12. if (cand1 && cand2) 13. Traverse_trans(cand1, cand2);14. else15. break; }16. if (transformation happens) {17. // Re- identify proactive loop fusion candidates.18. Classify_loops(sc); } }19. // Recursively visit chid nodes. 20. SC_LIST_ITER sc_list_iter;21. SC_NODE * kid;22. FOR_ALL_ELEM(kid, sc_list_iter, Init(sc->Kids())) 23. Top_down_trans(kid); }
O(n*m) (n: number of LCPs, m: number of intervening nodes among loop fusion candidates)
15
Proactive Loop Interchange
An automation that applies loop unswitching, reverse loop unswitching, if-condition distribution, if-condition tree height reduction and other control flow graph transformations to eliminate intervening statements between the outer loop and the inner loop in a loop-nest for the purpose of enabling loop interchange.
16
for (i=0; i<n;i++) {
if (a & (1<<i)) {
if (b)
bar();
else if (c) {
for (j=0;j<m;j++)
a[j][i] = 0;
}
}
}
for (i=0;i<n;i++) {
if (a & (1<<i)) {
if (!b && c) {
for (j=0;j<m;j++)
a[j][i] = 0;
}
else if (b)
bar();
}
}
Loop
if(a&(1<<i))
Then Else
if(b)
Then Else
if(c)Block
Then Else
Loop
Loop
if (a&(1<<i))
Then Else
if(!b&&c)
Then Else
if(b)
Then Else
Block
Loop
-----------------------if-condition tree height reduction-------------------------
Loop
Loop
Loop
Loop
blue
red
red
blue
red
17
for (i=0; i<n;i++) {
if (a & (1<<i)) {
if (!b && c) {
for (j=0;j<m;j++)
a[j][i]=0;
}
else if (b)
bar();
}
}
for (i=0;i<n;i++) {
if (!b &&c) {
if (a & (1<<i)) {
for (j=0;j<m;j++)
a[j][i] = 0;
}
}
else if (b) {
if (a & (1<<i))
bar();
}
}
Loop
if(a&(1<<i))
Then Else
if(!b&&c)
Then Else
Loop if(b)
Then Else
Block
Loop
if(!b&&c)
Then Else
if(a&(1<<i))
Then Else
Loop
if(b)
Then Else
if(a&(1<<i))
Then Else
Block
------------------------------ if-condition distribution -------------------------------------------------------
Loop
blue
red
Loop
Loop
red
blue
Loop
18
for (i=0;i<n;i++) { if (!b && c) { if (a & (1<<i)) { for (j=0;j<m;j++) a[j][i]=0; } } else if (b) { if (a & (1<<i)) bar(); }}
for (i=0;i<n;i++) { if (!b && c) { for (j=0;j<m;j++) { if (a & (1<<i)) a[j][i]=0; } } else if (b) { if (a & (1<<i)) bar(); }}
Loop
if(!b&&c)Then Else
if(a&(1<<i))
Then Else
Loop
if(b)
Then Else
if(a&(1<<i))
Then Else
BlockBlock
Loop
if(!b&&c)
Then Else
Loop
if(a&(1<<i))
Then Else
Block
if(b)
Then Else
if(a&(1<<i))
Then Else
Block
----------------------------reversed loop un-switching----------------------------------
Loop
red
blue
Loop
Loop
red
Loop
19
ty
for (i=0;i<n;i++) { if (!b && c) { for (j=0;j<m;j++) { if (a & (1<<i)) a[j][i]=0; } } else if (b) { if (a & (1<<i)) bar(); }}
if (!b && c) { for (i=0;i<n;i++) { for (j=0;j<m;j++) { if (a & (1<<i)) a[j][i]=0; } }}else if (b) { for (i=0;i<n;i++) { if (a & (1<<i)) bar(); }}
Loop
if(!b&&c)
Then Else
Loop
if(a&(1<<i))
Then Else
Block
if(b)
Then Else
if(a&(1<<i))
Then Else
Block
if(!b&&c)
Then Else
Loop
Loop
if(a&(1<<i))
Then Else
Block
if(b)
Then Else
Loop
if(a&(1<<i))
Then Else
Block
---------------------------loop un-switching --------------------------------------------------------------
Loop
red
Loop
Loop
Loop
20
Heuristics
Proactive loop fusion Maximize loop fusion. Large or unknown trip count loops. Loops on symmetric paths with same iteration spaces. Pre-check on transformation legality.
Proactive loop interchange Fully-permutable loop-nest. Memory reference iterates on inner loop’s dimension. Inner loop
has large or unknown trip counts. Simply-nested if-regions. Pre-check on transformation legality.
21
Peak scores of libquantum
Binary Istanbul (1c) Istanbul (12c)
Default peak 52.8 174
Default peak + proactive loop fusion
81.6 (1.55x) 459 (2.64x)
Default peak + proactive loop fusion + proactive loop interchange
58.8 (-28%) 632 (+38%)
AMD Istanbul, 2.4GHz, 2 socket, 6 cores/socket, 64KB L1 instruction cache, 64KB L1 data cache, 512 KB L2 cache, 6MB/socket L3 cache, 32GB DDR2-800 memory, SLES10 SP2
22
Reference
Kit Barton (www.cs.ualberta.ca/~cbarton)
Gather intervening codes between loops using dominance relation. Build Data Dependence Graph of the intervening codes. Use schedule queue to identify movable nodes.
23
Barton’s Non-Adjacent loops example
while (i < N) {a += i;i++;
}b := a * 2;c := b + 6;g := 0;h := g + 10;if (c < 100)
d := c/2;else
e := c * 2;while (j < N) {
f := g + 6;j++;
}
b := a * 2;
c := b + 6;
g := 0;
if (c < 100)
d := c/2;
else
e := c * 2;
h := g + 10;
24
Barton’s Non-Adjacent loops example
while (i < N) {a += i;i++;
}b := a * 2;c := b + 6;g := 0;h := g + 10;if (c < 100)
d := c/2;else
e := c * 2;while (j < N) {
f := g + 6;j++;
}
g := 0;h := g + 10;while (i < N) {
a += i;i++;
}while (j < N) {
f := g + 6;j++;
}b := a * 2;c := b + 6;if (c < 100)
d := c/2;else
e := c * 2;
25
Barton’s Pros & Cons
Pros Powerful full-fledged code motion.
Cons Loops must be control-flow equivalent. No finer granularity in if-regions.
Top Related