March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.
-
Upload
jaylon-bellamy -
Category
Documents
-
view
220 -
download
2
Transcript of March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.
![Page 1: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/1.jpg)
March 14, 2002 1
CMPUT680 - Winter 2006
Topic C: Loop FusionKit Barton
www.cs.ualberta.ca/~cbarton
![Page 2: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/2.jpg)
March 14, 2002 2
Outline
• Definition of loop fusion
• Basic concepts
• Prerequisites of loop fusion
• A loop fusion algorithm
• Example
![Page 3: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/3.jpg)
March 14, 2002 3
Loop Fusion
• Combine 2 or more loops into a single loop
• This cannot violate any dependencies between the loop bodies
• Several conditions which must be met for fusion to occur
• Often these conditions are not initially satisfied
![Page 4: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/4.jpg)
March 14, 2002 4
Advantages of Loop Fusion
• Save increment and branch instructions
• Creates opportunities for data reuse
• Provide more instructions to instruction scheduler to balance the use of functional units
![Page 5: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/5.jpg)
March 14, 2002 5
Disadvantages of Loop Fusion
• Increase code size effecting instruction cache performance
• Increase register pressure within a loop
• Could cause the formation of loops with more complex control flow
![Page 6: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/6.jpg)
March 14, 2002 6
Background
• There has been extensive work done on loop fusion
• Most has focused on weighted loop fusion (Gao et al., Kennedy and McKinley, Megiddo and Sarkar)
• Extensive work has also been done it performing loop fusion to increase parallelism
![Page 7: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/7.jpg)
March 14, 2002 7
Weighted Loop Fusion
• Associates non-negative weights with each pair of loop nests
• Weights are a measurement of the expected gain if the two loops are fused
• Gains include potential for array contraction, data reuse and improved local register allocation
![Page 8: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/8.jpg)
March 14, 2002 8
Optimal Loop Fusion
• Fuse loops to optimize data reuse, taking into consideration resource constraints and register usage
• This problem is NP-Hard
![Page 9: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/9.jpg)
March 14, 2002 9
Maximal Loop Fusion
• Our approach is to perform maximal loop fusion
• Fuse as many loops as possible, without considering resource constraints
• Fuse loops as soon as possible, not considering the consequences
![Page 10: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/10.jpg)
March 14, 2002 Allen & Kennedy, p. 150, 353 10
Dominators and Post Dominators
• A node x in a directed graph G with a single exit node dominates node y in G if any path from the entry node of G to y must pass through x
• A node x in a directed graph G with a single exit node post-dominates node y in G if any path from y to the exit node of G must pass through x
![Page 11: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/11.jpg)
March 14, 2002 11
Requirements for Loop Fusion
i. Loops must have identical iteration counts (be conforming)
ii. Loops must be control-flow equivalent
iii. Loops must be adjacent
iv. There cannot be any negative distance dependencies between the loops
![Page 12: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/12.jpg)
March 14, 2002 12
Non-conforming Loops
• If iteration counts are different, one loop must be manipulated to make the iteration counts the same
1. Loop peeling
2. Introduce a guard into one of the loops
![Page 13: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/13.jpg)
March 14, 2002 13
Loop Peeling
• Find the difference between the iteration count of the two loops (n)
• Duplicate the body of the loop with the higher iteration count n times
• Update the iteration count of the peeled loop
![Page 14: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/14.jpg)
March 14, 2002 14
Loop Peeling Example
while (i < 10)
{a[i] = a[i - 1] * 2;
i++;
}
while (j < 12)
{b[j] = b[j - 1] - 2;
j++;
}
while (i < 10)
{
a[i] = a[i - 1] * 2;
i++;
}
while (j < 10)
{
b[j] = b[j - 1] - 2;
j++;
}
b[j] = b[j - 1] - 2;
j++;
b[j] = b[j - 1] - 2;
j++;
![Page 15: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/15.jpg)
March 14, 2002 15
Guarding Iterations
• Increase the iteration count of the loop with fewer iterations
• Insert a guard branch around statements that would not normally be executed
![Page 16: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/16.jpg)
March 14, 2002 16
Guarding Iterations Example
while (i < 10)
{a[i] = a[i - 1] * 2;
i++;
}
while (j < 12)
{b[j] = b[j - 1] - 2;
j++;
}
while (i < 12)
{
if (i < 10)
{
a[i] = a[i - 1] * 2;
i++;
}
}
while (j < 12)
{
b[j] = b[j - 1] - 2;
j++;
}
![Page 17: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/17.jpg)
March 14, 2002 17
Loop Peeling
• Advantage:• Does not generate control flow within a loop
body
• Disadvantage:• Generates additional code outside of loops,
which could possible intervene with other loops
![Page 18: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/18.jpg)
March 14, 2002 18
Guarding Iterations
• Advantages:• Does not introduce intervening code• Can be “undone” later
• Disadvantage:• Generates control flow within a loop
![Page 19: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/19.jpg)
March 14, 2002 19
Control Flow Equivalence
• Two loops are control-flow equivalent if when one executes, the other also executes
Loop 1
BB
Loop2
Loop 1
Loop 3
BB
Loop2
![Page 20: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/20.jpg)
March 14, 2002 20
Determining Control Flow Equivalence
• Use the concepts of dominators and post dominators. Two loops L1 and L2 are control-flow equivalent if the following two conditions are true:• L1 dominates L2; and • L2 post dominates L1.
![Page 21: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/21.jpg)
March 14, 2002 21
Intervening Code
• Two loops are adjacent if there are no statements between the two loops
• Can be determined using the CFG:• If the immediate successor of the first loop is
the second loop, the two loops are adjacent
• If two loops are not adjacent, there is intervening code between them.
![Page 22: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/22.jpg)
March 14, 2002 22
Dealing with Non-Adjacent Loops
• If two loops are not adjacent, we attempt to make them adjacent by moving the intervening code
• Intervening code can be moved:• Above the first loop• Below the second loop• Both
• as long as no data dependencies are violated
![Page 23: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/23.jpg)
March 14, 2002 23
Intervening Code Example
• Assume CFG has 20 nodes
• 0-5 are above Loop 1• 17-19 are below Loop 2• What algorithm should be
used to determine which nodes are between Loop1 and Loop2?
Loop 1
Loop 2
6
7
8 9
10 11 12
13 14
15
16
![Page 24: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/24.jpg)
March 14, 2002 24
Gathering Intervening Code
• Given two loops L1 and L2, a basic block B is intervening code between L1 and L2 if and only if:o B is strictly dominated by L1o B is not dominated by L2
• Once the dominance relations are known, the set subtraction can be efficiently computed using bit vectors
![Page 25: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/25.jpg)
March 14, 2002 25
Intervening Code Example
Loop 1
Loop 2
6
7
8 9
10 11 12
13 14
15
16
Loop 10000 0011 1111 1111 1111 1
Loop 2
0000 0000 0000 0000 1111 1
Difference
0000 0011 1111 1111 0000 0
![Page 26: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/26.jpg)
March 14, 2002 26
Analyze Intervening Code
• Build a DDG of the intervening code• Put all nodes with no predecessors into queue• For each node in the queue:
• If there are no dependencies between the node and the loop
• Mark node as moveable• Add all of the nodes immediate successors to the
queue
• All nodes marked can be moved around the loop
![Page 27: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/27.jpg)
March 14, 2002 27
Non-Adjacent loops example
while (i < N) {a += i;i++;
}b := a * 2;c := b + 6;g := 0;h := g + 10;if (c < 100)
d := c/2;else
e := c * 2;while (j < N) {
f := g + 6;j++;
}
b := a * 2;
c := b + 6;
g := 0;
if (c < 100)
d := c/2;
else
e := c * 2;
h := g + 10;
![Page 28: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/28.jpg)
March 14, 2002 28
Non-Adjacent loops example
while (i < N) {a += i;i++;
}b := a * 2;c := b + 6;g := 0;h := g + 10;if (c < 100)
d := c/2;else
e := c * 2;while (j < N) {
f := g + 6;j++;
}
g := 0;h := g + 10;while (i < N) {
a += i;i++;
}while (j < N) {
f := g + 6;j++;
}b := a * 2;c := b + 6;if (c < 100)
d := c/2;else
e := c * 2;
![Page 29: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/29.jpg)
March 14, 2002 29
Non-Adjacent loops example
b := a * 2;
c := b + 6;
g := 0;
if (c < 100)
d := c/2;
else
e := c * 2;
h := g + 10;
Node Queueb := a * 2;
g := 0;
DDG Loop 2
Moveable Nodes
c := b + 6;
if (c < 100)
d := c/2;
else
e := c * 2;
b := a * 2;
c := b + 6;
if (c < 100)
d := c/2;
else
e := c * 2;
while (j < N) {
f := g + 6;
j++;
}
![Page 30: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/30.jpg)
March 14, 2002 30
Non-Adjacent loops example
b := a * 2;
c := b + 6;
g := 0;
if (c < 100)
d := c/2;
else
e := c * 2;
h := g + 10;
Node Queueb := a * 2;
g := 0;
DDG Loop 1
Moveable Nodes
h := g + 10;
g := 0;
h := g + 10;
while (i < N) {
a += i;
i++;
}
![Page 31: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/31.jpg)
March 14, 2002 31
Dependencies Preventing Fusion
i = j = 1;
while (i < 10)
{
a[i] = c[i] + 10;
i++;
}
while (j < 10)
{
b[j] = a[j+1] * 2;
j++;
}
Can the following loops be fused?
![Page 32: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/32.jpg)
March 14, 2002 32
Dependencies Preventing Fusion
• If we look at the array access patterns of a[], we see the following
a[i] = c[i] + 10;
b[j] = a[j+1] * 2;
![Page 33: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/33.jpg)
March 14, 2002 33
Dependencies Preventing Fusion
• By aligning the array access patterns, we get the following:
a[i] = c[i] + 10;
b[j] = a[j+1] * 2;
![Page 34: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/34.jpg)
March 14, 2002 34
Loop Alignment
i = j = 1;
while (i < 10)
{
a[i] = c[i] + 10;
i++;
}
while (j < 10)
{
b[j] = a[j+1] * 2;
j++;
}
j = 1;
i = 2
a[1] = c[1] + 10;
while (i < 10)
{
a[i] = c[i] + 10;
i++;
}
while (j < 10)
{
b[j] = a[j+1] * 2;
j++;
}
![Page 35: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/35.jpg)
March 14, 2002 35
Loop Alignment
• Loop alignment can be used to remove dependencies between loop bodies
• Easy to do when all dependencies have the same distance
• Gets tricky when there are multiple dependencies with different distances
![Page 36: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/36.jpg)
March 14, 2002 36
Putting it all together
• We’ve seen ways to deal with each of the preconditions of loop fusion
• If the conditions are not met, we apply transformations to try and modify the code
• If the transformations are successful, loop fusion can occur
• But in what order should these transformations be applied?
![Page 37: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/37.jpg)
March 14, 2002 37
Loop Fusion Algorithm
For each Ni from outermost to innermost:
Gather control equivalent loops in Ni into LoopSets
For each set Si in LoopSets
remove non-eligible loops from Si
FusedLoops = trueDirection = forwardwhile FusedLoops == true
if |Si| < 2 breakCompute Dominance Relation
FusedLoops = LoopFusionPass(Si, Direction)Reverse Direction
![Page 38: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/38.jpg)
March 14, 2002 38
Loop Fusion AlgorithmLoopFusionPass(S, Direction)
FusedLoops = false
For each pair of loops Lj and Lk in S such that Lj dominates Lk in Direction
if (DependenceDistance(Lj, Lk) < 0) continue
if (InterveningCode(Lj, Lk) == true and
IsInterveningCodeMoveable(Lj, Lk) == false) continue
d = | IterationCount(Lj) – IterationCount(Lk) |
if (Lj and Lk are non-conforming and (d cannot be determined at compile time or d > MAXPEEL)) continue
if (Lj and Lk are non-conforming) Peel iterations
MoveInterveningCode(Lj, Lk)
if InterveningCode(Lj, Lk) == false
FuseLoops(Lj, Lk) FusedLoops = true
Return FusedLoops
![Page 39: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/39.jpg)
March 14, 2002 39
ExampleL1: do i1 = 1, n a(i1) = a(i1) * k1 end doL2: do i2 = 1, n-1 d(i2) = a(i2) - b(i2+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
Loop Set
L1
L2
L3
L4
![Page 40: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/40.jpg)
March 14, 2002 40
Peeling Loop 1L1: do i1 = 1, n a(i1) = a(i1) * k1 end doL2: do i2 = 1, n-1 d(i2) = a(i2) - b(i2+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S7: a(1) = a(1) * k1L1: do i1 = 1, n-1 a(i1+1) = a(i1+1) * k1 end doL2: do i2 = 1, n-1 d(i2) = a(i2) - b(i2+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
![Page 41: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/41.jpg)
March 14, 2002 41
Fuse L1 and L2S7: a(1) = a(1) * k1L5: do i5 = 1, n-1 a(i5+1) = a(i5+1) * k1
d(i5) = a(i5) - b(i5+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S7: a(1) = a(1) * k1L1: do i1 = 1, n-1 a(i1+1) = a(i1+1) * k1 end doL2: do i2 = 1, n-1 d(i2) = a(i2) - b(i2+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
![Page 42: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/42.jpg)
March 14, 2002 42
Compare L5 and L3
• We now compare loops L5 and L3
• They are not adjacent, but the intervening code can move
• Difference in iteration count is not know, so fusion fails
S7: a(1) = a(1) * k1L5: do i5 = 1, n-1 a(i5+1) = a(i5+1) * k1
d(i5) = a(i5) - b(i5+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
![Page 43: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/43.jpg)
March 14, 2002 43
Compare L5 and L4
Intervening CodeS7: a(1) = a(1) * k1L5: do i5 = 1, n-1 a(i5+1) = a(i5+1) * k1
d(i5) = a(i5) - b(i5+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S1: ds = 0.0
L3: do i3 = 1, m
ds = ds + d(i3)
end do
S2: if (n<m)
S3: c(n-2) = n
S4: else
S5: c(n-2) = m
![Page 44: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/44.jpg)
March 14, 2002 44
Peel L5S7: a(1) = a(1) * k1L5: do i5 = 1, n-1 a(i5+1) = a(i5+1) * k1
d(i5) = a(i5) - b(i5+1) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2L5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
![Page 45: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/45.jpg)
March 14, 2002 45
Move Intervening CodeS7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2L5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doS1: ds = 0.0L3: do i3 = 1, m ds = ds + d(i3) end doS2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
![Page 46: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/46.jpg)
March 14, 2002 46
Reverse PassS7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
Loop Set
L1
L3
L4
Sorted in Reverse Dominance Direction
L1
L3
L4
![Page 47: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/47.jpg)
March 14, 2002 47
Compare L4 and L3
• Compare L4 and L3• No dependencies to
prevent fusion• Iteration count cannot
be determined at compile time
• Fusion fails
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
![Page 48: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/48.jpg)
March 14, 2002 48
Compare L4 and L5
Intervening Code
L3: do i3 = 1, m
ds = ds + d(i3)
end do
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
![Page 49: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/49.jpg)
March 14, 2002 49
Move Intervening CodeS7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL3: do i3 = 1, m ds = ds + d(i3) end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end do
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end doL3: do i3 = 1, m ds = ds + d(i3) end do
![Page 50: March 14, 20021 CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton cbarton.](https://reader036.fdocuments.in/reader036/viewer/2022062404/551ac973550346b2288b5835/html5/thumbnails/50.jpg)
March 14, 2002 50
Fuse L4 and L1S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL6: do i5 = 1, n-2 a(i6+2) = a(i6+2) * k1 d(i6+1) = a(i6+1) - b(i6+2) * k2 b(i6) = a(i6) + b(i6) / c(i6) end doL3: do i3 = 1, m ds = ds + d(i3) end do
S7: a(1) = a(1) * k1S8: a(2) = a(2) * k1S9: d(1) = a(1) - b(2) * k2S1: ds = 0.0S2: if (n<m)S3: c(n-2) = nS4: elseS5: c(n-2) = mL5: do i5 = 1, n-2 a(i5+2) = a(i5+2) * k1 d(i5+1) = a(i5+1) - b(i5+2) * k2 end doL4: do i4 = 1, n-2 b(i4) = a(i4) + b(i4) / c(i4) end doL3: do i3 = 1, m ds = ds + d(i3) end do