CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling...
-
Upload
sophie-kierce -
Category
Documents
-
view
217 -
download
4
Transcript of CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling...
CMPUT 680 - Compiler Design and Optimization
1
CMPUT680 - Fall 2003
Topic J: Wavefront SchedulingJosé Nelson Amaral
http://www.cs.ualberta.ca/~amaral/courses/680
CMPUT 680 - Compiler Design and Optimization
2
Reading Material
Bharadwaj, J., Menezes, K., McKinsey, C., “Wavefront Scheduling: Path Based Data Representation and Scheduling of Subgraphs,” Proceedings of 32nd International Symposium on Microarchitecture, Dec. 1996, pp. 100-113.Bharadwaj, J., “Method and apparatus for instruction scheduling to reduce negative effects of compensation code,” Patent No. 5,894,576, April 3 1999
CMPUT 680 - Compiler Design and Optimization
3
New Concepts
Global Code Scheduler (GCS)
Region Formation
Wavefront Scheduling
Path Vectors
Deferred Compensation
P-ready Code Motion
CMPUT 680 - Compiler Design and Optimization
4
Scheduling Regions
Similar to Mahlke’s definition, here a region isa subgraph of a control flow graph that has aunique entry node that dominates all thenodes in the region.
There is a further restriction that the regions must be acyclic.
CMPUT 680 - Compiler Design and Optimization
5
JS-nodes
A Join-Split (JS) edge in a CFG goes from a split node to a join node.
A split node in a CFG is a node that hasmore than one immediate successor.
A join node in a CFG is a node that hasmore than one immediate predecessor.
C
B
D
B
D
CMPUT 680 - Compiler Design and Optimization
6
Removal of JS-nodes
C
B
D
The application of the wavefrontscheduling technique requires theremoval of al JS-nodes.
A JS-node is removed by adding an empty block (called a JS block)between the split node and the join node.
C
B
D
G
CMPUT 680 - Compiler Design and Optimization
7
Interface Blocks
A side entry node is a nodein the region that has at leastone immediate predecessor in the region, and at least oneimmediate predecessoroutside the region.
B
E
C D
Which nodes are side entry nodes in the example?
D
D
CMPUT 680 - Compiler Design and Optimization
8
Interface Blocks
A side exit node is a nodein the region that has at leastone immediate successor in the region, and at least oneimmediate successoroutside the region.
Which nodes are side exit nodes in the example?
C and D
C D
B
E
C DC D
CMPUT 680 - Compiler Design and Optimization
9
Interface Blocks
When control enters or leaves the region, GCS may require a block to schedule compensation code in. Thus interface blocks are inserted between two nodes x and y iff:
(i) x is outside of the region, y is a side entry node, and there is an edge (x,y), or
(ii) y is outside the region, x is a side exit node, and there is an edge (x,y).
CMPUT 680 - Compiler Design and Optimization
10
Interface Blocks
Where do we need interface blocks in thefollowing example?
B
E
C D
CMPUT 680 - Compiler Design and Optimization
11
Interface Blocks
We need three interface blocks.
B
E
C D
F
G H
CMPUT 680 - Compiler Design and Optimization
12
Hierarchical Regions
For the global code scheduler, regions arehierarchical:(1) First the code of an inner most loop is selected and scheduled.
(2) Then a summary of the data flow and resource usage of the loop is computed, and the loop is converted into a single node in the graph.
CMPUT 680 - Compiler Design and Optimization
13
Nested Regions
A
C
B
D
E
F2
F1
F3
A
C
B
D
E
F2
F1
F3
G
H J K I
G, J, and K are JS blocks H and I are interface blocks
CMPUT 680 - Compiler Design and Optimization
14
Path Vectors
There is a finite number of control paths inan acyclic scheduling region.
A path vector is a bit vector in which each bitin the vector represents a unique path in aregion.
A subset of paths can be represented by apath vector by writing 1 for the paths in thesubset and writing 0 for the paths not in thesubset.
CMPUT 680 - Compiler Design and Optimization
15
Paths in our Example
A
F
B
D
C G
E
JH
K I
Paths:P0: ABCDHP1: ABCDJEP2: ABGDHP3: ABGDJEP4: AFKEP5: AFI
We can define the subset ofall paths that include basicblock G as BP(G) = {P2, P3}
And we can represent this setby the block path vector:BPV(G) = [ 0 0 1 1 0 0]
CMPUT 680 - Compiler Design and Optimization
16
Paths in our Example
A
F
B
D
C G
E
JH
K I
Paths:P0: ABCDHP1: ABCDJEP2: ABGDHP3: ABGDJEP4: AFKEP5: AFI
P5 P4 P3 P3 P1 P0 BPV(A) = [ 1 1 1 1 1 1] BPV(B) = [ 0 0 1 1 1 1] BPV(C) = [ 0 0 0 0 1 1] BPV(D) = [ 0 0 1 1 1 1] BPV(E) = [ 0 1 1 0 1 0] BPV(F) = [ 1 1 0 0 0 0] BPV(G) = [ 0 0 1 1 0 0] BPV(H) = [ 0 0 0 1 0 1] BPV(I) = [ 1 0 0 0 0 0] BPV(J) = [ 0 0 1 0 1 0] BPV(K) = [ 0 1 0 0 0 0]
CMPUT 680 - Compiler Design and Optimization
17
Control Flow Relations
We can compute control flow relations such asdominance, post-dominance, control equivalence,disjointness, etc, by performing bitwise operationson these path vectors.
If BPV(x) = BPV(y), then blocks x and y arecontrol flow equivalent.
If BPV(x) is a superset of BPV(y), then block x either dominates or post-dominates block y.
CMPUT 680 - Compiler Design and Optimization
18
Paths in our Example
A
F
B
D
C G
E
JH
K I
Paths:P0: ABCDHP1: ABCDJEP2: ABGDHP3: ABGDJEP4: AFKEP5: AFI
P5 P4 P3 P3 P1 P0 BPV(A) = [ 1 1 1 1 1 1] BPV(B) = [ 0 0 1 1 1 1] BPV(C) = [ 0 0 0 0 1 1] BPV(D) = [ 0 0 1 1 1 1] BPV(E) = [ 0 1 1 0 1 0] BPV(F) = [ 1 1 0 0 0 0] BPV(G) = [ 0 0 1 1 0 0] BPV(H) = [ 0 0 0 1 0 1] BPV(I) = [ 1 0 0 0 0 0] BPV(J) = [ 0 0 1 0 1 0] BPV(K) = [ 0 1 0 0 0 0]
Example1: What is the relationbetween blocks B and D?
Blocks B and Dare control flow equivalentbecause BPV(B) = BPV(D).
CMPUT 680 - Compiler Design and Optimization
19
Paths in our Example
A
F
B
D
C G
E
JH
K I
Paths:P0: ABCDHP1: ABCDJEP2: ABGDHP3: ABGDJEP4: AFKEP5: AFI
P5 P4 P3 P3 P1 P0 BPV(A) = [ 1 1 1 1 1 1] BPV(B) = [ 0 0 1 1 1 1] BPV(C) = [ 0 0 0 0 1 1] BPV(D) = [ 0 0 1 1 1 1] BPV(E) = [ 0 1 1 0 1 0] BPV(F) = [ 1 1 0 0 0 0] BPV(G) = [ 0 0 1 1 0 0] BPV(H) = [ 0 0 0 1 0 1] BPV(I) = [ 1 0 0 0 0 0] BPV(J) = [ 0 0 1 0 1 0] BPV(K) = [ 0 1 0 0 0 0]
Either block A dominates or post-dominatesblock E because and BPV(A)is a superset of BPV(E).
Example 2: What is the relationbetween blocks B and D?
CMPUT 680 - Compiler Design and Optimization
20
Paths in our Example
A
F
B
D
C G
E
JH
K I
Paths:P0: ABCDHP1: ABCDJEP2: ABGDHP3: ABGDJEP4: AFKEP5: AFI
P5 P4 P3 P3 P1 P0 BPV(A) = [ 1 1 1 1 1 1] BPV(B) = [ 0 0 1 1 1 1] BPV(C) = [ 0 0 0 0 1 1] BPV(D) = [ 0 0 1 1 1 1] BPV(E) = [ 0 1 1 0 1 0] BPV(F) = [ 1 1 0 0 0 0] BPV(G) = [ 0 0 1 1 0 0] BPV(H) = [ 0 0 0 1 0 1] BPV(I) = [ 1 0 0 0 0 0] BPV(J) = [ 0 0 1 0 1 0]
BPV(K) = [ 0 1 0 0 0 0]
Example3: Likewise block E eitherdominates or post-dominatesblock K because and BPV(E)is a superset of BPV(K).
CMPUT 680 - Compiler Design and Optimization
21
Problems with Cross-Block Scheduling
Most cross-block scheduling techniques are notjudicious when scheduling compensation code.
Consider that the scheduling of an instruction M in block x requires compensation code in block y.
Most schedulers cannot evaluate how desirableit is to place the compensation code in y.
Some schedulers only allow M to be scheduledin x if y has not been scheduled yet.
Compensation code is code that needs to bescheduled somewhere else to compensate forthe execution of an instruction M on a block x.
CMPUT 680 - Compiler Design and Optimization
22
Wavefront
A scheduling region is an acyclic region withJS edges eliminated and interface blocks added.
A wavefront is a strongly independent cut set that partitions a scheduling region in three parts:
nodes above the wavefront nodes on the wavefront
nodes below the wavefront
The wavefront is strongly independent in the sensethat no control flow path flows through more than one node in the wavefront.
CMPUT 680 - Compiler Design and Optimization
23
Wavefront Dominance Property
The wavefront nodes collectively dominate allthe nodes below the wavefront, and collectivelypost-dominate all the nodes above the wavefront.
Consider two blocks in the region: Block k is not in the wavefront Block w is in the wavefrontThis property guarantees that when an instructionoriginally in block k is scheduled in block w,compensation code can be inserted entirely intoblocks in the wavefront.
CMPUT 680 - Compiler Design and Optimization
24
JS-nodes and Strongly Independent Cuts
A
F
B
D
C
E
JH
K I
Can you build a wavefrontthat includes C and satisfythe conditions of dominance,post-dominance, and nocontrol path including morethan one node in the wavefront?
First try: {C, F}
This wavefront does notpost-dominate A,B nor itdominates D, H, J, E.
CMPUT 680 - Compiler Design and Optimization
25
JS-nodes and Strongly Independent Cuts
A
F
B
D
C
E
JH
K I
Can you build a wavefrontthat includes C and satisfythe conditions of dominance,post-dominance, and nocontrol path including morethan one node in the wavefront?
The path ABCDH includestwo nodes in the wavefronttherefore the wavefront is not a strongly independent cut set.
Second try: {C, D, F}
CMPUT 680 - Compiler Design and Optimization
26
JS-nodes and Strongly Independent Cuts
A
F
B
D
C G
E
JH
K I
When the proper JS-nodeis inserted, we can easilyfind a wavefront that:(1) post-dominates all predecessors,(2) dominates all successors, and(3) is a strongly independent cut set (no control path includes more than one node in the wavefront).
CMPUT 680 - Compiler Design and Optimization
27
Wavefront Scheduling
In directional scheduling (either top-down or bottom-up)there is a region of code that is already scheduled,another region that is not yet scheduled, and a boundary.
In wavefront scheduling, the wavefront is this boundary.The wavefront moves up or down according to the direction of scheduling choosen.
CMPUT 680 - Compiler Design and Optimization
28
Example of Wavefront Scheduling
A
F
B
D
C G
E
JH
K I
W0
W2
W4
W1
W6W3
W5
CMPUT 680 - Compiler Design and Optimization
29
Deferred Compensation
A
B
E
C D
G
F
Consider that an instruction Mis originally in block A. If we wantto move M downward we have toschedule M in all paths that containan use of the variable defined by M.
For instance, assume that there is an use of M in G.
CMPUT 680 - Compiler Design and Optimization
30
Deferred Compensation
A
B
E
C D
G
F
Path Summary:P0 = AFGP1 = ABDEGP2 = ABCEG
Thus a clone of M must appearin paths P0, P1, and P2.
The compensation path vectorof an instruction M is the set ofall paths that must contain a cloneof M when M is not scheduled inits original basic block.
CPV(M) = [1 1 1]
CMPUT 680 - Compiler Design and Optimization
31
Deferred Compensation
A
B
E
C D
G
F
Path Summary:P0 = AFGP1 = ABDEGP2 = ABCEG
CPV(M) = [1 1 1]
W1
Assume that we decide thatit is desirable to schedule a clone of M, M’, in block F.
We update CPV(M) to: CPV(M) = CPV(M) - BPV(F)
= [1 1 1] - [0 0 1] = [1 1 0]
M’
CMPUT 680 - Compiler Design and Optimization
32
Deferred Compensation
A
B
E
C D
G
F
Path Summary:P0 = AFGP1 = ABDEGP2 = ABCEG
CPV(M) = [1 1 0]W2
Assume that at W2 we decide toschedule a clone of M, M’’, in block C.
CPV(M) = CPV(M) - BPV(C)= [1 1 1] - [1 0 0] = [0 1 0]
M’
CMPUT 680 - Compiler Design and Optimization
33
Deferred Compensation
A
B
E
C D
G
F
Path Summary:P0 = AFGP1 = ABDEGP2 = ABCEG
CPV(M) = [0 1 0]W2
Now we cannot close block Dunless we schedule M.
M’M’’
Because BPV(B) is a supersetof CPV(M) we know that this isthe last compensation copy ofM to be scheduled.
CMPUT 680 - Compiler Design and Optimization
34
When to Move Code?
Bharadwaj, Menezes and McKinsey define theusefulness of moving code from an origin block Oto a target block T in terms of the likelihood thatcontrol will flow through T and O given that controlreaches T.
( ) ( )( )( )( )TBPV
OBPVTBPV
Prob
Prob ∩
CMPUT 680 - Compiler Design and Optimization
35