CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling...

CMPUT 680 - Compiler Design and Optimization

1

CMPUT680 - Fall 2003

Topic J: Wavefront SchedulingJosé Nelson Amaral

http://www.cs.ualberta.ca/~amaral/courses/680


2

Reading Material

Bharadwaj, J., Menezes, K., McKinsey, C., “Wavefront Scheduling: Path Based Data Representation and Scheduling of Subgraphs,” Proceedings of 32nd International Symposium on Microarchitecture, Dec. 1996, pp. 100-113.Bharadwaj, J., “Method and apparatus for instruction scheduling to reduce negative effects of compensation code,” Patent No. 5,894,576, April 3 1999


3

New Concepts

Global Code Scheduler (GCS)

Region Formation

Wavefront Scheduling

Path Vectors

Deferred Compensation

P-ready Code Motion


4

Scheduling Regions

Similar to Mahlke’s definition, here a region isa subgraph of a control flow graph that has aunique entry node that dominates all thenodes in the region.

There is a further restriction that the regions must be acyclic.


5

JS-nodes

A Join-Split (JS) edge in a CFG goes from a split node to a join node.

A split node in a CFG is a node that hasmore than one immediate successor.

A join node in a CFG is a node that hasmore than one immediate predecessor.

C

B

D

B

D


6

Removal of JS-nodes

C

B

D

The application of the wavefrontscheduling technique requires theremoval of al JS-nodes.

A JS-node is removed by adding an empty block (called a JS block)between the split node and the join node.

C

B

D

G


7

Interface Blocks

A side entry node is a nodein the region that has at leastone immediate predecessor in the region, and at least oneimmediate predecessoroutside the region.

B

E

C D

Which nodes are side entry nodes in the example?

D

D


8

Interface Blocks

A side exit node is a nodein the region that has at leastone immediate successor in the region, and at least oneimmediate successoroutside the region.

Which nodes are side exit nodes in the example?

C and D

C D

B

E

C DC D


9

Interface Blocks

When control enters or leaves the region, GCS may require a block to schedule compensation code in. Thus interface blocks are inserted between two nodes x and y iff:

(i) x is outside of the region, y is a side entry node, and there is an edge (x,y), or

(ii) y is outside the region, x is a side exit node, and there is an edge (x,y).


10

Interface Blocks

Where do we need interface blocks in thefollowing example?

B

E

C D


11

Interface Blocks

We need three interface blocks.

B

E

C D

F

G H


12

Hierarchical Regions

For the global code scheduler, regions arehierarchical:(1) First the code of an inner most loop is selected and scheduled.

(2) Then a summary of the data flow and resource usage of the loop is computed, and the loop is converted into a single node in the graph.


13

Nested Regions

A

C

B

D

E

F2

F1

F3

A

C

B

D

E

F2

F1

F3

G

H J K I

G, J, and K are JS blocks H and I are interface blocks


14

Path Vectors

There is a finite number of control paths inan acyclic scheduling region.

A path vector is a bit vector in which each bitin the vector represents a unique path in aregion.

A subset of paths can be represented by apath vector by writing 1 for the paths in thesubset and writing 0 for the paths not in thesubset.


15

Paths in our Example

A

F

B

D

C G

E

JH

K I

Paths:P0: ABCDHP1: ABCDJEP2: ABGDHP3: ABGDJEP4: AFKEP5: AFI

We can define the subset ofall paths that include basicblock G as BP(G) = {P2, P3}

And we can represent this setby the block path vector:BPV(G) = [ 0 0 1 1 0 0]


16


A

F

B

D

C G

E

JH

K I


P5 P4 P3 P3 P1 P0 BPV(A) = [ 1 1 1 1 1 1] BPV(B) = [ 0 0 1 1 1 1] BPV(C) = [ 0 0 0 0 1 1] BPV(D) = [ 0 0 1 1 1 1] BPV(E) = [ 0 1 1 0 1 0] BPV(F) = [ 1 1 0 0 0 0] BPV(G) = [ 0 0 1 1 0 0] BPV(H) = [ 0 0 0 1 0 1] BPV(I) = [ 1 0 0 0 0 0] BPV(J) = [ 0 0 1 0 1 0] BPV(K) = [ 0 1 0 0 0 0]


17

Control Flow Relations

We can compute control flow relations such asdominance, post-dominance, control equivalence,disjointness, etc, by performing bitwise operationson these path vectors.

If BPV(x) = BPV(y), then blocks x and y arecontrol flow equivalent.

If BPV(x) is a superset of BPV(y), then block x either dominates or post-dominates block y.


18


A

F

B

D

C G

E

JH

K I



Example1: What is the relationbetween blocks B and D?

Blocks B and Dare control flow equivalentbecause BPV(B) = BPV(D).


19


A

F

B

D

C G

E

JH

K I



Either block A dominates or post-dominatesblock E because and BPV(A)is a superset of BPV(E).

Example 2: What is the relationbetween blocks B and D?


20


A

F

B

D

C G

E

JH

K I


P5 P4 P3 P3 P1 P0 BPV(A) = [ 1 1 1 1 1 1] BPV(B) = [ 0 0 1 1 1 1] BPV(C) = [ 0 0 0 0 1 1] BPV(D) = [ 0 0 1 1 1 1] BPV(E) = [ 0 1 1 0 1 0] BPV(F) = [ 1 1 0 0 0 0] BPV(G) = [ 0 0 1 1 0 0] BPV(H) = [ 0 0 0 1 0 1] BPV(I) = [ 1 0 0 0 0 0] BPV(J) = [ 0 0 1 0 1 0]

BPV(K) = [ 0 1 0 0 0 0]

Example3: Likewise block E eitherdominates or post-dominatesblock K because and BPV(E)is a superset of BPV(K).


21

Problems with Cross-Block Scheduling

Most cross-block scheduling techniques are notjudicious when scheduling compensation code.

Consider that the scheduling of an instruction M in block x requires compensation code in block y.

Most schedulers cannot evaluate how desirableit is to place the compensation code in y.

Some schedulers only allow M to be scheduledin x if y has not been scheduled yet.

Compensation code is code that needs to bescheduled somewhere else to compensate forthe execution of an instruction M on a block x.


22

Wavefront

A scheduling region is an acyclic region withJS edges eliminated and interface blocks added.

A wavefront is a strongly independent cut set that partitions a scheduling region in three parts:

nodes above the wavefront nodes on the wavefront

nodes below the wavefront

The wavefront is strongly independent in the sensethat no control flow path flows through more than one node in the wavefront.


23

Wavefront Dominance Property

The wavefront nodes collectively dominate allthe nodes below the wavefront, and collectivelypost-dominate all the nodes above the wavefront.

Consider two blocks in the region: Block k is not in the wavefront Block w is in the wavefrontThis property guarantees that when an instructionoriginally in block k is scheduled in block w,compensation code can be inserted entirely intoblocks in the wavefront.


24

JS-nodes and Strongly Independent Cuts

A

F

B

D

C

E

JH

K I

Can you build a wavefrontthat includes C and satisfythe conditions of dominance,post-dominance, and nocontrol path including morethan one node in the wavefront?

First try: {C, F}

This wavefront does notpost-dominate A,B nor itdominates D, H, J, E.


25


A

F

B

D

C

E

JH

K I

Can you build a wavefrontthat includes C and satisfythe conditions of dominance,post-dominance, and nocontrol path including morethan one node in the wavefront?

The path ABCDH includestwo nodes in the wavefronttherefore the wavefront is not a strongly independent cut set.

Second try: {C, D, F}


26


A

F

B

D

C G

E

JH

K I

When the proper JS-nodeis inserted, we can easilyfind a wavefront that:(1) post-dominates all predecessors,(2) dominates all successors, and(3) is a strongly independent cut set (no control path includes more than one node in the wavefront).


27

Wavefront Scheduling

In directional scheduling (either top-down or bottom-up)there is a region of code that is already scheduled,another region that is not yet scheduled, and a boundary.

In wavefront scheduling, the wavefront is this boundary.The wavefront moves up or down according to the direction of scheduling choosen.


28

Example of Wavefront Scheduling

A

F

B

D

C G

E

JH

K I

W0

W2

W4

W1

W6W3

W5


29


A

B

E

C D

G

F

Consider that an instruction Mis originally in block A. If we wantto move M downward we have toschedule M in all paths that containan use of the variable defined by M.

For instance, assume that there is an use of M in G.


30


A

B

E

C D

G

F

Path Summary:P0 = AFGP1 = ABDEGP2 = ABCEG

Thus a clone of M must appearin paths P0, P1, and P2.

The compensation path vectorof an instruction M is the set ofall paths that must contain a cloneof M when M is not scheduled inits original basic block.

CPV(M) = [1 1 1]


31


A

B

E

C D

G

F


CPV(M) = [1 1 1]

W1

Assume that we decide thatit is desirable to schedule a clone of M, M’, in block F.

We update CPV(M) to: CPV(M) = CPV(M) - BPV(F)

= [1 1 1] - [0 0 1] = [1 1 0]

M’


32


A

B

E

C D

G

F


CPV(M) = [1 1 0]W2

Assume that at W2 we decide toschedule a clone of M, M’’, in block C.

CPV(M) = CPV(M) - BPV(C)= [1 1 1] - [1 0 0] = [0 1 0]

M’


33


A

B

E

C D

G

F


CPV(M) = [0 1 0]W2

Now we cannot close block Dunless we schedule M.

M’M’’

Because BPV(B) is a supersetof CPV(M) we know that this isthe last compensation copy ofM to be scheduled.


34

When to Move Code?

Bharadwaj, Menezes and McKinsey define theusefulness of moving code from an origin block Oto a target block T in terms of the likelihood thatcontrol will flow through T and O given that controlreaches T.

( ) ( )( )( )( )TBPV

OBPVTBPV

Prob

Prob ∩


35

CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling...

Documents

Transcript of CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling...