Instr Sched 3

34
Instruction Scheduling - Part 3 Y.N. Srikant Department of Computer Science Indian Institute of Science Bangalore 560 012 NPTEL Course on Compiler Design Y.N. Srikant Instruction Scheduling

Transcript of Instr Sched 3

Page 1: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 1/34

Page 2: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 2/34

Instruction Scheduling

Reordering of instructions so as to keep the pipelines of

functional units full with no stalls

NP-Complete and needs heuristcs

Applied on basic blocks (local)

Global scheduling requires elongation of basic blocks

(super-blocks)

Y.N. Srikant Instruction Scheduling

Page 3: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 3/34

Optimal Instruction Scheduling using Integer Linear

Programming

This is useful for the evaluation of instruction scheduling

heuristics that do not generate optimal schedules

Careful implementation may enable these methods to be

deployed even in production quality compilersAssume a simple resource model in which all the

functional units are fully pipelined

Assume an architecture with integer ALU, FP add unit, FP

mult/div unit, and load/store unit with possibly differing

execution latencies

Assume that there are R r  instances of the functional unit r 

Y.N. Srikant Instruction Scheduling

Page 4: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 4/34

Optimal Instruction Scheduling using Integer Linear

Programming

Let σi  be the time at which instruction i is scheduledLet d (i , j ) be the weight of the edge (i , j ) of the DAG

To satisfy dependence constraints, for each arc (i , j ) of the

DAG

σ j  ≥ σi  + d (i , j ) (1)

A matrix K n ×T , where n  is the number of instructions in theDAG and T  is an estimate of the worst case execution timeof the schedule, is used

T  can be estimated by summing up the execution times ofall the instructions in the DAG

K [i , t ] is 1, if instruction i  is scheduled at time step t  and 0

otherwise

Y.N. Srikant Instruction Scheduling

Page 5: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 5/34

Optimal Instruction Scheduling using Integer Linear

Programming

The schedule time σi  of instruction i  can be expressed as

σi  = k i ,0 · 0 + k i ,1 · 1 + · · ·+ k i ,T −1 · (T − 1)

where exactly one of the k i , j  is 1

This can be written in matrix form for all σi ’s as:

σ0

σ1...

σn −1

=

k 0,0 k 0,1, · · · k 0,T −1

k 1,0 k 1,1 · · · k 1,T −1...

......

...

k n −1,0 k n −1,1 · · · k n −1,T −1

01...

T − 1

(2)To express that each instruction is scheduled exactly once,

we include the constraintt 

k i ,t  = 1, ∀i  (3)

Y.N. Srikant Instruction Scheduling

Page 6: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 6/34

Optimal Instruction Scheduling using Integer Linear

Programming

The resource constraint that no more than R r  instructionsare scheduled in any time step can be expressed as

i  ∈ F (r )

k i ,t  ≤ R r , ∀ t  and ∀ r  (4)

where F (r ) represents the set of instructions that can be

executed in functional unit type r .

The objective function is to minimize the execution time or

schedule length, subject to the constraints in equations 1-4

above. This can be represented as:

minimize(maxi 

(σi  + d (i , j )))

Y.N. Srikant Instruction Scheduling

S

Page 7: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 7/34

Delayed Load Scheduling Algorithm for Trees

RISC load/store architecture with delayed loadsSingle cycle issue/execution, with only loads pipelined

(load delay = 1 cycle)

Generates optimal code without any interlocks for

expression trees

Three phases

Computation of minReg  as in Sethi-Ullman code generationalgorithmOrdering of loads and operations as in the SU algorithmEmitting code in canonical DLS order

Uses 1 + minReg  number of registers and can handle only

one cycle load delay

Y.N. Srikant Instruction Scheduling

S hi Ull i R C i Al i h

Page 8: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 8/34

Sethi-Ullman minReg Computation Algorithm

if (isLeaf(node)) then {node.minReg = 1}

else

if (node.left.minReg == node.right.minReg) then{node.minReg = node.left.minReg + 1}

else {node.minReg = MAX(node.left.minReg,

node.right.minReg)}

Y.N. Srikant Instruction Scheduling

S thi Ull i R C t ti E l

Page 9: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 9/34

Sethi-Ullman minReg Computation Example

½ Ø ½     Ð Ó  

¾ Ø ¾     Ð Ó  

¿ Ø ¿     Ø ½ · Ø ¾  

Ø     Ð Ó  

Ø     Ð Ó  

Ø     Ð Ó  

Ø     Ø · Ø  

Ø     Ø ¹ Ø  

Ø     Ø ¿ ¶ Ø  

½ ¼     × Ø Ø  

´ µ ¿ ¹ Ö × × Ó  

3

3

2

1 1 111

2 2

i10

i9

i3

i1 i2 i5

i7

i8

i4i6

st

mult

addadd

sub

loadloadloadloadload

´ µ Ü Ô Ö × × Ó Ò Ì Ö  

Y.N. Srikant Instruction Scheduling

S thi Ull Al ith C d G E l

Page 10: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 10/34

Sethi-Ullman Algorithm Code Gen Example

½ Ö ½     Ð Ó  

¾ Ö ¾     Ð Ó  

¿ Ö ½     Ö ½ · Ö ¾  

Ö ¾     Ð Ó  

Ö ¿     Ð Ó  

Ö     Ð Ó  

Ö ¿     Ö ¿ · Ö  

Ö ¾     Ö ¿ ¹ Ö ¾  

Ö ½     Ö ½ ¶ Ö ¾  

½ ¼     × Ø Ö ½  

´ µ Ó Ë Õ Ù Ò Ù × Ò Ê × Ø Ö ×  

Ö ½     Ð Ó  

Ö ¾     Ð Ó  

Ö ½     Ö ½ · Ö ¾  

Ö ¾     Ð Ó  

Ö ½     Ö ½ ¹ Ö ¾  

½ Ö ¾     Ð Ó  

¾ Ö ¿     Ð Ó  

¿ Ö ¾     Ö ¾ · Ö ¿  

Ö ½     Ö ½ ¶ Ö ¾  

½ ¼     × Ø Ö ½  

´ µ Ç Ô Ø Ñ Ð Ó Ë Õ Ù Ò Û Ø ¿ Ê × Ø Ö ×  

Y.N. Srikant Instruction Scheduling

DLS C t ti E l

Page 11: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 11/34

DLS Computation Example

Ö ½     Ð Ó  

Ö ¾     Ð Ó  

Ö ½     Ö ½ · Ö ¾   ± ½ × Ø Ð Ð  

Ö ¾     Ð Ó  

Ö ½     Ö ½ ¹ Ö ¾   ± ½ × Ø Ð Ð  

½ Ö ¾     Ð Ó  

¾ Ö ¿     Ð Ó  

¿ Ö ¾     Ö ¾ · Ö ¿   ± ½ × Ø Ð Ð  

Ö ½     Ö ½ ¶ Ö ¾  

½ ¼     × Ø Ö ½  

´ µ Ë Ø Ð Ð × Ò Ë Ø ¹ Í Ð Ð Ñ Ò Ë Õ Ù Ò  

Ö ½     Ð Ó  

Ö ¾     Ð Ó  

Ö ¿     Ð Ó  

½ Ö     Ð Ó  

Ö ½     Ö ½ · Ö ¾  

¾ Ö ¾     Ð Ó  

Ö ½     Ö ½ ¹ Ö ¿  

¿ Ö     Ö · Ö ¾  

Ö ½     Ö ½ ¶ Ö  

½ ¼     × Ø Ö ½  

´ µ Ä Ë Ë Õ Ù Ò Û Ø Æ Ó Ë Ø Ð Ð ×  

Y.N. Srikant Instruction Scheduling

DLS Algorithm Main Program

Page 12: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 12/34

DLS Algorithm - Main Program

Procedure Generate(root: ExprNode)

{ label(root); //Calculate minReg values

opSched = loadSched = emptyList(); //Initialize

order(root, opSched, loadSched);//Find load and operation order

schedule(opSched, loadSched, root.minReg+1);

//Emit canonical order

}

Y.N. Srikant Instruction Scheduling

DLS Algorithm Finding SU Order

Page 13: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 13/34

DLS Algorithm - Finding SU Order

Procedure Order(root: ExprNode;

var opSched, loadSched: NodeList){ if (not(isLeaf(root))

{ if (root.left.minReg < root.right.minReg)

{ order(root.right, opSched, loadSched);

order(root.left, opSched, loadSched);

} else

{order(root.left, opSched, loadSched);

order(root.right, opSched, loadSched);

}

append(root, opSched);}

else { append(root, loadSched);

}

Y.N. Srikant Instruction Scheduling

DLS Algorithm Scheduling

Page 14: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 14/34

DLS Algorithm - Scheduling

Procedure schedule(opSched, loadSched: NodeList;

Regs: integer)

{ for i = 1 to MIN(Regs, length(loadSched)) do// loads first

{ ld = popHead(loadSched);

ld.reg = getReg(); gen(Load, ld.name, ld.Reg)}

while (not Empty(loadSched))

// (Operation,Load) pairs next{ op = popHead(opSched); op.reg = op.left.reg;

gen(op.op, op.left.reg, op.right.reg, op.reg);

ld = popHead(loadSched); ld.reg = op.right.reg;

gen(Load, ld.name, ld.reg) }

while (not Empty(opSched)) //Remaining Operations

{ op = popHead(opSched); op.reg = op.left.reg;

gen(op.op, op.left.reg, op.right.reg, op.reg);

freeReg(op.right.reg) }

}

Y.N. Srikant Instruction Scheduling

Global Acyclic Scheduling

Page 15: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 15/34

Global Acyclic Scheduling

Average size of a basic block is quite small (5 to 20instructions)

Effectiveness of instruction scheduling is limitedThis is a serious concern in architectures supportinggreater ILP

VLIW architectures with several function unitssuperscalar architectures (multiple instruction issue)

Global scheduling is for a set of basic blocks

Overlaps execution of successive basic blocksTrace scheduling, Superblock scheduling, Hyperblock

scheduling, Software pipelining, etc.

Y.N. Srikant Instruction Scheduling

Trace Scheduling

Page 16: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 16/34

Trace Scheduling

A Trace is a frequently executed acyclic sequence of basicblocks in a CFG (part of a path)

Identifying a trace

Identify the most frequently executed basic blockExtend the trace starting from this block, forward and

backward, along most frequently executed edges

Apply list scheduling on the trace (including the branch

instructions)

Execution time for the trace may reduce, but execution time

for the other paths may increaseHowever, overall performance will improve

Y.N. Srikant Instruction Scheduling

Trace Example

Page 17: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 17/34

Trace Example

Ó Ö ´ ¼ ½ ¼ ¼ · · µ  

ß 

´ ¼ µ  

· ×  

Ð ×  

 

× Ù Ñ × Ù Ñ ·  

 

´ µ À ¹ Ä Ú Ð Ó  

± ±  Ö ½    ¼ 

± ±  Ö     ¼ 

± ±  Ö     ¼ ¼ 

± ±  Ö     × 

½ ½ Ö ¾     Ð Ó ´ Ö ½ µ  

¾ ´ Ö ¾ ¼ µ Ó Ø Ó  

¾ ¿ Ö ¿     Ð Ó ´ Ö ½ µ  

Ö     Ö ¿ · Ö  

´ Ö ½ µ     Ö  

Ó Ø Ó  

¿ Ö     Ö ¾ 

´ Ö ½ µ     Ö ¾ 

Ö     Ö · Ö  

½ ¼ Ö ½     Ö ½ ·  

½ ½ ´ Ö ½     Ö µ Ó Ø Ó ½  

´ µ × × Ñ Ð Ý Ó  

B2

B1

B3

B4

main trace

´ µ Ó Ò Ø Ö Ó Ð Ð Ó Û Ö Ô  

Y.N. Srikant Instruction Scheduling

Trace - Basic Block Schedule

Page 18: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 18/34

Trace Basic Block Schedule

2-way issue architecture with 2 integer units

add, sub, store : 1 cycle, load : 2 cycles, goto : no stall

9 cycles for the main trace and 6 cycles for the off-trace

Ì Ñ Á Ò Ø º Í Ò Ø ½ Á Ò Ø º Í Ò Ø ¾  

¼  ½ Ö ¾     Ð Ó ´ Ö ½ µ  

½ 

¾  ¾ ´ Ö ¾ ¼ µ Ó Ø Ó  

¿  ¿ Ö ¿     Ð Ó ´ Ö ½ µ  

 

  Ö     Ö ¿ · Ö  

  ´ Ö ½ µ     Ö Ó Ø Ó  

¿  Ö     Ö ¾ ´ Ö ½ µ     Ö ¾ 

´ µ   Ö     Ö · Ö ½ ¼ Ö ½     Ö ½ ·  

´ µ   ½ ½ ´ Ö ½     Ö µ Ó Ø Ó ½  

Y.N. Srikant Instruction Scheduling

Trace Schedule

Page 19: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 19/34

Trace Schedule

Y.N. Srikant Instruction Scheduling

Trace Schedule

Page 20: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 20/34

Trace Schedule

6 cycles for the main trace and 7 cycles for the off-trace

Ì Ñ Á Ò Ø º Í Ò Ø ½ Á Ò Ø º Í Ò Ø ¾  

¼  ½ Ö ¾     Ð Ó ´ Ö ½ µ ¿ Ö ¿     Ð Ó ´ Ö ½ µ  

½ 

¾  ¾ ´ Ö ¾ ¼ µ Ó Ø Ó Ö     Ö ¿ · Ö  

¿  ´ Ö ½ µ     Ö  

´ µ   Ö     Ö · Ö ½ ¼ Ö ½     Ö ½ ·  

´ µ   ½ ½ ´ Ö ½     Ö µ Ó Ø Ó ½  

¿  Ö     Ö ¾ ´ Ö ½ µ     Ö ¾ 

  ½ ¾ Ó Ø Ó  

Y.N. Srikant Instruction Scheduling

Trace Scheduling - Issues

Page 21: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 21/34

Trace Scheduling Issues

Side exits and side entrances are ignored during

scheduling of a trace

Required compensation code is inserted during

book-keeping (after scheduling the trace)

Speculative code motion - load instruction moved ahead ofconditional branch

Example: Register r3 should not be live in block B3(off-trace path)May cause unwanted exceptions

Requires additional hardware support!

Y.N. Srikant Instruction Scheduling

Compensation Code - Side Exit

Page 22: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 22/34

Compensation Code Side Exit

Y.N. Srikant Instruction Scheduling

Compensation Code - Side Exit

Page 23: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 23/34

p

Y.N. Srikant Instruction Scheduling

Compensation Code - Side Entry

Page 24: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 24/34

p y

Y.N. Srikant Instruction Scheduling

Compensation Code - Side Entry

Page 25: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 25/34

p y

Y.N. Srikant Instruction Scheduling

Superblock Scheduling

Page 26: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 26/34

p g

A Superblock is a trace without side entrances

Control can enter only from the topMany exits are possibleEliminates several book-keeping overheads

Superblock formation

Trace formation as before

Tail duplication to avoid side entrances into a superblockCode size increases

Y.N. Srikant Instruction Scheduling

Superblock Example

Page 27: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 27/34

p p

5 cycles for the main trace and 6 cycles for the off-trace

B1

B2

B4

B3

B4’

SuperBlock 1

SuperBlock 2

´ µ Ó Ò Ø Ö Ó Ð Ð Ó Û Ö Ô  

Ì Ñ Á Ò Ø º Í Ò Ø ½ Á Ò Ø º Í Ò Ø ¾  

¼  ½ Ö ¾     Ð Ó ´ Ö ½ µ ¿ Ö ¿     Ð Ó ´ Ö ½ µ  

½ 

¾  ¾ ´ Ö ¾ ¼ µ Ó Ø Ó Ö     Ö ¿ · Ö  

¿  ´ Ö ½ µ     Ö ½ ¼ Ö ½     Ö ½ ·  

  Ö     Ö · Ö ½ ½ ´ Ö ½    Ö µ Ó Ø Ó ½  

¿  Ö     Ö ¾ ´ Ö ½ µ    Ö ¾ 

  ³ Ö     Ö · Ö ½ ¼ ³ Ö ½     Ö ½ ·  

  ½ ½ ³ ´ Ö ½    Ö µ Ó Ø Ó ½  

´ µ Ë Ù Ô Ö Ð Ó Ë Ù Ð  

Y.N. Srikant Instruction Scheduling

Hyperblock Scheduling

Page 28: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 28/34

yp g

Superblock scheduling does not work well withcontrol-intensive programs which have many control flow

paths

Hyperblock scheduling was proposed to handle such

programs

Here, the control flow graph is IF-converted to eliminate

conditional branches

IF-conversion replaces conditional branches with

appropriate predicated instructions

Now, control dependence is changed to a datadependence

Y.N. Srikant Instruction Scheduling

IF-Conversion Example

Page 29: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 29/34

Y.N. Srikant Instruction Scheduling

Hyperblock Example Code

Page 30: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 30/34

Ó Ö ´ ¼ ½ ¼ ¼ · · µ  

ß 

´ ¼ µ  

· ×  

Ð ×  

 

× Ù Ñ × Ù Ñ ·  

 

´ µ À ¹ Ä Ú Ð Ó  

± ±  Ö ½    ¼ 

± ±  Ö     ¼ 

± ±  Ö     ¼ ¼ 

± ±  Ö     × 

½ ½ Ö ¾     Ð Ó ´ Ö ½ µ  

¾ ´ Ö ¾ ¼ µ Ó Ø Ó  

¾ ¿ Ö ¿     Ð Ó ´ Ö ½ µ  

Ö     Ö ¿ · Ö  

´ Ö ½ µ     Ö  

Ó Ø Ó  

¿ Ö     Ö ¾ 

´ Ö ½ µ     Ö ¾ 

Ö     Ö · Ö  

½ ¼ Ö ½     Ö ½ ·  

½ ½ ´ Ö ½     Ö µ Ó Ø Ó ½  

´ µ × × Ñ Ð Ý Ó  

B2

B1

B3

B4

main trace

´ µ Ó Ò Ø Ö Ó Ð Ð Ó Û Ö Ô  

Y.N. Srikant Instruction Scheduling

Hyperblock Example

Page 31: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 31/34

6 cycles for the entire set of predicated instructions

Instructions i3 and i4 can be executed speculatively and

can be moved up, instead of being scheduled after cycle 2

B2

B1

B3

B4

Hyperblock

´ µ Ó Ò Ø Ö Ó Ð Ð Ó Û Ö Ô  

Ì Ñ Á Ò Ø º Í Ò Ø ½ Á Ò Ø º Í Ò Ø ¾  

¼  ½ Ö ¾     Ð Ó ´ Ö ½ µ ¿ Ö ¿     Ð Ó ´ Ö ½ µ  

½ 

¾  ¾ ³ Ô ½     ´ Ö ¾ ¼ µ Ö     Ö ¿ · Ö  

¿  ´ Ö ½ µ     Ö ¸ Ô ½ ´ Ö ½ µ     Ö ¾ ¸ Ô ½  

  ½ ¼ Ö ½     Ö ½ · Ö     Ö ¾ ¸ Ô ½  

  Ö     Ö · Ö ½ ½ ´ Ö ½    Ö µ Ó Ø Ó ½  

´ µ À Ý Ô Ö Ð Ó Ë Ù Ð  

Y.N. Srikant Instruction Scheduling

Delayed Branch Scheduling

Page 32: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 32/34

Delayed branching

One instruction immediately following the delayed branchinstruction will be executed before the branch is taken

The instruction occupying the delay slot should beindependent of the branch instruction

It is best to fill the branch delay slot with an instruction from

the basic block that the branch terminatesOtherwise, an instruction from either the target block or thefall-through block, whichever is most likely to be executed,is selected

The selected instruction should either be a root node of the

DAG of the basic block (target of fall-through)and has a destination register that is not live-in in the otherblockor has a destination register that can be renamed

Y.N. Srikant Instruction Scheduling

Delay Branch Scheduling Conditions - 1

Page 33: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 33/34

Y.N. Srikant Instruction Scheduling

Delay Branch Scheduling Conditions - 2

Page 34: Instr Sched 3

7/28/2019 Instr Sched 3

http://slidepdf.com/reader/full/instr-sched-3 34/34

Y.N. Srikant Instruction Scheduling