Instr Sched 2
-
Upload
dilip-thelip -
Category
Documents
-
view
233 -
download
0
Transcript of Instr Sched 2
-
7/28/2019 Instr Sched 2
1/24
Instruction Scheduling - Part 2
Y.N. Srikant
Department of Computer ScienceIndian Institute of Science
Bangalore 560 012
NPTEL Course on Compiler Design
Y.N. Srikant Instruction Scheduling
http://-/?-http://-/?-http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
2/24
Instruction Scheduling
Reordering of instructions so as to keep the pipelines offunctional units full with no stallsNP-Complete and needs heuristcsApplied on basic blocks (local)Global scheduling requires elongation of basic blocks(super-blocks)
Y.N. Srikant Instruction Scheduling
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
3/24
Basic Block Scheduling
Basic block consists of micro-operation sequences (MOS),which are indivisibleEach MOS has several steps, each requiring resources
Each step of an MOS requires one cycle for executionPrecedence constraints and resource constraints must besatised by the scheduled program
PCs relate to data dependences and execution delaysRCs relate to limited availability of shared resources
Y.N. Srikant Instruction Scheduling
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
4/24
A Simple List Scheduling Algorithm
Find the shortest schedule : V N , such that precedenceand resource constraints are satised. Holes are lled with
NOPs.FUNCTION ListSchedule (V,E)BEGIN
Ready = root nodes of V; Schedule = ;WHILE Ready = DOBEGIN
v = highest priority node in Ready ;Lb = SatisfyPrecedenceConstraints (v , Schedule , ); (v) = SatisfyResourceConstraints (v , Schedule , , Lb );
Schedule = Schedule + {v };Ready = Ready { v } + {u | NOT (u Schedule )AND (w , u ) E , w Schedule };
ENDRETURN ;
ENDY.N. Srikant Instruction Scheduling
http://goforward/http://find/http://goback/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
5/24
Constraint Satisfaction Functions
FUNCTION SatisfyPrecedenceConstraint(v, Sched, )BEGIN
RETURN ( maxu Sched
(u ) + d (u , v ))
END
FUNCTION SatisfyResourceConstraint(v, Sched, , Lb)BEGIN
FOR i := Lb TO DO
IF 0 j < l (v ), v ( j ) +u Sched
u (i + j (u )) R THENRETURN (i);
END
Y.N. Srikant Instruction Scheduling
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
6/24
List Scheduling - Priority Ordering for Nodes
1 Height of the node in the DAG ( i.e., longest path from thenode to a terminal node
2 Estart , and Lstart , the earliest and latest start timesViolating Estart and Lstart may result in pipeline stallsEstart (v ) = max
i = 1 , , k (Estart (u i ) + d (u i , v ))
where u 1 , u 2 , , u k are predecessors of v . Estart value ofthe source node is 0.Lstart (u ) = min
i = 1 , , k (Lstart (v i ) d (u , v i ))
where v 1 , v 2 , , v k are successors of u . Lstart value of thesink node is set as its Estart value.
Estart and Lstart values can be computed using atop-down and a bottom-up pass, respectively, eitherstatically (before scheduling begins), or dynamically duringscheduling
Y.N. Srikant Instruction Scheduling
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
7/24
List Scheduling - Slack
1 A node with a lower Estart (or Lstart ) value has a higherpriority
2
Slack = Lstart Estart Nodes with lower slack are given higher priorityInstructions on the critical path may have a slack value ofzero and hence get priority
Y.N. Srikant Instruction Scheduling
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
8/24
Simple List Scheduling - Example - 1
1
3
6
2
4
5
1
0
1
1
2 0
1 2
2
1
1
4
5
1 2
3
node no.path length exec time
LEGEND
latency
path length (n) = exec time (n) , if n is a leaf
= max { latency (n,m) + path length (m) }m succ (n)
Schedule = {3, 1, 2, 4, 6, 5}
INSTRUCTION SCHEDULING - EXAMPLE
3
Y.N. Srikant Instruction Scheduling
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
9/24
Simple List Scheduling - Example - 2
latenciesadd,sub,store : 1 cycle; load : 2 cycles; mult : 3 cycles
path length and slack are shown on the left side and rightside of the pair of numbers in parentheses
ld
sub
mult st
add
st
add
ld
add
5(3, 3)0
6(2, 2)0
8(0, 0)0
3(2, 5)3 1(2, 7)5
7(0, 1)1
0(8, 8)0
2(6, 6)0
1(7, 7)0
i1
i3 i4
i7
i8
i6
i2
i5
i9
Y.N. Srikant Instruction Scheduling
l h d l l ( d )
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
10/24
Simple List Scheduling - Example - 2 (contd.)
latenciesadd,sub,store : 1 cycle; load : 2 cycles; mult : 3 cycles
2 Integer units and 1 Multiplication unit, all capable of loadand store as wellHeuristic used: height of the node or slack
int1 int2 mult Cycle # Instr.No. Instruction1 1 0 0 i1, i2 t 1 load a , t 2 load b 1 1 0 11 1 0 2 i4, i3 t 4 t 1 2 , t 3 t 1 + 41 0 1 3 i6, i5 t 5 t 2 + 3 , t 6 t 4 t 2
0 0 1 4 i6/i5 not sched. in cycle 20 0 1 5 due to shortage of int units1 0 0 6 i7 t 7 t 3 + t 61 0 0 7 i8 c st t 71 0 0 8 i9 b st t 5
Y.N. Srikant Instruction Scheduling
R U M d l R i T bl
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
11/24
Resource Usage Models - Reservation Table
Y.N. Srikant Instruction Scheduling
R U M d l Gl b l R i T bl
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
12/24
Resource Usage Models - Global Reservation Table
r 0 r 1 r 2 r M t 0 1 0 1 0t 1 1 1 0 1t 2 0 0 0 1
t T
M: No. of resources in the machineT: Length of the schedule
Y.N. Srikant Instruction Scheduling
R U M d l Gl b l R ti T bl
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
13/24
Resource Usage Models - Global Reservation Table
GRT is constructed as the schedule is built (cycle by cycle)All entries of GRT are initialized to 0GRT maintains the state of all the resources in the machineGRTs can answer questions of the type:can an instruction of class I be scheduled in the currentcycle (say t k )?Answer is obtained by ANDing RT of I with the GRTstarting from row t k
If the resulting table contains only 0s, then YES, otherwise
NOThe GRT is updated after scheduling the instruction with asimilar OR operation
Y.N. Srikant Instruction Scheduling
O ti S h d li g
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
14/24
Operation Scheduling
List scheduling discussed so far schedules instructions ona cycle-by-cycle basisOperation scheduling attempts to schedule instructionsone after anotherTries to nd the rst cycle at which each instruction can be
scheduledAfter choosing an operation i of highest priority, an attemptis made to schedule it at time t between Estart (i ) andLstart (i ) that does not have any resource conict
This scheduling may affect the Estart and Lstart values ofunscheduled instructionsPriorities may have to be recomputed for these instructions
Y.N. Srikant Instruction Scheduling
Operation Scheduling
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
15/24
Operation Scheduling
If no time slot as above can be found for instruction i , analready scheduled instruction j , which has resourceconicts with instruction i is de-scheduled Instruction i is placed in this slot and instruction j is placedin the ready list once again
In order to ensure that the algorithm does no get into aninnite loop (a group of instructions mutually de-scheduleeach other), a threshold on the number of de-scheduledinstructions is keptOnce the threshold is crossed, the partial schedule isabandoned, the Lstart value of the sink node is increased,new value of Lstart is computed, and the whole process isrestarted
Y.N. Srikant Instruction Scheduling
Simple List Scheduling Operation Scheduling
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
16/24
Simple List Scheduling - Operation Scheduling
latenciesadd,sub,store : 1 cycle; load : 2 cycles; mult : 3 cycles2 Integer units and 1 Multiplication unit, all capable of loadand store as well
ld
sub
mult st
add
st
add
ld
add
5(3, 3)0
6(2, 2)0
8(0, 0)0
3(2, 5)3 1(2, 7)5
7(0, 1)1
0(8, 8)0
2(6, 6)0
1 7 7 0
i1
i3 i4
i7
i8
i6
i2
i5
i9
Y.N. Srikant Instruction Scheduling
Simple List Scheduling Operation Scheduling
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
17/24
Simple List Scheduling - Operation Scheduling(contd.)
Instructions sorted on slack, with (Estart , Lstart ) valuesslack 0: i 1(0 , 0), i 4 (2 , 2), i 6 (3 , 3), i 7 (6 , 6), i 8(7 , 7), i 9 (8 , 8)slack 1: i 2(0 , 1), slack 3: i 3 (2 , 5), slack 5: i 5(2 , 7)
Cycle # Instr.No. Instruction
0 i1, i2 t 1 load a , t 2 load b 12 i4, i3 t 4 t 1 2 , t 3 t 1 + 43 i6, i5 t 5 t 2 + 3 , t 6 t 4 t 24
56 i7 t 7 t 3 + t 67 i8 c st t 78 i9 b st t 5
Y.N. Srikant Instruction Scheduling
Simple List Scheduling Disadvantages
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
18/24
Simple List Scheduling - Disadvantages
Checking resource constraints is inefcient here because itinvolves repeated ANDing and ORing of bit matrices for
many instructions in each scheduling stepSpace overhead may become considerable, but stillmanageable
Y.N. Srikant Instruction Scheduling
Automaton Based Scheduling
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
19/24
Automaton Based Scheduling
Constructs a collision automaton which indicates whether itis legal to issue an instruction in a given cycle ( i.e., noresource contentions)Collision automaton recognises legal instruction
sequencesAvoids extensive searching that is needed in list schedulingUses the same topological ordering and ready queue as inlist scheduling, to handle precedence constraints
Automaton can be constructed ofine using resourcereservation tables
Y.N. Srikant Instruction Scheduling
Collision Automaton
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
20/24
Collision Automaton
Uses a collision matrix for each state Size: #instruction classes length of the longest pipelineS [i , j ] = 1, iff i th instruction class creates a conict with thecurrent pipeline state S , if issued j cycles after the machineenters the current state S
Each instruction class I also has a similar collision matrixI [i , j ] = 1, iff instruction of class i would create a conict withinstruction class I in cycle j , if launched in the current cycleThese collision matrices are created using resource vectors
For the example, consider a dual issue machine
Y.N. Srikant Instruction Scheduling
Collision Automaton - Example
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
21/24
Collision Automaton Example
Resource Usage Vectors
instr class0
pipeline cycle1
i
f
ls
fd
mem
id
id+mem
Collision Matrices
0 1 0 1 0 1
ii i
f
ls ls ls
f f 0 0
1 0
1 0
1 0
0 0
0 0
0 0
1 1
1 0
int/inop(i class)
fp/fnop(f class)
ld/st(ls class)
0 00 00 0
0 01 00 0
1 00 01 1
0 01 01 0
0 00 01 0
1 00 01 0
F3
F5
F4
F2
F0 F1
f
ii
f ls
f
ls
i
f
i
COLLISION AUTOMATON
Y.N. Srikant Instruction Scheduling
Transitions in a Collision Automaton
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
22/24
Transitions in a Collision Automaton
Given a state S and any instruction i from an instructionclass I
S [I , 1] = 0 implies that it is legal to issue i from S Only legal issues have edges in the automatonThe collision matrix of the target state S is produced byOR-ing collision matrices of S and I When no instruction is legal to be issued from S , S is saidto be cycle-advancing
In any state, a NOP instruction can be issuedsuch a state behaves as a cycle-advancing state, onlywhen a NOP is issued (not otherwise)
Y.N. Srikant Instruction Scheduling
Cycle-advancing State
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
23/24
Cycle advancing State
Collision matrix is produced by left-shifting by one column,the collision matrix of S Such a state represents start of a new clock tick in allpipelines
In single instruction issue processors, all states arecycle-advancing Start state is cycle-advancing States in which NOP is issued behave like a
cycle-advancing state
Y.N. Srikant Instruction Scheduling
Instruction Scheduling with Collision Automaton
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/28/2019 Instr Sched 2
24/24
st uct o Sc edu g w t Co s o uto ato
1 Start at the Start state of the automaton2 Pick instructions one by one, in priority order from the
ready list3 If it is legal to issue the picked instruction in the current
state (i.e., cycle), issue it; there is no advancement of thecycle counter
4 Change state, compute collision matrix, update ready listand repeat the steps 2-3-4
5 If no instructions in the ready list are legal to be issued in astate, then insert a NOP in the output and compute the
collision matrix as explained above for cycle-advancingstates, and advance the cycle counter; goto to step 2
Note: If step 5 is executed repeatedly, start state will bereached at some point and in the start state, all resources willbe available
Y.N. Srikant Instruction Scheduling
http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-