Instr Sched 2

download Instr Sched 2

of 24

Transcript of Instr Sched 2

  • 7/28/2019 Instr Sched 2

    1/24

    Instruction Scheduling - Part 2

    Y.N. Srikant

    Department of Computer ScienceIndian Institute of Science

    Bangalore 560 012

    NPTEL Course on Compiler Design

    Y.N. Srikant Instruction Scheduling

    http://-/?-http://-/?-http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    2/24

    Instruction Scheduling

    Reordering of instructions so as to keep the pipelines offunctional units full with no stallsNP-Complete and needs heuristcsApplied on basic blocks (local)Global scheduling requires elongation of basic blocks(super-blocks)

    Y.N. Srikant Instruction Scheduling

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    3/24

    Basic Block Scheduling

    Basic block consists of micro-operation sequences (MOS),which are indivisibleEach MOS has several steps, each requiring resources

    Each step of an MOS requires one cycle for executionPrecedence constraints and resource constraints must besatised by the scheduled program

    PCs relate to data dependences and execution delaysRCs relate to limited availability of shared resources

    Y.N. Srikant Instruction Scheduling

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    4/24

    A Simple List Scheduling Algorithm

    Find the shortest schedule : V N , such that precedenceand resource constraints are satised. Holes are lled with

    NOPs.FUNCTION ListSchedule (V,E)BEGIN

    Ready = root nodes of V; Schedule = ;WHILE Ready = DOBEGIN

    v = highest priority node in Ready ;Lb = SatisfyPrecedenceConstraints (v , Schedule , ); (v) = SatisfyResourceConstraints (v , Schedule , , Lb );

    Schedule = Schedule + {v };Ready = Ready { v } + {u | NOT (u Schedule )AND (w , u ) E , w Schedule };

    ENDRETURN ;

    ENDY.N. Srikant Instruction Scheduling

    http://goforward/http://find/http://goback/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    5/24

    Constraint Satisfaction Functions

    FUNCTION SatisfyPrecedenceConstraint(v, Sched, )BEGIN

    RETURN ( maxu Sched

    (u ) + d (u , v ))

    END

    FUNCTION SatisfyResourceConstraint(v, Sched, , Lb)BEGIN

    FOR i := Lb TO DO

    IF 0 j < l (v ), v ( j ) +u Sched

    u (i + j (u )) R THENRETURN (i);

    END

    Y.N. Srikant Instruction Scheduling

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    6/24

    List Scheduling - Priority Ordering for Nodes

    1 Height of the node in the DAG ( i.e., longest path from thenode to a terminal node

    2 Estart , and Lstart , the earliest and latest start timesViolating Estart and Lstart may result in pipeline stallsEstart (v ) = max

    i = 1 , , k (Estart (u i ) + d (u i , v ))

    where u 1 , u 2 , , u k are predecessors of v . Estart value ofthe source node is 0.Lstart (u ) = min

    i = 1 , , k (Lstart (v i ) d (u , v i ))

    where v 1 , v 2 , , v k are successors of u . Lstart value of thesink node is set as its Estart value.

    Estart and Lstart values can be computed using atop-down and a bottom-up pass, respectively, eitherstatically (before scheduling begins), or dynamically duringscheduling

    Y.N. Srikant Instruction Scheduling

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    7/24

    List Scheduling - Slack

    1 A node with a lower Estart (or Lstart ) value has a higherpriority

    2

    Slack = Lstart Estart Nodes with lower slack are given higher priorityInstructions on the critical path may have a slack value ofzero and hence get priority

    Y.N. Srikant Instruction Scheduling

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    8/24

    Simple List Scheduling - Example - 1

    1

    3

    6

    2

    4

    5

    1

    0

    1

    1

    2 0

    1 2

    2

    1

    1

    4

    5

    1 2

    3

    node no.path length exec time

    LEGEND

    latency

    path length (n) = exec time (n) , if n is a leaf

    = max { latency (n,m) + path length (m) }m succ (n)

    Schedule = {3, 1, 2, 4, 6, 5}

    INSTRUCTION SCHEDULING - EXAMPLE

    3

    Y.N. Srikant Instruction Scheduling

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    9/24

    Simple List Scheduling - Example - 2

    latenciesadd,sub,store : 1 cycle; load : 2 cycles; mult : 3 cycles

    path length and slack are shown on the left side and rightside of the pair of numbers in parentheses

    ld

    sub

    mult st

    add

    st

    add

    ld

    add

    5(3, 3)0

    6(2, 2)0

    8(0, 0)0

    3(2, 5)3 1(2, 7)5

    7(0, 1)1

    0(8, 8)0

    2(6, 6)0

    1(7, 7)0

    i1

    i3 i4

    i7

    i8

    i6

    i2

    i5

    i9

    Y.N. Srikant Instruction Scheduling

    l h d l l ( d )

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    10/24

    Simple List Scheduling - Example - 2 (contd.)

    latenciesadd,sub,store : 1 cycle; load : 2 cycles; mult : 3 cycles

    2 Integer units and 1 Multiplication unit, all capable of loadand store as wellHeuristic used: height of the node or slack

    int1 int2 mult Cycle # Instr.No. Instruction1 1 0 0 i1, i2 t 1 load a , t 2 load b 1 1 0 11 1 0 2 i4, i3 t 4 t 1 2 , t 3 t 1 + 41 0 1 3 i6, i5 t 5 t 2 + 3 , t 6 t 4 t 2

    0 0 1 4 i6/i5 not sched. in cycle 20 0 1 5 due to shortage of int units1 0 0 6 i7 t 7 t 3 + t 61 0 0 7 i8 c st t 71 0 0 8 i9 b st t 5

    Y.N. Srikant Instruction Scheduling

    R U M d l R i T bl

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    11/24

    Resource Usage Models - Reservation Table

    Y.N. Srikant Instruction Scheduling

    R U M d l Gl b l R i T bl

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    12/24

    Resource Usage Models - Global Reservation Table

    r 0 r 1 r 2 r M t 0 1 0 1 0t 1 1 1 0 1t 2 0 0 0 1

    t T

    M: No. of resources in the machineT: Length of the schedule

    Y.N. Srikant Instruction Scheduling

    R U M d l Gl b l R ti T bl

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    13/24

    Resource Usage Models - Global Reservation Table

    GRT is constructed as the schedule is built (cycle by cycle)All entries of GRT are initialized to 0GRT maintains the state of all the resources in the machineGRTs can answer questions of the type:can an instruction of class I be scheduled in the currentcycle (say t k )?Answer is obtained by ANDing RT of I with the GRTstarting from row t k

    If the resulting table contains only 0s, then YES, otherwise

    NOThe GRT is updated after scheduling the instruction with asimilar OR operation

    Y.N. Srikant Instruction Scheduling

    O ti S h d li g

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    14/24

    Operation Scheduling

    List scheduling discussed so far schedules instructions ona cycle-by-cycle basisOperation scheduling attempts to schedule instructionsone after anotherTries to nd the rst cycle at which each instruction can be

    scheduledAfter choosing an operation i of highest priority, an attemptis made to schedule it at time t between Estart (i ) andLstart (i ) that does not have any resource conict

    This scheduling may affect the Estart and Lstart values ofunscheduled instructionsPriorities may have to be recomputed for these instructions

    Y.N. Srikant Instruction Scheduling

    Operation Scheduling

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    15/24

    Operation Scheduling

    If no time slot as above can be found for instruction i , analready scheduled instruction j , which has resourceconicts with instruction i is de-scheduled Instruction i is placed in this slot and instruction j is placedin the ready list once again

    In order to ensure that the algorithm does no get into aninnite loop (a group of instructions mutually de-scheduleeach other), a threshold on the number of de-scheduledinstructions is keptOnce the threshold is crossed, the partial schedule isabandoned, the Lstart value of the sink node is increased,new value of Lstart is computed, and the whole process isrestarted

    Y.N. Srikant Instruction Scheduling

    Simple List Scheduling Operation Scheduling

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    16/24

    Simple List Scheduling - Operation Scheduling

    latenciesadd,sub,store : 1 cycle; load : 2 cycles; mult : 3 cycles2 Integer units and 1 Multiplication unit, all capable of loadand store as well

    ld

    sub

    mult st

    add

    st

    add

    ld

    add

    5(3, 3)0

    6(2, 2)0

    8(0, 0)0

    3(2, 5)3 1(2, 7)5

    7(0, 1)1

    0(8, 8)0

    2(6, 6)0

    1 7 7 0

    i1

    i3 i4

    i7

    i8

    i6

    i2

    i5

    i9

    Y.N. Srikant Instruction Scheduling

    Simple List Scheduling Operation Scheduling

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    17/24

    Simple List Scheduling - Operation Scheduling(contd.)

    Instructions sorted on slack, with (Estart , Lstart ) valuesslack 0: i 1(0 , 0), i 4 (2 , 2), i 6 (3 , 3), i 7 (6 , 6), i 8(7 , 7), i 9 (8 , 8)slack 1: i 2(0 , 1), slack 3: i 3 (2 , 5), slack 5: i 5(2 , 7)

    Cycle # Instr.No. Instruction

    0 i1, i2 t 1 load a , t 2 load b 12 i4, i3 t 4 t 1 2 , t 3 t 1 + 43 i6, i5 t 5 t 2 + 3 , t 6 t 4 t 24

    56 i7 t 7 t 3 + t 67 i8 c st t 78 i9 b st t 5

    Y.N. Srikant Instruction Scheduling

    Simple List Scheduling Disadvantages

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    18/24

    Simple List Scheduling - Disadvantages

    Checking resource constraints is inefcient here because itinvolves repeated ANDing and ORing of bit matrices for

    many instructions in each scheduling stepSpace overhead may become considerable, but stillmanageable

    Y.N. Srikant Instruction Scheduling

    Automaton Based Scheduling

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    19/24

    Automaton Based Scheduling

    Constructs a collision automaton which indicates whether itis legal to issue an instruction in a given cycle ( i.e., noresource contentions)Collision automaton recognises legal instruction

    sequencesAvoids extensive searching that is needed in list schedulingUses the same topological ordering and ready queue as inlist scheduling, to handle precedence constraints

    Automaton can be constructed ofine using resourcereservation tables

    Y.N. Srikant Instruction Scheduling

    Collision Automaton

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    20/24

    Collision Automaton

    Uses a collision matrix for each state Size: #instruction classes length of the longest pipelineS [i , j ] = 1, iff i th instruction class creates a conict with thecurrent pipeline state S , if issued j cycles after the machineenters the current state S

    Each instruction class I also has a similar collision matrixI [i , j ] = 1, iff instruction of class i would create a conict withinstruction class I in cycle j , if launched in the current cycleThese collision matrices are created using resource vectors

    For the example, consider a dual issue machine

    Y.N. Srikant Instruction Scheduling

    Collision Automaton - Example

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    21/24

    Collision Automaton Example

    Resource Usage Vectors

    instr class0

    pipeline cycle1

    i

    f

    ls

    fd

    mem

    id

    id+mem

    Collision Matrices

    0 1 0 1 0 1

    ii i

    f

    ls ls ls

    f f 0 0

    1 0

    1 0

    1 0

    0 0

    0 0

    0 0

    1 1

    1 0

    int/inop(i class)

    fp/fnop(f class)

    ld/st(ls class)

    0 00 00 0

    0 01 00 0

    1 00 01 1

    0 01 01 0

    0 00 01 0

    1 00 01 0

    F3

    F5

    F4

    F2

    F0 F1

    f

    ii

    f ls

    f

    ls

    i

    f

    i

    COLLISION AUTOMATON

    Y.N. Srikant Instruction Scheduling

    Transitions in a Collision Automaton

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    22/24

    Transitions in a Collision Automaton

    Given a state S and any instruction i from an instructionclass I

    S [I , 1] = 0 implies that it is legal to issue i from S Only legal issues have edges in the automatonThe collision matrix of the target state S is produced byOR-ing collision matrices of S and I When no instruction is legal to be issued from S , S is saidto be cycle-advancing

    In any state, a NOP instruction can be issuedsuch a state behaves as a cycle-advancing state, onlywhen a NOP is issued (not otherwise)

    Y.N. Srikant Instruction Scheduling

    Cycle-advancing State

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    23/24

    Cycle advancing State

    Collision matrix is produced by left-shifting by one column,the collision matrix of S Such a state represents start of a new clock tick in allpipelines

    In single instruction issue processors, all states arecycle-advancing Start state is cycle-advancing States in which NOP is issued behave like a

    cycle-advancing state

    Y.N. Srikant Instruction Scheduling

    Instruction Scheduling with Collision Automaton

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/28/2019 Instr Sched 2

    24/24

    st uct o Sc edu g w t Co s o uto ato

    1 Start at the Start state of the automaton2 Pick instructions one by one, in priority order from the

    ready list3 If it is legal to issue the picked instruction in the current

    state (i.e., cycle), issue it; there is no advancement of thecycle counter

    4 Change state, compute collision matrix, update ready listand repeat the steps 2-3-4

    5 If no instructions in the ready list are legal to be issued in astate, then insert a NOP in the output and compute the

    collision matrix as explained above for cycle-advancingstates, and advance the cycle counter; goto to step 2

    Note: If step 5 is executed repeatedly, start state will bereached at some point and in the start state, all resources willbe available

    Y.N. Srikant Instruction Scheduling

    http://find/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-