Lecture 5 Section A.8 Branch Hazards and Dynamic Scheduling via scoreboarding

45
Oct. 26, 2004 1 Lecture 5 Section A.8 Branch Hazards and Dynamic Scheduling via scoreboarding Instructor: L.N. Bhuyan CS 203A Advanced Computer Architecture

description

CS 203A Advanced Computer Architecture. Lecture 5 Section A.8 Branch Hazards and Dynamic Scheduling via scoreboarding. Instructor: L.N. Bhuyan. Control Hazards. Branch problem: branches are resolved in EX stage  2 cycles penalty on taken branches - PowerPoint PPT Presentation

Transcript of Lecture 5 Section A.8 Branch Hazards and Dynamic Scheduling via scoreboarding

Oct. 26, 2004 1

Lecture 5Section A.8

Branch Hazards and Dynamic Scheduling

via scoreboarding

Instructor: L.N. Bhuyan

CS 203AAdvanced Computer Architecture

Oct. 26, 2004 2

Control Hazards

• Branch problem: – branches are resolved in EX stage 2 cycles penalty on taken branchesIdeal CPI =1. Assuming 2 cycles for all branches and 32%

branch instructions new CPI = 1 + 0.32*2 = 1.64

• Solutions:– Reduce branch penalty: change the datapath – new adder

needed in ID stage.– Fill branch delay slot(s) with a useful instruction.– Fixed branch prediction.– Static branch prediction.– Dynamic branch prediction.

Oct. 26, 2004 3

Control Hazards – branch delay slots

• Reduced branch penalty:– Compute condition and target address in the ID

stage: 1 cycle stall.– Target and condition computed even when

instruction is not a branch.

• Branch delay slot filling:move an instruction into the slot right after the branch,

hoping that its execution is necessary. Three alternatives (next slide)

Limitations: restrictions on which instructions can be rescheduled, compile time prediction of taken or untaken branches.

Oct. 26, 2004 4

Example Nondelayed vs. Delayed Branch

add M1 ,M2,M3

sub M4, M5,M6

beq M1, M4, Exit

or M8, M9 ,M10

xor M10, M1,M11

Nondelayed Branch

Exit:

add M1 ,M2,M3

sub M4, M5,M6

beq M1, M4, Exit

or M8, M9 ,M10

xor M10, M1,M11

Delayed Branch

Exit:

Oct. 26, 2004 5

Control Hazards: Branch Prediction

• Idea: doing something is better than waiting around doing nothingo Guess branch target, start executing at guessed positiono Execute branch, verify (check) your guess+ minimize penalty if guess is right (to zero)– May increase penalty for wrong guesseso Heavily researched area in the last 15 years

• Fixed branch prediction.Each of these strategies must be applied to all branch

instructions indiscriminately.– Predict not-taken (47% actually not taken):

continue to fetch instruction without stalling; do not change any state (no register write); if branch is taken turn the fetched instruction into no-op,

restart fetch at target address: 1 cycle penalty.

Oct. 26, 2004 6

Control Hazards: Branch Prediction

– Predict taken (53%): more difficult, must know target before branch is decoded. no advantage in our simple 5-stage pipeline.

• Static branch prediction.– Opcode-based: prediction based on opcode itself and

related condition. Examples: MC 88110, PowerPC 601/603.– Displacement based prediction: if d < 0 predict taken, if d

>= 0 predict not taken. Examples: Alpha 21064 (as option), PowerPC 601/603 for regular conditional branches.

– Compiler-directed prediction: compiler sets or clears a predict bit in the instruction itself. Examples: AT&T 9210 Hobbit, PowerPC 601/603 (predict bit reverses opcode or displacement predictions), HP PA 8000 (as option).

Oct. 26, 2004 7

Control Hazards: Branch Prediction

• Dynamic branch prediction– Based on the history of a particular branch -

Later

Oct. 26, 2004 8

MIPS R4000 pipeline

Oct. 26, 2004 9

MIPS FP Pipe Stages

FP Instr 1 2 3 4 5 6 7 8 …Add, Subtract U S+A A+RR+SMultiply U E+M M M M N N+ARDivide U A R D28 … D+A D+R, D+R, D+A,

D+R, A, RSquare root U E (A+R)108 … A RNegate U SAbsolute value U SFP compare U A RStages:

M First stage of multiplierN Second stage of multiplierR Rounding stageS Operand shift stageU Unpack FP numbers

A Mantissa ADD stage

D Divide pipeline stage

E Exception test stage

Oct. 26, 2004 10

R4000 Performance• Not ideal CPI of 1:

– Load stalls (1 or 2 clock cycles)– Branch stalls (2 cycles + unfilled slots)– FP result stalls: RAW data hazard (latency)– FP structural stalls: Not enough FP hardware (parallelism)

00.5

11.5

22.5

33.5

44.5

eqnto

tt

esp

ress

o

gcc li

doduc

nasa

7

ora

spic

e2g6

su2co

r

tom

catv

Base Load stalls Branch stalls FP result stalls FP structural

stalls

Oct. 26, 2004 11

FP Loop: Where are the Hazards?

Loop: LD F0,0(R1) ;F0=vector element ADDD F4,F0,F2 ;add scalar from F2 SD 0(R1),F4 ;store result SUBI R1,R1,8 ;decrement pointer 8B (DW) BNEZ R1,Loop ;branch R1!=zero NOP ;delayed branch slot

Instruction Instruction Latency inproducing result using result clock cycles

FP ALU op Another FP ALU op 3

FP ALU op Store double 2

Load double FP ALU op 1

Load double Store double 0

Integer op Integer op 0

• Where are the stalls?

Oct. 26, 2004 12

FP Loop Showing Stalls

• 9 clocks: Rewrite code to minimize stalls?

Instruction Instruction Latency inproducing result using result clock cycles

FP ALU op Another FP ALU op 3

FP ALU op Store double 2

Load double FP ALU op 1

1 Loop: LD F0,0(R1) ;F0=vector element

2 stall

3 ADDD F4,F0,F2 ;add scalar in F2

4 stall

5 stall

6 SD 0(R1),F4 ;store result

7 SUBI R1,R1,8 ;decrement pointer 8B (DW)

8 BNEZ R1,Loop ;branch R1!=zero

9 stall ;delayed branch slot

Oct. 26, 2004 13

Minimizing Stalls Technique 1: Compiler Optimization

6 clocks

Instruction Instruction Latency inproducing result using result clock cycles

FP ALU op Another FP ALU op 3

FP ALU op Store double 2

Load double FP ALU op 1

1 Loop: LD F0,0(R1)

2 stall

3 ADDD F4,F0,F2

4 SUBI R1,R1,8

5 BNEZ R1,Loop ;delayed branch

6 SD 8(R1),F4 ;altered when move past SUBI

Swap BNEZ and SD by changing address of SD

Oct. 26, 2004 14

HW Schemes: Instruction Parallelism• Compiler or Static instruction scheduling can avoid some

pipeline hazards.– e.g. filling branch delay slot.

• Why in HW at run time?– Works when can’t know dependence at compile time

WAW can only be detected at run time

– Compiler simpler– Code for one machine runs well on another

• Key idea: Allow instructions behind stall to proceedDIVD F0,F2,F4

ADDD F10,F0,F8

SUBD F8,F8,F14– Enables out-of-order execution => out-of-order completion– But, both structural and data hazards are checked in MIPS

ADDD is stalled at ID, SUBD can not even proceed to ID.

Oct. 26, 2004 15

HW Schemes: Instruction Parallelism

• Out-of-order execution divides ID stage:1. Issue—decode instructions, check for structural hazards, Issue in

order if the functional unit is free and no WAW.2. Read operands (RO)—wait until no data hazards, then read

operands ADDD would stall at RO, and SUBD could proceed with no stalls.

• Scoreboards allow instruction to execute whenever 1 & 2 hold, not waiting for prior instructions.

(WAR?)

(WAR?)

Focusing on FP operations – assume no MEM stages

IF ISSUE

… RO EX1 … EXm

RO EX1 … EXn

… RO EX1 … EXp

WB?

WB?

WB

Oct. 26, 2004 16

Scoreboard Implications

• Out-of-order completion => WAR, WAW hazards

• Solutions for WAR– CDC 6600: Stall Write to allow Reads to take place; Read registers

only during Read Operands stage.– Tomasulo: Register Renaming

• For WAW, must detect hazard: stall in the Issue stage until other completes

• Need to have multiple instructions in execution phase => multiple execution units or pipelined execution units

• Scoreboard replaces ID with 2 stages (Issue and RO)• Scoreboard keeps track of dependencies, state or

operations– Monitors every change in the hardware.– Determines when to read ops, when can execute, when can wb.– Hazard detection and resolution is centralized.

Oct. 26, 2004 17

Four Stages of Scoreboard Control1.Issue—decode instructions & check for structural

hazards (ID1) If a functional unit for the instruction is free and no other active

instruction has the same destination register (WAW), the scoreboard issues the instruction to the functional unit and updates its internal data structure. If a structural or WAW hazard exists, then the instruction issue stalls, and no further instructions will issue until these hazards are cleared.

2.Read operands—wait until no data hazards, then read operands (ID2)

A source operand is available if no earlier issued active instruction is going to write it, or if the register containing the operand is being written by a currently active functional unit. When the source operands are available, the scoreboard tells the functional unit to proceed to read the operands from the registers and begin execution. The scoreboard resolves RAW hazards dynamically in this step, and instructions may be sent into execution out of order.

Oct. 26, 2004 18

Four Stages of Scoreboard Control

3.Execution—operate on operands (EX) The functional unit begins execution upon receiving

operands. When the result is ready, it notifies the scoreboard that it has completed execution.

4.Write result—finish execution (WB) Once the scoreboard is aware that the functional unit has

completed execution, the scoreboard checks for WAR hazards. If none, it writes results. If WAR, then it stalls the instruction.Example:

DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F8,F8,F14 CDC 6600 scoreboard would stall SUBD until ADDD reads

operands

Oct. 26, 2004 19

Three Parts of the Scoreboard

1.Instruction status—which of 4 steps the instruction is in

2.Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each functional unit

Busy—Indicates whether the unit is busy or notOp—Operation to perform in the unit (e.g., + or –)Fi—Destination registerFj, Fk—Source-register numbersQj, Qk—Functional units producing source registers Fj, FkRj, Rk—Flags indicating when Fj, Fk are ready and not yet read.

Set to No after operand are read.

3.Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register

Oct. 26, 2004 20

Detailed Scoreboard Pipeline Control

Read operands

Execution complete

Instruction status

Write result

Issue

Bookkeeping

Rj No; Rk No

f(if Qj(f)=FU then Rj(f) Yes);f(if Qk(f)=FU then Rj(f) Yes);

Result(Fi(FU)) 0; Busy(FU) No

Busy(FU) yes; Op(FU) op; Fi(FU) `D’; Fj(FU) `S1’;

Fk(FU) `S2’; Qj Result(‘S1’); Qk Result(`S2’); Rj not Qj; Rk not Qk; Result(‘D’) FU;

Rj and Rk

Functional unit done

Wait until

f((Fj( f )!=Fi(FU) or Rj( f )=No) &

(Fk( f )!=Fi(FU) or

Rk( f )=No))

Not busy (FU) and not result(D)

A.55 on page A-76WAR

WAW

Oct. 26, 2004 21

Scoreboard Example

• The following numbers are to illustrate behavior, not representative

• LD – 1 cycle– (compute address + data cache access)

• ADDDs and SUBs are 2 cycles• Multiply is 10 cycles• Divide is 40 cycles

Oct. 26, 2004 22

Scoreboard Example

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2LD F2 45+ R3MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F30FU

Oct. 26, 2004 23

Scoreboard Example Cycle 1

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1LD F2 45+ R3MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F301 FU Integer

Oct. 26, 2004 24

Scoreboard Example Cycle 2Instruction status Read Execution WriteInstruction j k Issue operandscomplete ResultLD F6 34+ R2 1 2LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 NoMult1 NoMult2 NoAdd NoDivide No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

2 FU Integer

Note: Can’t issue I2 because Integer unit is busy. Can’t issue next instruction due to in-order issue

Oct. 26, 2004 25

Scoreboard Example Cycle 3Instruction status Read Execution WriteInstruction j k Issue operandscomplete ResultLD F6 34+ R2 1 2 3LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 NoMult1 NoMult2 NoAdd NoDivide No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

3 FU Integer

Oct. 26, 2004 26

Scoreboard Example Cycle 4Instruction status Read Execution WriteInstruction j k Issue operandscomplete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 NoMult1 NoMult2 NoAdd NoDivide No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

4 FU

Oct. 26, 2004 27

Scoreboard Example Cycle 5Instruction status Read Execution WriteInstruction j k Issue operandscomplete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 YesMult1 NoMult2 NoAdd NoDivide No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

5 FU Integer

Now I2 is issued

Oct. 26, 2004 28

Scoreboard Example Cycle 6

Instruction status Read Execution WriteInstruction j k Issue operandscomplete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6MULTD F0 F2 F4 6SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 NoMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd NoDivide No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

6 FU Mult Integer

Oct. 26, 2004 29

Scoreboard Example Cycle 7

Instruction status Read Execution WriteInstruction j k Issue operandscomplete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULTD F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 NoMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Subd F8 F6 F2 Integer Yes NoDivide No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

7 FU Mult Integer Add

I3 stalled at read because I2 isn’t complete

Oct. 26, 2004 30

Scoreboard Example Cycle 8Instruction status Read EX WriteInstruction j k Issue Op compl. ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDD F6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

8 FU Mult1 Add Divide

Oct. 26, 2004 31

Scoreboard Example Cycle 9

Instruction status Read EX WriteInstruction j k IssueOp completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

10 Mult1 Yes Mult F0 F2 F4 No NoMult2 No

2 Add Yes Sub F8 F6 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

9 FU Mult1 Add Divide

Note: I3 and I4 read operands because F2 is now available. ADDD (I6) can’t be issued because SUBD (I4) uses the adder

Oct. 26, 2004 32

Scoreboard Example Cycle 11Instruction status Read ExecutionWriteInstruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBDF8 F6 F2 7 9 11DIVDF10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

8 Mult1 Yes Mult F0 F2 F4 No NoMult2 No

0 Add Yes Sub F8 F6 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

11 FU Mult1 Add Divide

Note: Add takes 2 cycles, so nothing happens in cycle 10. MUL continues.

Oct. 26, 2004 33

Scoreboard Example Cycle 12Instruction status Read ExecutionWriteInstruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

7 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

12 FU Mult1 Divide

Oct. 26, 2004 34

Scoreboard Example Cycle 13Instruction status Read ExecutionWriteInstruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13Functional unit status dest S1 S2 FU for j FU for kFj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

6 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

13 FU Mult1 Add Divide

Now ADDD is issued because SUBD has completed

Oct. 26, 2004 35

Scoreboard Example Cycle 14Instruction status Read ExecutionWriteInstruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

5 Mult1 Yes Mult F0 F2 F4 No NoMult2 No

2 Add Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

14 FU Mult1 Add Divide

Oct. 26, 2004 36

Scoreboard Example Cycle 15

Instruction status Read ExecutionWriteInstruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

4 Mult1 Yes Mult F0 F2 F4 No NoMult2 No

1 Add Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

15 FU Mult1 Add Divide

Note: ADDD takes 2 cycles, so no change

Oct. 26, 2004 37

Scoreboard Example Cycle 16Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

3 Mult1 Yes Mult F0 F2 F4 No NoMult2 No

0 Add Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

16 FU Mult1 Add Divide

ADDD completes, but MULTD and DIVD go on

Oct. 26, 2004 38

Scoreboard Example Cycle 17Instruction status Read ExecutionWriteInstruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

2 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

17 FU Mult1 Add Divide

ADDD stalls, can’t write back due to WAR with DIVD. MULT and DIV continue

Oct. 26, 2004 39

Scoreboard Example Cycle 18Instruction status Read ExecutionWriteInstruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

1 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

18 FU Mult1 Add Divide

MULT and DIV continue

Oct. 26, 2004 40

Scoreboard Example Cycle 19Instruction status Read ExecutionWriteInstruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

0 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

19 FU Mult1 Add Divide

19 MULT completes after 10 cycles

Oct. 26, 2004 41

Scoreboard Example Cycle 20Instruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Yes Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

20 FU Add Divide

MULTD completes and writes to F0

Oct. 26, 2004 42

Scoreboard Example Cycle 21Instruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 No No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

21 FU Add Divide

Now DIVD reads because F0 is available

Oct. 26, 2004 43

Scoreboard Example Cycle 22Instruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd NoDivide Yes Div F10 F0 F6 No No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

21 FU Divide

ADDD writes result because WAR is removed.

Oct. 26, 2004 44

Scoreboard Example Cycle 61Instruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61ADDD F6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd NoDivide Yes Div F10 F0 F6 No No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

61 FU Divide

DIVD completes execution

Oct. 26, 2004 45

Scoreboard Example Cycle 62

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61 62ADDDF6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd No

0 Divide NoRegister result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3062 FU

Execution is finished