September 15, 2000 Prof. John Kubiatowicz

50
CS252/Kubiatowicz Lec 5.1 9/15/00 CS252 Graduate Computer Architecture Lecture 5 Software Scheduling around Hazards Hardware* Out-of-order Scheduling September 15, 2000 Prof. John Kubiatowicz

description

CS252 Graduate Computer Architecture Lecture 5 Software Scheduling around Hazards Hardware* Out-of-order Scheduling. September 15, 2000 Prof. John Kubiatowicz. Techniques to Increase ILP. Forwarding Branch Prediction Superpipelining Superscalar with Static Multiple Issue VLIW - PowerPoint PPT Presentation

Transcript of September 15, 2000 Prof. John Kubiatowicz

Page 1: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.1

9/15/00

CS252Graduate Computer Architecture

Lecture 5

Software Scheduling around HazardsHardware* Out-of-order Scheduling

September 15, 2000

Prof. John Kubiatowicz

Page 2: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.33

9/15/00

ECE 313 Fall 2004

Lecture 19 - Pipelining 3

33

Techniques to Increase ILP

• Forwarding• Branch Prediction• Superpipelining• Superscalar with Static Multiple Issue

VLIW• Superscalar with Dynamic Multiple

Issue• Superscalar with Speculation• Superscalar with Simultaneous

Multithreading (SMT)

Page 3: September 15, 2000 Prof. John Kubiatowicz

ECE 313 Fall 2004

Lecture 19 - Pipelining 3

34

Static Multiple Issue

Key idea: issue (decode & execute) multiple instructions in each clock cycle

Example: Issue load/store and ALU/branch in MIPS

ALU or branch

Instruction type Pipe stages

IF ID EX MEM WBLoad/ Store IF ID EX MEM WBALU or branchLoad/ StoreALU or branchLoad/ StoreALU or branchLoad/ Store

IF ID EX MEM WBIF ID EX MEM WB

IF ID EX MEM WBIF ID EX MEM WB

IF ID EX MEM WBIF ID EX MEM WB

(Fig. 6.44, old 6.57)

Page 4: September 15, 2000 Prof. John Kubiatowicz

ECE 313 Fall 2004

Lecture 19 - Pipelining 3

35

Example - A Static Multiple Issue MIPS

PCInstruction

memory

4

RegistersMux

Mux

ALU

Mux

Datamemory

Mux

40000040

Signextend Sign

extend

ALU Address

Writedata

(Fig. 6.45, old 6.58)

Executes ALU/Branch Instructions

Executes Load/Store Instructions

Page 5: September 15, 2000 Prof. John Kubiatowicz

ECE 313 Fall 2004

Lecture 19 - Pipelining 3

36

VLIW / EPIC Processors

VLIW - Very Long Instruction Words Functional units exposed in instruction word Static scheduling by compiler Pipeline is exposed; compiler must schedule delays to get

right result Examples: Philips Trimedia, Texas Instruments C6000

Explicit Parallel Instruction Computer (EPIC) 3 41-bit instructions in each instruction packet Compiler determines parallelism Hardware checks dependencies and fowards/stalls Examples: Intel Itanium, Itanium 2

Page 6: September 15, 2000 Prof. John Kubiatowicz

ECE 313 Fall 2004

Lecture 19 - Pipelining 3

37

Itanium Block Diagram

Source: Extreme Tech www.extremetech.com

Page 7: September 15, 2000 Prof. John Kubiatowicz

ECE 313 Fall 2004

Lecture 19 - Pipelining 3

38

Software Manipulation to Increase ILP

Software Transformations can increase ILP Code reordering to reduce stalls Loop unrolling

Example (p. 438)Loop: lw $t0, 0($s1) # $t0=array element

addu $t0, $t0, $s2 # add scalar in $s2

sw $t0, 0($s1) # store result

addi $s1, $s1, -4 # decrement ptr

bne $s1, $zero, Loop

Goal: reorder to speed superscalar execution

Page 8: September 15, 2000 Prof. John Kubiatowicz

ECE 313 Fall 2004

Lecture 19 - Pipelining 3

39

Software ManipulationReordering Code

Note sparse utilization of superscalar pipeline! End result:

5 instructions in 4 clocks CPI = 0.8 IPC = 1.25

ALU or branch instruction Data transfer instruction ClockLoop: lw $t0, 0($s1) 1

addi $s1, $s1, -4 2addu $t0, $t0, $s2 3bne $s1, $zero, Loop sw $t0, 4($s1) 4

Page 9: September 15, 2000 Prof. John Kubiatowicz

ECE 313 Fall 2004

Lecture 19 - Pipelining 3

40

Software Manipulation - Loop Unrolling Assume loop count a multiple of 4 & unroll End result:

4 loop iterations in 8 clocks IPC = 1.75 2 clocks / iteration!

ALU or branch instruction Data transfer instruction ClockLoop: addi $s1, $s1, -16 lw $t0, 0($s1) 1

lw $t1, 12($s1) 2lw $t2, 8($s1) 3lw $t3, 4($s1) 4sw $t0, 0($s1) 5sw $t1, 12($s1) 6sw $t2, 8($s1) 7

bne $s1, $zero, Loop sw $t3, 4($s1) 8

addu $t0, $t0, $s2addu $t1, $t1, $s2addu $t2, $t2, $s2addu $t3, $t3, $s2

Page 10: September 15, 2000 Prof. John Kubiatowicz

ECE 313 Fall 2004

Lecture 19 - Pipelining 3

41

Techniques to Increase ILP

Forwarding Branch Prediction Superpipelining Superscalar with Static Multiple Issue VLIW Superscalar with Dynamic Multiple Issue Superscalar with Speculation Superscalar with Simultaneous Multithreading

(SMT)

Page 11: September 15, 2000 Prof. John Kubiatowicz

ECE 313 Fall 2004

Lecture 19 - Pipelining 3

42

Dynamic Multiple Issue

Key ideas: ”Look past" stalls for instructions that can execute

lw $t0, 20($t2)

addu $t1, $t0, $s2

sub $s4, $s4, $s3

slti $t5, $s4, 20 Execute instructions out of order Use multiple functional units for parallel execution Forward results between functional units when necessary Update registers (in original order of execution)

addu stalls until $t0 available

sub is ready to execute but blocked by stall!

Page 12: September 15, 2000 Prof. John Kubiatowicz

ECE 313 Fall 2004

Lecture 19 - Pipelining 3

43

Speculation

Guess about the outcome of an instruction (e.g., branch or load) Based on guess, start executing instructions Cancel started instructions if guess is incorrect

Complicating factors Must buffer instruction results until outcome known Exceptions in speculated instructions - how can you have

an exception in an instruction that didn’t execute?

Page 13: September 15, 2000 Prof. John Kubiatowicz

ECE 313 Fall 2004

Lecture 19 - Pipelining 3

44

Superscalar Dynamic Pipelining

(Fig. 6.49, old 6.61)

Instruction Fetchand decode unit

Reservationstation

Reservationstation

Reservationstation

Reservationstation

Integer IntegerFloating

pointLoad/Store

Commitunit

Functionalunits

In-order issue

In-order commit

Out-of-orderexecute

Page 14: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.45

9/15/00

Can HW reduce CPI to 1- or IPC to 1+?

• Why in HW/at run time?– Works when can’t know real dependence at compile time– Compiler simpler– Code for one machine runs well on another

• Key idea #1: Allow instructions behind stall to proceed

DIVD F0,F2,F4ADDD F10,F0,F8SUBD F12,F8,F14

Out-of-order execution out-of-order completion?

• Key idea #2: Register RenamingDIVD F0,F2,F4 DIVD F0,F2,F4 ADDD F10,F0,F8 ADDD F10,F0,F8 SUBD F0,F8,F14 SUBD F100,F8,F14 MULD F6,F10,F0 MULD F6,F10,F100

Totally removes WAR and WAW hazards.

Page 15: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.46

9/15/00

Moving beyond the five-stage pipeline:

• Why limit performance for slow/less frequent ops?– Variable latencies -> out-of-order execution desirable

• How do we prevent WAR and WAW hazards?• How do we deal with variable latency?

– Forwarding for RAW hazards will be harder.

Clock Cycle Number

Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

LD F6,34(R2) I F I D EX MEM WB

LD F2,45(R3) I F I D EX MEM WB

MULTD F0,F2,F4 I F I D stall M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 MEM WB

SUBD F8,F6,F2 I F I D A1 A2 MEM WB

DI VD F10,F0,F6 I F I D stall stall stall stall stall stall stall stall stall D1 D2

ADDD F6,F8,F2 I F I D A1 A2 MEM WB

RAW

WAR

Page 16: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.48

9/15/00

Scoreboard Architecture(CDC 6600)

Fu

ncti

on

al U

nit

s

Reg

iste

rs

FP MultFP Mult

FP MultFP Mult

FP DivideFP Divide

FP AddFP Add

IntegerInteger

MemorySCOREBOARDSCOREBOARD

Page 17: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.49

9/15/00

ECE 313 Fall 2004

Lecture 19 - Pipelining 3

49

Basic Pipelined MIPS

W

M WE

5

RD1

RD2

RN1

RN2

WN

WD

RegisterFile

ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

32

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

5

5

5

IF/ID ID/EX EX/MEM MEM/WB

Zero

0

1

MemRead

MemWrite

ALUControl6

0

15

0

1

0

1

W

MControl

IF_pc_next

IF_pc

IF_pc4

ID_pc4

ID_op

WB_RegWrite

RegWRite

ID_immed

ID_rt

ID_rd

EX_rd

EX_rt

EX_pc4

EX_RegDst

EX_ALUOp

MEM_PCSrc

ID_extend

EX_btgt

EX_Zero

EX_offset

MEM_btgtMEM_btgt

MEM_Branch

MEM_Zero

MEM_MemRead

EX_RegRd MEM_RegRd WB_RegRd

WB_RegWrite

WB_ALUOut

EX_extend

EX_rd1

EX_funct

ID_rs

ID_rt

WB_wd

EX_ALUSrc

WB_wn

MEM_memout

WB_memout

reset

reset

reset

reset

reset

Page 18: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.51

9/15/00

Four Stages of Scoreboard Control

• Issue—decode instructions & check for structural hazards (ID1)

– Instructions issued in program order (for hazard checking)– Don’t issue if structural hazard– Don’t issue if instruction is output dependent on any

previously issued but uncompleted instruction (WAW hazards)

• Read operands—wait until no data hazards, then read operands (ID2)

– All real dependencies (RAW hazards) resolved in this stage, since we wait for instructions to write back data.

– No forwarding of data in this model!

Page 19: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.52

9/15/00

Four Stages of Scoreboard Control

• Execution—operate on operands (EX)– The functional unit begins execution upon receiving

operands. When the result is ready, it notifies the scoreboard that it has completed execution.

• Write result—finish execution (WB)– Stall until no WAR hazards with previous instructions:

Example: DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F8,F8,F14

CDC 6600 scoreboard would stall SUBD until ADDD reads operands

Page 20: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.53

9/15/00

Three Parts of the Scoreboard

• Instruction status:Which of 4 steps the instruction is in

• Functional unit status:—Indicates the state of the functional unit (FU). 9 fields for each functional unitBusy: Indicates whether the unit is busy or not

Op: Operation to perform in the unit (e.g., + or –)Fi: Destination registerFj,Fk: Source-register numbersQj,Qk:Functional units producing source registers Fj, FkRj,Rk: Flags indicating when Fj, Fk are ready

• Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register

Page 21: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.54

9/15/00

Scoreboard ExampleInstruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 NoMult2 NoAdd NoDivide No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

FU

Page 22: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.55

9/15/00

Scoreboard Example: Cycle 1

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

1 FU Integer

Page 23: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.56

9/15/00

Detailed Scoreboard Pipeline Control

Read operandsExecutio

n complete

Instruction status

Write result

Issue

Bookkeeping

Rj No; Rk No

f(if Qj(f)=FU then Rj(f) Yes);f(if Qk(f)=FU then Rk(f) Yes); Result(Fi(FU)) 0; Busy(FU) No

Busy(FU) yes; Op(FU) op; Fi(FU) `D’; Fj(FU) `S1’;

Fk(FU) `S2’; Qj Result(‘S1’); Qk Result(`S2’); Rj not Qj; Rk not Qk; Result(‘D’) FU;

Rj and Rk

Functional unit done

Wait until

f((Fj(f)Fi(FU) or Rj(f)=No) & (Fk(f)Fi(FU) or

Rk( f )=No))

Not busy (FU) and not result(D)

Page 24: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.57

9/15/00

Scoreboard Example: Cycle 2

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

1 Integer Yes Load F6 R2 NoMult1 NoMult2 NoAdd NoDivide No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

2 FU Integer

• Can we enter Issue for 2nd LD?

Page 25: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.58

9/15/00

Scoreboard Example: Cycle 3

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

0 Integer Yes Load F6 R2 NoMult1 NoMult2 NoAdd NoDivide No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

3 FU Integer

• Issue MULT (in order)?

Page 26: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.59

9/15/00

Scoreboard Example: Cycle 4

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 NoMult2 NoAdd NoDivide No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

4 FU

Page 27: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.60

9/15/00

Scoreboard Example: Cycle 5

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F2 R3 YesMult1 NoMult2 NoAdd NoDivide No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

5 FU Integer

Page 28: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.61

9/15/00

Scoreboard Example: Cycle 6

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6MULTD F0 F2 F4 6SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F2 R3 NoMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd NoDivide No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

6 FU Mult1 Integer

Page 29: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.62

9/15/00

Scoreboard Example: Cycle 7

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULTD F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F2 R3 NoMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Sub F8 F6 F2 Integer Yes NoDivide No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

7 FU Mult1 Integer Add

• Read multiply operands?

Page 30: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.63

9/15/00

Scoreboard Example: Cycle 8a

(First half of clock cycle)Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULTD F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer Yes Load F2 R3 NoMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Sub F8 F6 F2 Integer Yes NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

8 FU Mult1 Integer Add Divide

Page 31: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.64

9/15/00

Scoreboard Example: Cycle 8b

(Second half of clock cycle)Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

8 FU Mult1 Add Divide

Page 32: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.65

9/15/00

Scoreboard Example: Cycle 9

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer No10 Mult1 Yes Mult F0 F2 F4 Yes Yes

Mult2 No2 Add Yes Sub F8 F6 F2 Yes Yes

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

9 FU Mult1 Add Divide

• Read operands for MULT & SUB? Issue ADDD?

Note Remaining

Page 33: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.66

9/15/00

Scoreboard Example: Cycle 10

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer No9 Mult1 Yes Mult F0 F2 F4 No No

Mult2 No1 Add Yes Sub F8 F6 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3010 FU Mult1 Add Divide

Page 34: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.67

9/15/00

Scoreboard Example: Cycle 11

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11DIVD F10 F0 F6 8ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer No8 Mult1 Yes Mult F0 F2 F4 No No

Mult2 No0 Add Yes Sub F8 F6 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3011 FU Mult1 Add Divide

Page 35: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.68

9/15/00

Scoreboard Example: Cycle 12

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer No7 Mult1 Yes Mult F0 F2 F4 No No

Mult2 NoAdd NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3012 FU Mult1 Divide

• Read operands for DIVD?

Page 36: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.69

9/15/00

Scoreboard Example: Cycle 13

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer No6 Mult1 Yes Mult F0 F2 F4 No No

Mult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3013 FU Mult1 Add Divide

Page 37: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.70

9/15/00

Scoreboard Example: Cycle 14

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer No5 Mult1 Yes Mult F0 F2 F4 No No

Mult2 No2 Add Yes Add F6 F8 F2 Yes Yes

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3014 FU Mult1 Add Divide

Page 38: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.71

9/15/00

Scoreboard Example: Cycle 15

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer No4 Mult1 Yes Mult F0 F2 F4 No No

Mult2 No1 Add Yes Add F6 F8 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3015 FU Mult1 Add Divide

Page 39: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.72

9/15/00

Scoreboard Example: Cycle 16

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer No3 Mult1 Yes Mult F0 F2 F4 No No

Mult2 No0 Add Yes Add F6 F8 F2 No No

Divide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3016 FU Mult1 Add Divide

Page 40: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.73

9/15/00

Scoreboard Example: Cycle 17

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer No2 Mult1 Yes Mult F0 F2 F4 No No

Mult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3017 FU Mult1 Add Divide

• Why not write result of ADD???

WAR Hazard!

Page 41: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.74

9/15/00

Scoreboard Example: Cycle 18

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer No1 Mult1 Yes Mult F0 F2 F4 No No

Mult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3018 FU Mult1 Add Divide

Page 42: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.75

9/15/00

Scoreboard Example: Cycle 19

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer No0 Mult1 Yes Mult F0 F2 F4 No No

Mult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3019 FU Mult1 Add Divide

Page 43: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.76

9/15/00

Scoreboard Example: Cycle 20

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Yes Yes

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3020 FU Add Divide

Page 44: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.77

9/15/00

Scoreboard Example: Cycle 21

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No No

40 Divide Yes Div F10 F0 F6 Yes Yes

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3021 FU Add Divide

• WAR Hazard is now gone...

Page 45: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.78

9/15/00

Scoreboard Example: Cycle 22

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16 22

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 NoMult2 NoAdd No

39 Divide Yes Div F10 F0 F6 No No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3022 FU Divide

Page 46: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.79

9/15/00

(skip a few cycles)

Page 47: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.80

9/15/00

Scoreboard Example: Cycle 61

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61ADDD F6 F8 F2 13 14 16 22

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 NoMult2 NoAdd No

0 Divide Yes Div F10 F0 F6 No No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3061 FU Divide

Page 48: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.81

9/15/00

Scoreboard Example: Cycle 62

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61 62ADDD F6 F8 F2 13 14 16 22

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 NoMult2 NoAdd NoDivide No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3062 FU

Page 49: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.82

9/15/00

Review: Scoreboard Example: Cycle 62

Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61 62ADDD F6 F8 F2 13 14 16 22

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Integer NoMult1 NoMult2 NoAdd NoDivide No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3062 FU

• In-order issue; out-of-order execute & commit

Page 50: September 15, 2000 Prof. John Kubiatowicz

CS252/KubiatowiczLec 5.83

9/15/00

Detailed Scoreboard Pipeline Control

Read operandsExecutio

n complete

Instruction status

Write result

Issue

Bookkeeping

Rj No; Rk No

f(if Qj(f)=FU then Rj(f) Yes);f(if Qk(f)=FU then Rk(f) Yes); Result(Fi(FU)) 0; Busy(FU) No

Busy(FU) yes; Op(FU) op; Fi(FU) `D’; Fj(FU) `S1’;

Fk(FU) `S2’; Qj Result(‘S1’); Qk Result(`S2’); Rj not Qj; Rk not Qk; Result(‘D’) FU;

Rj and Rk

Functional unit done

Wait until

f((Fj(f)Fi(FU) or Rj(f)=No) & (Fk(f)Fi(FU) or

Rk( f )=No))

Not busy (FU) and not result(D)