Advanced Pipelining and Instruction Level Parallelism (ILP)

39
F00: 1 CS/EE 5810 CS/EE 6810 Advanced Pipelining and Instruction Level Parallelism (ILP)

description

Advanced Pipelining and Instruction Level Parallelism (ILP). HW Schemes: Instruction Parallelism. Why in HW at run time? Works when can’t know real dependence at compile time Makes the compiler simpler Code for one machine runs well on another - PowerPoint PPT Presentation

Transcript of Advanced Pipelining and Instruction Level Parallelism (ILP)

Page 1: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 1CS/EE 5810CS/EE 6810

Advanced Pipelining and Instruction Level Parallelism (ILP)

Page 2: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 2CS/EE 5810CS/EE 6810

HW Schemes: Instruction Parallelism

• Why in HW at run time?– Works when can’t know real dependence at compile time

– Makes the compiler simpler

– Code for one machine runs well on another

• Key idea: Allow instructions behind stall to proceedDIVD F0,F2,F4

ADDD F10,F0,F8

SUBD F12,F8,F14– Enables out-of-order execution => out-of-order completion

– ID stage checked both for structural and data hazards

– Scoreboard dates to CDC 6600 in 1963

Page 3: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 3CS/EE 5810CS/EE 6810

Dynamic Scheduling

Compiler scheduling is static scheduling

• Idea: Dynamic HW control of hazard and issue– Hazards handled are structural and data

» Control are also handled, but ignore them for now

– Issue to EX stage only when things are clear

• Implementation: two approaches in the book– Control-centric – Scoreboarding

– Data-centric – Tomasulo algorithm

• Variants of these basic schemes are also seen in today’s processors

– Issue queue – decentralized control-centric

– Also distributed versions of the data-centric approach

Page 4: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 4CS/EE 5810CS/EE 6810

Scoreboarding

• Out-of-order execution divides ID stage:1. Issue—decode instructions, check for structural hazards

2. Read operands—wait until no data hazards, then read operands

• Scoreboards allow instruction to execute whenever 1 & 2 hold, not waiting for prior instructions

• CDC 6600: In order issue, out of order execution, out of order commit ( also called completion)

Page 5: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 5CS/EE 5810CS/EE 6810

Scoreboard Implications

• Out-of-order completion => WAR, WAW hazards?

• Solutions for WAR– Queue both the operation and copies of its operands

– Read registers only during Read Operands stage

• For WAW, must detect hazard: stall until other completes

• Need to have multiple instructions in execution phase => multiple execution units or pipelined execution units

• Scoreboard keeps track of dependencies, state or operations

• Scoreboard replaces ID, EX, WB with 4 stages

Page 6: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 6CS/EE 5810CS/EE 6810

Rules

Text is a bit underspecified due to ALU/FP focus

• In reality, issue rules are influenced by: – Instruction window (how many instructions you see)

» Size affects how many instructions are available for issue

» Bigger tends to be better up to a point (bigger is always more expensive…)

» Fill rate – how many instructions can enter the window per cycle

» Issue rate – how many instructions can issue per cycle

– Execution units

» What and how many?

» Result latencies and initiation intervals have a clear influence

– Register file issues (FPRs and GPRs)

» Size and number of read and write ports available

– Bypass (forwarding) capabilities

» Coupled with execution latencies to determine operand availability

Page 7: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 7CS/EE 5810CS/EE 6810

Issue Rules

Idea is to only issue when hazards clear• Structural hazards

– Obvious – desired EX unit must be free– Bus structure influences

» Sharing of decoded instruction or register file ports can limit issue

» E.g. FMPY and FDIV units may exist, but only one may issue per cycle

• Data Hazards– More complex due to parallel multi-cycle EX units– RAW – delay issue until producing instruction clears– WAW – delay issue if pending Rd target is in EX– WAR – delay issue of Writing inst until reading inst clears operand

read stage

Page 8: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 8CS/EE 5810CS/EE 6810

Four Stages of Scoreboard Control1. Issue—decode instructions & check for

structural hazards (ID1) If a functional unit for the instruction is free and no other

active instruction has the same destination register (WAW), the scoreboard issues the instruction to the functional unit and updates its internal data structure. If a structural or WAW hazard exists, then the instruction issue stalls, and no further instructions will issue until these hazards are cleared.

2.Read operands—wait until no data hazards, then read operands (ID2)

A source operand is available if no earlier issued active instruction is going to write it, or if the register containing the operand is being written by a currently active functional unit. When the source operands are available, the scoreboard tells the functional unit to proceed to read the operands from the registers and begin execution. The scoreboard resolves RAW hazards dynamically in this step, and instructions may be sent into execution out of order.

Page 9: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 9CS/EE 5810CS/EE 6810

Four Stages of Scoreboard Control

3.Execution—operate on operands (EX) The functional unit begins execution upon receiving

operands. When the result is ready, it notifies the scoreboard that it has completed execution.

4.Write result—finish execution (WB) Once the scoreboard is aware that the functional unit has

completed execution, the scoreboard checks for WAR hazards. If none, it writes results. If WAR, then it stalls the instruction.

Example:

DIVD F0,F2,F4

ADDD F10,F0,F8

SUBD F8,F8,F14

CDC 6600 scoreboard would stall SUBD until ADDD reads operands

Page 10: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 10CS/EE 5810CS/EE 6810

Three Parts of the Scoreboard

1. Instruction status—which of 4 steps the instruction is in

2. Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each functional unit

Busy—Indicates whether the unit is busy or notOp—Operation to perform in the unit (e.g., + or –)Fi—Destination registerFj, Fk—Source-register numbersQj, Qk—Functional units producing source registers Fj, FkRj, Rk—Flags indicating when Fj, Fk are ready

3. Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register

Page 11: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 11CS/EE 5810CS/EE 6810

Detailed Scoreboard Pipeline Control

Read operands

Execution complete

Instruction status

Write result

Issue

Bookkeeping

Rj No; Rk No

f(if Qj(f)=FU then Rj(f) Yes);f(if Qk(f)=FU then Rj(f) Yes);

Result(Fi(FU)) 0; Busy(FU) No

Busy(FU) yes; Op(FU) op; Fi(FU) `D’; Fj(FU) `S1’;

Fk(FU) `S2’; Qj Result(‘S1’); Qk Result(`S2’); Rj not Qj; Rk not Qk; Result(‘D’) FU;

Rj and Rk

Functional unit done

Wait until

f((Fj( f )≠Fi(FU) or Rj( f )=No) &

(Fk( f ) ≠Fi(FU) or

Rk( f )=No))

Not busy (FU) and not result(D)

Page 12: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 12CS/EE 5810CS/EE 6810

Scoreboarding Example

• Pick some functional unit latencies– FP Add / Sub is 2 cycles

– FP Mult is 10 cycles

– FP divide is 40 cycles

Page 13: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 13CS/EE 5810CS/EE 6810

Scoreboard ExampleInstruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2LD F2 45+ R3MULTDF0 F2 F4SUBDF8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F30FU

Page 14: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 14CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 1Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1LD F2 45+ R3MULTDF0 F2 F4SUBDF8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F301 FU Integer

Page 15: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 15CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 2Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2LD F2 45+ R3MULTDF0 F2 F4SUBDF8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F302 FU Integer

• Issue 2nd LD?

Page 16: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 16CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 3

• Issue MULT?

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3LD F2 45+ R3MULTDF0 F2 F4SUBDF8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 NoMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F303 FU Integer

Page 17: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 17CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 4Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3MULTDF0 F2 F4SUBDF8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 NoMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F304 FU Integer

Page 18: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 18CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 5Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5MULTDF0 F2 F4SUBDF8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 YesMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F305 FU Integer

Page 19: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 19CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 6Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6MULTDF0 F2 F4 6SUBDF8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 YesMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F306 FU Mult1 Integer

Page 20: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 20CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 7Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULTDF0 F2 F4 6SUBDF8 F6 F2 7DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 NoMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Sub F8 F6 F2 Integer Yes NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F307 FU Mult1 Integer Add

• Read multiply operands?

Page 21: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 21CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 8aInstruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULTDF0 F2 F4 6SUBDF8 F6 F2 7DIVD F10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 NoMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Sub F8 F6 F2 Integer Yes NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F308 FU Mult1 Integer Add Divide

Page 22: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 22CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 8bInstruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6SUBDF8 F6 F2 7DIVD F10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F308 FU Mult1 Add Divide

Page 23: Advanced Pipelining and Instruction Level Parallelism (ILP)

Scoreboard Example Cycle 9Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBDF8 F6 F2 7 9DIVD F10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

10 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No

2 Add Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F309 FU Mult1 Add Divide

• Read operands for MULT & SUBD? Issue ADDD?

Page 24: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 24CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 11Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBDF8 F6 F2 7 9 11DIVD F10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

8 Mult1 Yes Mult F0 F2 F4 No NoMult2 No

0 Add Yes Sub F8 F6 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3011 FU Mult1 Add Divide

Page 25: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 25CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 12Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBDF8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

7 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3012 FU Mult1 Divide

• Read operands for DIVD?

Page 26: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 26CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 13Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBDF8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

6 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3013 FU Mult1 Add Divide

Page 27: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 27CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 14Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBDF8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

5 Mult1 Yes Mult F0 F2 F4 No NoMult2 No

2 Add Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3014 FU Mult1 Add Divide

Page 28: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 28CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 15Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBDF8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

4 Mult1 Yes Mult F0 F2 F4 No NoMult2 No

1 Add Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3015 FU Mult1 Add Divide

Page 29: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 29CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 16Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBDF8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

3 Mult1 Yes Mult F0 F2 F4 No NoMult2 No

0 Add Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3016 FU Mult1 Add Divide

Page 30: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 30CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 17Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBDF8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

2 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3017 FU Mult1 Add Divide

• Write result of ADDD?

Page 31: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 31CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 18Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBDF8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

1 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3018 FU Mult1 Add Divide

Page 32: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 32CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 19Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19SUBDF8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

0 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3019 FU Mult1 Add Divide

Figure 4.5 in the book…

Page 33: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 33CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 20Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBDF8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Yes Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3020 FU Add Divide

Page 34: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 34CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 21Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBDF8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Yes Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3021 FU Add Divide

Page 35: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 35CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 22Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBDF8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDDF6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd No

40 Divide Yes Div F10 F0 F6 No NoRegister result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3022 FU Divide

Page 36: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 36CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 61Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBDF8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61ADDDF6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd No

0 Divide Yes Div F10 F0 F6 No NoRegister result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3061 FU Divide

Page 37: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 37CS/EE 5810CS/EE 6810

Scoreboard Example Cycle 62Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBDF8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61 62ADDDF6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd No

0 Divide NoRegister result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3062 FU

Page 38: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 38CS/EE 5810CS/EE 6810

CDC 6600 Scoreboard

• Speedup 1.7 from compiler; 2.5 by hand BUT slow memory (no cache) limits benefit

• Limitations of 6600 scoreboard:– No forwarding hardware

– Limited to instructions in basic block (small window)

– Small number of functional units (structural hazards), especailly integer/load store units

– Do not issue on structural hazards

– Wait for WAR hazards

– Prevent WAW hazards

Page 39: Advanced Pipelining and Instruction Level Parallelism (ILP)

F00: 39CS/EE 5810CS/EE 6810

Intrinsic Scoreboard Limits

• Amount of ILP– Based on true dependencies (RAW hazards)

– Compiler can do a lot to reduce this, unrolling, scheduling, etc.

• Number of Scoreboard entries– Window size

– Fill rate and issue rate

• Number and type of functional units– More means fewer structural hazards

• Antidependencies and output dependencies– WAR and WAW hazards

– Renaming (using virtual registers) and more registers will help

• Plus you can also speculate…