Advanced Pipelining

56
COMP25212 Advanced Pipelining Out of Order Processors

description

Advanced Pipelining. Out of Order Processors. COMP25212. Overview and Learning Outcomes. Find out how modern processors work Understand the evolution of processors Learn how out-of-order processors can improve processors performance - PowerPoint PPT Presentation

Transcript of Advanced Pipelining

Page 1: Advanced Pipelining

COMP25212

Advanced Pipelining

Out of Order Processors

Page 2: Advanced Pipelining

Overview and Learning Outcomes

• Find out how modern processors work

• Understand the evolution of processors

• Learn how out-of-order processors can improve processors performance

• Discover architectural solutions to support and improve out-of-order execution

2

Page 3: Advanced Pipelining

Remember from last week

• Classic 5-stage pipeline• Control Hazards• Data Hazards

• Instruction Level Parallelism• Superscalar

• Out of order execution

Page 4: Advanced Pipelining

Classic 5-stage pipeline

Fetch

Logic

DecodeLogic

Exec

Logic

Mem

Logic

Write

Logic

Inst Cache Data Cache

• A single execution flow

Page 5: Advanced Pipelining

Modern Pipelines

Fetch

Decode

Add1

Ld2

Write

Back

Inst Cache

Write

Back

Ld1

Mul3

Write

Back

Mul1

Mul2

Div3

Write

Back

Div1

Div2

• Many execution flows

Functional Units

Page 6: Advanced Pipelining

In ARM Processors

In-order processor

Out of order processor

Page 7: Advanced Pipelining

Out of Order Execution

• The original order in a program is not preserved

• Processors execute instructions as input data becomes available

• Pipeline stalls due to conflicted instructions are avoided by processing instructions which are able to run immediately

• Take advantage of ILP

• Instructions per cycle increases

Page 8: Advanced Pipelining

Conflicted Instructions

• Cache misses: long wait before finishing execution

• Structural Hazard: the required resources are not available

• Data hazard: dependencies between instructions

Page 9: Advanced Pipelining

Structural Hazards

• Functional Units are typically not pipelined

• This means only one instruction can use them at once

• If all suitable Functional Units for executing an instruction are busy, then the instruction can not be executed

Page 10: Advanced Pipelining

Modern Pipelines

Fetch

Decode

Add1

Ld2

Write

Back

Inst Cache

Write

Back

Ld1

Mul3

Write

Back

Mul1

Mul2

Div3

Write

Back

Div1

Div2

• Many execution flows

Functional Units

Page 11: Advanced Pipelining

Data dependencies

• True dependencyr1 <- r2 op r3

r4 <- r1 op r5

• Anti-dependencyr1 <- r2 op r3

r2 <- r4 op r5

• Output dependencyr1 <- r2 op r3

r1 <- r4 op r5

Read-after-write

RAW

Write-after-read

WAR

Write-after-write

WAW

Page 12: Advanced Pipelining

• Key Idea: Allow instructions behind stall to proceed. => Instructions executing in parallel. There are multiple execution units, so use them

DIVD F0, F2, F4

ADDD F10, F0, F8

SUBD F12, F8, F14

– Enables out-of-order execution => out-of-order completion

Even though ADDD stalls, theSUBD has no dependencies and can run.

Dynamic Scheduling

Dynamic pipeline scheduling overcomes the limitations of in-order pipelined execution by allowing out-of-order instruction execution

Page 13: Advanced Pipelining

Out of Order Execution with Scoreboard

Page 14: Advanced Pipelining

Scoreboard

• The scoreboard is a centralized hardware mechanism

– Instruction are executed as soon as their operands are available and there are no hazard conditions

• It dynamically constructs the dependency graph by hardware for a window of instructions as they are issued in program order

• The scoreboard is a “data structure” that provides the information necessary for all pieces of the processor to work together

(In Appendix A.8) CDC6600 (1963)

Page 15: Advanced Pipelining

• Out-of-order execution divides ID stage:1.Issue—decode instructions, check for structural

hazards

2.Read operands—wait until no data hazards, then read operands

• Scoreboards allow instruction to execute whenever 1 & 2 hold, not waiting for prior instructions

• We will use In-order issue, out-of-order execution, out of order commit ( also called completion)

The Key idea of Scoreboards

Page 16: Advanced Pipelining

Typical Scoreboard Structure

Page 17: Advanced Pipelining

Issue

WriteBack

ReadOperands

ExecuteInteger

WriteBack

ExecuteFP Multiplication

WriteBack

ExecuteFP Add

WriteBack

ExecuteFP Division

WriteBack

ExecuteFP Multiplication

Stages of a Scoreboard Pipeline

Page 18: Advanced Pipelining

1. Issue —decode instructions & check for structural & WAW hazards (ID)

If a functional unit for the instruction is free (no structural hazards) and no other active instruction has the same destination register (no WAW), the scoreboard issues the instruction to the functional unit and updates its internal data structure.

If a structural or WAW hazard exists, then the instruction issue stalls, and no further instructions will issue until these hazards are cleared.

2. Read operands —wait until no data hazards, then read operands (RO)

A source operand is available if no earlier issued active instruction is going to write it, or if the register containing the operand is being written by a currently active functional unit (no RAW).

When the source operands are available, the scoreboard tells the functional unit to proceed to read the operands from the registers and begin execution. The scoreboard resolves RAW hazards dynamically in this step, and instructions may be sent into execution out of order.

Alwaysdone in programorder

Can bedone out ofprogramorder

Stages of a Scoreboard Pipeline

Page 19: Advanced Pipelining

3. Execution —operate on operands (EX)

The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard that it has completed execution.

4. Write result —finish execution (WB)

Once the scoreboard is aware of the fact that the functional unit has completed execution, the scoreboard checks for WAR hazards. If none, it writes results. If WAR, then it stalls the instruction.

Example:

DIVD F0,F2,F4

ADDD F10,F0,F8

SUBD F8,F8,F14

Scoreboard would stall SUBD until ADDD reads operands

Stages of a Scoreboard Pipeline

Page 20: Advanced Pipelining

1. Instruction status—which of 4 stages the instruction is in

2. Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each functional unit

Busy—Indicates whether the unit is being used or not

Op—Operation to perform in the unit (e.g., + or –)

Fi—Destination register

Fj, Fk—Source-register numbers

Qj, Qk—Functional units producing source registers Fj, Fk

Rj, Rk—Flags indicating when Fj, Fk are ready. Set to No after operands are read.

3. Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register

Information within the Scoreboard

Page 21: Advanced Pipelining

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2LD F2 45+ R3MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F30FU

Instruction stream

Instruction status:Scoreboard only

records the statusWe will show the

times for each stage, for convenience

Information within the Scoreboard

Page 22: Advanced Pipelining

1. Instruction status—which of 4 stages the instruction is in

2. Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each functional unit

Busy—Indicates whether the unit is being used or not

Op—Operation to perform in the unit (e.g., + or –)

Fi—Destination register

Fj, Fk—Source-register numbers

Qj, Qk—Functional units producing source registers Fj, Fk

Rj, Rk—Flags indicating when Fj, Fk are ready. Set to No after operands are read.

3. Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register

Information within the Scoreboard

Page 23: Advanced Pipelining

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2LD F2 45+ R3MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F30FU

FU countdown

Functional Units:1 Integer

2 Multiplication1 Addition1 Division

Information within the Scoreboard

Source and destination

registers

Which FU willproduce each

operand

OperandsReady?

Page 24: Advanced Pipelining

1. Instruction status—which of 4 stages the instruction is in

2. Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each functional unit

Busy—Indicates whether the unit is being used or not

Op—Operation to perform in the unit (e.g., + or –)

Fi—Destination register

Fj, Fk—Source-register numbers

Qj, Qk—Functional units producing source registers Fj, Fk

Rj, Rk—Flags indicating when Fj, Fk are ready. Set to No after operands are read.

3. Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register

Information within the Scoreboard

Page 25: Advanced Pipelining

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2LD F2 45+ R3MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F30FU

Clock cycle counter

Information within the Scoreboard

Which FU will write in each register?

Page 26: Advanced Pipelining

A Scoreboard ExampleThe following code is run on a scoreboard pipeline with:

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional Unit (FU) # of FUs EX cycles

Integer Mem 1 1Floating Point Multiply 2 10 Floating Point Add 1 2Floating point Divide 1 40

Functional units are not pipelined

Page 27: Advanced Pipelining

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

123456

L.D F6, 34 (R2)

1

L.D F2, 45 (R3)

2

MUL.D F0, F2, F4

3

DIV.D F10, F0, F6

5

SUB.D F8, F6, F2

4

ADD.D F6, F8, F2

6

Date Dependence:(1, 4) (1, 5) (2, 3) (2, 4) (2, 6) (3, 5) (4, 6)

Output Dependence:(1, 6)

Anti-dependence: (5, 6)

Example Code

Real Data Dependence (RAW)

Anti-dependence (WAR)

Output Dependence (WAW)

Dependency Graph For Example Code

Page 28: Advanced Pipelining

Scoreboard Example Cycle 1

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1LD F2 45+ R3MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F301 FU Integer

Issue LD #1

Page 29: Advanced Pipelining

Scoreboard Example Cycle 2

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2LD F2 45+ R3MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F302 FU Integer

LD#1 reads operandsLD #2 can’t issue since integer unit is busyMULT can’t issue because we require in-order issue.Pipeline Stalls

Stall

Page 30: Advanced Pipelining

Scoreboard Example Cycle 3

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3LD F2 45+ R3MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F303 FU Integer

LD #1 completes

Page 31: Advanced Pipelining

Scoreboard Example Cycle 4

Instruction status Read ExecutionWriteInstruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3MULTDF0 F2 F4SUBDF8 F6 F2DIVDF10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?

TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F304 FU

LD #1 writes back and frees Integer FU and

register F6

Page 32: Advanced Pipelining

Scoreboard Example Cycle 5

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 YesMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F305 FU Integer

Issue LD #2 since integer unit is now free.

Page 33: Advanced Pipelining

Scoreboard Example Cycle 6

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6MULTDF0 F2 F4 6SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 YesMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F306 FU Mult1 Integer

Issue MULT.

Page 34: Advanced Pipelining

Scoreboard Example Cycle 7

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULTDF0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 YesMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Sub F8 F6 F2 Integer Yes NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F307 FU Mult1 Integer Add

MULT can’t read its operands (F2) because LD #2 hasn’t finished.

SUBD is issued

Page 35: Advanced Pipelining

Scoreboard Example Cycle 8a

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULTDF0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 YesMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Sub F8 F6 F2 Integer Yes NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F308 FU Mult1 Integer Add Divide

MULT and SUBD both waiting for F2.

DIVD issues.

Page 36: Advanced Pipelining

Scoreboard Example Cycle 8b

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F308 FU Mult1 Add Divide

LD #2 writes F2.

Page 37: Advanced Pipelining

Scoreboard Example Cycle 9

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

10 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No

2 Add Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F309 FU Mult1 Add Divide

Now MULT and SUBD can both read F2.

Page 38: Advanced Pipelining

Scoreboard Example Cycle 10

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

10 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No

2 Add Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F309 FU Mult1 Add Divide

MULT and SUB continue operation

9

1

Page 39: Advanced Pipelining

Scoreboard Example Cycle 11

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9 11DIVD F10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

8 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No

0 Add Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3011 FU Mult1 Add Divide

ADDD can not be issued because add unit is busy.

SUBD completes

Page 40: Advanced Pipelining

Scoreboard Example Cycle 12

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

7 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3012 FU Mult1 Divide

SUBD finishes.

DIVD waits for F0

Page 41: Advanced Pipelining

Scoreboard Example Cycle 13

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

6 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3013 FU Mult1 Add Divide

ADDD issues.

Page 42: Advanced Pipelining

Scoreboard Example Cycle 14

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

5 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No

2 Add Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3014 FU Mult1 Add Divide

MULT and ADDDcontinue their operation

Page 43: Advanced Pipelining

Scoreboard Example Cycle 15

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

4 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No

1 Add Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3015 FU Mult1 Add Divide

Nearly there…

Page 44: Advanced Pipelining

Scoreboard Example Cycle 16

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

3 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No

0 Add Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3016 FU Mult1 Add Divide

ADDD completes execution

Page 45: Advanced Pipelining

Scoreboard Example Cycle 17

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

2 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3017 FU Mult1 Add Divide

ADDD can’t write because of RAW with DIVDADDD stalls write back

Page 46: Advanced Pipelining

Scoreboard Example Cycle 18

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

1 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3018 FU Mult1 Add Divide

MULT still continuesits execution

Page 47: Advanced Pipelining

Scoreboard Example Cycle 19

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

0 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3019 FU Mult1 Add Divide

MULT completes execution.

Page 48: Advanced Pipelining

Scoreboard Example Cycle 20

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Yes Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3020 FU Add Divide

MULT writes and frees FU and register F0

Page 49: Advanced Pipelining

Scoreboard Example Cycle 21

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Yes Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3021 FU Add Divide

DIVD can read operands

Page 50: Advanced Pipelining

Scoreboard Example Cycle 22

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDDF6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd No

40 Divide Yes Div F10 F0 F6 Yes YesRegister result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3022 FU Divide

Now ADDD can write since WAR removed

ADD FU and register F6 freed

Page 51: Advanced Pipelining

39 cycles later…

Page 52: Advanced Pipelining

Scoreboard Example Cycle 61

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61ADDDF6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd No

0 Divide Yes Div F10 F0 F6 Yes YesRegister result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3061 FU Divide

DIVD completes execution

Page 53: Advanced Pipelining

Scoreboard Example Cycle 62

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61 62ADDDF6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd No

0 Divide NoRegister result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3062 FU

DIVD writes back and frees resources

Execution Complete

Page 54: Advanced Pipelining

Scoreboard Example Cycle 62

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61 62ADDDF6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd No

0 Divide NoRegister result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3062 FU

In-order issueOut-of-order executionOut-of-order completion

Page 55: Advanced Pipelining

• Techniques to deal with data hazards in instruction pipelines by:

– Result forwarding to reduce or eliminate RAW hazards

– Hazard detection hardware to stall the pipeline during hazards

– Compiler-based static scheduling to separate the dependent instructions minimizing actual hazard-prevention stalls in scheduled code (will discuss in detail next week.)

– Uses a hardware-based mechanism to rearrange instruction execution order to reduce stalls dynamically at runtime (dynamic scheduling)

» Better dynamic exploitation of instruction-level parallelism (ILP)

Summary

Page 56: Advanced Pipelining

• The amount of parallelism available among the instructions (chosen from the same basic block)

• The number of score entries (The size of the scoreboard determines the size of the window)

• The number and types of functional units (Structural hazards increase when out of order execution is used)

• The presence of antidependence and output dependences lead to WAR and WAW stalls.

Limitations of Scoreboard