EC 513 - ascslab.org · Department of Electrical & Computer Engineering Structural Hazard sub x8,...

38
Department of Electrical & Computer Engineering EC 513 Computer Architecture Prof. Michel A. Kinsy Hazard Resolution

Transcript of EC 513 - ascslab.org · Department of Electrical & Computer Engineering Structural Hazard sub x8,...

Department of Electrical & Computer Engineering

EC 513Computer Architecture

Prof. Michel A. Kinsy

Hazard Resolution

Department of Electrical & Computer Engineering

Instruction Interactions § An instruction in the pipeline may need a

resource being used by another instruction in the pipeline § Structural hazard

§ An instruction may depend on something produced by an earlier instruction§ Dependence may be for a data calculation

§ Data hazard§ Dependence may be for calculating the next

address§ Control hazard (branches, interrupts)

Department of Electrical & Computer Engineering

Structural Hazardsub x8, x6, x7 lw x1, 8(x2)add x5,x6,x7 sw x3, 24(x4)

Address

Inst[31-0]

PC

WriteData

Read Addr 1

Read Addr 2

WriteAddrRegisterFile

ReadData1

ReadData2

ALU

Overflow

zero

Address

WriteData

Read Data

MemWrite

MemRead

SignExtend32 64

MemtoReg

ALUSrc

Shiftleft1

ADD

PCSrc

ALUControl

1

1

00

0

1

ALUOp

ControlUnit

Branch

Memory

RegWrite

RegWrite

Instr[30, 14-12]

Instr[19-15]

Instr[24-20]

Instr[11-7]

Instr[31-21]

ADD

4

0

Department of Electrical & Computer Engineering

Multi-Stage RISC-V CPU

Address

Inst[31-0]

PC

WriteData

Read Addr 1

Read Addr 2

WriteAddrRegisterFile

ReadData1

ReadData2

ALU

Overflow

zero

Address

WriteData

Read Data

MemWrite

MemRead

SignExtend

32 64

MemtoReg

ALUSrc

Shiftleft1

ADD

PCSrc

ALUControl

1

1

00

0

1

ALUOp

ControlUnit

Branch

Memory

RegWrite

RegWrite

Instr[30, 14-12]

Instr[19-15]

Instr[24-20]

Instr[11-7]

Instr[31-21]

ADD

4

0

Department of Electrical & Computer Engineering

timet0 t1 t2 t3 t4 t5 t6 t7 . . .

IF I1 I2 I3 I3 I3 I3 I4 I5ID I1 I2 I2 I2 I2 I3 I4EX I1 nop nop nop I2 I3MA I1 nop nop nop I2WB I1 nop nop nop

Stalled Stages and Pipeline

timet0 t1 t2 t3 t4 t5 t6 t7 . .

(I1) r1 ß (r0) + 10 IF1 ID1 EX1 MA1 WB1

(I2) r4 ß (r1) + 17 IF2 ID2 ID2 ID2 ID2 EX2 MA2

(I3) IF3 IF3 IF3 IF3 ID3 EX3(I4) IF4 ID4

(I5) IF5stalled stages

Resource Usage

Department of Electrical & Computer Engineering

Resolving Data Hazards§ Strategy 1: Wait for the result to be available by

freezing earlier pipeline stages § Interlocks

§ Strategy 2: Route data as soon as possible after it is calculated to the earlier pipeline stage § Bypass

Department of Electrical & Computer Engineering

Resolving Data Hazards§ Strategy 3: Speculate on the dependence

§ Two cases:§ Guessed correctly

§ Do nothing

§ Guessed incorrectly § Kill and restart

Department of Electrical & Computer Engineering

Instruction Formats§ R-type instruction

§ I-type instruction & I-immediate (32 bits)

§ I-imm = signExtend(inst[31:20])§ S-type instruction & S-immediate (32 bits)

§ S-imm = signExtend({inst[31:25], inst[11:7]})

funct7 rs2 funct3rs1 rd opcode7 5 5 3 5 7

imm[11:0] funct3rs1 rd opcode12 5 3 5 7

imm[11:5] rs2 funct3rs1 imm[4:0] opcode7 5 5 3 5 7

Department of Electrical & Computer Engineering

Instruction Formats§ SB-type instruction & B-immediate (32 bits)

§ B-imm = signExtend({inst[31], inst[7], inst[30:25], inst[11:8], 1’b0})

§ U-type instruction & U-immediate (32 bits)

§ U-imm = signExtend({inst[31:12], 12’b0})

§ UJ-type instruction & J-immediate (32 bits)

§ J-imm = signExtend({inst[31], inst[19:12], inst[20], inst[30:21], 1’b0})

imm[12] imm[10:5] rs2 rs1 funct3 imm[4:1] imm[11] opcode1 6 5 5 3 4 1 7

rd opcode5 7

imm[31:12]20

imm[20] imm[10:1] rdimm[19:12]imm[11] opcode1 10 1 8 5 7

Department of Electrical & Computer Engineering

Instruction Formats

funct7 rs2 funct3rs1 rd opcode7 5 5 3 5 7

imm[11:0] funct3rs1 rd opcode

imm[11:5] rs2 funct3rs1 imm[4:0] opcode

imm[12] imm[10:5] rs2 rs1 funct3 imm[4:1-11] opcode

rd opcodeimm[31:12]

imm[20] imm[10:1] rdimm[19:12]imm[11] opcode

R-type

I-type

S-type

SB-type

U-type

UJ-type

Department of Electrical & Computer Engineering

Source and Destination Registersfunct7 rs2 funct3rs1 rd opcode

7 5 5 3 5 7

imm[11:0] funct3rs1 rd opcode

imm[11:5] rs2 funct3rs1 imm[4:0] opcode

imm[12] imm[10:5] rs2 rs1 funct3 imm[4:1-11] opcode

rd opcodeimm[31:12]

imm[20] imm[10:1] rdimm[19:12]imm[11] opcode

R-type

I-type

S-type

SB-type

U-type

UJ-type

source(s) destinationALU rd ß (rs1) [func3,func7] (rs2) rs1, rs2 rdALUi rd ß (rs1) [func3] I-imm rs1 rd

rd ß (rs1) [funct3, inst[30]] I-imm[4:0] rs1 rd

Department of Electrical & Computer Engineering

Source and Destination Registersfunct7 rs2 funct3rs1 rd opcode

7 5 5 3 5 7

imm[11:0] funct3rs1 rd opcode

imm[11:5] rs2 funct3rs1 imm[4:0] opcode

imm[12] imm[10:5] rs2 rs1 funct3 imm[4:1-11] opcode

rd opcodeimm[31:12]

imm[20] imm[10:1] rdimm[19:12]imm[11] opcode

R-type

I-type

S-type

SB-type

U-type

UJ-type

source(s) destinationALU rd ß (rs1) [func3,func7] (rs2) rs1, rs2 rdALUi rd ß (rs1) [func3] I-imm rs1 rd

rd ß (rs1) [funct3, inst[30]] I-imm[4:0] rs1 rdLW rd ß M [(rs1) + imm] rs1 rdSW M [(rs1) + imm] ß (rs2) rs1, rs2LUI rd ß U-imm rdAUIPC rd ß pc + U-imm rd

Department of Electrical & Computer Engineering

Source and Destination Registerssource(s) destination

ALU rd ß (rs1) [func3,func7] (rs2) rs1, rs2 rdALUi rd ß (rs1) [func3] I-imm rs1 rd

rd ß (rs1) [funct3, inst[30]] I-imm[4:0] rs1 rdLW rd ß M [(rs1) + imm] rs1 rdSW M [(rs1) + imm] ß (rs2) rs1, rs2LUI rd ß U-imm rdAUIPC rd ß pc + U-imm rdJAL rd ß pc + 4 rd

pc ß pc + J-immJALR rd ß pc + 4 rs1 rd

pc ß (rs1 + I-imm) & ~0x01BR pc ßcompare(funct3, rs1, rs2) ? rs1, rs2

pc + B-imm : pc + 4

Department of Electrical & Computer Engineering

Types of Data Hazards § Consider executing a sequence of

rk ß (ri) op (rj) type of instructions

Output-dependencer3 ß (r1) op (r2) Write-after-Write r3 ß (r6) op (r7) (WAW) hazard

Data-dependencer3 ß (r1) op (r2) Read-after-Write r5 ß (r3) op (r4) (RAW) hazard

Anti-dependencer3 ß (r1) op (r2) Write-after-Read r1 ß (r4) op (r5) (WAR) hazard

Department of Electrical & Computer Engineering

Detecting Data Hazards§ Range and Domain of instruction i

§ R(i) = Registers (or other storage) modified by instruction i

§ D(i) = Registers (or other storage) read by instruction

§ Suppose instruction j follows instruction i in the program order. Executing instruction j before the effect of instruction i has taken place can cause a§ RAW hazard if R(i) D(j) !=§ WAR hazard if D(i) R(j) !=§ WAW hazard if R(i) R(j) !=

UUU

OOO

Department of Electrical & Computer Engineering

Register vs. Memory Data Dependence

§ Data hazards due to register operands can be determined at the decode stage

§ Data hazards due to memory operands can be determined only after computing the effective address§ store M[(r1) + disp1] ß (r2) § load r3 ß M[(r4) + disp2]

§ Does (r1 + disp1) = (r4 + disp2) ?

Department of Electrical & Computer Engineering

Data Hazards: An ExampleI1 ADD x6, x6, x4

I2 LW x2, 44(x3)

I3 SUB x5, x2, x4

I4 AND x8, x6, x2

I5 SUB x10, x5, x6

I6 ADD x6, x8, x2

RAW Hazards

Department of Electrical & Computer Engineering

Data Hazards: An Example

RAW HazardsWAR Hazards

I1 ADD x6, x6, x4

I2 LW x2, 44(x3)

I3 SUB x5, x2, x4

I4 AND x8, x6, x2

I5 SUB x10, x5, x6

I6 ADD x6, x8, x2

Department of Electrical & Computer Engineering

Data Hazards: An Example

RAW HazardsWAR HazardsWAW Hazards

I1 ADD x6, x6, x4

I2 LW x2, 44(x3)

I3 SUB x5, x2, x4

I4 AND x8, x6, x2

I5 SUB x10, x5, x6

I6 ADD x6, x8, x2

Department of Electrical & Computer Engineering

Data Hazards: An Example

RAW HazardsWAR HazardsWAW Hazards

I6

I2

I4

I1

I5

I3

I1 ADD x6, x6, x4

I2 LW x2, 44(x3)

I3 SUB x5, x2, x4

I4 AND x8, x6, x2

I5 SUB x10, x5, x6

I6 ADD x6, x8, x2

Department of Electrical & Computer Engineering

Resolving Data Hazards§ Strategy 1: Wait for the result to be available by

freezing earlier pipeline stages § Interlocks

§ Strategy 2: Route data as soon as possible after it is calculated to the earlier pipeline stage § Bypass

Department of Electrical & Computer Engineering

Interlocks to resolve Data Hazards

IRIR IR31

PCA

B

Y

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR

ImmExt

ALUrd1

GPRs

rs1rs2

wswd rd2

we

wdata

addr

wdata

rdataData Memory

we

nop

...x1 ß x2 + 10x4 ß x1 + 17...

Stall Condition

Department of Electrical & Computer Engineering

Cdest

we

re1 re2

Cre

ws we wsCdest Cdest

we

IRIR IR31

PCA

B

Y

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR

ImmExt

ALUrd1

GPRs

rs1rs2

wswd rd2

we

wdata

addr

wdata

rdataData Memory

we

Interlock Control Logic

nop

stallCstall

ws

rsrt

Department of Electrical & Computer Engineering

Deriving the Stall SignalCdest

ws = Case opcodeALU , ALUi à rdLW, LUI à rdJAL, JALR, AUIPC à rd

we = Case opcodeALU, ALUi, LW, LUI à (ws != 0) JAL, JALR, AUIPC à on... à off

Crere1 = Case opcode

ALU, ALUi, LW, SW, BR, JALR à onLUI, JAL, AUIPC à off

re2 = Case opcodeALU, SW, BR à on

… à off

Cstall= ((rs1D =wsE).weE + (rs1D =wsM).weM + (rs1D =wsW).weW) . re1D +((rs2D =wsE).weE + (rs2D =wsM).weM + (rs2D =wsW).weW) . re2D

Department of Electrical & Computer Engineering

Cdest

we

re1 re2

Cre

ws we wsCdest Cdest

we

IRIR IR31

PCA

B

Y

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR

ImmExt

ALUrd1

GPRs

rs1rs2

wswd rd2

we

wdata

addr

wdata

rdataData Memory

we

Interlock Control Logic

nop

stallCstall

ws

rsrt

Department of Electrical & Computer Engineering

Resolving Data Hazards§ Strategy 1: Wait for the result to be available by

freezing earlier pipeline stages § Interlocks

§ Strategy 2: Route data as soon as possible after it is calculated to the earlier pipeline stage § Bypass

Department of Electrical & Computer Engineering

Feedback to Resolve Hazards§ Later stages provide dependence information to earlier

stages which can stall (or kill) instructions § Controlling a pipeline in this manner works provided the

instruction at stage i+1 can complete without any interference from instructions in stages 1 to i§ Otherwise deadlocks may occur

FB1

stage1

stage2

stage3

stage4

FB2 FB3 FB4

Department of Electrical & Computer Engineering

Bypassing

§ Each stall or kill introduces a bubble in the pipeline à CPI > 1

timet0 t1 t2 t3 t4 t5 t6 t7 . .

(I1) r1 ß r0 + 10 IF1 ID1 EX1 MA1 WB1

(I2) r4 ß r1 + 17 IF2 ID2 EX2 MA2 WB2(I3) IF3 ID3 EX3 MA3 WB3

(I4) IF4 ID4 EX4 MA4

(I5) IF5 ID5 EX5

timet0 t1 t2 t3 t4 t5 t6 t7 . .

(I1) r1 ß (r0) + 10 IF1 ID1 EX1 MA1 WB1

(I2) r4 ß (r1) + 17 IF2 ID2 ID2 ID2 ID2 EX2 MA2

(I3) IF3 IF3 IF3 IF3 ID3 EX3(I4) IF4 ID4

(I5) IF5stalled stages

Department of Electrical & Computer Engineering

Adding a Bypass

IRIR IR31

PCA

B

Y

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR

ImmExt

ALUrd1

GPRs

rs1rs2

wswd rd2

we

wdata

addr

wdata

rdataData Memory

we

nop

...x1 ß x2 + 10x4 ß x1 + 17...

Stall Condition

ASrc

Department of Electrical & Computer Engineering

Cstall= ((rs1D =wsE).weE + (rs1D =wsM).weM + (rs1D =wsW).weW) . re1D +((rs2D =wsE).weE + (rs2D =wsM).weM + (rs2D =wsW).weW) . re2D

Deriving the Stall Signal

ASrc = (rs1D=wsE).weE.re1D

Is this correct?

Cdestws = Case opcode

ALU , ALUi à rdLW, LUI à rdJAL, JALR, AUIPC à rd

we = Case opcodeALU, ALUi, LW, LUI à (ws != 0) JAL, JALR, AUIPC à on... à off

Crere1 = Case opcode

ALU, ALUi, LW, SW, BR, JALR à onLUI, JAL, AUIPC à off

re2 = Case opcodeALU, SW, BR à on

… à off

Department of Electrical & Computer Engineering

Deriving the Stall Signal

Load: lw x1, 10(x2)

x1 ß M[x2 + 10]x4 ß x1 + 17

Cstall= ((rs1D =wsE).weE + (rs1D =wsM).weM + (rs1D =wsW).weW) . re1D +((rs2D =wsE).weE + (rs2D =wsM).weM + (rs2D =wsW).weW) . re2D

ASrc = (rs1D=wsE).weE.re1D

Cdestws = Case opcode

ALU , ALUi à rdLW, LUI à rdJAL, JALR, AUIPC à rd

we = Case opcodeALU, ALUi, LW, LUI à (ws != 0) JAL, JALR, AUIPC à on... à off

Crere1 = Case opcode

ALU, ALUi, LW, SW, BR, JALR à onLUI, JAL, AUIPC à off

re2 = Case opcodeALU, SW, BR à on

… à off

Department of Electrical & Computer Engineering

Bypass and Stall Signals§ Split weE into two components: we-bypass, we-stall

we-bypassE = Case opcodeE

ALU, ALUi, AUIPC à (ws != 0) ... à off

ASrc = (rs1D =wsE).we-bypassE . re1D

stall = ((rs1D =wsE).we-stallE + (rs1D=wsM).weM +

(rs1D=wsW).weW). re1D+((rs2D = wsE).weE +

(rs2D = wsM).weM + (rs3D = wsW).weW). re2D

we-stallE = Case opcodeE

LW à (ws != 0) ... à off

Department of Electrical & Computer Engineering

Fully Bypassed Datapath

ASrcIRIR IR

PCA

B

Y

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR ALU

ImmExt

rd1

GPRs

rs1rs2

wswdrd2

we

wdata

addr

wdata

rdataData Memory

we

31

nop

D

E M W

PC for JAL, ...

BSrc

Stall

Department of Electrical & Computer Engineering

Fully Bypassed Datapath

Is there still a need for the stall signal ?

ASrcIRIR IR

PCA

B

Y

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR ALU

ImmExt

rd1

GPRs

rs1rs2

wswdrd2

we

wdata

addr

wdata

rdataData Memory

we

31

nop

D

E M W

PC for JAL, ...

BSrc

Stall

Department of Electrical & Computer Engineering

Fully Bypassed Datapath

stall = (rs1D=wsE). (opcodeE=LWE).(wsE != 0).re1D + (rs2D=wsE). (opcodeE=LWE).(wsE!= 0).re2D

ASrcIRIR IR

PCA

B

Y

R

MD1 MD2

addrinst

InstMemory

0x4Add

IR ALU

ImmExt

rd1

GPRs

rs1rs2

wswdrd2

we

wdata

addr

wdata

rdataData Memory

we

31

nop

D

E M W

PC for JAL, ...

BSrc

Stall

Department of Electrical & Computer Engineering

Why a program may have CPI >1§ Why an Instruction may not be dispatched every

cycle (CPI>1)?§ Full bypassing may be too expensive to

implement§ Typically all frequently used paths are provided§ Some infrequently used bypass paths may increase

cycle time and counteract the benefit of reducing CPI§ Loads have two cycle latency

§ Instruction after load cannot use load result§ MIPS-I ISA defined load delay slots, a software-visible

pipeline hazard (compiler schedules independent instruction or inserts NOP to avoid hazard) - Removed in MIPS-II

Department of Electrical & Computer Engineering

Resolving Data Hazards§ Strategy 1: Wait for the result to be available by

freezing earlier pipeline stages § Interlocks

§ Strategy 2: Route data as soon as possible after it is calculated to the earlier pipeline stage § Bypass

Department of Electrical & Computer Engineering

Next Class§ Complex Pipelining: Superscalar