Lecture’s Overview

29
Mohamed Younis CMCS 411, Computer Architecture 1 CMCS 411-101 Computer Architecture Lecture 21 Handling Pipeline Hazards April 16, 2001 www.csee.umbc.edu/~younis/CMSC411/ CMSC411.htm

description

CMCS 411-101 Computer Architecture Lecture 21 Handling Pipeline Hazards April 16, 2001 www.csee.umbc.edu/~younis/CMSC411/ CMSC411.htm. Lecture’s Overview. Previous Lecture: Designing a pipelined datapath Standardized multi-stage instruction execution Unique resources per stage - PowerPoint PPT Presentation

Transcript of Lecture’s Overview

Page 1: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 1

CMCS 411-101Computer Architecture

Lecture 21

Handling Pipeline Hazards

April 16, 2001

www.csee.umbc.edu/~younis/CMSC411/ CMSC411.htm

Page 2: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 2

Lecture’s Overview Previous Lecture:

• Designing a pipelined datapath Standardized multi-stage instruction execution

Unique resources per stage

• Controlling pipeline operations Simplified stage-based control Detailed working example

This Lecture:

• Pipeline hazard detection

• Handling hazard in the pipeline design

• Supporting exceptions

Page 3: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 3

Pipelined Datapath

In struct ionme mo ry

Add ress

4

32

0

A dd A ddresult

S hif tleft 2

IF/ID E X/ME M M EM /W B

Mux

0

1

Add

PC

0W ritedata

Mux

1R eg iste rs

Readd ata 1

Readd ata 2

R ea dre gis ter 1

R ea dre gis ter 2

16Sign

exte nd

W ritere gis ter

W ritedata

R eadda ta

1

A LUresult

Mux

A LUZe ro

ID /EX

D atame mo ry

Ad dress

Data Stationary

Page 4: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 4

Standardized Five Stages InstructionsCycle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec MemLoad Wr

Ifetch Reg/Dec Exec MemStore Wr

Ifetch Reg/Dec Exec MemR-type Wr

Ifetch Reg/Dec Exec MemBeq Wr

Ifetch Reg/Dec Exec MemOri Wr

Cycle 5

Page 5: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 5

Data Stationary ControlThe Main Control generates the control signals during Reg/Dec

Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later

IF/ID

Reg

ister

ID/E

x Reg

ister

Ex/M

em R

egister

Mem

/Wr R

egister

Reg/Dec Exec Mem

ExtOp

ALUOp

RegDst

ALUSrc

Branch

MemWr

MemtoReg

RegWr

MainControl

ExtOp

ALUOp

RegDst

ALUSrc

MemtoReg

RegWr

MemtoReg

RegWr

MemtoReg

RegWr

Branch

MemWr

Branch

MemWr

Wr

* Slide is courtesy of Dave Patterson

Page 6: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 6

P C

Instruc tionmem ory

Add

Instruction[20– 16]

ALU Op

Branch

RegD st

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

A dd Addresult

R egistersW ritereg is ter

W ritedata

Readdata 1

Readdata 2

R eadreg is ter 1

R eadreg is ter 2

Signextend

Mux

1

A LUresult

Zero

W ritedata

Readdata

Mux

1

A LUcon trol

Sh iftle ft 2

M emR ead

C ontrol

ALU

Instruction[15– 11]

6

EX

M

W B

M

W B

W BIF/ID

PC Src

ID/E X

EX /ME M

ME M/W B

Mux

0

1

A ddressData

memory

Address

Datapath + Data Stationary Control

Page 7: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 7

IM R eg

IM R eg

C C 1 C C 2 C C 3 C C 4 C C 5 C C 6

Tim e (in clock cyc les)

sub $2 , $1, $3

Programexecu tionorde r(in ins tructions)

and $12, $2 , $5

IM R eg D M R eg

IM D M R eg

IM D M R eg

C C 7 C C 8 C C 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6 , $2

add $14, $2 , $2

sw $15, 100($2 )

Value ofreg is te r $2 :

D M R eg

R eg

R eg

R eg

D M

Data Hazards

Generally caused by data dependence among instructions Can cause erroneous semantics if went undetected and avoided Raised when an instruction depends on result of a prior instruction still in the pipeline and attempts to use item before it is ready

Page 8: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 8

Dependency DetectionIFetch Dec Exec Mem WB

IFetch Dec Exec Mem WB

IFetch Dec Exec Mem WB

IFetch Dec Exec Mem WBProgram Flow

Time

Write after read Data Hazard

Data Hazards is caused by backward dependency at the DEC stage A hazardous register is to be updated at an earlier instruction at the EX, MEM or WB stages of the pipeline It is assumed that register writing takes precedence over reading if happened in the same clock cycle WB stages will not cause data hazard

Data Hazard conditions:

1a. EX/MEM.Register-Rd = ID/EX.Register-Rs

1b. EX/MEM.Register-Rd = ID/EX.Register-Rt

2a. MEM/WB.Register-Rd = ID/EX.Register-Rs

2b. MEM/WB.Register-Rd = ID/EX.Register-Rt

Page 9: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 9

IM R eg

IM R eg

C C 1 C C 2 C C 3 C C 4 C C 5 C C 6

Tim e (in c lock cyc les )

sub $ 2, $1 , $3

Prog ramexecution o rde r(in ins tructions)

and $12 , $2, $5

IM R eg D M R eg

IM D M R eg

IM D M R eg

C C 7 C C 8 C C 9

10 10 10 10 1 0/– 20 – 20 – 20 – 20 – 20

or $13 , $6, $2

add $14 , $2, $2

sw $15 , 100 ($2 )

V alue o f register $2 :

D M R eg

R eg

R eg

R eg

X X X – 20 X X X X XVa lue of EX /M E M :X X X X – 20 X X X XVa lue o f M EM /W B :

D M

Data Forwarding

Detecting data hazard conditions allows using intermediate results

Forward only when there is a WB stage (check the RegWrite signal)

Page 10: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 10

R e gisters

Mux M

ux

A LU

ID /E X M E M /W B

D atam em ory

Mux

Fo rw a rdingun it

E X /M E M

F orw ardB

R dE X/M E M .RegisterRd

M EM /W B .R egisterR d

R tR tR s

Forw ardA

Mux

Forwarding Hardware

EX Hazard:

if (EX/MEM.RegWrite & (EX/MEM.Register-Rd = ID/EX.Register-Rs)) ForwardA = EX/MEM

if (EX/MEM.RegWrite & (EX/MEM.Register-Rd = ID/EX.Register-Rt)) ForwardB = EX/MEM

MEM Hazard:

if (MEM/WB.RegWrite & (MEM/WB.Register-Rd = ID/EX.Register-Rs)) ForwardA = MEM/WB

if (MEM/WB.RegWrite & (MEM/WB.Register-Rd = ID/EX.Register-Rt)) ForwardB = MEM/WB

Page 11: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 11

P CInstruc tion

m em ory

R eg is ters

Mux

Mux

C ontro l

A LU

E X

M

W B

M

W B

W B

ID /E X

E X /M E M

M EM /W B

D atam em ory

Mux

Forward ingunit

IF /ID

Mux

R dEX /M EM .R egis terRd

M EM /W B .R egis te rRd

R t

R t

R s

IF /ID .R egis terR d

IF /ID .R egis terR t

IF /ID .R egis terR t

IF /ID .R egis terR s

Forwarding Register of Successive Use

Example:

add $1, $1, $2;

add $1, $1, $3;

add $1, $1, $4;

if (MEM/WB.RegWrite & (EX/MEM.Register-Rd ID/EX.Register-Rs) &

(MEM/WB.Register-Rd = ID/EX.Register-Rs)) ForwardA = MEM/WB

if (MEM/WB.RegWrite & (EX/MEM.Register-Rd ID/EX.Register-Rs) &

(MEM/WB.Register-Rd = ID/EX.Register-Rt)) ForwardB = MEM/WB

Page 12: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 12

PCInstruc tion

memory

Registe rs

Mux

Mux

Mux

EX

M

W B

W B

Datamem ory

Mux

F orwa rdingun it

IF/ID

and $4 , $2, $5 sub $2, $1 , $3

ID /EX

before <1>

EX/M EM

be fore<2>

MEM/W B

o r $4 , $4, $2

C lock 3

2

5

10 10

$2

$5

5

2

4

$1

$3

3

1

2

Control

ALU

M

WB

Data Forwarding Example

Page 13: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 13

PCInstruc tion

memory

Registe rs

Mux

Mux

Mux

EX

M

W B

M

WB

Datamemory

Mux

F orwardingun it

IF/ID

or $4 , $4, $2 and $4 , $2, $5

ID /EX

sub $2, . . .

EX/M EM

be fore<1>

MEM/W B

add $9 , $4, $2

C lock 4

4

6

10 10

$4

$2

6

2

4

$2

$5

5

2

4

Control

ALU

10

2

W B

Example

Page 14: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 14

PCInstruc tion

memory

Registe rs

Mux

Mux

Mux

EX

M

W B

M

WB

Datamemory

Mux

Forwardingun it

IF/ID

add $9 , $4, $2 o r $4 , $4, $2

ID /EX

and $4, . . .

EX/M EM

sub $2 , . . .

MEM/W B

a fte r<1>

C lock 5

4

2

10 10

$4

$2

2

4

9

$4

$2

4

2

24

Control

ALU

10

W B

2

1

4

Example

Page 15: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 15

PCInstruc tion

mem ory

Mux

Mux

Mux

EX

M

W B

M

WB

Datam em ory

Mux

Forwardingun it

after< 1>a fte r<2> add $9 , $4, $2 o r $4, . . .

E X/M EM

and $4, . . .

MEM/W B

C lock 6

10

$4

$2

2

4

9

ALU

10

4

W B

4

1

Registe rs

IF/ID

ID /EX

4

Contro l

Example

Page 16: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 16

R eg

IM

R eg

R eg

IM

C C 1 C C 2 C C 3 C C 4 C C 5 C C 6

T im e (in c lock cyc les)

lw $2, 20 ($1)

P rogramexe cu tionorder(in ins tructions)

and $4 , $2, $5

IM R eg D M R eg

IM D M R eg

IM D M R eg

C C 7 C C 8 C C 9

or $8 , $2, $6

add $9 , $4, $2

slt $1, $6 , $7

D M R eg

R eg

R eg

D M

Data Hazard Caused by Load

Dependencies backward in time are hazards Cannot solve with forwarding Must delay/stall instruction dependent on loads

Page 17: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 17

lw $2, 20 ($1 )

P rograme xe cu tiono rde r(in ins tru ctio ns)

and $4, $2 , $ 5

or $ 8, $2 , $ 6

add $9, $4 , $ 2

slt $ 1, $6 , $7

R eg

IM

R eg

R e g

IM D M

C C 1 C C 2 C C 3 C C 4 C C 5 C C 6T im e (in c lo ck cyc les)

IM R e g D M R e gIM

IM D M R eg

IM D M R eg

C C 7 C C 8 C C 9 C C 10

D M R e g

R egR eg

R eg

bubble

Hazard Detection Unit

if (ID/EX.MemRead & ((ID/EX.Register-Rt = IF/ID.Register-Rs) |

(ID/EX.Register-Rt = IF/ID.Register-Rt))) stall the pipeline

Page 18: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 18

P CInstruct ion

m em ory

R egis ters

Mux

Mux

Mux

C ontro l

A LU

E X

M

W B

M

W B

W B

ID /E X

E X /M E M

M EM /W B

D atam em ory

Mux

Ha zarddetec tion

un it

Forw ard ingun it

0

Mux

IF /ID

ID /EX .M e m R ead

ID /EX .R egis terR t

IF /ID .R eg is te rR d

IF /ID .R eg is te rR t

IF /ID .R eg is te rR t

IF /ID .R eg is te rR s

R tR s

R d

R t EX /M EM .R egisterR d

ME M/W B .R egisterR d

Supporting Pipeline Stall

De-asserting all the control line in EX, MEM and WB control fields of the ID/EX pipeline stationary register would stall the pipeline

Page 19: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 19

R eg

R eg

C C 1

Tim e (in c lock cyc le s)

40 be q $1 , $3, 7

P rogramexecu tiono rde r(in ins tru ctions)

IM R e g

IM D M

IM D M

IM D M

D M

D M R eg

R eg R eg

R eg

R eg

R egIM

44 an d $1 2, $2 , $5

48 or $13, $6 , $ 2

52 ad d $1 4, $2 , $2

72 lw $4 , 50($7)

C C 2 C C 3 C C 4 C C 5 C C 6 C C 7 C C 8 C C 9

R eg

Control Hazards

Must be carefully detected and handled to avoid semantic error Can be solved by excessive pipeline stalls which is inefficient Two approaches: “delayed branching” and “branch prediction”

Page 20: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 20

PCInstruction

memory

4

R egisters

Mux

Mux

Mux

A LU

E X

M

W B

M

W B

W B

ID/E X

0

EX /ME M

M EM /W B

Datam em ory

Mux

H azarddetection

unit

Forward ingunit

IF .F lush

IF/ID

Signextend

C ontro l

Mux

=

Shiftle ft 2

Mux

Delayed Branching

Moves comparison to ID stage to minimize worst case pipeline flushing Interacts with data hazards hardware to ensure right register values Flushes the pipeline with zero control values and instruction register

Page 21: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 21

PCIn struc tion

m em ory

4

Re gisters

Signexten d

Mux

Mux

Con trol

EX

M

W B

M

WB

WB

Mux

Hazarddetec tion

un it

F orwa rd ingu nit

Mux

IF.F lu sh

IF/ID

and $12, $ 2, $5 b eq $1 , $3, 7 su b $10, $4, $8

M EM /W B

EX/M EM

ID /EX

C lock 3

72 44

48 4 4

28

7

$1

$3

1 0

4 8

72

72

0

$4

$8

ALUData

m em o ry

Mux

S hif tle f t 2

before<1> b efore<2>

=

Delayed Branching Example

Page 22: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 22

Mux

0

bu bble (nop)lw $4, 50 ($ 7)

C lock 4

be q $1, $ 3, 7 su b $10 , . . . befo re <1>

PC In struc tionm em ory

4

Re gisters

Signexten d

Mux

Mux

Con trol

EX

M

W B

M

WB

WB

Mux

Hazarddetec tion

un it

F orwa rd ingu nit

IF.F lu sh

IF/ID

M EM /W B

EX/M EM

ID /EX

76 72

76 7 2

$ 1

$ 3

10

7 6

ALUData

m em o ry

Mux

S hif tle f t 2

=

Example

Page 23: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 23

a . F rom befo re b. From ta rget c . From fall through

sub $ t4 , $ t5, $t6

add $s1 , $s2, $s3

if $s1 = 0 then

add $s1 , $s2 , $s3

if $s1 = 0 then

add $s1 , $s2 , $s3

if $s1 = 0 then

sub $t4 , $ t5 , $ t6add $s1 , $s2 , $s3

if $s1 = 0 the n

sub $t4, $t5 , $ t6

add $s1 , $s2 , $s3

if $s2 = 0 then

Becom esB ecom esBecom es

D e lay slot

D e lay slot

D elay s lo t

sub $ t4 , $ t5, $t6

if $s2 = 0 then

add $s1, $s2 , $s3

Compiler Optimization of Delays

Page 24: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 24

T aken

Taken

Taken

Taken

N ot taken

N ot taken

N ot taken

N o t taken

Predic t taken P red ic t taken

P red ic t not taken P red ic t not taken

Dynamic Branch Prediction Uses a history buffer to track branching instruction using their addresses

Simplest approach uses one-bit per branch instruction to indicate the last experience

Branching differently from prediction gets the history bit flipped

One-bit buffer still fails to predict at least twice in case of a loop

A two-bit buffer better

captures the history of

the branch instruction

Page 25: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 25

Exception/Interrupts 5 instructions executing in 5 stage pipeline

How to stop the pipeline? Restart? Who caused the interrupt?

Stage Problem interrupts occurring

IF Page fault on instruction fetch; memory-protection violation

ID Undefined or illegal op-code

EX Arithmetic exception

MEM Page fault on data fetch; memory-protection violation; memory error

Load with data page fault, Add with instruction page fault? Solution: interrupt vector/instruction, interrupt ASAP, restart everything

incomplete

Faults (within instr., can be restarted) Force trap instruction into IF disable writes till trap hits WB must save multiple PCs or PC + state

External Interrupts: Allow pipeline to drain Load PC with interrupt address

Page 26: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 26

PCIns truc tion

m em ory

4

R egisters

S ignextend

Mux

Mux

Mux

C ontrol

ALU

EX

M

W B

M

W B

W B

ID /EX

E X/M EM

ME M/W B

Mux

Datamemory

Mux

H azarddetec tion

un it

Forw ardingun it

IF .F lush

IF/ID

=

ExceptPC

40000040

0

Mux

0

Mux

0

Mux

ID.F lush EX.Flush

C ause

Sh iftleft 2

Supporting Exception Handling

Page 27: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 27

s lt $15, $6, $7lw $ 16, 50 ($7 ) ad d $1, $ 2, $1 or $13, . . . and $ 12, . . .

C lock 5

400 0004 0

0

0

0

01 0

10

0

0

1

58 54

54

1 2

$6

$7

15

50

$2

$1

$1

13 12

Datam em oryPC

4

Reg isters

S igne xten d

Mux

Mux

Mux

Co ntrol

AL U

EX

M

W B

M

W B

W B

ID/EX

E X/M EM

M EM /W B

Mux

Mux

H aza rdde tection

unit

F orward in gun it

IF .F lush

IF /ID

=

Exce ptPC

4 000 00 40

0

Mux

0

Mux

0

Mux

ID.Flush EX.Flush

Ca use

Sh iftle ft 2

In struc tionm em ory

Example for Exception Handling

Add instruction causes overflow

Page 28: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 28

bubble (nop )sw $ 25, 10 00($0) bu bble bubb le or $13, . . .

C lock 6

4000 0044

400 0004 4

13

0

0

0

000

000 0

00

1

1 3

D atam em ory

40 0000 40

PC

4

R eg isters

S igne xten d

Mux

Mux

Mux

AL U

EX

M

W B

M

W B

W B

ID /EX

E X/M EM

M EM /W B

Mux

Mux

H aza rdde tection

unit

F orw ard in gun it

IF .F lush

IF /ID

=

Ex ce ptPC

4 000 00 40

0

Mux

0

Mux

0

Mux

ID .Flus h EX.F lus h

C a us e

Sh iftle ft 2

In struc tionm em ory

C o ntrol

Example

Pipeline is drained after saving address of instruction First instruction of the handler is fetched

Page 29: Lecture’s Overview

Mohamed Younis CMCS 411, Computer Architecture 29

Conclusion Summary

Pipeline hazard detection• Data hazard • Control hazard

Handling hazard in the pipeline design• Data forwarding and branch prediction• Coordination between control and data hazard

Supporting exceptions• Pipeline flushing• Invoking exception handling routines

Next Lecture The basic of cache memory Measuring and improving cache performance

Reading assignment includes sections 6.4 to 6.8 in the text bookReading assignment includes sections 6.4 to 6.8 in the text book