00 Review 3 Pipeline

35
1 R e v i e w : C o m p u t e r O r g a n i z a t i o n  R e v i e w : C o m p u t e r O r g a n i z a t i o n  R e v i e w : C o m p u t e r O r g a n i z a t i o n  R e v i e w : C o m p u t e r O r g a n i z a t i o n  Pipelining Chansu Yu 2 [email protected] L a u n d r y E x a m p l e   Laundry Example  Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold  Washer takes 30 minutes  Dryer takes 30 minutes  “Fold er” takes 30 minutes  “Stasher” takes 30 minutes to put clothes into drawers A B C D

Transcript of 00 Review 3 Pipeline

Page 1: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 1/35

1

R e v i e w : C o m p u t e r O r g a n i z a t i o n  R e v i e w : C o m p u t e r O r g a n i z a t i o n  R e v i e w : C o m p u t e r O r g a n i z a t i o n  R e v i e w : C o m p u t e r O r g a n i z a t i o n  

Pipelining

Chansu Yu

[email protected]

L a u n d r y E x a m p l e  

  Laundry Example

  Ann, Brian, Cathy, Dave

each have one load of clothesto wash, dry, and fold

  Washer takes 30 minutes

  Dryer takes 30 minutes

  “Folder” takes 30 minutes

  “Stasher” takes 30 minutes

to put clothes into drawers

A B C D

Page 2: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 2/35

2

[email protected]

S e q u e n t i a l L a u n d r y  

Sequential laundry takes 8 hours for 4 loads

If they learned pipelining, how long would laundry take?

30

B

C

D

ATime 

30 30 3030 30 3030 30 30 3030 30 30 3030

6 PM 7 8 9 10 11 12 1 2 AM

[email protected]

F a s t e r L a u n d r y - P i p e l i n i n g  

Faster laundry takes 3.5 hours for 4 loads!

12 2 AM6 PM 7 8 9 10 11 1

Time 

B

C

D

A

3030 30 3030 30 30

Page 3: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 3/35

3

[email protected]

5 S t a g e s o f M I P S  

Step name

Action for R-type

instructions

Action for memory-reference

instructions

Action for

branches

Action for

jumps

Instruction fetch IR = Memory[PC]

PC = PC + 4

Instruction A = Reg [IR[25-21]]

decode/register fetch B = Reg [IR[20-16]]

ALUOut = PC + (sign-extend (IR[15-0]) << 2)

Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II

computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)

 jump completion

Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]

completion ALUOut or

Store: Memory [ALUOut] = B

Memory read completion Load: Reg[IR[20-16]] = MDR

[email protected]

S i n g l e c y c l e v s . M u l t i c y c l e  

Instructionfetch

Reg.read

ALUoperation

Reg.write

Memoryread

Instructionfetch

Reg.read

ALUoperation

Reg.write

Instructionfetch

Reg.read

ALUoperation

Reg.write

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Memoryread

Instructionfetch

Reg.read

ALUoperation

Reg.write

What are the advantages of multicycle implementation ?

What are the disadvantages of multicycle implementation ?

add 

load 

add 

load 

Page 4: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 4/35

4

[email protected]

M u l t i c y c l e v s . P i p e l i n e d  

Instructionfetch

Reg.read

ALUoperation

Reg.write

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Memoryread

Instructionfetch

Reg.read

ALUoperation

Reg.write

What are the advantages of pipelined implementation ?

What are the disadvantages of pipelined implementation ?

add 

Instructionfetch

Reg.read

ALUoperation

Reg.write

Memoryread

Instructionfetch

Reg.read

ALUoperation

Reg.write

load 

add 

load 

[email protected]

L e s s o n s f r o m P i p e l i n e d L a u n d r y  

 

Pipelining doesn’t help latency of single

task, it helps throughput of entire workload

 

Potential speedup = Number pipe stages

  Pipeline rate limited by slowest pipeline

stage

 

Unbalanced lengths of pipe stages reduces

speedup

 

Time to “fill” pipeline and time to “drain”

it reduces speedup

 

Multiple tasks operating simultaneously

using different resources – any

dependencies, any conflicts ???

6 PM 7 8 9

Time 

B

CD

A

3030 30 3030 30 30

Page 5: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 5/35

5

[email protected]

C a n p i p e l i n i n g g e t u s i n t o t r o u b l e ?  

  I

  n

  s

  t

  r

  u

  c

  t

  i

  o

  n

  s

T i m e S t e p ( C l o c k C y c l e )  

I F 

I D 

I F 

E X 

I D 

I F 

W B 

M E M 

E X 

M E M 

E X 

I D 

I F I D  

I F 

W B 

M E M 

E X 

I D 

W B 

M E M 

E X 

W B 

M E M 

W B 

 If any two stages use

the same resource, theremust be a conflict.

[email protected]

H a z a r d s  

 Hazard = when an instruction’s stage is unable to execute during the currentcycle.

  Can always resolve hazards by waiting

  pipeline control must detect the hazard

 

take action (or delay action) to resolve hazards

  I

  n

  s

  t

  r

  u

  c

  t

  i

  o

  n

  s

T i m e S t e p ( C l o c k C y c l e )  

I F 

I D 

I F 

E X 

I D 

I F 

W B 

M E M 

E X 

M E M 

E X 

I D 

I F I D  

W B 

M E M 

E X 

W B 

M E M 

W B 

S t a l l

I n s t r u c t i o n # 2 s t a g e 3  

u n a b l e t o c o n t i n u e .

Page 6: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 6/35

[email protected]

S t r u c t u r a l H a z a r d s  

A needed functional unit is busy executing a previous instruction

(Attempt to use the same resource two different ways at the same time)

  I

  n

  s

  t

  r

  u

  c

  t

  i

  o

  n

  s

T i m e S t e p ( C l o c k C y c l e )  

I F 

I D 

I F 

E X 

I D 

I F 

W B 

M E M 

E X 

M E M 

E X 

I D 

W B 

M E M 

W B 

8 9 

S t a l l S t a l l

E x a m p l e :  

– O u r s a m p l e M I P S p i p e l i n e h a s n o n e .  

– W h a t i f P C + 4 c o m p u t a t i o n u s e d m a i n A L U i n s t e a d o f s e p a r a t e

a d d e r ?  

[email protected]

C o n t r o l H a z a r d s  

While executing a previous branch, next instruction address

might not yet be known.

(attempt to make a decision before condition is evaluated)

  I

  n

  s

  t

  r

  u

  c

  t

  i

  o

  n

  s

T i m e S t e p ( C l o c k C y c l e )  

I F 

M E M 

B r a n c h t a r g e t  

I D 

S t a l l

C o m p u t e s b r a n c h t a r g e t a d d r e s s .

C o n d i t i o n a l

b r a n c h  

I F 

C a l c u l a t e s P C + 4 .

E X 

P e r f o r m s b r a n c h t e s t & s e t s P C t o t a r g e t .

S t a l l I D  

W B 

M E M E X 

W B 

7 8 

Page 7: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 7/35

[email protected]

D a t a H a z a r d s  

Needed data still being computed by previous instruction.

(attempt to use item before it is ready)

add $s3,$s1,$s2

T i m e S t e p ( C l o c k C y c l e )  

I F 

sw $s4,0($s3)

I D 

I F 

lw $s5,0($s3)

add $s7,$s5,$s6

E X 

I D 

I F 

I D 

I F 

W B 

I D 

W B 

E X 

S t a l l

M E M 

S t a l l

M E M 

E X 

S t a l l E X  

1 0 

M E M 

1 1 

W B 

1 2 

W B 

M E M 

S t a l l

[email protected]

P i p e l i n e d A p p r o a c h  

I F 

I D 

I F 

E X 

I D 

I F 

W B 

M E M 

E X 

M E M 

E X 

I D 

I F I D  

I F 

W B 

M E M 

E X 

I D 

W B 

M E M 

E X 

W B 

M E M 

W B 

B

C

D

A- Cycle time, No. stages

- Resource conflict 

Page 8: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 8/35

8

[email protected]

R e s o u r c e C o n f l i c t s ( r e v i s i t )  

Step name

Action for R-type

instructions

Action for memory-reference

instructions

Action for

branches

Action for

jumps

Instruction fetch IR = Memory[PC]

PC = PC + 4

Instruction A = Reg [IR[25-21]]

decode/register fetch B = Reg [IR[20-16]]

ALUOut = PC + (sign-extend (IR[15-0]) << 2)

Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II

computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)

 jump completion

Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]

completion ALUOut or

Store: Memory [ALUOut] = B

Memory read completion Load: Reg[IR[20-16]] = MDR

 ALU conflict

 Register file conflict (read or write)

 Memory

 conflict

16 [email protected]

B a s i c P i p e l i n e  

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Instruction

Mux

0

1

Add

PC

0Write

data

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Address

Datamemory

1

ALUresult

Mux

ALU

Zero

IF: Instruction fetch ID: Instruction decode/ 

register file read

EX: Execute/ 

address calculation

MEM: Memory access WB: Write back

 Instructions and data

 move generally from

left to right through

 the five stages as they

  complete execution

except two cases.

- WB stage

- PC selection

Page 9: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 9/35

9

17 [email protected]

B a s i c P i p e l i n e  

Step name

Action for R-type

instructions

Action for memory-reference

instructions

Action for

branches

Action for

jumps

Instruction fetch IR = Memory[PC]

PC = PC + 4

Instruction A = Reg [IR[25-21]]

decode/register fetch B = Reg [IR[20-16]]

ALUOut = PC + (sign-extend (IR[15-0]) << 2)

Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II

computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)

 jump completion

Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]

completion ALUOut or

Store: Memory [ALUOut] = B

Memory read completion Load: Reg[IR[20-16]] = MDR

Why do we still need 2 ALUs at EX stage?(one for A-B and the other for PC+IR)

Why move ?? ZF is available during EX 

stage, anyway.

[email protected]

P i p e l i n e d D a t a p a t h  

For store instruction,

(?) => ID/EX pipeline register => EX/MEM pipeline register => (?)

Instruction

memory

Address

4

32

0

AddAdd

result

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1

Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALU

Zero

ID/EX

Datamemory

Address

 Add to the basic pipeline

in order to actually split

 the datapath into stages.

The info. must be placed 

in a pipeline register;

 otherwise, it is lost when

 the next instruction

enters that pipeline stage.

Page 10: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 10/35

10

[email protected]

C o n t e n t o f P i p e l i n e R e g i s t e r s  

  Which data should be passed through stages? I.e.,what are the contents of pipeline registers?

  In IF/ID pipeline register  PC (32), Inst. (32)

  In ID/EX pipeline register  PC (32), Reg. data 1 (32), Reg. data 2 (32), Offset (32), Reg. no. 2

and 3 (10)

  In EX/MEM pipeline register  PC (32), ZF (1), ALUOut (32), Reg. data 2 (32), Reg. no. (5)

  In MEM/WB pipeline register  Memory data (32), ALUOut (32), Reg. no. (5)

[email protected]

E x a m p l e  

  Five instructions go through the MIPS pipeline:

lw $10, 20($1) 100011 00001 01010 0000 0000 0001 0100 (8c2a 0014)

sub$11, $2, $3 010000 00010 00011 01011 00000 100100 (4043 5824)

and$12, $4, $5 010000 00100 00101 01100 00000 100110 (4085 6026)

or $13, $6, $7 010000 00110 00111 01101 00000 100111 (40c7 6827)

add$14, $8, $9 010000 01000 01001 01110 00000 100000 (4109 7020)

$pc = 0000 0000 5000 0000 [0000 0000 0000 1000] = 0000 1000 0000 0000$1 = 0000 0000 0000 1000 [0000 0000 0000 1004] = 0000 1004 0000 0000

... .....

$9 = 0000 0000 0000 9000

Page 11: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 11/35

11

[email protected]

[email protected]

Page 12: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 12/35

12

[email protected]

[email protected]

Page 13: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 13/35

13

[email protected]

26 [email protected]

Page 14: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 14/35

14

27 [email protected]

F i v e i n s t r u c t i o n s g o t h r o u g h

t h e M I P S p i p e l i n e  

lw $10, 20($1) 100011 00001 01010 0000 0000 0001 0100 (8c2a 0014)

sub$11, $2, $3 010000 00010 00011 01011 00000 100100 (4043 5824)

and$12, $4, $5 010000 00100 00101 01100 00000 100110 (4085 6026)

or $13, $6, $7 010000 00110 00111 01101 00000 100111 (40c7 6827)

add$14, $8, $9 010000 01000 01001 01110 00000 100000 (4109 7020)

Register contents Memory contents

$pc = 0000 0000 5000 0000 [0000 0000 0000 1000] = 0000 1000 0000 0000

$1 = 0000 0000 0000 1000 [0000 0000 0000 1004] = 0000 1004 0000 0000

... .....

$9 = 0000 0000 0000 9000

[email protected]

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

add $14, $8, $9 lw $10, 20($1)sub $11, $2, $3and $12, $4, $5or $13, $6, $7

(d)

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(i)

(k)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

Page 15: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 15/35

15

[email protected]

9 C o n t r o l S i g n a l s  

Instruction RegDst ALUSrc

Memto-

Re

Reg

Write

Mem

Read

Mem

Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0sw X 1 X 0 0 1 0 0 0beq X 0 X 0 0 0 1 0 1

4 multiplexor selectors

000000

100011101011000100

(PCSrc)

3 write signals 2 ALU signals

Q1: In which stage is the control circuit?Q2: EX stage executes “and” and WB stage executes “lw”

 Is MemtoReg 1 or 0?

[email protected]

  Generate control signals all at once at ID stage

  And passed them through stages just like the data

P i p e l i n e C o n t r o l  

Execution/Address Calculation

stage control lines

Memory access stage

control lines

 

stage control

lines

Instruction

Reg

Dst

ALU

O 1

ALU

O 0

ALU

Src Branch

Mem

Read

Mem

Write

Reg

write

Mem to

Re

R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X

Control

EX

M

WB

M

WB

WB

IF/ID ID/EX EX/MEM MEM/WB

Instruction

Page 16: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 16/35

16 

[email protected]

D a t a p a t h w i t h C o n t r o l  

PC

Instructionmemory

    I   n   s    t   r   u   c    t    i   o   n

Add

Instruction[20–16]

    M   e   m    t   o    R   e   g

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15–0]

0

0

Mux

0

1

AddAdd

result

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

    R   e   g    W   r    i    t   e

MemRead

Control

ALU

Instruction[15–11]

6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

    M   e   m    W   r    i    t   e

Address

Datamemory

Address

[email protected]

G r a p h i c a l l y R e p r e s e n t i n g

P i p e l i n e s  

  Can help with answering questions like:

 

how many cycles does it take to execute this code?  what is the ALU doing during cycle 4?

  use this representation to help understand datapaths

IM Reg DM Reg

IM Reg DM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $10, 20($1)

Programexecutionorder(in instructions)

sub $11, $2, $3

ALU

ALU

Page 17: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 17/35

17 

[email protected]

D a t a H a z a r d s  

  Needed data still being computed by previousinstruction

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

 Assume $1=10,

$2=10, $3=30

[email protected]

  Problem with starting next instruction before first is finished

 

dependencies that “go backward in time” are data hazards

D a t a H a z a r d s : D e p e n d e n c i e s  

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecutionorder(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value ofregister $2:

DM Reg

Reg

Reg

Reg

DM

“and” has a problem

“or” has a problem

“add” ???

“sw” is OK 

Page 18: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 18/35

18

[email protected]

D a t a H a z a r d s : F o r w a r d i n g  

sub $2,$1,$3I F 

and $12,$2,$5

I D 

I F E X  I D 

W B E X 

S t a l l

M E M 

S t a l l M E M W B  

W h i l e r e s u l t n o t w r i t t e n b a c k u n t i l W B :  

sub $2,$1,$3I F 

and $12,$2,$5

I D 

I F E X  I D 

W B E X M E M  

M E M W B  

I t i s c a l c u l a t e d e a r l i e r – i n E X :  

A d d f o r w a r d i n g h a r d w a r e t o a l l o w , e . g . , E X ’ s o u t p u t ( l o c a t e d i n E X / M E M

p i p e l i n e r e g i s t e r ) t o b e E X ’ s i n p u t .  

 Actually available

after EX stage (not WB)

 Actually needed 

at EX stage (not ID)

36 [email protected]

F o r w a r d i n g : A l l 2 C a s e s  

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecution order(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2 :

DM Reg

Reg

Reg

Reg

X X X – 20 X X X X XValue of EX/MEM :

X X X X – 20 X X X XValue of MEM/WB :

DM

“and” has a problem

-> fixed 

“or” has a problem

-> fixed 

“add” ??? -> OK 

“sw” is OK 

Page 19: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 19/35

19

37 [email protected]

D a t a H a z a r d s ( a g a i n )  

  Needed data still being computed by previousinstruction

sub $11, $3, $2

and $12, $11, $4

or $13, $6, $11

add $14, $8, $9

sw $15, 100($2)

[email protected]

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

(d)

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(i)

(k)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

sub $11, $3, $2

Page 20: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 20/35

20

[email protected]

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

sub $11, $3, $2

Rs=3

(a)

(c)

(b)

(f)

(g)

(h)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

and $12, $11, $4

Rd=11

$Rs=300

[email protected]

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

or $13, $6, $11

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

sub $11, $3, $2and $12, $11, $4

Rs=11

Rd=12

$Rs=???

???

???

Rd=11

Page 21: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 21/35

21

[email protected]

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

or $13, $6, $11

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

sub $11, $3, $2and $12, $11, $4

Rs=11

Rd=12

$Rs=1100

300

100

Rd=11

[email protected]

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

add $14, $8, $9

(d)

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(i)

(k)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

or $13, $6, $11 sub $11, $3, $2and $12, $11, $4

???

Rd=12 Rd=11

???

Page 22: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 22/35

22

[email protected]

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

add $14, $8, $9

(d)

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(i)

(k)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

or $13, $6, $11 sub $11, $3, $2and $12, $11, $4

100

Rd=12 Rd=11

100

[email protected]

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

sw $15, 100($2)

(d)

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(i)

(k)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

add $14, $8, $9 or $13, $6, $11 sub $11, $3, $2and $12, $11, $4

???

Rd=11

???

Page 23: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 23/35

23

[email protected]

F o r w a r d i n g : I m p l e m e n t a t i o n  

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16

Signextend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALUZero

ID/EX

 Additional datapath

 for forwarding ?

 How to control the

 forwarding datapth ?

46 [email protected]

F o r w a r d i n g : I m p l e m e n t a t i o n  

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Write

data

Readdata

Datamemory

1

ALUresult

Mux

ALUZero

ID/EX

 Additional datapath

 for forwarding ?

 How to control the

 forwarding datapth ?

Page 24: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 24/35

24

47 [email protected]

F o r w a r d i n g : F o r w a r d i n g U n i t  

PCInstructionmemory

Registers

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Forwardingunit

IF/ID

    I   n   s    t   r   u   c    t    i   o   n

Mu

xRd

EX/MEM.RegisterRd

MEM/WB.RegisterRd

Rt

Rt

Rs

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

Forwarding unit:

6-input, 2-output 

combinational circuit  HW#1, (5)

[email protected]

F o r w a r d i n g C o n t r o l  

  Control logic

  ForwardA =

  10 if (EX/MEM.Rd = ID/EX.Rs) <- get operand from EX/MEM

  01 if (MEM/WB.Rd = ID/EX.Rs) <- get operand from MEM/WB

  00, otherwise <- get operand from ID/EX

  ForwardB =

  10 if (EX/MEM.Rd = ID/EX.Rt) <- get operand from EX/MEM

  01 if (MEM/WB.Rd = ID/EX.Rt) <- get operand from MEM/WB

  00, otherwise <- get operand from ID/EX

Page 25: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 25/35

25

[email protected]

F o r w a r d i n g C o n t r o l C i r c u i t  

  ForwardA =  10 if ((EX/MEM.Rd = ID/EX.Rs) && EX/MEM.RegWrite &&

(EX/MEM.Rd ≠ 0))

  01 if ((MEM/WB.Rd = ID/EX.Rs) && MEM/WB.RegWrite &&(MEM/WB.Rd ≠ 0) && (EX/MEM.Rd ≠ ID/EX.Rs))

  00, otherwise

  ForwardB =  10 if ((EX/MEM.Rd = ID/EX.Rt) && EX/MEM.RegWrite &&

(EX/MEM.Rd ≠ 0))

  01 if ((MEM/WB.Rd = ID/EX.Rt) && MEM/WB.RegWrite &&(MEM/WB.Rd ≠ 0) && (EX/MEM.Rd ≠ ID/EX.Rt)))

  00, otherwise

[email protected]

D a t a H a z a r d s : A l l C o n s i d e r e d ? ? ?  

lw $s5,0($s4)

add $s7,$s5,$s6

I D 

I F 

I F 

I D 

W B E X 

E X M E M W B  

M E M 

S t a l l

… e s p e c i a l l y w h e n w e r e m e m b e r t h a t m e m o r y a c c e s s i s r e a l l y o f t e n

m u c h l o n g e r t h a n a s i n g l e c y c l e :  

S t a l l S t a l l

… b u t i t d o e s n ’ t e l i m i n a t e a l l d a t a h a z a r d s :  

lw $s5,0($s4)

add $s7,$s5,$s6

I D 

I F 

I F 

I D 

W B E X 

E X M E M W B  

M E M 

S t a l l

Page 26: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 26/35

26 

[email protected]

D a t a H a z a r d s : S t a l l i n g  

  Stall the pipeline by keeping an instruction in the same stage

lw$2, 20($1)

Programexecutionorder(in instructions)

and $4, $2,$5

or$8,$2,$6

add $9, $4,$2

slt $1,$6,$7

Reg

IM

Reg

Reg

IM DM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (inclock cycles)

IM Reg DM RegIM

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9 CC 10

DM Reg

RegReg

Reg

bubble

lw-and 

lw-or 

 At CC5, MEM stage is empty !!!

[email protected]

D a t a H a z a r d s : S t a l l i n g  

  Stalling detection and control

  Detects during the ID stage when “lw” instruction is in EXstage

  The following two instructions are in ID (“and”) and IF (“or”)

stages, respectively

  If detected,

  Stall the following instruction (in ID stage, “and”) so that it repeats

the ID stage again => IF/ID pipeline register should not bechanged

  Stall the second instruction (in IF stage, “or”) so that it repeats theIF stage again => PC should not be changed

Page 27: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 27/35

27 

[email protected]

D a t a H a z a r d s : S t a l l i n g  

  Hazard detection  If (ID/EX.MemRead and

((ID/EX.Rt = IF/ID.Rs) or (ID/EX.Rt = IF/ID.Rt)) stall the pipeline

  Control signals generated from hazard detection unit  IF/IDWrite to prevent IF/ID register from changing

  PCWrite to prevent PC from changing

  MUX control to delay forwarding control signals (pass “null” signals)

lw

[email protected]

S t a l l i n g : D e t e c t i o n U n i t  

  Stall by letting an instruction that won’t write anything goforward

PCInstruction

memory

Registers

M

ux

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

0

Mux

IF/ID

    I   n   s    t   r   u   c    t    i   o   n

ID/EX.MemRead

    I    F    /    I    D    W   r    i    t   e

    P    C    W   r    i    t   e

ID/EX.RegisterRt

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

Rt

Rs

Rd

RtEX/MEM.RegisterRd

MEM/WB.RegisterRd Hazard detection unit:

4-input, 3-output 

combinational circuit 

Page 28: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 28/35

28

[email protected]

S t a l l i n g : W h a t h a p p e n i n t h e p i p l e i n e ?  

lw $s5,0($s4)

add $s7,$s5,$s6

I D 

I F 

I F 

I D 

W B 

E X 

E X M E M W B  

M E M 

S t a l l

( I D )  

I F I D E X M E M W B  

S t a l l

( I F )  

I F I D E X M E M W B  

I D I F W B  E X M E M  

I D I F W B  E X M E M  

I D I F W B  E X M E M  

 No EX stage

 No MEM stage

I F I D E X M E M W B  

 No WB stage

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11 CC12

 ID stage is repeated at CC7 

<- IF/ID.Write

 IF stage is repeated at CC7 

<- PCWrite

 No EX at CC7, no MEM at CC8

and no WB at CC9

<- zero control signals

56 [email protected]

B r a n c h ( C o n t r o l ) H a z a r d s  

While executing a previous branch, next instruction address

might not yet be known.

  I

  n

  s

  t

  r

  u

  c

  t

  i

  o

  n

  s

T i m e S t e p ( C l o c k C y c l e )  

I F 

M E M 

B r a n c h t a r g e t  

I D 

S t a l l

C o m p u t e s b r a n c h t a r g e t a d d r e s s .

C o n d i t i o n a l

b r a n c h  

I F 

C a l c u l a t e s P C + 4 .

E X 

P e r f o r m s b r a n c h t e s t & s e t s P C t o t a r g e t  

S t a l l I D  

W B 

M E M E X 

W B 

7 8 

Page 29: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 29/35

29

57 [email protected]

B r a n c h ( C o n t r o l ) H a z a r d s  

[email protected]

  We can stall the pipeline for every branch instruction

  Too slow (3 instructions)

  Or, continue execution down the sequential instruction stream

assuming that the branch will not be taken (predict “branch

not taken”)

  If the condition is not met, OK ! (prediction is successful)

  If the condition is met, (prediction is wrong)

 

Some unwanted instructions are in the pipeline!

  Need to “flush” instructions

  How do you compare the above two ?

  If branches are taken half the time, and if it costs little to discard the

instructions, the second approach halves the cost of control hazards

B r a n c h H a z a r d s  

Page 30: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 30/35

30

[email protected]

S t a l l i n g : W h a t h a p p e n i n t h e p i p l e i n e ?  

 beq $1,$2, 7

add $3,$4,$5

I D 

I F 

I F 

W B 

E X M E M  

N u l l

( I D )  

I F I D E X M E M W B  

 ID stage executes

a null instruction

(sll $0,$0,$0) at CC3

CC1 CC2 CC3 CC4 CC5 CC6 CC7 

“target of beq”

N u l l

( E X )  

N u l l

( M E M )  

N u l l

( W B )  

 EX stage executes

a null instruction

(sll $0,$0,$0) at 

CC4

 MEM stage executes

a null instruction

(sll $0,$0,$0) at CC5

WB stage executes

a null instruction

(sll $0,$0,$0) at 

CC6 

 IF.Flush at CC3 will do.

• A new control signalIF.Flush is introduced

to flush the instructionin IF stage

• It zeros theinstruction field of the

IF/ID pipeline register,

which in fact can be

decoded as “sll $0, $0,$0”

• In fact, “nop” = “sll

$0, $0, $0”

[email protected]

B r a n c h H a z a r d s : B r a n c h D e l a y

S l o t s  

While determining next instruction address, go ahead and

execute sequentially following instruction(s).

  I

  n

  s

  t

  r

  u

  c

  t

  i

  o

  n

  s

T i m e S t e p ( C l o c k C y c l e )  

C o n d i t i o n a l

b r a n c h  

I F 

W B 

M E M 

E X 

M E M 

E X 

I D 

W B 

M E M 

W B 

B r a n c h d e l a y  

I D 

I F 

C o m p u t e s b r a n c h t a r g e t a d d r e s s .

P e r f o r m s b r a n c h t e s t & s e t s P C t o t a r g e t .

B r a n c h t a r g e t  

E X 

I D 

I F 

F e t c h e s c o r r e c t t a r g e t .

Page 31: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 31/35

31

[email protected]

B r a n c h H a z a r d s : B r a n c h D e l a y

S l o t s  

  Advantage:

  Can avoid one stall per delay slot.

  Disadvantages:

  Makes assembly-language programming more difficult.

  Can be difficult to find appropriate code for slot.

  Exposes implementation detail that could change.

  Later implementations without a stall must still emulate slot.

  Most modern processors avoid

[email protected]

B r a n c h H a z a r d s : B r a n c h

P r e d i c t i o n  

Guess which instruction is next, & start executing it.

  What if guess is wrong? : Flush the pipeline

  Simplest guesses: Always Taken or Never Taken.

 

When to do prediction?  Static prediction: compiler

  Dynamic prediction: processor

Page 32: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 32/35

32

[email protected]

D y n a m i c B r a n c h P r e d i c t i o n  

  Branch prediction buffer (branch history table)

  A small memory that is indexed by the lower portion of the

address of the branch instruction and that contains one or

more bits indicating whether the branch was recently taken

or not.

PC  Instruction

memory

 BPB

 Instruction

Prediction (T or NT)

 IF/ID

[email protected]

D y n a m i c B r a n c h P r e d i c t i o n  

  1-bit predictor

  Prediction accuracy

------ loop 10 times => 1st: ?, 2nd: correct, 3rd: correct,

beq 9th: correct, 10th: incorrect => 80% accuracy

Predict taken

Predict 

not taken

N (Not taken)

T (Taken)T NT

(Because the first one is incorrect in

the second execution of the same code.)

Page 33: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 33/35

33

[email protected]

E x c e p t i o n s  

  Another form of control hazard involves exceptions.

  When an arithmetic overflow occurs during executing “add$1, $2, $1”

  Transfer control to the exception routine (0x4000 0040)  This is the same as executing a branch instruction

  Necessary actions are  Stop executing the current instruction and start the exception routine.  Following instructions already in the pipe must be wiped out (flush

pipeline registers).  Return to the offending instruction.

66 [email protected]

F l u s h C o n t r o l S i g n a l s  

  Similar to the taken-branch, we need to flush pipelineregisters. Question is which pipeline register(s)?

  Arithmetic overflow is detected at the end of EX stage.

  And thus flushing takes place at MEM stage (at the next cycle).

  Since three following instructions are already in the pipeline (IF, ID

and EX stages), we need to flush those three instructions.

  Therefore, we need ID.Flush and EX.Flush in addition to IF.Flushcontrol signal.

Page 34: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 34/35

34

67 [email protected]

 

PCInstruction

memory

4

Registers

Signextend

Mux

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Mux

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

IF.Flush

IF/ID

=

ExceptPC

40000040

0

Mux

0

Mux

0

Mux

ID.Flush EX.Flush

Cause

Shiftleft 2

For the instruction in IF stage For the instruction

in ID stage For the instruction

in ID stage

OF

[email protected]

C h a l l e n g e s  

  What if more than one instruction generates

exceptions?

  While “add” causes an overflow exception at CC5 in EX,

  another causes an invalid opcode exception at CC5 in IF

  It is not OK to generate all flushing signals.

  And, how does the exception service routine

correctly identify the instruction that causes theexception? => Imprecise exception

Page 35: 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 35/35

[email protected]

P r e c i s e a n d I m p r e c i s e E x c e p t i o n s  

  Precise exceptions  Hardware (CPU) correctly identifies the offending instruction.

  And makes sure all prior instructions complete.

  All instructions following it are not allowed to complete theirexecution and have not modified the process state

  Imprecise exception  Hardware does not guarantee it and leaves it up to the operating system

to determine which instruction caused the problem.

  Some instructions following the offending instruction are allowed tocompleted their execution and modified the process state.

  Most of modern CPUs support  Precise exceptions