00 Review 3 Pipeline

Post on 06-Apr-2018

219 views 0 download

Transcript of 00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 1/35

1

R e v i e w : C o m p u t e r O r g a n i z a t i o n  R e v i e w : C o m p u t e r O r g a n i z a t i o n  R e v i e w : C o m p u t e r O r g a n i z a t i o n  R e v i e w : C o m p u t e r O r g a n i z a t i o n  

Pipelining

Chansu Yu

2c.yu91@csuohio.edu

L a u n d r y E x a m p l e  

  Laundry Example

  Ann, Brian, Cathy, Dave

each have one load of clothesto wash, dry, and fold

  Washer takes 30 minutes

  Dryer takes 30 minutes

  “Folder” takes 30 minutes

  “Stasher” takes 30 minutes

to put clothes into drawers

A B C D

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 2/35

2

3c.yu91@csuohio.edu

S e q u e n t i a l L a u n d r y  

Sequential laundry takes 8 hours for 4 loads

If they learned pipelining, how long would laundry take?

30

B

C

D

ATime 

30 30 3030 30 3030 30 30 3030 30 30 3030

6 PM 7 8 9 10 11 12 1 2 AM

4c.yu91@csuohio.edu

F a s t e r L a u n d r y - P i p e l i n i n g  

Faster laundry takes 3.5 hours for 4 loads!

12 2 AM6 PM 7 8 9 10 11 1

Time 

B

C

D

A

3030 30 3030 30 30

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 3/35

3

5c.yu91@csuohio.edu

5 S t a g e s o f M I P S  

Step name

Action for R-type

instructions

Action for memory-reference

instructions

Action for

branches

Action for

jumps

Instruction fetch IR = Memory[PC]

PC = PC + 4

Instruction A = Reg [IR[25-21]]

decode/register fetch B = Reg [IR[20-16]]

ALUOut = PC + (sign-extend (IR[15-0]) << 2)

Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II

computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)

 jump completion

Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]

completion ALUOut or

Store: Memory [ALUOut] = B

Memory read completion Load: Reg[IR[20-16]] = MDR

6 c.yu91@csuohio.edu

S i n g l e c y c l e v s . M u l t i c y c l e  

Instructionfetch

Reg.read

ALUoperation

Reg.write

Memoryread

Instructionfetch

Reg.read

ALUoperation

Reg.write

Instructionfetch

Reg.read

ALUoperation

Reg.write

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Memoryread

Instructionfetch

Reg.read

ALUoperation

Reg.write

What are the advantages of multicycle implementation ?

What are the disadvantages of multicycle implementation ?

add 

load 

add 

load 

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 4/35

4

7 c.yu91@csuohio.edu

M u l t i c y c l e v s . P i p e l i n e d  

Instructionfetch

Reg.read

ALUoperation

Reg.write

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Memoryread

Instructionfetch

Reg.read

ALUoperation

Reg.write

What are the advantages of pipelined implementation ?

What are the disadvantages of pipelined implementation ?

add 

Instructionfetch

Reg.read

ALUoperation

Reg.write

Memoryread

Instructionfetch

Reg.read

ALUoperation

Reg.write

load 

add 

load 

8c.yu91@csuohio.edu

L e s s o n s f r o m P i p e l i n e d L a u n d r y  

 

Pipelining doesn’t help latency of single

task, it helps throughput of entire workload

 

Potential speedup = Number pipe stages

  Pipeline rate limited by slowest pipeline

stage

 

Unbalanced lengths of pipe stages reduces

speedup

 

Time to “fill” pipeline and time to “drain”

it reduces speedup

 

Multiple tasks operating simultaneously

using different resources – any

dependencies, any conflicts ???

6 PM 7 8 9

Time 

B

CD

A

3030 30 3030 30 30

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 5/35

5

9c.yu91@csuohio.edu

C a n p i p e l i n i n g g e t u s i n t o t r o u b l e ?  

  I

  n

  s

  t

  r

  u

  c

  t

  i

  o

  n

  s

T i m e S t e p ( C l o c k C y c l e )  

I F 

I D 

I F 

E X 

I D 

I F 

W B 

M E M 

E X 

M E M 

E X 

I D 

I F I D  

I F 

W B 

M E M 

E X 

I D 

W B 

M E M 

E X 

W B 

M E M 

W B 

 If any two stages use

the same resource, theremust be a conflict.

10c.yu91@csuohio.edu

H a z a r d s  

 Hazard = when an instruction’s stage is unable to execute during the currentcycle.

  Can always resolve hazards by waiting

  pipeline control must detect the hazard

 

take action (or delay action) to resolve hazards

  I

  n

  s

  t

  r

  u

  c

  t

  i

  o

  n

  s

T i m e S t e p ( C l o c k C y c l e )  

I F 

I D 

I F 

E X 

I D 

I F 

W B 

M E M 

E X 

M E M 

E X 

I D 

I F I D  

W B 

M E M 

E X 

W B 

M E M 

W B 

S t a l l

I n s t r u c t i o n # 2 s t a g e 3  

u n a b l e t o c o n t i n u e .

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 6/35

11c.yu91@csuohio.edu

S t r u c t u r a l H a z a r d s  

A needed functional unit is busy executing a previous instruction

(Attempt to use the same resource two different ways at the same time)

  I

  n

  s

  t

  r

  u

  c

  t

  i

  o

  n

  s

T i m e S t e p ( C l o c k C y c l e )  

I F 

I D 

I F 

E X 

I D 

I F 

W B 

M E M 

E X 

M E M 

E X 

I D 

W B 

M E M 

W B 

8 9 

S t a l l S t a l l

E x a m p l e :  

– O u r s a m p l e M I P S p i p e l i n e h a s n o n e .  

– W h a t i f P C + 4 c o m p u t a t i o n u s e d m a i n A L U i n s t e a d o f s e p a r a t e

a d d e r ?  

12c.yu91@csuohio.edu

C o n t r o l H a z a r d s  

While executing a previous branch, next instruction address

might not yet be known.

(attempt to make a decision before condition is evaluated)

  I

  n

  s

  t

  r

  u

  c

  t

  i

  o

  n

  s

T i m e S t e p ( C l o c k C y c l e )  

I F 

M E M 

B r a n c h t a r g e t  

I D 

S t a l l

C o m p u t e s b r a n c h t a r g e t a d d r e s s .

C o n d i t i o n a l

b r a n c h  

I F 

C a l c u l a t e s P C + 4 .

E X 

P e r f o r m s b r a n c h t e s t & s e t s P C t o t a r g e t .

S t a l l I D  

W B 

M E M E X 

W B 

7 8 

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 7/35

13c.yu91@csuohio.edu

D a t a H a z a r d s  

Needed data still being computed by previous instruction.

(attempt to use item before it is ready)

add $s3,$s1,$s2

T i m e S t e p ( C l o c k C y c l e )  

I F 

sw $s4,0($s3)

I D 

I F 

lw $s5,0($s3)

add $s7,$s5,$s6

E X 

I D 

I F 

I D 

I F 

W B 

I D 

W B 

E X 

S t a l l

M E M 

S t a l l

M E M 

E X 

S t a l l E X  

1 0 

M E M 

1 1 

W B 

1 2 

W B 

M E M 

S t a l l

14c.yu91@csuohio.edu

P i p e l i n e d A p p r o a c h  

I F 

I D 

I F 

E X 

I D 

I F 

W B 

M E M 

E X 

M E M 

E X 

I D 

I F I D  

I F 

W B 

M E M 

E X 

I D 

W B 

M E M 

E X 

W B 

M E M 

W B 

B

C

D

A- Cycle time, No. stages

- Resource conflict 

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 8/35

8

15c.yu91@csuohio.edu

R e s o u r c e C o n f l i c t s ( r e v i s i t )  

Step name

Action for R-type

instructions

Action for memory-reference

instructions

Action for

branches

Action for

jumps

Instruction fetch IR = Memory[PC]

PC = PC + 4

Instruction A = Reg [IR[25-21]]

decode/register fetch B = Reg [IR[20-16]]

ALUOut = PC + (sign-extend (IR[15-0]) << 2)

Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II

computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)

 jump completion

Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]

completion ALUOut or

Store: Memory [ALUOut] = B

Memory read completion Load: Reg[IR[20-16]] = MDR

 ALU conflict

 Register file conflict (read or write)

 Memory

 conflict

16 c.yu91@csuohio.edu

B a s i c P i p e l i n e  

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Instruction

Mux

0

1

Add

PC

0Write

data

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Address

Datamemory

1

ALUresult

Mux

ALU

Zero

IF: Instruction fetch ID: Instruction decode/ 

register file read

EX: Execute/ 

address calculation

MEM: Memory access WB: Write back

 Instructions and data

 move generally from

left to right through

 the five stages as they

  complete execution

except two cases.

- WB stage

- PC selection

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 9/35

9

17 c.yu91@csuohio.edu

B a s i c P i p e l i n e  

Step name

Action for R-type

instructions

Action for memory-reference

instructions

Action for

branches

Action for

jumps

Instruction fetch IR = Memory[PC]

PC = PC + 4

Instruction A = Reg [IR[25-21]]

decode/register fetch B = Reg [IR[20-16]]

ALUOut = PC + (sign-extend (IR[15-0]) << 2)

Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II

computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)

 jump completion

Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]

completion ALUOut or

Store: Memory [ALUOut] = B

Memory read completion Load: Reg[IR[20-16]] = MDR

Why do we still need 2 ALUs at EX stage?(one for A-B and the other for PC+IR)

Why move ?? ZF is available during EX 

stage, anyway.

18c.yu91@csuohio.edu

P i p e l i n e d D a t a p a t h  

For store instruction,

(?) => ID/EX pipeline register => EX/MEM pipeline register => (?)

Instruction

memory

Address

4

32

0

AddAdd

result

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1

Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALU

Zero

ID/EX

Datamemory

Address

 Add to the basic pipeline

in order to actually split

 the datapath into stages.

The info. must be placed 

in a pipeline register;

 otherwise, it is lost when

 the next instruction

enters that pipeline stage.

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 10/35

10

19c.yu91@csuohio.edu

C o n t e n t o f P i p e l i n e R e g i s t e r s  

  Which data should be passed through stages? I.e.,what are the contents of pipeline registers?

  In IF/ID pipeline register  PC (32), Inst. (32)

  In ID/EX pipeline register  PC (32), Reg. data 1 (32), Reg. data 2 (32), Offset (32), Reg. no. 2

and 3 (10)

  In EX/MEM pipeline register  PC (32), ZF (1), ALUOut (32), Reg. data 2 (32), Reg. no. (5)

  In MEM/WB pipeline register  Memory data (32), ALUOut (32), Reg. no. (5)

20c.yu91@csuohio.edu

E x a m p l e  

  Five instructions go through the MIPS pipeline:

lw $10, 20($1) 100011 00001 01010 0000 0000 0001 0100 (8c2a 0014)

sub$11, $2, $3 010000 00010 00011 01011 00000 100100 (4043 5824)

and$12, $4, $5 010000 00100 00101 01100 00000 100110 (4085 6026)

or $13, $6, $7 010000 00110 00111 01101 00000 100111 (40c7 6827)

add$14, $8, $9 010000 01000 01001 01110 00000 100000 (4109 7020)

$pc = 0000 0000 5000 0000 [0000 0000 0000 1000] = 0000 1000 0000 0000$1 = 0000 0000 0000 1000 [0000 0000 0000 1004] = 0000 1004 0000 0000

... .....

$9 = 0000 0000 0000 9000

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 11/35

11

21c.yu91@csuohio.edu

22c.yu91@csuohio.edu

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 12/35

12

23c.yu91@csuohio.edu

24c.yu91@csuohio.edu

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 13/35

13

25c.yu91@csuohio.edu

26 c.yu91@csuohio.edu

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 14/35

14

27 c.yu91@csuohio.edu

F i v e i n s t r u c t i o n s g o t h r o u g h

t h e M I P S p i p e l i n e  

lw $10, 20($1) 100011 00001 01010 0000 0000 0001 0100 (8c2a 0014)

sub$11, $2, $3 010000 00010 00011 01011 00000 100100 (4043 5824)

and$12, $4, $5 010000 00100 00101 01100 00000 100110 (4085 6026)

or $13, $6, $7 010000 00110 00111 01101 00000 100111 (40c7 6827)

add$14, $8, $9 010000 01000 01001 01110 00000 100000 (4109 7020)

Register contents Memory contents

$pc = 0000 0000 5000 0000 [0000 0000 0000 1000] = 0000 1000 0000 0000

$1 = 0000 0000 0000 1000 [0000 0000 0000 1004] = 0000 1004 0000 0000

... .....

$9 = 0000 0000 0000 9000

28c.yu91@csuohio.edu

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

add $14, $8, $9 lw $10, 20($1)sub $11, $2, $3and $12, $4, $5or $13, $6, $7

(d)

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(i)

(k)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 15/35

15

29c.yu91@csuohio.edu

9 C o n t r o l S i g n a l s  

Instruction RegDst ALUSrc

Memto-

Re

Reg

Write

Mem

Read

Mem

Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0sw X 1 X 0 0 1 0 0 0beq X 0 X 0 0 0 1 0 1

4 multiplexor selectors

000000

100011101011000100

(PCSrc)

3 write signals 2 ALU signals

Q1: In which stage is the control circuit?Q2: EX stage executes “and” and WB stage executes “lw”

 Is MemtoReg 1 or 0?

30c.yu91@csuohio.edu

  Generate control signals all at once at ID stage

  And passed them through stages just like the data

P i p e l i n e C o n t r o l  

Execution/Address Calculation

stage control lines

Memory access stage

control lines

 

stage control

lines

Instruction

Reg

Dst

ALU

O 1

ALU

O 0

ALU

Src Branch

Mem

Read

Mem

Write

Reg

write

Mem to

Re

R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X

Control

EX

M

WB

M

WB

WB

IF/ID ID/EX EX/MEM MEM/WB

Instruction

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 16/35

16 

31c.yu91@csuohio.edu

D a t a p a t h w i t h C o n t r o l  

PC

Instructionmemory

    I   n   s    t   r   u   c    t    i   o   n

Add

Instruction[20–16]

    M   e   m    t   o    R   e   g

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15–0]

0

0

Mux

0

1

AddAdd

result

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

    R   e   g    W   r    i    t   e

MemRead

Control

ALU

Instruction[15–11]

6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

    M   e   m    W   r    i    t   e

Address

Datamemory

Address

32c.yu91@csuohio.edu

G r a p h i c a l l y R e p r e s e n t i n g

P i p e l i n e s  

  Can help with answering questions like:

 

how many cycles does it take to execute this code?  what is the ALU doing during cycle 4?

  use this representation to help understand datapaths

IM Reg DM Reg

IM Reg DM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $10, 20($1)

Programexecutionorder(in instructions)

sub $11, $2, $3

ALU

ALU

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 17/35

17 

33c.yu91@csuohio.edu

D a t a H a z a r d s  

  Needed data still being computed by previousinstruction

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

 Assume $1=10,

$2=10, $3=30

34c.yu91@csuohio.edu

  Problem with starting next instruction before first is finished

 

dependencies that “go backward in time” are data hazards

D a t a H a z a r d s : D e p e n d e n c i e s  

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecutionorder(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value ofregister $2:

DM Reg

Reg

Reg

Reg

DM

“and” has a problem

“or” has a problem

“add” ???

“sw” is OK 

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 18/35

18

35c.yu91@csuohio.edu

D a t a H a z a r d s : F o r w a r d i n g  

sub $2,$1,$3I F 

and $12,$2,$5

I D 

I F E X  I D 

W B E X 

S t a l l

M E M 

S t a l l M E M W B  

W h i l e r e s u l t n o t w r i t t e n b a c k u n t i l W B :  

sub $2,$1,$3I F 

and $12,$2,$5

I D 

I F E X  I D 

W B E X M E M  

M E M W B  

I t i s c a l c u l a t e d e a r l i e r – i n E X :  

A d d f o r w a r d i n g h a r d w a r e t o a l l o w , e . g . , E X ’ s o u t p u t ( l o c a t e d i n E X / M E M

p i p e l i n e r e g i s t e r ) t o b e E X ’ s i n p u t .  

 Actually available

after EX stage (not WB)

 Actually needed 

at EX stage (not ID)

36 c.yu91@csuohio.edu

F o r w a r d i n g : A l l 2 C a s e s  

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecution order(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2 :

DM Reg

Reg

Reg

Reg

X X X – 20 X X X X XValue of EX/MEM :

X X X X – 20 X X X XValue of MEM/WB :

DM

“and” has a problem

-> fixed 

“or” has a problem

-> fixed 

“add” ??? -> OK 

“sw” is OK 

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 19/35

19

37 c.yu91@csuohio.edu

D a t a H a z a r d s ( a g a i n )  

  Needed data still being computed by previousinstruction

sub $11, $3, $2

and $12, $11, $4

or $13, $6, $11

add $14, $8, $9

sw $15, 100($2)

38c.yu91@csuohio.edu

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

(d)

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(i)

(k)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

sub $11, $3, $2

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 20/35

20

39c.yu91@csuohio.edu

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

sub $11, $3, $2

Rs=3

(a)

(c)

(b)

(f)

(g)

(h)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

and $12, $11, $4

Rd=11

$Rs=300

40c.yu91@csuohio.edu

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

or $13, $6, $11

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

sub $11, $3, $2and $12, $11, $4

Rs=11

Rd=12

$Rs=???

???

???

Rd=11

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 21/35

21

41c.yu91@csuohio.edu

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

or $13, $6, $11

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

sub $11, $3, $2and $12, $11, $4

Rs=11

Rd=12

$Rs=1100

300

100

Rd=11

42c.yu91@csuohio.edu

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

add $14, $8, $9

(d)

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(i)

(k)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

or $13, $6, $11 sub $11, $3, $2and $12, $11, $4

???

Rd=12 Rd=11

???

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 22/35

22

43c.yu91@csuohio.edu

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

add $14, $8, $9

(d)

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(i)

(k)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

or $13, $6, $11 sub $11, $3, $2and $12, $11, $4

100

Rd=12 Rd=11

100

44c.yu91@csuohio.edu

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

sw $15, 100($2)

(d)

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(i)

(k)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

add $14, $8, $9 or $13, $6, $11 sub $11, $3, $2and $12, $11, $4

???

Rd=11

???

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 23/35

23

45c.yu91@csuohio.edu

F o r w a r d i n g : I m p l e m e n t a t i o n  

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16

Signextend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALUZero

ID/EX

 Additional datapath

 for forwarding ?

 How to control the

 forwarding datapth ?

46 c.yu91@csuohio.edu

F o r w a r d i n g : I m p l e m e n t a t i o n  

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

    I   n   s    t   r   u   c    t    i   o   n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Write

data

Readdata

Datamemory

1

ALUresult

Mux

ALUZero

ID/EX

 Additional datapath

 for forwarding ?

 How to control the

 forwarding datapth ?

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 24/35

24

47 c.yu91@csuohio.edu

F o r w a r d i n g : F o r w a r d i n g U n i t  

PCInstructionmemory

Registers

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Forwardingunit

IF/ID

    I   n   s    t   r   u   c    t    i   o   n

Mu

xRd

EX/MEM.RegisterRd

MEM/WB.RegisterRd

Rt

Rt

Rs

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

Forwarding unit:

6-input, 2-output 

combinational circuit  HW#1, (5)

48c.yu91@csuohio.edu

F o r w a r d i n g C o n t r o l  

  Control logic

  ForwardA =

  10 if (EX/MEM.Rd = ID/EX.Rs) <- get operand from EX/MEM

  01 if (MEM/WB.Rd = ID/EX.Rs) <- get operand from MEM/WB

  00, otherwise <- get operand from ID/EX

  ForwardB =

  10 if (EX/MEM.Rd = ID/EX.Rt) <- get operand from EX/MEM

  01 if (MEM/WB.Rd = ID/EX.Rt) <- get operand from MEM/WB

  00, otherwise <- get operand from ID/EX

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 25/35

25

49c.yu91@csuohio.edu

F o r w a r d i n g C o n t r o l C i r c u i t  

  ForwardA =  10 if ((EX/MEM.Rd = ID/EX.Rs) && EX/MEM.RegWrite &&

(EX/MEM.Rd ≠ 0))

  01 if ((MEM/WB.Rd = ID/EX.Rs) && MEM/WB.RegWrite &&(MEM/WB.Rd ≠ 0) && (EX/MEM.Rd ≠ ID/EX.Rs))

  00, otherwise

  ForwardB =  10 if ((EX/MEM.Rd = ID/EX.Rt) && EX/MEM.RegWrite &&

(EX/MEM.Rd ≠ 0))

  01 if ((MEM/WB.Rd = ID/EX.Rt) && MEM/WB.RegWrite &&(MEM/WB.Rd ≠ 0) && (EX/MEM.Rd ≠ ID/EX.Rt)))

  00, otherwise

50c.yu91@csuohio.edu

D a t a H a z a r d s : A l l C o n s i d e r e d ? ? ?  

lw $s5,0($s4)

add $s7,$s5,$s6

I D 

I F 

I F 

I D 

W B E X 

E X M E M W B  

M E M 

S t a l l

… e s p e c i a l l y w h e n w e r e m e m b e r t h a t m e m o r y a c c e s s i s r e a l l y o f t e n

m u c h l o n g e r t h a n a s i n g l e c y c l e :  

S t a l l S t a l l

… b u t i t d o e s n ’ t e l i m i n a t e a l l d a t a h a z a r d s :  

lw $s5,0($s4)

add $s7,$s5,$s6

I D 

I F 

I F 

I D 

W B E X 

E X M E M W B  

M E M 

S t a l l

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 26/35

26 

51c.yu91@csuohio.edu

D a t a H a z a r d s : S t a l l i n g  

  Stall the pipeline by keeping an instruction in the same stage

lw$2, 20($1)

Programexecutionorder(in instructions)

and $4, $2,$5

or$8,$2,$6

add $9, $4,$2

slt $1,$6,$7

Reg

IM

Reg

Reg

IM DM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (inclock cycles)

IM Reg DM RegIM

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9 CC 10

DM Reg

RegReg

Reg

bubble

lw-and 

lw-or 

 At CC5, MEM stage is empty !!!

52c.yu91@csuohio.edu

D a t a H a z a r d s : S t a l l i n g  

  Stalling detection and control

  Detects during the ID stage when “lw” instruction is in EXstage

  The following two instructions are in ID (“and”) and IF (“or”)

stages, respectively

  If detected,

  Stall the following instruction (in ID stage, “and”) so that it repeats

the ID stage again => IF/ID pipeline register should not bechanged

  Stall the second instruction (in IF stage, “or”) so that it repeats theIF stage again => PC should not be changed

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 27/35

27 

53c.yu91@csuohio.edu

D a t a H a z a r d s : S t a l l i n g  

  Hazard detection  If (ID/EX.MemRead and

((ID/EX.Rt = IF/ID.Rs) or (ID/EX.Rt = IF/ID.Rt)) stall the pipeline

  Control signals generated from hazard detection unit  IF/IDWrite to prevent IF/ID register from changing

  PCWrite to prevent PC from changing

  MUX control to delay forwarding control signals (pass “null” signals)

lw

54c.yu91@csuohio.edu

S t a l l i n g : D e t e c t i o n U n i t  

  Stall by letting an instruction that won’t write anything goforward

PCInstruction

memory

Registers

M

ux

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

0

Mux

IF/ID

    I   n   s    t   r   u   c    t    i   o   n

ID/EX.MemRead

    I    F    /    I    D    W   r    i    t   e

    P    C    W   r    i    t   e

ID/EX.RegisterRt

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

Rt

Rs

Rd

RtEX/MEM.RegisterRd

MEM/WB.RegisterRd Hazard detection unit:

4-input, 3-output 

combinational circuit 

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 28/35

28

55c.yu91@csuohio.edu

S t a l l i n g : W h a t h a p p e n i n t h e p i p l e i n e ?  

lw $s5,0($s4)

add $s7,$s5,$s6

I D 

I F 

I F 

I D 

W B 

E X 

E X M E M W B  

M E M 

S t a l l

( I D )  

I F I D E X M E M W B  

S t a l l

( I F )  

I F I D E X M E M W B  

I D I F W B  E X M E M  

I D I F W B  E X M E M  

I D I F W B  E X M E M  

 No EX stage

 No MEM stage

I F I D E X M E M W B  

 No WB stage

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11 CC12

 ID stage is repeated at CC7 

<- IF/ID.Write

 IF stage is repeated at CC7 

<- PCWrite

 No EX at CC7, no MEM at CC8

and no WB at CC9

<- zero control signals

56 c.yu91@csuohio.edu

B r a n c h ( C o n t r o l ) H a z a r d s  

While executing a previous branch, next instruction address

might not yet be known.

  I

  n

  s

  t

  r

  u

  c

  t

  i

  o

  n

  s

T i m e S t e p ( C l o c k C y c l e )  

I F 

M E M 

B r a n c h t a r g e t  

I D 

S t a l l

C o m p u t e s b r a n c h t a r g e t a d d r e s s .

C o n d i t i o n a l

b r a n c h  

I F 

C a l c u l a t e s P C + 4 .

E X 

P e r f o r m s b r a n c h t e s t & s e t s P C t o t a r g e t  

S t a l l I D  

W B 

M E M E X 

W B 

7 8 

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 29/35

29

57 c.yu91@csuohio.edu

B r a n c h ( C o n t r o l ) H a z a r d s  

58c.yu91@csuohio.edu

  We can stall the pipeline for every branch instruction

  Too slow (3 instructions)

  Or, continue execution down the sequential instruction stream

assuming that the branch will not be taken (predict “branch

not taken”)

  If the condition is not met, OK ! (prediction is successful)

  If the condition is met, (prediction is wrong)

 

Some unwanted instructions are in the pipeline!

  Need to “flush” instructions

  How do you compare the above two ?

  If branches are taken half the time, and if it costs little to discard the

instructions, the second approach halves the cost of control hazards

B r a n c h H a z a r d s  

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 30/35

30

59c.yu91@csuohio.edu

S t a l l i n g : W h a t h a p p e n i n t h e p i p l e i n e ?  

 beq $1,$2, 7

add $3,$4,$5

I D 

I F 

I F 

W B 

E X M E M  

N u l l

( I D )  

I F I D E X M E M W B  

 ID stage executes

a null instruction

(sll $0,$0,$0) at CC3

CC1 CC2 CC3 CC4 CC5 CC6 CC7 

“target of beq”

N u l l

( E X )  

N u l l

( M E M )  

N u l l

( W B )  

 EX stage executes

a null instruction

(sll $0,$0,$0) at 

CC4

 MEM stage executes

a null instruction

(sll $0,$0,$0) at CC5

WB stage executes

a null instruction

(sll $0,$0,$0) at 

CC6 

 IF.Flush at CC3 will do.

• A new control signalIF.Flush is introduced

to flush the instructionin IF stage

• It zeros theinstruction field of the

IF/ID pipeline register,

which in fact can be

decoded as “sll $0, $0,$0”

• In fact, “nop” = “sll

$0, $0, $0”

60c.yu91@csuohio.edu

B r a n c h H a z a r d s : B r a n c h D e l a y

S l o t s  

While determining next instruction address, go ahead and

execute sequentially following instruction(s).

  I

  n

  s

  t

  r

  u

  c

  t

  i

  o

  n

  s

T i m e S t e p ( C l o c k C y c l e )  

C o n d i t i o n a l

b r a n c h  

I F 

W B 

M E M 

E X 

M E M 

E X 

I D 

W B 

M E M 

W B 

B r a n c h d e l a y  

I D 

I F 

C o m p u t e s b r a n c h t a r g e t a d d r e s s .

P e r f o r m s b r a n c h t e s t & s e t s P C t o t a r g e t .

B r a n c h t a r g e t  

E X 

I D 

I F 

F e t c h e s c o r r e c t t a r g e t .

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 31/35

31

61c.yu91@csuohio.edu

B r a n c h H a z a r d s : B r a n c h D e l a y

S l o t s  

  Advantage:

  Can avoid one stall per delay slot.

  Disadvantages:

  Makes assembly-language programming more difficult.

  Can be difficult to find appropriate code for slot.

  Exposes implementation detail that could change.

  Later implementations without a stall must still emulate slot.

  Most modern processors avoid

62c.yu91@csuohio.edu

B r a n c h H a z a r d s : B r a n c h

P r e d i c t i o n  

Guess which instruction is next, & start executing it.

  What if guess is wrong? : Flush the pipeline

  Simplest guesses: Always Taken or Never Taken.

 

When to do prediction?  Static prediction: compiler

  Dynamic prediction: processor

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 32/35

32

63c.yu91@csuohio.edu

D y n a m i c B r a n c h P r e d i c t i o n  

  Branch prediction buffer (branch history table)

  A small memory that is indexed by the lower portion of the

address of the branch instruction and that contains one or

more bits indicating whether the branch was recently taken

or not.

PC  Instruction

memory

 BPB

 Instruction

Prediction (T or NT)

 IF/ID

64c.yu91@csuohio.edu

D y n a m i c B r a n c h P r e d i c t i o n  

  1-bit predictor

  Prediction accuracy

------ loop 10 times => 1st: ?, 2nd: correct, 3rd: correct,

beq 9th: correct, 10th: incorrect => 80% accuracy

Predict taken

Predict 

not taken

N (Not taken)

T (Taken)T NT

(Because the first one is incorrect in

the second execution of the same code.)

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 33/35

33

65c.yu91@csuohio.edu

E x c e p t i o n s  

  Another form of control hazard involves exceptions.

  When an arithmetic overflow occurs during executing “add$1, $2, $1”

  Transfer control to the exception routine (0x4000 0040)  This is the same as executing a branch instruction

  Necessary actions are  Stop executing the current instruction and start the exception routine.  Following instructions already in the pipe must be wiped out (flush

pipeline registers).  Return to the offending instruction.

66 c.yu91@csuohio.edu

F l u s h C o n t r o l S i g n a l s  

  Similar to the taken-branch, we need to flush pipelineregisters. Question is which pipeline register(s)?

  Arithmetic overflow is detected at the end of EX stage.

  And thus flushing takes place at MEM stage (at the next cycle).

  Since three following instructions are already in the pipeline (IF, ID

and EX stages), we need to flush those three instructions.

  Therefore, we need ID.Flush and EX.Flush in addition to IF.Flushcontrol signal.

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 34/35

34

67 c.yu91@csuohio.edu

 

PCInstruction

memory

4

Registers

Signextend

Mux

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Mux

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

IF.Flush

IF/ID

=

ExceptPC

40000040

0

Mux

0

Mux

0

Mux

ID.Flush EX.Flush

Cause

Shiftleft 2

For the instruction in IF stage For the instruction

in ID stage For the instruction

in ID stage

OF

68c.yu91@csuohio.edu

C h a l l e n g e s  

  What if more than one instruction generates

exceptions?

  While “add” causes an overflow exception at CC5 in EX,

  another causes an invalid opcode exception at CC5 in IF

  It is not OK to generate all flushing signals.

  And, how does the exception service routine

correctly identify the instruction that causes theexception? => Imprecise exception

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 35/35

69c.yu91@csuohio.edu

P r e c i s e a n d I m p r e c i s e E x c e p t i o n s  

  Precise exceptions  Hardware (CPU) correctly identifies the offending instruction.

  And makes sure all prior instructions complete.

  All instructions following it are not allowed to complete theirexecution and have not modified the process state

  Imprecise exception  Hardware does not guarantee it and leaves it up to the operating system

to determine which instruction caused the problem.

  Some instructions following the offending instruction are allowed tocompleted their execution and modified the process state.

  Most of modern CPUs support  Precise exceptions