00 Review 3 Pipeline

8/3/2019 00 Review 3 Pipeline

http://slidepdf.com/reader/full/00-review-3-pipeline 1/35

1

R e v i e w : C o m p u t e r O r g a n i z a t i o n R e v i e w : C o m p u t e r O r g a n i z a t i o n R e v i e w : C o m p u t e r O r g a n i z a t i o n R e v i e w : C o m p u t e r O r g a n i z a t i o n

Pipelining

Chansu Yu

[email protected]

L a u n d r y E x a m p l e

Laundry Example

Ann, Brian, Cathy, Dave

each have one load of clothesto wash, dry, and fold

Washer takes 30 minutes

Dryer takes 30 minutes

“Folder” takes 30 minutes

“Stasher” takes 30 minutes

to put clothes into drawers

A B C D



2

[email protected]

S e q u e n t i a l L a u n d r y

Sequential laundry takes 8 hours for 4 loads

If they learned pipelining, how long would laundry take?

30

B

C

D

ATime

30 30 3030 30 3030 30 30 3030 30 30 3030

6 PM 7 8 9 10 11 12 1 2 AM

[email protected]

F a s t e r L a u n d r y - P i p e l i n i n g

Faster laundry takes 3.5 hours for 4 loads!

12 2 AM6 PM 7 8 9 10 11 1

Time

B

C

D

A

3030 30 3030 30 30



3

[email protected]

5 S t a g e s o f M I P S

Step name

Action for R-type

instructions

Action for memory-reference

instructions

Action for

branches

Action for

jumps

Instruction fetch IR = Memory[PC]

PC = PC + 4

Instruction A = Reg [IR[25-21]]

decode/register fetch B = Reg [IR[20-16]]

ALUOut = PC + (sign-extend (IR[15-0]) << 2)

Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II

computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)

jump completion

Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]

completion ALUOut or

Store: Memory [ALUOut] = B

Memory read completion Load: Reg[IR[20-16]] = MDR

6 [email protected]

S i n g l e c y c l e v s . M u l t i c y c l e

Instructionfetch

Reg.read

ALUoperation

Reg.write

Memoryread

Instructionfetch

Reg.read

ALUoperation

Reg.write

Instructionfetch

Reg.read

ALUoperation

Reg.write

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Memoryread

Instructionfetch

Reg.read

ALUoperation

Reg.write

What are the advantages of multicycle implementation ?

What are the disadvantages of multicycle implementation ?

add

load

add

load



4

7 [email protected]

M u l t i c y c l e v s . P i p e l i n e d

Instructionfetch

Reg.read

ALUoperation

Reg.write

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Memoryread

Instructionfetch

Reg.read

ALUoperation

Reg.write

What are the advantages of pipelined implementation ?

What are the disadvantages of pipelined implementation ?

add

Instructionfetch

Reg.read

ALUoperation

Reg.write

Memoryread

Instructionfetch

Reg.read

ALUoperation

Reg.write

load

add

load

[email protected]

L e s s o n s f r o m P i p e l i n e d L a u n d r y

Pipelining doesn’t help latency of single

task, it helps throughput of entire workload

Potential speedup = Number pipe stages

Pipeline rate limited by slowest pipeline

stage

Unbalanced lengths of pipe stages reduces

speedup

Time to “fill” pipeline and time to “drain”

it reduces speedup

Multiple tasks operating simultaneously

using different resources – any

dependencies, any conflicts ???

6 PM 7 8 9

Time

B

CD

A

3030 30 3030 30 30



5

[email protected]

C a n p i p e l i n i n g g e t u s i n t o t r o u b l e ?

1

I

n

s

t

r

u

c

t

i

o

n

s

T i m e S t e p ( C l o c k C y c l e )

I F

1

2

I D

2

I F

3

E X

3

I D

I F

5

W B

5

M E M

E X

4

M E M

4

E X

I D

I F I D

I F

6

W B

M E M

E X

I D

7

W B

M E M

E X

8

W B

M E M

9

W B

If any two stages use

the same resource, theremust be a conflict.

[email protected]

H a z a r d s

Hazard = when an instruction’s stage is unable to execute during the currentcycle.

Can always resolve hazards by waiting

pipeline control must detect the hazard

take action (or delay action) to resolve hazards

1

I

n

s

t

r

u

c

t

i

o

n

s


I F

1

2

I D

2

I F

3

E X

3

I D

I F

W B

5

M E M

E X

4

M E M

4

E X

I D

I F I D

6

W B

M E M

E X

7

W B

M E M

8

W B

9

S t a l l

I n s t r u c t i o n # 2 s t a g e 3

u n a b l e t o c o n t i n u e .



6

[email protected]

S t r u c t u r a l H a z a r d s

A needed functional unit is busy executing a previous instruction

(Attempt to use the same resource two different ways at the same time)

1

I

n

s

t

r

u

c

t

i

o

n

s


I F

1

2

I D

2

I F

3

E X

3

I D

I F

W B

5

M E M

E X

M E M

4

E X

I D

6

W B

M E M

7

W B

8 9

S t a l l S t a l l

E x a m p l e :

– O u r s a m p l e M I P S p i p e l i n e h a s n o n e .

– W h a t i f P C + 4 c o m p u t a t i o n u s e d m a i n A L U i n s t e a d o f s e p a r a t e

a d d e r ?

[email protected]

C o n t r o l H a z a r d s

While executing a previous branch, next instruction address

might not yet be known.

(attempt to make a decision before condition is evaluated)

I

n

s

t

r

u

c

t

i

o

n

s


I F

M E M

4

B r a n c h t a r g e t

I D

2

S t a l l

C o m p u t e s b r a n c h t a r g e t a d d r e s s .

C o n d i t i o n a l

b r a n c h

I F

1

C a l c u l a t e s P C + 4 .

E X

3

P e r f o r m s b r a n c h t e s t & s e t s P C t o t a r g e t .

S t a l l I D

W B

5

M E M E X

6

W B

7 8



7

[email protected]

D a t a H a z a r d s

Needed data still being computed by previous instruction.

(attempt to use item before it is ready)

add $s3,$s1,$s2


I F

1

sw $s4,0($s3)

I D

2

I F

lw $s5,0($s3)

add $s7,$s5,$s6

E X

I D

I F

6

I D

I F

W B

5

I D

W B

9

E X

3

S t a l l

M E M

4

S t a l l

M E M

E X

7

S t a l l E X

1 0

M E M

1 1

W B

1 2

W B

M E M

8

S t a l l

[email protected]

P i p e l i n e d A p p r o a c h

1

I F

1

2

I D

2

I F

3

E X

3

I D

I F

5

W B

5

M E M

E X

4

M E M

4

E X

I D

I F I D

I F

6

W B

M E M

E X

I D

7

W B

M E M

E X

8

W B

M E M

9

W B

B

C

D

A- Cycle time, No. stages

- Resource conflict



8

[email protected]

R e s o u r c e C o n f l i c t s ( r e v i s i t )

Step name

Action for R-type

instructions


instructions

Action for

branches

Action for

jumps


PC = PC + 4






jump completion





ALU conflict

Register file conflict (read or write)

Memory

conflict

16 [email protected]

B a s i c P i p e l i n e

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Instruction

Mux

0

1

Add

PC

0Write

data

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Address

Datamemory

1

ALUresult

Mux

ALU

Zero

IF: Instruction fetch ID: Instruction decode/

register file read

EX: Execute/

address calculation

MEM: Memory access WB: Write back

Instructions and data

move generally from

left to right through

the five stages as they

complete execution

except two cases.

- WB stage

- PC selection



9


B a s i c P i p e l i n e

Step name

Action for R-type

instructions


instructions

Action for

branches

Action for

jumps


PC = PC + 4






jump completion





Why do we still need 2 ALUs at EX stage?(one for A-B and the other for PC+IR)

Why move ?? ZF is available during EX

stage, anyway.

[email protected]

P i p e l i n e d D a t a p a t h

For store instruction,

(?) => ID/EX pipeline register => EX/MEM pipeline register => (?)

Instruction

memory

Address

4

32

0

AddAdd

result

Shiftleft 2

I n s t r u c t i o n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1

Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALU

Zero

ID/EX

Datamemory

Address

Add to the basic pipeline

in order to actually split

the datapath into stages.

The info. must be placed

in a pipeline register;

otherwise, it is lost when

the next instruction

enters that pipeline stage.



10

[email protected]

C o n t e n t o f P i p e l i n e R e g i s t e r s

Which data should be passed through stages? I.e.,what are the contents of pipeline registers?

In IF/ID pipeline register PC (32), Inst. (32)

In ID/EX pipeline register PC (32), Reg. data 1 (32), Reg. data 2 (32), Offset (32), Reg. no. 2

and 3 (10)

In EX/MEM pipeline register PC (32), ZF (1), ALUOut (32), Reg. data 2 (32), Reg. no. (5)

In MEM/WB pipeline register Memory data (32), ALUOut (32), Reg. no. (5)

[email protected]

E x a m p l e

Five instructions go through the MIPS pipeline:

lw $10, 20($1) 100011 00001 01010 0000 0000 0001 0100 (8c2a 0014)

sub$11, $2, $3 010000 00010 00011 01011 00000 100100 (4043 5824)

and$12, $4, $5 010000 00100 00101 01100 00000 100110 (4085 6026)

or $13, $6, $7 010000 00110 00111 01101 00000 100111 (40c7 6827)

add$14, $8, $9 010000 01000 01001 01110 00000 100000 (4109 7020)

$pc = 0000 0000 5000 0000 [0000 0000 0000 1000] = 0000 1000 0000 0000$1 = 0000 0000 0000 1000 [0000 0000 0000 1004] = 0000 1004 0000 0000

... .....

$9 = 0000 0000 0000 9000



11

[email protected]

[email protected]



12

[email protected]

[email protected]



13

[email protected]




14


F i v e i n s t r u c t i o n s g o t h r o u g h

t h e M I P S p i p e l i n e

lw $10, 20($1) 100011 00001 01010 0000 0000 0001 0100 (8c2a 0014)

sub$11, $2, $3 010000 00010 00011 01011 00000 100100 (4043 5824)

and$12, $4, $5 010000 00100 00101 01100 00000 100110 (4085 6026)

or $13, $6, $7 010000 00110 00111 01101 00000 100111 (40c7 6827)

add$14, $8, $9 010000 01000 01001 01110 00000 100000 (4109 7020)

Register contents Memory contents

$pc = 0000 0000 5000 0000 [0000 0000 0000 1000] = 0000 1000 0000 0000

$1 = 0000 0000 0000 1000 [0000 0000 0000 1004] = 0000 1004 0000 0000

... .....

$9 = 0000 0000 0000 9000

[email protected]

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2


IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

add $14, $8, $9 lw $10, 20($1)sub $11, $2, $3and $12, $4, $5or $13, $6, $7

(d)

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(i)

(k)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)



15

[email protected]

9 C o n t r o l S i g n a l s

Instruction RegDst ALUSrc

Memto-

Re

Reg

Write

Mem

Read

Mem

Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0sw X 1 X 0 0 1 0 0 0beq X 0 X 0 0 0 1 0 1

4 multiplexor selectors

000000

100011101011000100

(PCSrc)

3 write signals 2 ALU signals

Q1: In which stage is the control circuit?Q2: EX stage executes “and” and WB stage executes “lw”

Is MemtoReg 1 or 0?

[email protected]

Generate control signals all at once at ID stage

And passed them through stages just like the data

P i p e l i n e C o n t r o l

Execution/Address Calculation

stage control lines

Memory access stage

control lines

stage control

lines

Instruction

Reg

Dst

ALU

O 1

ALU

O 0

ALU

Src Branch

Mem

Read

Mem

Write

Reg

write

Mem to

Re

R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X

Control

EX

M

WB

M

WB

WB

IF/ID ID/EX EX/MEM MEM/WB

Instruction



16

[email protected]

D a t a p a t h w i t h C o n t r o l

PC

Instructionmemory


Add

Instruction[20–16]

M e m t o R e g

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15–0]

0

0

Mux

0

1

AddAdd

result

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

R e g W r i t e

MemRead

Control

ALU

Instruction[15–11]

6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

M e m W r i t e

Address

Datamemory

Address

[email protected]

G r a p h i c a l l y R e p r e s e n t i n g

P i p e l i n e s

Can help with answering questions like:

how many cycles does it take to execute this code? what is the ALU doing during cycle 4?

use this representation to help understand datapaths

IM Reg DM Reg

IM Reg DM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $10, 20($1)

Programexecutionorder(in instructions)

sub $11, $2, $3

ALU

ALU



17

[email protected]

D a t a H a z a r d s

Needed data still being computed by previousinstruction

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Assume $1=10,

$2=10, $3=30

[email protected]

Problem with starting next instruction before first is finished

dependencies that “go backward in time” are data hazards

D a t a H a z a r d s : D e p e n d e n c i e s

IM Reg

IM Reg



sub $2, $1, $3


and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value ofregister $2:

DM Reg

Reg

Reg

Reg

DM

“and” has a problem

“or” has a problem

“add” ???

“sw” is OK



18

[email protected]

D a t a H a z a r d s : F o r w a r d i n g

sub $2,$1,$3I F

and $12,$2,$5

I D

I F E X I D

W B E X

S t a l l

M E M

S t a l l M E M W B

W h i l e r e s u l t n o t w r i t t e n b a c k u n t i l W B :

sub $2,$1,$3I F

and $12,$2,$5

I D

I F E X I D

W B E X M E M

M E M W B

I t i s c a l c u l a t e d e a r l i e r – i n E X :

A d d f o r w a r d i n g h a r d w a r e t o a l l o w , e . g . , E X ’ s o u t p u t ( l o c a t e d i n E X / M E M

p i p e l i n e r e g i s t e r ) t o b e E X ’ s i n p u t .

Actually available

after EX stage (not WB)

Actually needed

at EX stage (not ID)


F o r w a r d i n g : A l l 2 C a s e s

IM Reg

IM Reg



sub $2, $1, $3

Programexecution order(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2 :

DM Reg

Reg

Reg

Reg

X X X – 20 X X X X XValue of EX/MEM :

X X X X – 20 X X X XValue of MEM/WB :

DM

“and” has a problem

-> fixed

“or” has a problem

-> fixed

“add” ??? -> OK

“sw” is OK



19


D a t a H a z a r d s ( a g a i n )

Needed data still being computed by previousinstruction

sub $11, $3, $2

and $12, $11, $4

or $13, $6, $11

add $14, $8, $9

sw $15, 100($2)

[email protected]

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2


IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

(d)

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(i)

(k)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

sub $11, $3, $2



20

[email protected]

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2


IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

sub $11, $3, $2

Rs=3

(a)

(c)

(b)

(f)

(g)

(h)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

and $12, $11, $4

Rd=11

$Rs=300

[email protected]

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2


IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

or $13, $6, $11

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

sub $11, $3, $2and $12, $11, $4

Rs=11

Rd=12

$Rs=???

???

???

Rd=11



21

[email protected]

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2


IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

or $13, $6, $11

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

sub $11, $3, $2and $12, $11, $4

Rs=11

Rd=12

$Rs=1100

300

100

Rd=11

[email protected]

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2


IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

add $14, $8, $9

(d)

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(i)

(k)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

or $13, $6, $11 sub $11, $3, $2and $12, $11, $4

???

Rd=12 Rd=11

???



22

[email protected]

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2


IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

add $14, $8, $9

(d)

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(i)

(k)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

or $13, $6, $11 sub $11, $3, $2and $12, $11, $4

100

Rd=12 Rd=11

100

[email protected]

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2


IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALU

Zero

ID/EX

sw $15, 100($2)

(d)

(a)

(c)

(b)

(e)

(f)

(g)

(h)

(i)

(k)

(l)

(j) (m)

(n)

(o)

(p)

(q)

(r)

(s)

(t)

(u)

(v)

(w)

(x) (y)

(z)

(f)

(g)

add $14, $8, $9 or $13, $6, $11 sub $11, $3, $2and $12, $11, $4

???

Rd=11

???



23

[email protected]

F o r w a r d i n g : I m p l e m e n t a t i o n

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2


IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16

Signextend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALUZero

ID/EX

Additional datapath

for forwarding ?

How to control the

forwarding datapth ?


F o r w a r d i n g : I m p l e m e n t a t i o n

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2


IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata1

Readdata2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Write

data

Readdata

Datamemory

1

ALUresult

Mux

ALUZero

ID/EX

Additional datapath

for forwarding ?

How to control the

forwarding datapth ?



24


F o r w a r d i n g : F o r w a r d i n g U n i t

PCInstructionmemory

Registers

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Forwardingunit

IF/ID


Mu

xRd

EX/MEM.RegisterRd

MEM/WB.RegisterRd

Rt

Rt

Rs

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

Forwarding unit:

6-input, 2-output

combinational circuit HW#1, (5)

[email protected]

F o r w a r d i n g C o n t r o l

Control logic

ForwardA =

10 if (EX/MEM.Rd = ID/EX.Rs) <- get operand from EX/MEM

01 if (MEM/WB.Rd = ID/EX.Rs) <- get operand from MEM/WB

00, otherwise <- get operand from ID/EX

ForwardB =

10 if (EX/MEM.Rd = ID/EX.Rt) <- get operand from EX/MEM

01 if (MEM/WB.Rd = ID/EX.Rt) <- get operand from MEM/WB

00, otherwise <- get operand from ID/EX



25

[email protected]

F o r w a r d i n g C o n t r o l C i r c u i t

ForwardA = 10 if ((EX/MEM.Rd = ID/EX.Rs) && EX/MEM.RegWrite &&

(EX/MEM.Rd ≠ 0))

01 if ((MEM/WB.Rd = ID/EX.Rs) && MEM/WB.RegWrite &&(MEM/WB.Rd ≠ 0) && (EX/MEM.Rd ≠ ID/EX.Rs))

00, otherwise

ForwardB = 10 if ((EX/MEM.Rd = ID/EX.Rt) && EX/MEM.RegWrite &&

(EX/MEM.Rd ≠ 0))

01 if ((MEM/WB.Rd = ID/EX.Rt) && MEM/WB.RegWrite &&(MEM/WB.Rd ≠ 0) && (EX/MEM.Rd ≠ ID/EX.Rt)))

00, otherwise

[email protected]

D a t a H a z a r d s : A l l C o n s i d e r e d ? ? ?

lw $s5,0($s4)

add $s7,$s5,$s6

I D

I F

I F

I D

W B E X

E X M E M W B

M E M

S t a l l

… e s p e c i a l l y w h e n w e r e m e m b e r t h a t m e m o r y a c c e s s i s r e a l l y o f t e n

m u c h l o n g e r t h a n a s i n g l e c y c l e :

S t a l l S t a l l

… b u t i t d o e s n ’ t e l i m i n a t e a l l d a t a h a z a r d s :

lw $s5,0($s4)

add $s7,$s5,$s6

I D

I F

I F

I D

W B E X

E X M E M W B

M E M

S t a l l



26

[email protected]

D a t a H a z a r d s : S t a l l i n g

Stall the pipeline by keeping an instruction in the same stage

lw$2, 20($1)


and $4, $2,$5

or$8,$2,$6

add $9, $4,$2

slt $1,$6,$7

Reg

IM

Reg

Reg

IM DM


Time (inclock cycles)

IM Reg DM RegIM

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9 CC 10

DM Reg

RegReg

Reg

bubble

lw-and

lw-or

At CC5, MEM stage is empty !!!

[email protected]


Stalling detection and control

Detects during the ID stage when “lw” instruction is in EXstage

The following two instructions are in ID (“and”) and IF (“or”)

stages, respectively

If detected,

Stall the following instruction (in ID stage, “and”) so that it repeats

the ID stage again => IF/ID pipeline register should not bechanged

Stall the second instruction (in IF stage, “or”) so that it repeats theIF stage again => PC should not be changed



27

[email protected]


Hazard detection If (ID/EX.MemRead and

((ID/EX.Rt = IF/ID.Rs) or (ID/EX.Rt = IF/ID.Rt)) stall the pipeline

Control signals generated from hazard detection unit IF/IDWrite to prevent IF/ID register from changing

PCWrite to prevent PC from changing

MUX control to delay forwarding control signals (pass “null” signals)

lw

[email protected]

S t a l l i n g : D e t e c t i o n U n i t

Stall by letting an instruction that won’t write anything goforward

PCInstruction

memory

Registers

M

ux

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

0

Mux

IF/ID


ID/EX.MemRead

I F / I D W r i t e

P C W r i t e

ID/EX.RegisterRt

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

Rt

Rs

Rd

RtEX/MEM.RegisterRd

MEM/WB.RegisterRd Hazard detection unit:

4-input, 3-output

combinational circuit



28

[email protected]

S t a l l i n g : W h a t h a p p e n i n t h e p i p l e i n e ?

lw $s5,0($s4)

add $s7,$s5,$s6

I D

I F

I F

I D

W B

E X

E X M E M W B

M E M

S t a l l

( I D )

I F I D E X M E M W B

S t a l l

( I F )


I D I F W B E X M E M



No EX stage

No MEM stage


No WB stage

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11 CC12

ID stage is repeated at CC7

<- IF/ID.Write

IF stage is repeated at CC7

<- PCWrite

No EX at CC7, no MEM at CC8

and no WB at CC9

<- zero control signals


B r a n c h ( C o n t r o l ) H a z a r d s

While executing a previous branch, next instruction address

might not yet be known.

I

n

s

t

r

u

c

t

i

o

n

s


I F

M E M

4


I D

2

S t a l l



b r a n c h

I F

1

C a l c u l a t e s P C + 4 .

E X

3

P e r f o r m s b r a n c h t e s t & s e t s P C t o t a r g e t

S t a l l I D

W B

5

M E M E X

6

W B

7 8



29


B r a n c h ( C o n t r o l ) H a z a r d s

[email protected]

We can stall the pipeline for every branch instruction

Too slow (3 instructions)

Or, continue execution down the sequential instruction stream

assuming that the branch will not be taken (predict “branch

not taken”)

If the condition is not met, OK ! (prediction is successful)

If the condition is met, (prediction is wrong)

Some unwanted instructions are in the pipeline!

Need to “flush” instructions

How do you compare the above two ?

If branches are taken half the time, and if it costs little to discard the

instructions, the second approach halves the cost of control hazards

B r a n c h H a z a r d s



30

[email protected]

S t a l l i n g : W h a t h a p p e n i n t h e p i p l e i n e ?

beq $1,$2, 7

add $3,$4,$5

I D

I F

I F

W B

E X M E M

N u l l

( I D )


ID stage executes

a null instruction

(sll $0,$0,$0) at CC3

CC1 CC2 CC3 CC4 CC5 CC6 CC7

“target of beq”

N u l l

( E X )

N u l l

( M E M )

N u l l

( W B )

EX stage executes

a null instruction

(sll $0,$0,$0) at

CC4

MEM stage executes

a null instruction

(sll $0,$0,$0) at CC5

WB stage executes

a null instruction

(sll $0,$0,$0) at

CC6

IF.Flush at CC3 will do.

• A new control signalIF.Flush is introduced

to flush the instructionin IF stage

• It zeros theinstruction field of the

IF/ID pipeline register,

which in fact can be

decoded as “sll $0, $0,$0”

• In fact, “nop” = “sll

$0, $0, $0”

[email protected]

B r a n c h H a z a r d s : B r a n c h D e l a y

S l o t s

While determining next instruction address, go ahead and

execute sequentially following instruction(s).

I

n

s

t

r

u

c

t

i

o

n

s



b r a n c h

I F

1

W B

5

M E M

E X

M E M

4

E X

I D

6

W B

M E M

7

W B

B r a n c h d e l a y

I D

2

I F


P e r f o r m s b r a n c h t e s t & s e t s P C t o t a r g e t .


E X

3

I D

I F

F e t c h e s c o r r e c t t a r g e t .



31

[email protected]

B r a n c h H a z a r d s : B r a n c h D e l a y

S l o t s

Advantage:

Can avoid one stall per delay slot.

Disadvantages:

Makes assembly-language programming more difficult.

Can be difficult to find appropriate code for slot.

Exposes implementation detail that could change.

Later implementations without a stall must still emulate slot.

Most modern processors avoid

[email protected]

B r a n c h H a z a r d s : B r a n c h

P r e d i c t i o n

Guess which instruction is next, & start executing it.

What if guess is wrong? : Flush the pipeline

Simplest guesses: Always Taken or Never Taken.

When to do prediction? Static prediction: compiler

Dynamic prediction: processor



32

[email protected]

D y n a m i c B r a n c h P r e d i c t i o n

Branch prediction buffer (branch history table)

A small memory that is indexed by the lower portion of the

address of the branch instruction and that contains one or

more bits indicating whether the branch was recently taken

or not.

PC Instruction

memory

BPB

Instruction

Prediction (T or NT)

IF/ID

[email protected]

D y n a m i c B r a n c h P r e d i c t i o n

1-bit predictor

Prediction accuracy

------ loop 10 times => 1st: ?, 2nd: correct, 3rd: correct,

beq 9th: correct, 10th: incorrect => 80% accuracy

Predict taken

Predict

not taken

N (Not taken)

T (Taken)T NT

(Because the first one is incorrect in

the second execution of the same code.)



33

[email protected]

E x c e p t i o n s

Another form of control hazard involves exceptions.

When an arithmetic overflow occurs during executing “add$1, $2, $1”

Transfer control to the exception routine (0x4000 0040) This is the same as executing a branch instruction

Necessary actions are Stop executing the current instruction and start the exception routine. Following instructions already in the pipe must be wiped out (flush

pipeline registers). Return to the offending instruction.


F l u s h C o n t r o l S i g n a l s

Similar to the taken-branch, we need to flush pipelineregisters. Question is which pipeline register(s)?

Arithmetic overflow is detected at the end of EX stage.

And thus flushing takes place at MEM stage (at the next cycle).

Since three following instructions are already in the pipeline (IF, ID

and EX stages), we need to flush those three instructions.

Therefore, we need ID.Flush and EX.Flush in addition to IF.Flushcontrol signal.



34


PCInstruction

memory

4

Registers

Signextend

Mux

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Mux

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

IF.Flush

IF/ID

=

ExceptPC

40000040

0

Mux

0

Mux

0

Mux

ID.Flush EX.Flush

Cause

Shiftleft 2

For the instruction in IF stage For the instruction

in ID stage For the instruction

in ID stage

OF

[email protected]

C h a l l e n g e s

What if more than one instruction generates

exceptions?

While “add” causes an overflow exception at CC5 in EX,

another causes an invalid opcode exception at CC5 in IF

It is not OK to generate all flushing signals.

And, how does the exception service routine

correctly identify the instruction that causes theexception? => Imprecise exception



[email protected]

P r e c i s e a n d I m p r e c i s e E x c e p t i o n s

Precise exceptions Hardware (CPU) correctly identifies the offending instruction.

And makes sure all prior instructions complete.

All instructions following it are not allowed to complete theirexecution and have not modified the process state

Imprecise exception Hardware does not guarantee it and leaves it up to the operating system

to determine which instruction caused the problem.

Some instructions following the offending instruction are allowed tocompleted their execution and modified the process state.

Most of modern CPUs support Precise exceptions

00 Review 3 Pipeline

Documents

Transcript of 00 Review 3 Pipeline