00 Review 3 Pipeline
Transcript of 00 Review 3 Pipeline
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 1/35
1
R e v i e w : C o m p u t e r O r g a n i z a t i o n R e v i e w : C o m p u t e r O r g a n i z a t i o n R e v i e w : C o m p u t e r O r g a n i z a t i o n R e v i e w : C o m p u t e r O r g a n i z a t i o n
Pipelining
Chansu Yu
L a u n d r y E x a m p l e
Laundry Example
Ann, Brian, Cathy, Dave
each have one load of clothesto wash, dry, and fold
Washer takes 30 minutes
Dryer takes 30 minutes
“Folder” takes 30 minutes
“Stasher” takes 30 minutes
to put clothes into drawers
A B C D
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 2/35
2
S e q u e n t i a l L a u n d r y
Sequential laundry takes 8 hours for 4 loads
If they learned pipelining, how long would laundry take?
30
B
C
D
ATime
30 30 3030 30 3030 30 30 3030 30 30 3030
6 PM 7 8 9 10 11 12 1 2 AM
F a s t e r L a u n d r y - P i p e l i n i n g
Faster laundry takes 3.5 hours for 4 loads!
12 2 AM6 PM 7 8 9 10 11 1
Time
B
C
D
A
3030 30 3030 30 30
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 3/35
3
5 S t a g e s o f M I P S
Step name
Action for R-type
instructions
Action for memory-reference
instructions
Action for
branches
Action for
jumps
Instruction fetch IR = Memory[PC]
PC = PC + 4
Instruction A = Reg [IR[25-21]]
decode/register fetch B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II
computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)
jump completion
Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]
completion ALUOut or
Store: Memory [ALUOut] = B
Memory read completion Load: Reg[IR[20-16]] = MDR
S i n g l e c y c l e v s . M u l t i c y c l e
Instructionfetch
Reg.read
ALUoperation
Reg.write
Memoryread
Instructionfetch
Reg.read
ALUoperation
Reg.write
Instructionfetch
Reg.read
ALUoperation
Reg.write
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Memoryread
Instructionfetch
Reg.read
ALUoperation
Reg.write
What are the advantages of multicycle implementation ?
What are the disadvantages of multicycle implementation ?
add
load
add
load
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 4/35
4
M u l t i c y c l e v s . P i p e l i n e d
Instructionfetch
Reg.read
ALUoperation
Reg.write
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Memoryread
Instructionfetch
Reg.read
ALUoperation
Reg.write
What are the advantages of pipelined implementation ?
What are the disadvantages of pipelined implementation ?
add
Instructionfetch
Reg.read
ALUoperation
Reg.write
Memoryread
Instructionfetch
Reg.read
ALUoperation
Reg.write
load
add
load
L e s s o n s f r o m P i p e l i n e d L a u n d r y
Pipelining doesn’t help latency of single
task, it helps throughput of entire workload
Potential speedup = Number pipe stages
Pipeline rate limited by slowest pipeline
stage
Unbalanced lengths of pipe stages reduces
speedup
Time to “fill” pipeline and time to “drain”
it reduces speedup
Multiple tasks operating simultaneously
using different resources – any
dependencies, any conflicts ???
6 PM 7 8 9
Time
B
CD
A
3030 30 3030 30 30
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 5/35
5
C a n p i p e l i n i n g g e t u s i n t o t r o u b l e ?
1
I
n
s
t
r
u
c
t
i
o
n
s
T i m e S t e p ( C l o c k C y c l e )
I F
1
2
I D
2
I F
3
E X
3
I D
I F
5
W B
5
M E M
E X
4
M E M
4
E X
I D
I F I D
I F
6
W B
M E M
E X
I D
7
W B
M E M
E X
8
W B
M E M
9
W B
If any two stages use
the same resource, theremust be a conflict.
H a z a r d s
Hazard = when an instruction’s stage is unable to execute during the currentcycle.
Can always resolve hazards by waiting
pipeline control must detect the hazard
take action (or delay action) to resolve hazards
1
I
n
s
t
r
u
c
t
i
o
n
s
T i m e S t e p ( C l o c k C y c l e )
I F
1
2
I D
2
I F
3
E X
3
I D
I F
W B
5
M E M
E X
4
M E M
4
E X
I D
I F I D
6
W B
M E M
E X
7
W B
M E M
8
W B
9
S t a l l
I n s t r u c t i o n # 2 s t a g e 3
u n a b l e t o c o n t i n u e .
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 6/35
6
S t r u c t u r a l H a z a r d s
A needed functional unit is busy executing a previous instruction
(Attempt to use the same resource two different ways at the same time)
1
I
n
s
t
r
u
c
t
i
o
n
s
T i m e S t e p ( C l o c k C y c l e )
I F
1
2
I D
2
I F
3
E X
3
I D
I F
W B
5
M E M
E X
M E M
4
E X
I D
6
W B
M E M
7
W B
8 9
S t a l l S t a l l
E x a m p l e :
– O u r s a m p l e M I P S p i p e l i n e h a s n o n e .
– W h a t i f P C + 4 c o m p u t a t i o n u s e d m a i n A L U i n s t e a d o f s e p a r a t e
a d d e r ?
C o n t r o l H a z a r d s
While executing a previous branch, next instruction address
might not yet be known.
(attempt to make a decision before condition is evaluated)
I
n
s
t
r
u
c
t
i
o
n
s
T i m e S t e p ( C l o c k C y c l e )
I F
M E M
4
B r a n c h t a r g e t
I D
2
S t a l l
C o m p u t e s b r a n c h t a r g e t a d d r e s s .
C o n d i t i o n a l
b r a n c h
I F
1
C a l c u l a t e s P C + 4 .
E X
3
P e r f o r m s b r a n c h t e s t & s e t s P C t o t a r g e t .
S t a l l I D
W B
5
M E M E X
6
W B
7 8
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 7/35
7
D a t a H a z a r d s
Needed data still being computed by previous instruction.
(attempt to use item before it is ready)
add $s3,$s1,$s2
T i m e S t e p ( C l o c k C y c l e )
I F
1
sw $s4,0($s3)
I D
2
I F
lw $s5,0($s3)
add $s7,$s5,$s6
E X
I D
I F
6
I D
I F
W B
5
I D
W B
9
E X
3
S t a l l
M E M
4
S t a l l
M E M
E X
7
S t a l l E X
1 0
M E M
1 1
W B
1 2
W B
M E M
8
S t a l l
P i p e l i n e d A p p r o a c h
1
I F
1
2
I D
2
I F
3
E X
3
I D
I F
5
W B
5
M E M
E X
4
M E M
4
E X
I D
I F I D
I F
6
W B
M E M
E X
I D
7
W B
M E M
E X
8
W B
M E M
9
W B
B
C
D
A- Cycle time, No. stages
- Resource conflict
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 8/35
8
R e s o u r c e C o n f l i c t s ( r e v i s i t )
Step name
Action for R-type
instructions
Action for memory-reference
instructions
Action for
branches
Action for
jumps
Instruction fetch IR = Memory[PC]
PC = PC + 4
Instruction A = Reg [IR[25-21]]
decode/register fetch B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II
computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)
jump completion
Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]
completion ALUOut or
Store: Memory [ALUOut] = B
Memory read completion Load: Reg[IR[20-16]] = MDR
ALU conflict
Register file conflict (read or write)
Memory
conflict
B a s i c P i p e l i n e
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Instruction
Mux
0
1
Add
PC
0Write
data
Mux
1Registers
Readdata1
Readdata2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
Address
Datamemory
1
ALUresult
Mux
ALU
Zero
IF: Instruction fetch ID: Instruction decode/
register file read
EX: Execute/
address calculation
MEM: Memory access WB: Write back
Instructions and data
move generally from
left to right through
the five stages as they
complete execution
except two cases.
- WB stage
- PC selection
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 9/35
9
B a s i c P i p e l i n e
Step name
Action for R-type
instructions
Action for memory-reference
instructions
Action for
branches
Action for
jumps
Instruction fetch IR = Memory[PC]
PC = PC + 4
Instruction A = Reg [IR[25-21]]
decode/register fetch B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II
computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)
jump completion
Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]
completion ALUOut or
Store: Memory [ALUOut] = B
Memory read completion Load: Reg[IR[20-16]] = MDR
Why do we still need 2 ALUs at EX stage?(one for A-B and the other for PC+IR)
Why move ?? ZF is available during EX
stage, anyway.
P i p e l i n e d D a t a p a t h
For store instruction,
(?) => ID/EX pipeline register => EX/MEM pipeline register => (?)
Instruction
memory
Address
4
32
0
AddAdd
result
Shiftleft 2
I n s t r u c t i o n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1
Registers
Readdata1
Readdata2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALU
Zero
ID/EX
Datamemory
Address
Add to the basic pipeline
in order to actually split
the datapath into stages.
The info. must be placed
in a pipeline register;
otherwise, it is lost when
the next instruction
enters that pipeline stage.
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 10/35
10
C o n t e n t o f P i p e l i n e R e g i s t e r s
Which data should be passed through stages? I.e.,what are the contents of pipeline registers?
In IF/ID pipeline register PC (32), Inst. (32)
In ID/EX pipeline register PC (32), Reg. data 1 (32), Reg. data 2 (32), Offset (32), Reg. no. 2
and 3 (10)
In EX/MEM pipeline register PC (32), ZF (1), ALUOut (32), Reg. data 2 (32), Reg. no. (5)
In MEM/WB pipeline register Memory data (32), ALUOut (32), Reg. no. (5)
E x a m p l e
Five instructions go through the MIPS pipeline:
lw $10, 20($1) 100011 00001 01010 0000 0000 0001 0100 (8c2a 0014)
sub$11, $2, $3 010000 00010 00011 01011 00000 100100 (4043 5824)
and$12, $4, $5 010000 00100 00101 01100 00000 100110 (4085 6026)
or $13, $6, $7 010000 00110 00111 01101 00000 100111 (40c7 6827)
add$14, $8, $9 010000 01000 01001 01110 00000 100000 (4109 7020)
$pc = 0000 0000 5000 0000 [0000 0000 0000 1000] = 0000 1000 0000 0000$1 = 0000 0000 0000 1000 [0000 0000 0000 1004] = 0000 1004 0000 0000
... .....
$9 = 0000 0000 0000 9000
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 11/35
11
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 12/35
12
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 13/35
13
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 14/35
14
F i v e i n s t r u c t i o n s g o t h r o u g h
t h e M I P S p i p e l i n e
lw $10, 20($1) 100011 00001 01010 0000 0000 0001 0100 (8c2a 0014)
sub$11, $2, $3 010000 00010 00011 01011 00000 100100 (4043 5824)
and$12, $4, $5 010000 00100 00101 01100 00000 100110 (4085 6026)
or $13, $6, $7 010000 00110 00111 01101 00000 100111 (40c7 6827)
add$14, $8, $9 010000 01000 01001 01110 00000 100000 (4109 7020)
Register contents Memory contents
$pc = 0000 0000 5000 0000 [0000 0000 0000 1000] = 0000 1000 0000 0000
$1 = 0000 0000 0000 1000 [0000 0000 0000 1004] = 0000 1004 0000 0000
... .....
$9 = 0000 0000 0000 9000
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
I n s t r u c t i o n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0
Address
Writedata
Mux
1Registers
Readdata1
Readdata2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
Datamemory
1
ALUresult
Mux
ALU
Zero
ID/EX
add $14, $8, $9 lw $10, 20($1)sub $11, $2, $3and $12, $4, $5or $13, $6, $7
(d)
(a)
(c)
(b)
(e)
(f)
(g)
(h)
(i)
(k)
(l)
(j) (m)
(n)
(o)
(p)
(q)
(r)
(s)
(t)
(u)
(v)
(w)
(x) (y)
(z)
(f)
(g)
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 15/35
15
9 C o n t r o l S i g n a l s
Instruction RegDst ALUSrc
Memto-
Re
Reg
Write
Mem
Read
Mem
Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0sw X 1 X 0 0 1 0 0 0beq X 0 X 0 0 0 1 0 1
4 multiplexor selectors
000000
100011101011000100
(PCSrc)
3 write signals 2 ALU signals
Q1: In which stage is the control circuit?Q2: EX stage executes “and” and WB stage executes “lw”
Is MemtoReg 1 or 0?
Generate control signals all at once at ID stage
And passed them through stages just like the data
P i p e l i n e C o n t r o l
Execution/Address Calculation
stage control lines
Memory access stage
control lines
stage control
lines
Instruction
Reg
Dst
ALU
O 1
ALU
O 0
ALU
Src Branch
Mem
Read
Mem
Write
Reg
write
Mem to
Re
R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X
Control
EX
M
WB
M
WB
WB
IF/ID ID/EX EX/MEM MEM/WB
Instruction
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 16/35
16
D a t a p a t h w i t h C o n t r o l
PC
Instructionmemory
I n s t r u c t i o n
Add
Instruction[20–16]
M e m t o R e g
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15–0]
0
0
Mux
0
1
AddAdd
result
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
R e g W r i t e
MemRead
Control
ALU
Instruction[15–11]
6
EX
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
M e m W r i t e
Address
Datamemory
Address
G r a p h i c a l l y R e p r e s e n t i n g
P i p e l i n e s
Can help with answering questions like:
how many cycles does it take to execute this code? what is the ALU doing during cycle 4?
use this representation to help understand datapaths
IM Reg DM Reg
IM Reg DM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
lw $10, 20($1)
Programexecutionorder(in instructions)
sub $11, $2, $3
ALU
ALU
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 17/35
17
D a t a H a z a r d s
Needed data still being computed by previousinstruction
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Assume $1=10,
$2=10, $3=30
Problem with starting next instruction before first is finished
dependencies that “go backward in time” are data hazards
D a t a H a z a r d s : D e p e n d e n c i e s
IM Reg
IM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
sub $2, $1, $3
Programexecutionorder(in instructions)
and $12, $2, $5
IM Reg DM Reg
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9
10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Value ofregister $2:
DM Reg
Reg
Reg
Reg
DM
“and” has a problem
“or” has a problem
“add” ???
“sw” is OK
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 18/35
18
D a t a H a z a r d s : F o r w a r d i n g
sub $2,$1,$3I F
and $12,$2,$5
I D
I F E X I D
W B E X
S t a l l
M E M
S t a l l M E M W B
W h i l e r e s u l t n o t w r i t t e n b a c k u n t i l W B :
sub $2,$1,$3I F
and $12,$2,$5
I D
I F E X I D
W B E X M E M
M E M W B
I t i s c a l c u l a t e d e a r l i e r – i n E X :
A d d f o r w a r d i n g h a r d w a r e t o a l l o w , e . g . , E X ’ s o u t p u t ( l o c a t e d i n E X / M E M
p i p e l i n e r e g i s t e r ) t o b e E X ’ s i n p u t .
Actually available
after EX stage (not WB)
Actually needed
at EX stage (not ID)
F o r w a r d i n g : A l l 2 C a s e s
IM Reg
IM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
sub $2, $1, $3
Programexecution order(in instructions)
and $12, $2, $5
IM Reg DM Reg
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9
10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Value of register $2 :
DM Reg
Reg
Reg
Reg
X X X – 20 X X X X XValue of EX/MEM :
X X X X – 20 X X X XValue of MEM/WB :
DM
“and” has a problem
-> fixed
“or” has a problem
-> fixed
“add” ??? -> OK
“sw” is OK
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 19/35
19
D a t a H a z a r d s ( a g a i n )
Needed data still being computed by previousinstruction
sub $11, $3, $2
and $12, $11, $4
or $13, $6, $11
add $14, $8, $9
sw $15, 100($2)
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
I n s t r u c t i o n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0
Address
Writedata
Mux
1Registers
Readdata1
Readdata2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
Datamemory
1
ALUresult
Mux
ALU
Zero
ID/EX
(d)
(a)
(c)
(b)
(e)
(f)
(g)
(h)
(i)
(k)
(l)
(j) (m)
(n)
(o)
(p)
(q)
(r)
(s)
(t)
(u)
(v)
(w)
(x) (y)
(z)
(f)
(g)
sub $11, $3, $2
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 20/35
20
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
I n s t r u c t i o n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0
Address
Writedata
Mux
1Registers
Readdata1
Readdata2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
Datamemory
1
ALUresult
Mux
ALU
Zero
ID/EX
sub $11, $3, $2
Rs=3
(a)
(c)
(b)
(f)
(g)
(h)
(j) (m)
(n)
(o)
(p)
(q)
(r)
(s)
(t)
(u)
(v)
(w)
(x) (y)
(z)
(f)
(g)
and $12, $11, $4
Rd=11
$Rs=300
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
I n s t r u c t i o n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0
Address
Writedata
Mux
1Registers
Readdata1
Readdata2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
Datamemory
1
ALUresult
Mux
ALU
Zero
ID/EX
or $13, $6, $11
(a)
(c)
(b)
(e)
(f)
(g)
(h)
(l)
(j) (m)
(n)
(o)
(p)
(q)
(r)
(s)
(t)
(u)
(v)
(w)
(x) (y)
(z)
(f)
(g)
sub $11, $3, $2and $12, $11, $4
Rs=11
Rd=12
$Rs=???
???
???
Rd=11
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 21/35
21
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
I n s t r u c t i o n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0
Address
Writedata
Mux
1Registers
Readdata1
Readdata2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
Datamemory
1
ALUresult
Mux
ALU
Zero
ID/EX
or $13, $6, $11
(a)
(c)
(b)
(e)
(f)
(g)
(h)
(l)
(j) (m)
(n)
(o)
(p)
(q)
(r)
(s)
(t)
(u)
(v)
(w)
(x) (y)
(z)
(f)
(g)
sub $11, $3, $2and $12, $11, $4
Rs=11
Rd=12
$Rs=1100
300
100
Rd=11
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
I n s t r u c t i o n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0
Address
Writedata
Mux
1Registers
Readdata1
Readdata2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
Datamemory
1
ALUresult
Mux
ALU
Zero
ID/EX
add $14, $8, $9
(d)
(a)
(c)
(b)
(e)
(f)
(g)
(h)
(i)
(k)
(l)
(j) (m)
(n)
(o)
(p)
(q)
(r)
(s)
(t)
(u)
(v)
(w)
(x) (y)
(z)
(f)
(g)
or $13, $6, $11 sub $11, $3, $2and $12, $11, $4
???
Rd=12 Rd=11
???
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 22/35
22
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
I n s t r u c t i o n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0
Address
Writedata
Mux
1Registers
Readdata1
Readdata2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
Datamemory
1
ALUresult
Mux
ALU
Zero
ID/EX
add $14, $8, $9
(d)
(a)
(c)
(b)
(e)
(f)
(g)
(h)
(i)
(k)
(l)
(j) (m)
(n)
(o)
(p)
(q)
(r)
(s)
(t)
(u)
(v)
(w)
(x) (y)
(z)
(f)
(g)
or $13, $6, $11 sub $11, $3, $2and $12, $11, $4
100
Rd=12 Rd=11
100
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
I n s t r u c t i o n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0
Address
Writedata
Mux
1Registers
Readdata1
Readdata2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
Datamemory
1
ALUresult
Mux
ALU
Zero
ID/EX
sw $15, 100($2)
(d)
(a)
(c)
(b)
(e)
(f)
(g)
(h)
(i)
(k)
(l)
(j) (m)
(n)
(o)
(p)
(q)
(r)
(s)
(t)
(u)
(v)
(w)
(x) (y)
(z)
(f)
(g)
add $14, $8, $9 or $13, $6, $11 sub $11, $3, $2and $12, $11, $4
???
Rd=11
???
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 23/35
23
F o r w a r d i n g : I m p l e m e n t a t i o n
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
I n s t r u c t i o n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0
Address
Writedata
Mux
1Registers
Readdata1
Readdata2
Readregister 1
Readregister 2
16
Signextend
Writeregister
Writedata
Readdata
Datamemory
1
ALUresult
Mux
ALUZero
ID/EX
Additional datapath
for forwarding ?
How to control the
forwarding datapth ?
F o r w a r d i n g : I m p l e m e n t a t i o n
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
I n s t r u c t i o n
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0
Address
Writedata
Mux
1Registers
Readdata1
Readdata2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Write
data
Readdata
Datamemory
1
ALUresult
Mux
ALUZero
ID/EX
Additional datapath
for forwarding ?
How to control the
forwarding datapth ?
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 24/35
24
F o r w a r d i n g : F o r w a r d i n g U n i t
PCInstructionmemory
Registers
Mux
Mux
Control
ALU
EX
M
WB
M
WB
WB
ID/EX
EX/MEM
MEM/WB
Datamemory
Mux
Forwardingunit
IF/ID
I n s t r u c t i o n
Mu
xRd
EX/MEM.RegisterRd
MEM/WB.RegisterRd
Rt
Rt
Rs
IF/ID.RegisterRd
IF/ID.RegisterRt
IF/ID.RegisterRt
IF/ID.RegisterRs
Forwarding unit:
6-input, 2-output
combinational circuit HW#1, (5)
F o r w a r d i n g C o n t r o l
Control logic
ForwardA =
10 if (EX/MEM.Rd = ID/EX.Rs) <- get operand from EX/MEM
01 if (MEM/WB.Rd = ID/EX.Rs) <- get operand from MEM/WB
00, otherwise <- get operand from ID/EX
ForwardB =
10 if (EX/MEM.Rd = ID/EX.Rt) <- get operand from EX/MEM
01 if (MEM/WB.Rd = ID/EX.Rt) <- get operand from MEM/WB
00, otherwise <- get operand from ID/EX
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 25/35
25
F o r w a r d i n g C o n t r o l C i r c u i t
ForwardA = 10 if ((EX/MEM.Rd = ID/EX.Rs) && EX/MEM.RegWrite &&
(EX/MEM.Rd ≠ 0))
01 if ((MEM/WB.Rd = ID/EX.Rs) && MEM/WB.RegWrite &&(MEM/WB.Rd ≠ 0) && (EX/MEM.Rd ≠ ID/EX.Rs))
00, otherwise
ForwardB = 10 if ((EX/MEM.Rd = ID/EX.Rt) && EX/MEM.RegWrite &&
(EX/MEM.Rd ≠ 0))
01 if ((MEM/WB.Rd = ID/EX.Rt) && MEM/WB.RegWrite &&(MEM/WB.Rd ≠ 0) && (EX/MEM.Rd ≠ ID/EX.Rt)))
00, otherwise
D a t a H a z a r d s : A l l C o n s i d e r e d ? ? ?
lw $s5,0($s4)
add $s7,$s5,$s6
I D
I F
I F
I D
W B E X
E X M E M W B
M E M
S t a l l
… e s p e c i a l l y w h e n w e r e m e m b e r t h a t m e m o r y a c c e s s i s r e a l l y o f t e n
m u c h l o n g e r t h a n a s i n g l e c y c l e :
S t a l l S t a l l
… b u t i t d o e s n ’ t e l i m i n a t e a l l d a t a h a z a r d s :
lw $s5,0($s4)
add $s7,$s5,$s6
I D
I F
I F
I D
W B E X
E X M E M W B
M E M
S t a l l
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 26/35
26
D a t a H a z a r d s : S t a l l i n g
Stall the pipeline by keeping an instruction in the same stage
lw$2, 20($1)
Programexecutionorder(in instructions)
and $4, $2,$5
or$8,$2,$6
add $9, $4,$2
slt $1,$6,$7
Reg
IM
Reg
Reg
IM DM
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (inclock cycles)
IM Reg DM RegIM
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9 CC 10
DM Reg
RegReg
Reg
bubble
lw-and
lw-or
At CC5, MEM stage is empty !!!
D a t a H a z a r d s : S t a l l i n g
Stalling detection and control
Detects during the ID stage when “lw” instruction is in EXstage
The following two instructions are in ID (“and”) and IF (“or”)
stages, respectively
If detected,
Stall the following instruction (in ID stage, “and”) so that it repeats
the ID stage again => IF/ID pipeline register should not bechanged
Stall the second instruction (in IF stage, “or”) so that it repeats theIF stage again => PC should not be changed
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 27/35
27
D a t a H a z a r d s : S t a l l i n g
Hazard detection If (ID/EX.MemRead and
((ID/EX.Rt = IF/ID.Rs) or (ID/EX.Rt = IF/ID.Rt)) stall the pipeline
Control signals generated from hazard detection unit IF/IDWrite to prevent IF/ID register from changing
PCWrite to prevent PC from changing
MUX control to delay forwarding control signals (pass “null” signals)
lw
S t a l l i n g : D e t e c t i o n U n i t
Stall by letting an instruction that won’t write anything goforward
PCInstruction
memory
Registers
M
ux
Mux
Mux
Control
ALU
EX
M
WB
M
WB
WB
ID/EX
EX/MEM
MEM/WB
Datamemory
Mux
Hazarddetection
unit
Forwardingunit
0
Mux
IF/ID
I n s t r u c t i o n
ID/EX.MemRead
I F / I D W r i t e
P C W r i t e
ID/EX.RegisterRt
IF/ID.RegisterRd
IF/ID.RegisterRt
IF/ID.RegisterRt
IF/ID.RegisterRs
Rt
Rs
Rd
RtEX/MEM.RegisterRd
MEM/WB.RegisterRd Hazard detection unit:
4-input, 3-output
combinational circuit
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 28/35
28
S t a l l i n g : W h a t h a p p e n i n t h e p i p l e i n e ?
lw $s5,0($s4)
add $s7,$s5,$s6
I D
I F
I F
I D
W B
E X
E X M E M W B
M E M
S t a l l
( I D )
I F I D E X M E M W B
S t a l l
( I F )
I F I D E X M E M W B
I D I F W B E X M E M
I D I F W B E X M E M
I D I F W B E X M E M
No EX stage
No MEM stage
I F I D E X M E M W B
No WB stage
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11 CC12
ID stage is repeated at CC7
<- IF/ID.Write
IF stage is repeated at CC7
<- PCWrite
No EX at CC7, no MEM at CC8
and no WB at CC9
<- zero control signals
B r a n c h ( C o n t r o l ) H a z a r d s
While executing a previous branch, next instruction address
might not yet be known.
I
n
s
t
r
u
c
t
i
o
n
s
T i m e S t e p ( C l o c k C y c l e )
I F
M E M
4
B r a n c h t a r g e t
I D
2
S t a l l
C o m p u t e s b r a n c h t a r g e t a d d r e s s .
C o n d i t i o n a l
b r a n c h
I F
1
C a l c u l a t e s P C + 4 .
E X
3
P e r f o r m s b r a n c h t e s t & s e t s P C t o t a r g e t
S t a l l I D
W B
5
M E M E X
6
W B
7 8
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 29/35
29
B r a n c h ( C o n t r o l ) H a z a r d s
We can stall the pipeline for every branch instruction
Too slow (3 instructions)
Or, continue execution down the sequential instruction stream
assuming that the branch will not be taken (predict “branch
not taken”)
If the condition is not met, OK ! (prediction is successful)
If the condition is met, (prediction is wrong)
Some unwanted instructions are in the pipeline!
Need to “flush” instructions
How do you compare the above two ?
If branches are taken half the time, and if it costs little to discard the
instructions, the second approach halves the cost of control hazards
B r a n c h H a z a r d s
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 30/35
30
S t a l l i n g : W h a t h a p p e n i n t h e p i p l e i n e ?
beq $1,$2, 7
add $3,$4,$5
I D
I F
I F
W B
E X M E M
N u l l
( I D )
I F I D E X M E M W B
ID stage executes
a null instruction
(sll $0,$0,$0) at CC3
CC1 CC2 CC3 CC4 CC5 CC6 CC7
“target of beq”
N u l l
( E X )
N u l l
( M E M )
N u l l
( W B )
EX stage executes
a null instruction
(sll $0,$0,$0) at
CC4
MEM stage executes
a null instruction
(sll $0,$0,$0) at CC5
WB stage executes
a null instruction
(sll $0,$0,$0) at
CC6
IF.Flush at CC3 will do.
• A new control signalIF.Flush is introduced
to flush the instructionin IF stage
• It zeros theinstruction field of the
IF/ID pipeline register,
which in fact can be
decoded as “sll $0, $0,$0”
• In fact, “nop” = “sll
$0, $0, $0”
B r a n c h H a z a r d s : B r a n c h D e l a y
S l o t s
While determining next instruction address, go ahead and
execute sequentially following instruction(s).
I
n
s
t
r
u
c
t
i
o
n
s
T i m e S t e p ( C l o c k C y c l e )
C o n d i t i o n a l
b r a n c h
I F
1
W B
5
M E M
E X
M E M
4
E X
I D
6
W B
M E M
7
W B
B r a n c h d e l a y
I D
2
I F
C o m p u t e s b r a n c h t a r g e t a d d r e s s .
P e r f o r m s b r a n c h t e s t & s e t s P C t o t a r g e t .
B r a n c h t a r g e t
E X
3
I D
I F
F e t c h e s c o r r e c t t a r g e t .
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 31/35
31
B r a n c h H a z a r d s : B r a n c h D e l a y
S l o t s
Advantage:
Can avoid one stall per delay slot.
Disadvantages:
Makes assembly-language programming more difficult.
Can be difficult to find appropriate code for slot.
Exposes implementation detail that could change.
Later implementations without a stall must still emulate slot.
Most modern processors avoid
B r a n c h H a z a r d s : B r a n c h
P r e d i c t i o n
Guess which instruction is next, & start executing it.
What if guess is wrong? : Flush the pipeline
Simplest guesses: Always Taken or Never Taken.
When to do prediction? Static prediction: compiler
Dynamic prediction: processor
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 32/35
32
D y n a m i c B r a n c h P r e d i c t i o n
Branch prediction buffer (branch history table)
A small memory that is indexed by the lower portion of the
address of the branch instruction and that contains one or
more bits indicating whether the branch was recently taken
or not.
PC Instruction
memory
BPB
Instruction
Prediction (T or NT)
IF/ID
D y n a m i c B r a n c h P r e d i c t i o n
1-bit predictor
Prediction accuracy
------ loop 10 times => 1st: ?, 2nd: correct, 3rd: correct,
beq 9th: correct, 10th: incorrect => 80% accuracy
Predict taken
Predict
not taken
N (Not taken)
T (Taken)T NT
(Because the first one is incorrect in
the second execution of the same code.)
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 33/35
33
E x c e p t i o n s
Another form of control hazard involves exceptions.
When an arithmetic overflow occurs during executing “add$1, $2, $1”
Transfer control to the exception routine (0x4000 0040) This is the same as executing a branch instruction
Necessary actions are Stop executing the current instruction and start the exception routine. Following instructions already in the pipe must be wiped out (flush
pipeline registers). Return to the offending instruction.
F l u s h C o n t r o l S i g n a l s
Similar to the taken-branch, we need to flush pipelineregisters. Question is which pipeline register(s)?
Arithmetic overflow is detected at the end of EX stage.
And thus flushing takes place at MEM stage (at the next cycle).
Since three following instructions are already in the pipeline (IF, ID
and EX stages), we need to flush those three instructions.
Therefore, we need ID.Flush and EX.Flush in addition to IF.Flushcontrol signal.
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 34/35
34
PCInstruction
memory
4
Registers
Signextend
Mux
Mux
Mux
Control
ALU
EX
M
WB
M
WB
WB
ID/EX
EX/MEM
MEM/WB
Mux
Datamemory
Mux
Hazarddetection
unit
Forwardingunit
IF.Flush
IF/ID
=
ExceptPC
40000040
0
Mux
0
Mux
0
Mux
ID.Flush EX.Flush
Cause
Shiftleft 2
For the instruction in IF stage For the instruction
in ID stage For the instruction
in ID stage
OF
C h a l l e n g e s
What if more than one instruction generates
exceptions?
While “add” causes an overflow exception at CC5 in EX,
another causes an invalid opcode exception at CC5 in IF
It is not OK to generate all flushing signals.
And, how does the exception service routine
correctly identify the instruction that causes theexception? => Imprecise exception
8/3/2019 00 Review 3 Pipeline
http://slidepdf.com/reader/full/00-review-3-pipeline 35/35
P r e c i s e a n d I m p r e c i s e E x c e p t i o n s
Precise exceptions Hardware (CPU) correctly identifies the offending instruction.
And makes sure all prior instructions complete.
All instructions following it are not allowed to complete theirexecution and have not modified the process state
Imprecise exception Hardware does not guarantee it and leaves it up to the operating system
to determine which instruction caused the problem.
Some instructions following the offending instruction are allowed tocompleted their execution and modified the process state.
Most of modern CPUs support Precise exceptions