L04-Pipelining
-
Upload
samalmasry -
Category
Documents
-
view
213 -
download
0
Transcript of L04-Pipelining
-
8/6/2019 L04-Pipelining
1/33
CS 152 Computer Architecture andEngineering
Lecture 4 - Pipelining
Krste Asanovic
Electrical Engineering and Computer SciencesUniversity of California at Berkeley
http://www.eecs.berkeley.edu/~krste
http://inst.eecs.berkeley.edu/~cs152
http://www.t-ram.com/http://www.t-ram.com/http://www.t-ram.com/http://www.t-ram.com/ -
8/6/2019 L04-Pipelining
2/33
2/3/2009 CS152-Spring09 2
Last time in Lecture 3
Microcoding became less attractive as gap betweenRAM and ROM speeds reduced
Complex instruction sets difficult to pipeline, sodifficult to increase performance as gate count grew
Iron-law explains architecture design space Trade instructions/program, cycles/instruction, and time/cycle
Load-Store RISC ISAs designed for efficientpipelined implementations Very similar to vertical microcode
Inspired by earlier Cray machines
-
8/6/2019 L04-Pipelining
3/33
2/3/2009 CS152-Spring09 3
Iron Law of ProcessorPerformance
Time = Instructions Cycles TimeProgram Program * Instruction * Cycle
Instructions per program depends on source code,compiler technology, and ISA
Cycles per instructions (CPI) depends upon theISA and the microarchitecture
Time per cycle depends upon themicroarchitecture and the base technology
Microarchitecture CPI cycle time
Microcoded >1 short
Single-cycle unpipelined 1 long
Pipelined 1 shortThis lecture
-
8/6/2019 L04-Pipelining
4/33
2/3/2009 CS152-Spring09 4
An Ideal Pipeline
All objects go through the same stages
No sharing of resources between any two stages
Propagation delay through all pipeline stages is equal
The scheduling of an object entering the pipelineis not affected by the objects in other stages
stage1
stage2
stage3
stage4
These conditions generally hold for industrialassembly lines.But can an instruction pipeline satisfy the last
condition?
-
8/6/2019 L04-Pipelining
5/33
2/3/2009 CS152-Spring09 5
Pipelined MIPS
To pipeline MIPS:
First build MIPS without pipelining with
CPI=1
Next, add pipeline registers to reduce
cycle time while maintaining CPI=1
-
8/6/2019 L04-Pipelining
6/33
2/3/2009 CS152-Spring09 6
Pipelined Datapath
Clock period can be reduced by dividing the execution of aninstruction into multiple cycles
tC > max {tIM, tRF , tALU , tDM , tRW } ( = tDM probably)
However, CPI will increase unless instructions are pipelined
write-back
phase
fetch
phase
execute
phase
decode & Reg-fetch
phase
memory
phase
addr
wdata
rdataData
Memory
we
ALU
ImmExt
0x4Add
addrrdata
Inst.Memory
rd1
GPRs
rs1rs2
wswd rd2
we
IRPC
-
8/6/2019 L04-Pipelining
7/332/3/2009 CS152-Spring09 7
Technology Assumptions
Thus, the following timing assumption is reasonable
A small amount of very fast memory (caches)backed up by a large, slower memory
Fast ALU (at least for integers)
Multiported Register files (slower!)
tIM tRF tALU tDM tRW
A 5-stage pipelined Harvard architecture will bethe focus of our detailed design
-
8/6/2019 L04-Pipelining
8/332/3/2009 CS152-Spring09 8
5-Stage Pipelined Execution
time t0 t1 t2 t3 t4 t5 t6 t7 . . . .instruction1 IF1 ID1 EX1 MA1 WB1instruction2 IF2 ID2 EX2 MA2 WB2instruction3 IF3 ID3 EX3 MA3 WB3instruction4 IF4 ID4 EX4 MA4 WB4
instruction5 IF5 ID5 EX5 MA5 WB5
Write-Back(WB)
I-Fetch(IF)
Execute(EX)
Decode, Reg. Fetch(ID)
Memory(MA)
addr
wdata
rdataData
Memory
we
ALU
ImmExt
0x4Add
addrrdata
Inst.
Memory
rd1
GPRs
rs1rs2
wswdrd2
we
IRPC
-
8/6/2019 L04-Pipelining
9/332/3/2009 CS152-Spring09 9
5-Stage Pipelined ExecutionResource Usage Diagram
time t0 t1 t2 t3 t4 t5 t6 t7 . . . .IF I1 I2 I3 I4 I5ID I1 I2 I3 I4 I5EX I1 I2 I3 I4 I5MA I1 I2 I3 I4 I5
WB I1 I2 I3 I4 I5
Resour
ces
Write-Back(WB)
I-Fetch(IF)
Execute(EX)
Decode, Reg. Fetch(ID)
Memory(MA)
addr
wdata
rdataData
Memory
we
ALU
ImmExt
0x4Add
addrrdata
Inst.
Memory
rd1
GPRs
rs1rs2
wswdrd2
we
IRPC
-
8/6/2019 L04-Pipelining
10/332/3/2009 CS152-Spring09 10
Pipelined Execution:ALU Instructions
IRIR IR
31
PCA
B
Y
R
MD1 MD2
addrinst
InstMemory
0x4
Add
IR
ImmExt
ALU
rd1
GPRs
rs1
rs2
wswd rd2
we
wdata
addr
wdata
rdataDataMemory
we
Not quite correct!
We need an Instruction Reg (IR) for each stage
-
8/6/2019 L04-Pipelining
11/332/3/2009 CS152-Spring09 11
Pipelined MIPS Datapathwithout jumps
IRIR IR
31
PCA
B
Y
R
MD1 MD2
addrinst
InstMemory
0x4
Add
IR
ImmExt
ALU
rd1
GPRs
rs1rs2
wswd rd2
we
DataMemory
wdata
addr
wdata
rdata
we
OpSel
ExtSel BSrc
WBSrcMemWrite
RegDstRegWrite
F D E M W
Control Points Needto Be Connected
-
8/6/2019 L04-Pipelining
12/332/3/2009 CS152-Spring09 12
How Instructions can Interact witheach other in a pipeline
An instruction in the pipeline may need aresource being used by another instruction
in the pipelinestructural hazard
An instruction may depend on somethingproduced by an earlier instruction Dependence may be for a data value
data hazard
Dependence may be for the next instructionsaddress
control hazard (branches, exceptions)
-
8/6/2019 L04-Pipelining
13/332/3/2009 CS152-Spring09 13
Data Hazards
...r1 r0 + 10r4 r1 + 17...
r1 is stale. Oops!
r1 r4 r1
IRIR IR31
PCA
B
Y
R
MD1 MD2
addrinst
InstMemory
0x4
Add
IR
ImmExt
ALU
rd1
GPRs
rs1
rs2
wswd rd2
we
wdata
addr
wdata
rdataDataMemory
we
-
8/6/2019 L04-Pipelining
14/332/3/2009 CS152-Spring09 14
CS152 Administrivia
PS 1 due Tuesday Feb 10 in class
Section covering PS 1 on Wednesday Feb 11 Room/time TBD
First Quiz on Thursday Feb 12
In class, closed-book, no computers or calculators Covers lectures 1-5 (this weeks lectures)
Lecture 7, Tuesday Feb 17 in 320 Soda
Lecture 8, Thursday Feb 19 back in 306 Soda
See website for full schedule
-
8/6/2019 L04-Pipelining
15/332/3/2009 CS152-Spring09 15
Resolving Data Hazards (1)
Strategy 1:
Wait for the result to be available by freezingearlier pipeline stages interlocks
-
8/6/2019 L04-Pipelining
16/332/3/2009 CS152-Spring09 16
Feedback to Resolve Hazards
Later stages provide dependence information to earlier stageswhich can stall (or kill) instructions
FB1
stage1
stage2
stage
3
stage
4
FB2 FB3 FB4
Controlling a pipeline in this manner works providedthe instruction at stage i+1 can complete withoutany interference from instructions in stages 1 to i
(otherwise deadlocks may occur)
-
8/6/2019 L04-Pipelining
17/332/3/2009 CS152-Spring09 17
IRIR IR
31
PCA
B
Y
R
MD1 MD2
addrinst
Inst
Memory
0x4
Add
IR
ImmExt
ALU
rd1
GPRs
rs1rs2
wswd rd2
we
wdata
addr
wdata
rdata
DataMemory
we
nop
Interlocks to resolve Data Hazards
...r1 r0 + 10r4 r1 + 17...
Stall Condition
-
8/6/2019 L04-Pipelining
18/332/3/2009 CS152-Spring09 18
stalled stages
timet0 t1 t2 t3 t4 t5 t6 t7 . . . .
IF I1 I2 I3 I3 I3 I3 I4 I5ID I1 I2 I2 I2 I2 I3 I4 I5EX I1 nop nop nop I2 I3 I4 I5MA I1 nop nop nop I2 I3 I4 I5WB I1 nop nop nop I2 I3 I4 I5
Stalled Stages and Pipeline Bubbles
timet0 t1 t2 t3 t4 t5 t6 t7 . . . .
(I1) r1 (r0) + 10 IF1 ID1 EX1 MA1 WB1(I2) r4 (r1) + 17 IF2 ID2 ID2 ID2 ID2 EX2 MA2 WB2(I3) IF3 IF3 IF3 IF3 ID3 EX3 MA3 WB3
(I4) IF4 ID4 EX4 MA4 WB4(I5) IF5 ID5 EX5 MA5 WB5
ResourceUsage
nop pipeline bubble
-
8/6/2019 L04-Pipelining
19/332/3/2009 CS152-Spring09 19
IRIR IR31
PCA
B
Y
R
MD1 MD2
addrinst
InstMemory
0x4
Add
IR
ImmExt
ALU
rd1
GPRs
rs1rs2
wswd rd2
we
wdata
addr
wdata
rdataDataMemory
we
nop
Interlock Control Logic
Compare the source registers of the instruction in the decodestage with the destination registerof the uncommitted
instructions.
stall
Cstall
ws
rsrt
?
-
8/6/2019 L04-Pipelining
20/332/3/2009 CS152-Spring09 20
Cdest
Interlock Control Logicignoring jumps & branches
Should we always stall if the rs field matches some rd?
IRIR IR
PCA
B
Y
R
MD1 MD2
addrinst
InstMemory
0x4
Add
IR
ImmExt
ALU
rd1
GPRs
rs1rs2
wswd rd2
we
wdata
addr
wdata
rdataDataMemory
we
31
nop
stall
Cstall
ws
rsrt
?
we
re1 re2
Cre
ws we ws
Cdest Cdest
we
not every instruction writes a register we
not every instruction reads a register re
-
8/6/2019 L04-Pipelining
21/332/3/2009 CS152-Spring09 21
Source & Destination Registers
source(s) destination
ALU rd (rs) func (rt) rs, rt rdALUi rt (rs) op imm rs rtLW rt M [(rs) + imm] rs rtSW M [(rs) + imm] (rt) rs, rtBZ cond(rs)
true: PC (PC) + imm rsfalse: PC (PC) + 4 rs
J PC (PC) + immJAL r31 (PC), PC (PC) + imm 31JR PC (rs) rs
JALR r31
(PC), PC
(rs) rs 31
R-type: op rs rt rd func
I-type: op rs rt immediate16
J-type: op immediate26
-
8/6/2019 L04-Pipelining
22/332/3/2009 CS152-Spring09 22
Deriving the Stall Signal
Cdest
ws = Case opcodeALU rdALUi, LW rtJAL, JALR R31
we = Case opcodeALU, ALUi, LW (ws 0)JAL, JALR on... off
Cre
re1 = Case opcodeALU, ALUi,
on off
re2 = Case opcode on off
LW, SW, BZ,JR, JALRJ, JAL
ALU, SW...
Cstall
stall = ((rsD =wsE).weE +(rsD =wsM).weM +
(rsD =wsW).weW) . re1D +
((rtD =wsE).weE +
(rtD =wsM).weM +
(rtD =wsW).weW) . re2D
This
isnot
thefu
llstory!
-
8/6/2019 L04-Pipelining
23/332/3/2009 CS152-Spring09 23
Hazards due to Loads & Stores
...M[(r1)+7] (r2)r4 M[(r3)+5]
...
IRIR IR
31
PCA
B
Y
R
MD1 MD2
addrinst
Inst
Memory
0x4
Add
IR
ImmExt
ALU
rd1
GPRs
rs1rs2
wswd rd2
we
wdata
addr
wdata
rdata
DataMemory
we
nop
Stall Condition
Is there any possible data hazardin this instruction sequence?
What if(r1)+7 = (r3)+5 ?
-
8/6/2019 L04-Pipelining
24/33
2/3/2009 CS152-Spring09 24
Load & Store Hazards
However, the hazard is avoided because ourmemory system completes writes in one cycle !
Load/Store hazards are sometimes resolved in the
pipeline and sometimes in the memory systemitself.
More on this later in the course.
...M[(r1)+7] (r2)r4 M[(r3)+5]...
(r1)+7 = (r3)+5 data hazard
-
8/6/2019 L04-Pipelining
25/33
2/3/2009 CS152-Spring09 25
Resolving Data Hazards (2)
Strategy 2:
Route data as soon as possible after it iscalculated to the earlier pipeline stage bypass
-
8/6/2019 L04-Pipelining
26/33
2/3/2009 CS152-Spring09 26
Bypassing
Each stall or killintroduces a bubble in the pipeline CPI > 1
time t0 t1 t2 t3 t4 t5 t6 t7 . . . .
(I1)r1
r0 + 10 IF1 ID1 EX1 MA1 WB1(I2) r4 r1 + 17 IF2 ID2 ID2 ID2 ID2 EX2 MA2 WB2(I3) IF3 IF3 IF3 IF3 ID3 EX3 MA3(I4) stalled stages IF4 ID4 EX4(I5) IF5 ID5
time t0 t1 t2 t3 t4 t5 t6 t7 . . . .(I1) r1 r0 + 10 IF1 ID1 EX1 MA1 WB1(I2) r4 r1 + 17 IF2 ID2 EX2 MA2 WB2(I3) IF3 ID3 EX3 MA3 WB3(I4) IF4 ID4 EX4 MA4 WB4
(I5) IF5 ID5 EX5 MA5 WB5
A new datapath, i.e., a bypass, can get the data fromthe output of the ALU to its input
-
8/6/2019 L04-Pipelining
27/33
2/3/2009 CS152-Spring09 27
Adding a Bypass
ASrc
...(I1) r1 r0 + 10
(I2) r4 r1 + 17
r4 r1... r1 ...
IRIR IR
PCA
B
Y
R
MD1 MD2
addrinst
InstMemory
0x4
Add
IR
ImmExt
ALU
rd1
GPRs
rs1rs2
wswd rd2
we
wdata
addr
wdata
rdataDataMemory
we
31
nop
stall
D
E M W
When does this bypass help?
r1 M[r0 + 10]r4 r1 + 17
JAL 500r4 r31 + 17
yes no no
-
8/6/2019 L04-Pipelining
28/33
2/3/2009 CS152-Spring09 28
The Bypass SignalDeriving it from the Stall Signal
ASrc = (rsD=wsE).weE.re1D
we = Case opcodeALU, ALUi, LW (ws 0)JAL, JALR on
... off
No because only ALU and ALUi instructions can benefit
from this bypass
Is this correct?
Split weE into two components: we-bypass, we-stall
stall = ( ((rsD =wsE).weE + (rsD =wsM).weM + (rsD =wsW).weW).re1D +((rtD =wsE).weE + (rtD =wsM).weM + (rtD =wsW).weW).re2D )
ws = Case opcodeALU rdALUi, LW rt
JAL, JALR R31
-
8/6/2019 L04-Pipelining
29/33
2/3/2009 CS152-Spring09 29
Bypass and Stall Signals
we-bypassE = Case opcodeEALU, ALUi (ws 0)... off
ASrc = (rsD =wsE).we-bypassE . re1D
Split weE into two components: we-bypass, we-stall
stall = ((rsD =wsE).we-stallE+
(rsD=wsM).weM + (rsD=wsW).weW). re1D
+((rtD = wsE).weE + (rtD = wsM).weM + (rtD = wsW).weW). re2D
we-stallE = Case opcodeELW (ws 0)JAL, JALR on... off
-
8/6/2019 L04-Pipelining
30/33
2/3/2009 CS152-Spring09 30
Fully Bypassed Datapath
ASrcIRIR IR
PCA
B
Y
R
MD1 MD2
addrinst
InstMemory
0x4
Add
IR ALU
ImmExt
rd1
GPRs
rs1rs2
wswd rd2
we
wdata
addr
wdata
rdataDataMemory
we
31
nop
stall
D
E M W
PC for JAL, ...
BSrcIs there stilla need for thestall signal ? stall = (rsD=wsE). (opcodeE=LWE).(wsE 0 ).re1D
+ (rtD=wsE). (opcodeE=LWE).(wsE 0 ).re2D
-
8/6/2019 L04-Pipelining
31/33
2/3/2009 CS152-Spring09 31
Resolving Data Hazards (3)
Strategy 3:
Speculate on the dependence. Two cases:
Guessed correctly do nothing
Guessed incorrectly kill and restart
-
8/6/2019 L04-Pipelining
32/33
2/3/2009 CS152-Spring09 32
Next Time: Control Hazards
Branches/Jumps
Exceptions/Interrupts
-
8/6/2019 L04-Pipelining
33/33
Acknowledgements
These slides contain material developed and
copyright by: Arvind (MIT)
Krste Asanovic (MIT/UCB)
Joel Emer (Intel/MIT)
James Hoe (CMU)
John Kubiatowicz (UCB)
David Patterson (UCB)
MIT material derived from course 6.823
UCB material derived from course CS252