Pipelining IV
description
Transcript of Pipelining IV
Pipelining IV
TopicsTopics Implementing pipeline control Pipelining and performance analysis
Systems I
2
Implementing Pipeline Control
Combinational logic generates pipeline control signals Action occurs at start of following cycle
E
M
W
F
D
CCCC
rB
srcAsrcB
icode valE valM dstE dstM
Bchicode valE valA dstE dstM
icode ifun valC valA valB dstE dstM srcA srcB
valC valPicode ifun rA
predPC
d_srcB
d_srcA
e_Bch
D_icode
E_icode
M_icode
E_dstM
Pipecontrollogic
D_bubbleD_stall
E_bubble
F_stall
3
Initial Version of Pipeline Controlbool F_stall =
# Conditions for a load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } ||# Stalling at fetch while ret passes through pipelineIRET in { D_icode, E_icode, M_icode };
bool D_stall = # Conditions for a load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };
bool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Bubble for ret IRET in { D_icode, E_icode, M_icode };
bool E_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB};
4
Control Combinations
Special cases that can arise on same clock cycle
Combination ACombination A Not-taken branch ret instruction at branch target
Combination BCombination B Instruction that reads from memory to %esp Followed by ret instruction
LoadEUseD
M
Load/use
JXXED
M
Mispredict
JXXED
M
Mispredict
EretD
M
ret 1
retEbubbleD
M
ret 2
bubbleEbubbleD
retM
ret 3
EretD
M
ret 1
EretD
M
ret 1
retEbubbleD
M
ret 2
retEbubbleD
M
ret 2
bubbleEbubbleD
retM
ret 3
bubbleEbubbleD
retM
ret 3
Combination B
Combination A
5
Control Combination A
Should handle as mispredicted branch Stalls F pipeline register But PC selection logic will be using M_valM anyhow
JXXED
M
Mispredict
JXXED
M
Mispredict
EretD
M
ret 1
EretD
M
ret 1
EretD
M
ret 1
Combination A
ConditionCondition FF DD EE MM WW
Processing retProcessing ret stallstall bubblebubble normalnormal normalnormal normalnormal
Mispredicted BranchMispredicted Branch normalnormal bubblebubble bubblebubble normalnormal normalnormal
CombinationCombination stallstall bubblebubble bubblebubble normalnormal normalnormal
E
M
W
F
D
Instructionmemory
Instructionmemory
PCincrement
PCincrement
Registerfile
Registerfile
ALUALU
Datamemory
Datamemory
SelectPC
rB
dstE dstMSelectA
ALUA
ALUB
Mem.control
Addr
srcA srcB
read
write
ALUfun.
Fetch
Decode
Execute
Memory
Write back
icode
data out
data in
A BM
E
M_valA
W_valM
W_valE
M_valA
W_valM
d_rvalA
f_PC
PredictPC
valE valM dstE dstM
Bchicode valE valA dstE dstM
icode ifun valC valA valB dstE dstM srcA srcB
valC valPicode ifun rA
predPC
CCCC
d_srcBd_srcA
e_Bch
M_Bch
CCCC
d_srcBd_srcA
e_Bch
M_Bch
6
Control Combination B
Would attempt to bubble and stall pipeline register D Signaled by processor as pipeline error
LoadEUseD
M
Load/use
EretD
M
ret 1
EretD
M
ret 1
EretD
M
ret 1
Combination B
ConditionCondition FF DD EE MM WW
Processing retProcessing ret stallstall bubblebubble normalnormal normalnormal normalnormal
Load/Use HazardLoad/Use Hazard stallstall stallstall bubblebubble normalnormal normalnormal
CombinationCombination stallstall bubble + bubble + stallstall
bubblebubble normalnormal normalnormal
7
Handling Control Combination B
Load/use hazard should get priority ret instruction should be held in decode stage for additional
cycle
LoadEUseD
M
Load/use
EretD
M
ret 1
EretD
M
ret 1
EretD
M
ret 1
Combination B
ConditionCondition FF DD EE MM WW
Processing retProcessing ret stallstall bubblebubble normalnormal normalnormal normalnormal
Load/Use HazardLoad/Use Hazard stallstall stallstall bubblebubble normalnormal normalnormal
CombinationCombination stallstall stallstall bubblebubble normalnormal normalnormal
8
Corrected Pipeline Control Logic
Load/use hazard should get priority ret instruction should be held in decode stage for additional
cycle
ConditionCondition FF DD EE MM WW
Processing retProcessing ret stallstall bubblebubble normalnormal normalnormal normalnormal
Load/Use HazardLoad/Use Hazard stallstall stallstall bubblebubble normalnormal normalnormal
CombinationCombination stallstall stallstall bubblebubble normalnormal normalnormal
bool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode } # but not condition for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL }
&& E_dstM in { d_srcA, d_srcB });
9
Pipeline SummaryData HazardsData Hazards
Most handled by forwardingNo performance penalty
Load/use hazard requires one cycle stall
Control HazardsControl Hazards Cancel instructions when detect mispredicted branch
Two clock cycles wasted Stall fetch stage while ret passes through pipeline
Three clock cycles wasted
Control CombinationsControl Combinations Must analyze carefully First version had subtle bug
Only arises with unusual instruction combination
10
Performance Analysis with Pipelining
Ideal pipelined machine: CPI = 1Ideal pipelined machine: CPI = 1 One instruction completed per cycle But much faster cycle time than unpipelined machine
However - hazards are working against the idealHowever - hazards are working against the ideal Hazards resolved using forwarding are fine Stalling degrades performance and instruction comletion
rate is interrupted
CPI is measure of “architectural efficiency” of designCPI is measure of “architectural efficiency” of design
CycleSeconds
nInstructioCycles
ProgramnsInstructio
ProgramSeconds timeCPU
11
Computing CPICPICPI
Function of useful instruction and bubbles
Cb/Ci represents the pipeline penalty due to stalls
Can reformulate to account forCan reformulate to account for load penalties (lp) branch misprediction penalties (mp) return penalties (rp)
CPI Ci CbCi
1.0 CbCi
CPI 1.0 lpmp rp
12
Computing CPI - IISo how do we determine the penalties?So how do we determine the penalties?
Depends on how often each situation occurs on average How often does a load occur and how often does that load
cause a stall? How often does a branch occur and how often is it
mispredicted How often does a return occur?
We can measure theseWe can measure these simulator hardware performance counters
We can estimate through historical averagesWe can estimate through historical averages Then use to make early design tradeoffs for architecture
13
Computing CPI - III
CPI = 1 + 0.31 = 1.31 == 31% worse than idealCPI = 1 + 0.31 = 1.31 == 31% worse than idealThis gets worse when:This gets worse when:
Account for non-ideal memory access latency Deeper pipelines (where stalls per hazard increase)
CauseCause NameName InstructionInstructionFrequencyFrequency
ConditionConditionFrequencyFrequency
StallsStalls ProductProduct
Load/UseLoad/Use lplp 0.300.30 0.30.3 11 0.090.09
MispredictMispredict mpmp 0.200.20 0.40.4 22 0.160.16
ReturnReturn rprp 0.020.02 1.01.0 33 0.060.06Total penaltyTotal penalty 0.310.31
14
SummaryTodayToday
Pipeline control logic Effect on CPI and performance
Next TimeNext Time Further mitigation of branch mispredictions State machine design