Constructive Computer Architecture: Multistage Pipelined Processors and modular refinement Arvind
CSL718 : Pipelined Processors
description
Transcript of CSL718 : Pipelined Processors
Anshul Kumar, CSE IITD
CSL718 : Pipelined ProcessorsCSL718 : Pipelined Processors
Improving Branch Performance – contd.
21st Jan, 2006
Anshul Kumar, CSE IITD slide 2
Improving Branch PerformanceImproving Branch Performance
• Branch Elimination– replace branch with other instructions
• Branch Speed Up– reduce time for computing CC and TIF
• Branch Prediction– guess the outcome and proceed, undo if necessary
• Branch Target Capture– make use of history
Anshul Kumar, CSE IITD slide 3
Improving Branch PerformanceImproving Branch Performance
• Branch Elimination– replace branch with other instructions
• Branch Speed Up– reduce time for computing CC and TIF
• Branch Prediction– guess the outcome and proceed, undo if necessary
• Branch Target Capture– make use of history
Anshul Kumar, CSE IITD slide 4
Branch EliminationBranch Elimination
C
S
Use conditional/guarded instructions(predicated execution)
T
F
C : S
OP1BC CC = Z, + 2ADD R3, R2, R1OP2
OP1ADD R3, R2, R1, NZOP2
Examples: HP PA (all integer arithmetic/logical instructions)DEC Alpha, SPARC V9 (conditional move)
Anshul Kumar, CSE IITD slide 5
Branch Elimination - contd.Branch Elimination - contd.
IF IF IF D AG DF DF DF EX EX
IF IF IF D AG TIF TIF TIF
IF IF IF D’ D AG
OP1
ADD/OP2
BC
CC
IF IF IF D AG DF DF DF EX EXADD(cond)
Anshul Kumar, CSE IITD slide 6
Improving Branch PerformanceImproving Branch Performance
• Branch Elimination– replace branch with other instructions
• Branch Speed Up– reduce time for computing CC and TIF
• Branch Prediction– guess the outcome and proceed, undo if necessary
• Branch Target Capture– make use of history
Anshul Kumar, CSE IITD slide 7
Branch Speed Up : Branch Speed Up : early target address generationearly target address generation
• Assume each instruction is Branch• Generate target address while decoding• If target in same page omit translation• After decoding discard target address if not
Branch
IF IF IF D TIF TIF TIF AGBC
Anshul Kumar, CSE IITD slide 8
Branch Speed Up : Branch Speed Up : increase CC - branch gapincrease CC - branch gap
Increase the gap between the instruction which sets CC and branching
• Early CC setting• Delayed branch
Anshul Kumar, CSE IITD slide 9
Summary - Branch Speed UpSummary - Branch Speed Up
n=0 n=1 n=2 n=3 n=4 n=5uncond 4 4 4 4 4 4cond (T) 6 5 4 4 4 4cond (I) 5 4 3 2 1 0uncond 4 3 2 1 0 0cond (T) 6 5 4 3 2 1cond (I) 5 4 3 2 1 0de
laye
d e
arly
CC
bran
ch
set
ting
Anshul Kumar, CSE IITD slide 10
Delayed Branch with NullificationDelayed Branch with Nullification
(Also called annulment )• Delay slot is used optionally• Branch instruction specifies the option• Option may be exercised based on
correctness of branch prediction• Helps in better utilization of delay slots
Anshul Kumar, CSE IITD slide 11
Improving Branch PerformanceImproving Branch Performance
• Branch Elimination– replace branch with other instructions
• Branch Speed Up– reduce time for computing CC and TIF
• Branch Prediction– guess the outcome and proceed, undo if necessary
• Branch Target Capture– make use of history
Anshul Kumar, CSE IITD slide 12
Branch PredictionBranch Prediction
• Treat conditional branches as unconditional branches / NOP
• Undo if necessaryStrategies:
– Fixed (always guess inline)– Static (guess on the basis of instruction type /
displacement)– Dynamic (guess based on recent history)
Anshul Kumar, CSE IITD slide 13
Static Branch PredictionStatic Branch Prediction
Instr % Guess Branch Correct
uncond 14.5 always 100% 14.5%
cond 58 never 54% 27%
loop 9.8 always 91% 9%
call/ret 17.7 always 100% 17.7%
Total 68.2%
Anshul Kumar, CSE IITD slide 14
Threshold forThreshold for Static predictionStatic prediction
actual T I
guess T 4 5
I 6 0guess target if 4 p + 5 (1 - p) < 6 p + 0 (1 - p)
i.e. p > .71
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIFI-1
I
CC
Anshul Kumar, CSE IITD slide 15
Dynamic Branch Prediction -Dynamic Branch Prediction -basic ideabasic idea
Predict based on the history of previous branch
loop: xxx 2 mispredictions xxx for every xxx occurrence xxx BC loop
Anshul Kumar, CSE IITD slide 16
Dynamic Branch Prediction -Dynamic Branch Prediction -2 bit prediction scheme2 bit prediction scheme
0 1
2 3
N
T
N
T
N
T
T N0/1 3/2
predict taken predict not taken
Anshul Kumar, CSE IITD slide 17
Dynamic Branch Prediction -Dynamic Branch Prediction -second schemesecond scheme
Predict based on the history of previous n branches
e.g., if n = 3 then3 branches taken predict taken2 branches taken predict taken1 branch taken predict not taken0 branches taken predict not taken
Anshul Kumar, CSE IITD slide 18
Dynamic Branch Prediction -Dynamic Branch Prediction -Bimodal predictorBimodal predictor
Maintain saturating counters
0 1 2 3
T
N
T
N
T
N
TN
One counter per branch orOne counter per cache line -
merge results if multiple branches
Anshul Kumar, CSE IITD slide 19
Dynamic Branch Prediction -Dynamic Branch Prediction -History of last History of last nn occurrences occurrences
1 1 0
current entry
1 1 1
updated entry
outcome of lastthree occurrencesof this branch
0 : not taken1 : taken
prediction using majority decision
actual outcome‘taken’
Anshul Kumar, CSE IITD slide 20
Dynamic Branch Prediction -Dynamic Branch Prediction -storing prediction countersstoring prediction counters
store in separate buffer orstore in cache directory
CACHEdirectory storage
cache line
counter
Anshul Kumar, CSE IITD slide 21
Correct guesses vs. history lengthCorrect guesses vs. history length
n Compiler Business Scientific Supervisor
0 64.1 64.4 70.4 54.0
1 91.9 95.2 86.6 79.7
2 93.3 96.5 90.8 83.4
3 93.7 96.6 91.0 83.5
4 94.5 96.8 91.8 83.7
5 94.7 97.0 92.0 83.9
Anshul Kumar, CSE IITD slide 22
Two-Level PredictionTwo-Level Prediction
• Uses two levels of information to make a direction prediction– Branch History Table (BHT) - last n
occurrences– Pattern History Table (PHT) - saturating 2 bit
counters• Captures patterned behavior of branches
– Groups of branches are correlated– Particular branches have particular behavior
Anshul Kumar, CSE IITD slide 23
Correlation between branchesCorrelation between branches
B1: if (x)...
B2: if (y)...
z = x && yB3: if (z)
...
• B3 can be predicted with 100% accuracy based on the outcomes of B1 and B2
Anshul Kumar, CSE IITD slide 24
PHT
T/NT
1 0 1 1 0
GBHR
PHT
PC
T/NT
BHT
1 1 0 1 0
1 1 1 0 0
0 0 1 1 1
0 1 1 1 1
Global Predictor Local Predictor
Some Two-level PredictorsSome Two-level Predictors
bits from PC and BHT can be combined to index PHT
Anshul Kumar, CSE IITD slide 25
Two-level Predictor ClassificationTwo-level Predictor Classification
• Yeh and Patt 3-letter naming scheme– Type of history collected
• G (global), P (per branch), S (per set)– PHT type
• A (adaptive), S (static)– PHT organization
• g (global), p (per branch), s (per set)
• Examples - GAs, PAp etc.
Anshul Kumar, CSE IITD slide 26
Improving Branch PerformanceImproving Branch Performance
• Branch Elimination– replace branch with other instructions
• Branch Speed Up– reduce time for computing CC and TIF
• Branch Prediction– guess the outcome and proceed, undo if necessary
• Branch Target Capture– make use of history
Anshul Kumar, CSE IITD slide 27
Branch Target CaptureBranch Target Capture
• Branch Target Buffer (BTB)• Target Instruction Buffer (TIB)
instr addr pred stats targettarget addrtarget instr
prob of target change < 5%
Anshul Kumar, CSE IITD slide 28
BTB PerformanceBTB Performance
BTB missgo inline
inline
BTB hitgo to target
decision
result target inline target
delay 0 5 4 0
.4 .6
.8 .2 .2 .8
.4*.8*0 + .4*.2*5 + .6*.2*4 + .6*.8*0= 0.88
Anshul Kumar, CSE IITD slide 29
Dynamic information about branchDynamic information about branch
• Previous branch decisions
• Explicit prediction• Stored in cache
directory Branch History Table, BHT
• Previous target address / instruction
• Implicit prediction• Stored in separate buffer Branch Target Buffer, BTBBr Target Addr Cache, BTACTarget Instr Buffer, TIBBr Target Instr Cache, BTIC
These two can be combined
Anshul Kumar, CSE IITD slide 30
Storing prediction infoStoring prediction info
In cache
directory storage
cache line
counter
instr addr pred stats target
In separatebuffer
Anshul Kumar, CSE IITD slide 31
Combined prediction mechanismCombined prediction mechanism
• Explicit : use history bits• Implicit : use BTB hit/miss
– hit go to target, miss go inline• Combined : BTB hit/miss followed by
explicit prediction using history bits. One of the following is commonly used– hit go to target, miss explicit prediction– miss go inline, hit explicit prediction
Anshul Kumar, CSE IITD slide 32
Combined predictionCombined prediction
BTB missI
BTB hit BTB miss
I
BTB hitT
I T
expl predict
Prediction T: Target, I: Inline Actual outcome T: Target, I: Inline
I T I T
T
I T I T
I
expl predict
T
I T
Anshul Kumar, CSE IITD slide 33
Structure of TablesStructure of Tables
Instruction fetch path with• BHT• BTAC• BTIC
Anshul Kumar, CSE IITD slide 34
Compute/fetch schemeCompute/fetch scheme
I - cache
IFAR
+
InstructionFetch address
ComputeBTA
BTAIIFA
Next sequentialaddress
A I I + 1 I + 2 I + 3
BTI BTI+1 BTI+2 BTI+3
(no dynamic branch prediction)
Anshul Kumar, CSE IITD slide 35
BHT (Branch History Table)BHT (Branch History Table)
I-cache16 K
4-way set assocBHT
Predictionlogic
2 2 2 2History bits
InstructionFetch address
2 2 2 2
128 x 4entries
128 x 4 lines8 instr/line
4 instr/cycle
decode queue
issue queue
4 x 1 instr
4 x 1 instr
Taken / not takenBTA for a taken guess
Anshul Kumar, CSE IITD slide 36
BTAC schemeBTAC scheme
I - cache
IFAR
+
InstructionFetch addressBTA
IIFA
Next sequentialaddress
A I I + 1 I + 2 I + 3
BTI BTI+1 BTI+2 BTI+3
BTAC
BA BTA
Anshul Kumar, CSE IITD slide 37
BTIC scheme - 1BTIC scheme - 1
I - cache
IFAR
+
InstructionFetch addressBTA
IIFA
Next sequentialaddress
A I
BTIC
BA BTI BTA+
To decoder
Anshul Kumar, CSE IITD slide 38
BTIC scheme - 2BTIC scheme - 2
I - cache
IFAR
+
InstructionFetch addressBTA+
IIFA
Next sequentialaddress
A I I+1
BTIC
BA BTI BTI+1
To decoder
computed
Anshul Kumar, CSE IITD slide 39
Successor index in I-cacheSuccessor index in I-cache
I - cache
IFAR
InstructionFetch addressIIFA
Next address
A I I + 1 I + 2 I + 3
BTI BTI+1 BTI+2 BTI+3
successorindex