Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM...
Transcript of Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM...
Fall 2011
Prof. Hyesoon Kim
FE ID EX MEM WB
add r1, r2, r3 add
mul
mul
mul
add
sub r4, r1, r3 sub addsub add
addsubmul r5, r2, r3 mul sub
subsub add
add
add
Add: 2 cycles
add add
add
subsub
subsubmul
L L L L L
FE_stage
FE ID EX MEM WB
br0x800
br target 0x800
add r1, r2,r3 0x804
target sub r2,r3,r4 0x900
br0x804
br
br
br
0x804
0x804
0x900
PC (latch)
add
add
add
sub
0x904
1
cycle
2
3
4
5
6 add sub
FE_stage
• -Eliminate branches
– Predication (more on later)
• Delayed branch slot
– SPARC, MIPS
• Dual-path execution (more on later)
• Or predict?
• Branches are very frequent
– Approx. 20% of all instructions
• Can not wait until we know where it goes
– Long pipelines
• Branch outcome is known after B cycles
• No scheduling past the branch until outcome known
– Superscalars (e.g., 4-way)
• Branch every cycle or so!
• One cycle of work, then bubbles for ~B cycles?
• Predict Branches
– And predict them well!
• Fetch, decode, etc. on the predicted path
– Option 1: No execution until branch is resolved
– Option 2: Execute anyway (speculation)
• Recover from mispredictions
– Restart fetch from correct path
• Need to know two things
– Whether the branch is taken or not (direction)
– The target address if it is taken (target)
• Direct jumps, Function calls: unconditional
branches
– Direction known (always taken), target easy to
compute
• Conditional Branches (typically PC-relative)
– Direction difficult to predict, target easy to compute
• Indirect jumps, function returns
– Direction known (always taken), target difficult
• Needed for conditional branches
– Most branches are of this type
• Many, many kinds of predictors for this
– Static: fixed rule, or compiler annotation(e.g. br.bwh (branch whether hint. IA-64))
– Dynamic: hardware prediction
• Dynamic prediction usually history-based
– Example: predict direction is the sameas the last time this branch was executed
• Always predict NT
– easy to implement
– 30-40% accuracy … not so good
• Always predict T
– 60-70% accuracy
• BTFNT
– loops usually have a few iterations, so this is
like always predicting that the loop is taken
– don’t know target until decode
K bits of branch
instruction address
Index
Branch history
table of 2^K entries,
1 bit per entry
Use this entry to
predict this branch:
0: predict not taken
1: predict taken
When branch direction resolved,
go back into the table and
update entry: 0 if not taken, 1 if taken
0xDC08: for(i=0; i < 100000; i++)
{
0xDC44: if( ( i % 100) == 0 )
tick( );
0xDC50: if( (i & 1) == 1)
odd( );
}
T
N
• Example: short loop (8 iterations)
– Taken 7 times, then not taken once
– Not-taken mispredicted (was taken previously)
Act: TTTTTTTNTTTTTTNTTTTTTTNT…
Pred: XTTTTTTTNTTTTTTNTTTTTTTN
Corr: Xooooo MMooooooMMooooooMM
Misprediction rate: 2/8 = 25%
• Execute the same loop again
– First always mispredicted (previous outcome was not taken)
– Then 6 predicted correctly
– Then last one mispredicted again
• Each fluke/anomaly in a stable pattern results in two
mispredicts per loop
DC08: TTTTTTTTTTT ... TTTTTTTTTTNTTTTTTTTT …
100,000 iterations
How often is branch outcome != previous outcome?
2 / 100,000
TN
NT
DC44: TTTTT ... TNTTTTT … TNTTTTT …
2 / 100
DC50: TNTNTNTNTNTNTNTNTNTNTNTNTNTNT …
2 / 2
99.998%
Prediction
Rate98.0%
0.0%
0 1
FSM for Last-time
Prediction
0 1
2 3
FSM for 2bC
(2-bit Counter)
Predict NT
Predict T
Transistion on T outcome
Transistion on NT outcome
2
T
3
T
3
T
…3
N
N
1
T
0
0
T
1
T T T T…
T
1 1 1 1
T
1
T…1
0
T
1
T
2
T
3
T
3
T… 3
T
Initial Training/Warm-up1bC:
2bC:
Only 1 Mispredict per N branches now!
DC08: 99.999% DC44: 99.0%
0 1
2 3
We can
live with
these
These
are good
This is bad!
• 98% 99%
– Who cares?
– Actually, it’s 2% misprediction rate 1%
– That’s a halving of the number of mispredictions
• So what?
– If a pipeline can fetch 5 instructions at a cycle and the branch
resolution time is 20 cycles
– To Fetch 500 instructions
– 100 accuracy : 100 cycles
– 98 accuracy:
• 100 (correctly fetch) + 20 (misprediction)*10 = 300 cycles
– 99 accuracy
• 100 (correctly fetch) + 20 misprediction *5 = 200 cycles
1 1 ….. 1 0
BHR
(branch
history
register)
00 …. 00
00 …. 01
00 …. 10
11 …. 11
0 1
2 3
index
Pattern History Table
previous one
Yeh&patt’92
0 0 0 0 0 0
History length
Initialization value (0 or 1)
1 : branch is taken
0: branch is not-taken
Old history New history
New BHR = old BHR<<1 | (br_dir)
Example
BHR: 00000
Br1 : taken BHR 00001
Br 2: not-taken BHR 00010
Br 3: taken BHR 00101
20
• Yeh and Patt 3-letter naming scheme
– Type of history collected
• G (global), P (per branch), S (per set)
– PHT type
• A (adaptive), S (static)
– PHT organization
• g (global), p (per branch), s (per set)
GBHR
PC
PHT
GAp
BHR Table
PC
PAp
• Local Behavior
– What is the predicted direction of Branch A
given the outcomes of previous instances of
Branch A?
• Global Behavior
– What is the predicted direction of Branch Z
given the outcomes of all* previous branches
A, B, …, X and Y?
* number of previous branches tracked limited by the history length
• Branches are correlatedBranch X: if (cond1)
….
Branch Y: if (cond 2)
….
Branch Z : if (cond 1 and cond 2)
…….1 0
Branch
X
Branch
Y
Branch
Z
1 0 0
1 1 1
0 1 0
0 0 0
BHR
…….1 1
…….01
…….00
PHT
1 1 ….. 1 0
2bc
2bc
2bc
2bc
BHR
index
0x809000
PC
XOR
McFarling’93
Predictor size: 2^(history length)*2bit
predict_func(pc, actual_dir)
{
index = PC xor BHR
taken = 2bit_counters[index] > 2 ? 1 : 0
correctly_predictied = (actual_dir == taken) ? 1 : 0 // stats
}
updated_func(pc, actual_dir)
{
index = PC xor BHR
if (actual_dir) SAT_INC( 2bit_counter[index] )
else SAT_DEC ( 2bit_counter[index] )
BHR = BHR << 1 | actual_dir
}