Storage Free Confidence Estimator for the TAGE predictor
-
Upload
callum-massey -
Category
Documents
-
view
30 -
download
0
description
Transcript of Storage Free Confidence Estimator for the TAGE predictor
1
Storage Free Confidence Estimator for the TAGE predictor
André Seznec
IRISA/INRIA
2Why confidence estimation for branch predictors
• Energy/performance tradeoffs:• Guiding fetch gating or fetch throttling:
• Dynamic speculative structures resizing
• Controlling SMT resource allocation through fetch policies• Fetch the “most” useful instructions
• Dual Path execution
3
What is confidence estimation ?
• Assert a confidence to a prediction : Is it likely that the prediction is correct ?
• Generally discriminate only low and high confidence predictions:
• High confidence: « very likely » to be correct
• Low confidence: « not so likely » to be correct
4
Confidence estimation for branch predictors
• 1981, Jim Smith:
• weak counters predictions are more likely to mispredict
• 1996, Jacobsen, Rotenberg, Smith: Gshare-like 4-bit counters
Increment on correct prediction, reset on misprediction
low confidence < threshold ≤ high confidence
• 1998 Enhanced JRS Grunwald et al: Use the prediction in the index
• A few other proposals:
• Self confidence for perceptrons ..
Most studies still use enhanced JRS confidence estimators
5Metrics for confidence estimators(Grunwald et al 1998)
• SENS Sensitivity:
• Fraction of correct pred. classified as high conf.
• PVP Predictive Value of a Positive test
• Probability of high conf. to be correct
• SPEC, Specificity:
• Fraction of mispred. classified as low conf.
• PVN, Predictive Value of a Negative test
• Probability of low conf. to be mispredicted
Different qualities for different usages
6The current limits of confidence prediction
• Discriminating between high and low confidence is unsufficient:
• What is the misp. rate on high and low confidence ?
• Malik et al: Use probability for each counter value on
an enhanced JRS
• Enhanced JRS and state-of-the art branch predictors ?
• Each predictor its own confidence estimator
7
This study
Cost-effective confidence estimator for TAGE • No storage overhead• Discrimate:
Low conf. pred. : ≈ 30 % misp. rate or more
Medium conf. pred.: 8-15% misp.rate High conf. pred. : < 1 % misp rate
8TAGE:
multiple tables, global history predictor
L(1)1iαL(i)
0 L(0)
The set of history lengths forms a geometric series
What is important: L(i)-L(i-1) is drastically increasing most of the storage for short history !!
{0, 2, 4, 8, 16, 32, 64, 128}
Capture correlation on very long histories
9
pc h[0:L1]
ctr u tag
hash hash
=?
ctr u tag
hash hash
=?
ctr u tag
hash hash
=?
prediction
pc pc h[0:L2] pc h[0:L3]
11 1 1 1 1 1
1
1
TAGEGeometric history length + PPM-like
+ optimized update policy
Tagless base predictor
10
=? =? =?
11 1 1 1 1 1
1
1
Hit
Hit
Altpred
Pred
Miss
11
Prediction computation
• General case:
• Longest matching component provides the prediction
• Special case:
• Many mispredictions on newly allocated entries: weak Ctr
On many applications, Altpred more accurate than Pred
• Property dynamically monitored through a single 4-bit counter
12
A tagged table entry
• Ctr: 3-bit prediction counter
• U: 2-bit useful counter
• Was the entry recently useful ?
• Tag: partial tag
Tag CtrU
13
Updating the U counter
If (Altpred ≠ Pred) then
• Pred = taken : U= U + 1• Pred ≠ taken : U = U - 1
Graceful aging:Periodic shift of all U counters
• implemented through the reset of a single bit
14Allocating a new entry on a misprediction
• Find a single “useless” entry with a longer history:
• Priviledge the smallest possible history To minimize footprint
• But not too much To avoid ping-pong phenomena
• Initialize Ctr as weak and U as zero
15
Confidence by observation on TAGE
• Apart the prediction, the predictor delivers:
• The provider component and the value of the prediction counter High correlation with the quality of the predictions
• The history of mispredictions can also be observed
burst of mispredictions might indicate predictor warming or program phase changing
16
Experimental framework
• 20 traces from the CBP-1 and 20 traces from the CBP-2
• 16Kbits TAGE : 5 tables, max hist 80 bits
• 64Kbits TAGE : 8 tables, max hist 130 bits
• 256Kbits TAGE : 9 tables, max hist 300 bits
• Probability of misprediction as a metric of confidence:
• Misprediction Per Kilopredictions (MKP)
17
Bimodal as the provider component
• Provides many (often most) of the predictions:
• Allocation of a tagged table entry happens on a misprediction Generally bimodal prediction = the bias of the
branch
• 256Kbits TAGE, bimodal= very accurate prediction
• Often less than 1 MKP, always significantly lower than the global misprediction rate
• 16Kbits TAGE:
• Often bimodal= very accurate prediction
• On demanding apps: bimodal not better than average
18
Discriminating the bimodal predictions
• Weak counters:
• Systematically more than 250 MKP (generally more than 300 MKP) Can be classified as low confidence
• « Identify » conflicts due to limited predictor size:
• Was there a misprediction provided by the bimodal recently (10 last branches) ? ≈80-150 MKP for 16Kbits, ≈50-70 MKP for 64Kbits Can be classified as medium confidence
• The remaining:
• High confidence: <10 MKP, generally much less
19
A tagged component as the provider
• Discrimate on the values of the prediction counter
|2ctr +1| TAGE 16Kbits TAGE 256Kbits
Weak: 1 340 MKP 325 MKP
Nearly Weak: 3 313 MKP 312 MKP
Nearly Sat.: 5 213 MKP 225 MKP
Saturated : 7 29 MKP 17 MKP
20Tagged component as provider: a more thorough analysis
• Weak, Nearly Weak , Nearly Saturated:
• For all benchmarks, for the three TAGE configurations in the range of 200 MKP or higher
• Saturated:
• Slightly lower than the global misprediction rate of the applications Very high confidence for predictable applications (< 10
MKP) Not that high confidence for poorly predictable
applications
(> 50 MKP)
Problem: Saturated often represents more than 50 % of the predictions
21
22
Intermediate summary• High confidence class:
• (Bimodal saturated, no recent misprediction by bimodal)
• Low confidence class:
• Bimodal weak and not saturated tagged
• Medium confidence class:
• (Bimodal and recent misprediction by bimodal)
• Tagged saturated:
• Depends on applications, predictor size etc
Very large class ..
23
Tweaking the predictor to improve confidence
24How to improve confidence on tagged counter saturated class
• Widening the prediction counter ?
• Not that good: Slightly decreased accuracy Only marginal improvement on accuracy on
saturated class
• Modifying the counter update:
• Transition to saturated state with a very low probability P=1/128 in our experiments Marginal accuracy loss ( ≈ 0.02 MPKI)
25
Towards 3 confidence classes
• Tagged Saturated is high confidence
• Nearly Saturated is enlarged and is medium confidence
16 Kbits 64Kbits 256 Kbits
Maximum 16 MKP 13 MKP 12 MKP
Average 4 MKP 2 MKP 2 MKP
16 Kbits 64Kbits 256 Kbits
Maximum 169 MKP 173 MKP 174 MKP
Average 85 MKP 71 MKP 73 MKP
26
Towards 3 confidence classes
• Low confidence:
• Weak bimodal + Weak tagged + Nearly Weak tagged
• Medium confidence:
• Bimodal recently mispredicted + Nearly Saturated tagged
• High confidence:
• Bimodal saturated + Saturated tagged
27
Prediction and misprediction coverage
high conf medium conf low conf
16Kbits 0.740-0.093 (5) 0.209-0.466 (85) 0.051-0.439 (317)
64Kbits 0.799-0.076 (3) 0.160-0.450 (71) 0.040-0.474 (316)
256 Kbits 0.813-0.050 (2) 0.148-0.455 (73) 0.036-0.491 (325)
Misprediction rate
Prediction coverageMisprediction coverage
28
Behavior examples, 64Kbits
high conf medium conf low conf
twolf 15.143 MPKI
0.465-0.053 (13) 0.385-0.460 (137)
0.150-0.487 (390)
gcc 4.192 MPKI
0.780-0.093 (3) 0.195-0.450 (51) 0.025-0.457 (295)
vortex 0.300 MPKI
0.976-0.004 (0) 0.019-0.710(110) 0.005-0.286 (207)
Misprediction rate
Prediction coverageMisprediction coverage
29
30
Predictions Mispredictions
low
medium
high
31
Summary
• Many studies on applications of confidence estimations, but a very few on confidence estimators.
• Each predictor requires a different confidence estimator
• A very cost-effective and efficient confidence estimator for TAGE
• Storage free, very limited logic
• Discriminate between 3 confidence classes: Medium + low conf > 90 % of the mispredictions High conf in the range of 1 % mispredictions or less
32
The End