Storage Free Confidence Estimator for the TAGE predictor

1

Storage Free Confidence Estimator for the TAGE predictor

André Seznec

IRISA/INRIA

2Why confidence estimation for branch predictors

• Energy/performance tradeoffs:• Guiding fetch gating or fetch throttling:

• Dynamic speculative structures resizing

• Controlling SMT resource allocation through fetch policies• Fetch the “most” useful instructions

• Dual Path execution

3

What is confidence estimation ?

• Assert a confidence to a prediction : Is it likely that the prediction is correct ?

• Generally discriminate only low and high confidence predictions:

• High confidence: « very likely » to be correct

• Low confidence: « not so likely » to be correct

4

Confidence estimation for branch predictors

• 1981, Jim Smith:

• weak counters predictions are more likely to mispredict

• 1996, Jacobsen, Rotenberg, Smith: Gshare-like 4-bit counters

Increment on correct prediction, reset on misprediction

low confidence < threshold ≤ high confidence

• 1998 Enhanced JRS Grunwald et al: Use the prediction in the index

• A few other proposals:

• Self confidence for perceptrons ..

Most studies still use enhanced JRS confidence estimators

5Metrics for confidence estimators(Grunwald et al 1998)

• SENS Sensitivity:

• Fraction of correct pred. classified as high conf.

• PVP Predictive Value of a Positive test

• Probability of high conf. to be correct

• SPEC, Specificity:

• Fraction of mispred. classified as low conf.

• PVN, Predictive Value of a Negative test

• Probability of low conf. to be mispredicted

Different qualities for different usages

6The current limits of confidence prediction

• Discriminating between high and low confidence is unsufficient:

• What is the misp. rate on high and low confidence ?

• Malik et al: Use probability for each counter value on

an enhanced JRS

• Enhanced JRS and state-of-the art branch predictors ?

• Each predictor its own confidence estimator

7

This study

Cost-effective confidence estimator for TAGE • No storage overhead• Discrimate:

Low conf. pred. : ≈ 30 % misp. rate or more

Medium conf. pred.: 8-15% misp.rate High conf. pred. : < 1 % misp rate

8TAGE:

multiple tables, global history predictor

L(1)1iαL(i)

0 L(0)

The set of history lengths forms a geometric series

What is important: L(i)-L(i-1) is drastically increasing most of the storage for short history !!

{0, 2, 4, 8, 16, 32, 64, 128}

Capture correlation on very long histories

9

pc h[0:L1]

ctr u tag

hash hash

=?

ctr u tag

hash hash

=?

ctr u tag

hash hash

=?

prediction

pc pc h[0:L2] pc h[0:L3]

11 1 1 1 1 1

1

1

TAGEGeometric history length + PPM-like

+ optimized update policy

Tagless base predictor

10

=? =? =?

11 1 1 1 1 1

1

1

Hit

Hit

Altpred

Pred

Miss

11

Prediction computation

• General case:

• Longest matching component provides the prediction

• Special case:

• Many mispredictions on newly allocated entries: weak Ctr

On many applications, Altpred more accurate than Pred

• Property dynamically monitored through a single 4-bit counter

12

A tagged table entry

• Ctr: 3-bit prediction counter

• U: 2-bit useful counter

• Was the entry recently useful ?

• Tag: partial tag

Tag CtrU

13

Updating the U counter

If (Altpred ≠ Pred) then

• Pred = taken : U= U + 1• Pred ≠ taken : U = U - 1

Graceful aging:Periodic shift of all U counters

• implemented through the reset of a single bit

14Allocating a new entry on a misprediction

• Find a single “useless” entry with a longer history:

• Priviledge the smallest possible history To minimize footprint

• But not too much To avoid ping-pong phenomena

• Initialize Ctr as weak and U as zero

15

Confidence by observation on TAGE

• Apart the prediction, the predictor delivers:

• The provider component and the value of the prediction counter High correlation with the quality of the predictions

• The history of mispredictions can also be observed

burst of mispredictions might indicate predictor warming or program phase changing

16

Experimental framework

• 20 traces from the CBP-1 and 20 traces from the CBP-2

• 16Kbits TAGE : 5 tables, max hist 80 bits



• Probability of misprediction as a metric of confidence:

• Misprediction Per Kilopredictions (MKP)

17

Bimodal as the provider component

• Provides many (often most) of the predictions:

• Allocation of a tagged table entry happens on a misprediction Generally bimodal prediction = the bias of the

branch

• 256Kbits TAGE, bimodal= very accurate prediction

• Often less than 1 MKP, always significantly lower than the global misprediction rate

• 16Kbits TAGE:

• Often bimodal= very accurate prediction

• On demanding apps: bimodal not better than average

18

Discriminating the bimodal predictions

• Weak counters:

• Systematically more than 250 MKP (generally more than 300 MKP) Can be classified as low confidence

• « Identify » conflicts due to limited predictor size:

• Was there a misprediction provided by the bimodal recently (10 last branches) ? ≈80-150 MKP for 16Kbits, ≈50-70 MKP for 64Kbits Can be classified as medium confidence

• The remaining:

• High confidence: <10 MKP, generally much less

19

A tagged component as the provider

• Discrimate on the values of the prediction counter

|2ctr +1| TAGE 16Kbits TAGE 256Kbits

Weak: 1 340 MKP 325 MKP

Nearly Weak: 3 313 MKP 312 MKP

Nearly Sat.: 5 213 MKP 225 MKP

Saturated : 7 29 MKP 17 MKP

20Tagged component as provider: a more thorough analysis

• Weak, Nearly Weak , Nearly Saturated:

• For all benchmarks, for the three TAGE configurations in the range of 200 MKP or higher

• Saturated:

• Slightly lower than the global misprediction rate of the applications Very high confidence for predictable applications (< 10

MKP) Not that high confidence for poorly predictable

applications

(> 50 MKP)

Problem: Saturated often represents more than 50 % of the predictions

22

Intermediate summary• High confidence class:

• (Bimodal saturated, no recent misprediction by bimodal)

• Low confidence class:

• Bimodal weak and not saturated tagged

• Medium confidence class:

• (Bimodal and recent misprediction by bimodal)

• Tagged saturated:

• Depends on applications, predictor size etc

Very large class ..

23

Tweaking the predictor to improve confidence

24How to improve confidence on tagged counter saturated class

• Widening the prediction counter ?

• Not that good: Slightly decreased accuracy Only marginal improvement on accuracy on

saturated class

• Modifying the counter update:

• Transition to saturated state with a very low probability P=1/128 in our experiments Marginal accuracy loss ( ≈ 0.02 MPKI)

25

Towards 3 confidence classes

• Tagged Saturated is high confidence

• Nearly Saturated is enlarged and is medium confidence

16 Kbits 64Kbits 256 Kbits

Maximum 16 MKP 13 MKP 12 MKP

Average 4 MKP 2 MKP 2 MKP

16 Kbits 64Kbits 256 Kbits

Maximum 169 MKP 173 MKP 174 MKP

Average 85 MKP 71 MKP 73 MKP

26

Towards 3 confidence classes

• Low confidence:

• Weak bimodal + Weak tagged + Nearly Weak tagged

• Medium confidence:

• Bimodal recently mispredicted + Nearly Saturated tagged

• High confidence:

• Bimodal saturated + Saturated tagged

27

Prediction and misprediction coverage

high conf medium conf low conf

16Kbits 0.740-0.093 (5) 0.209-0.466 (85) 0.051-0.439 (317)

64Kbits 0.799-0.076 (3) 0.160-0.450 (71) 0.040-0.474 (316)

256 Kbits 0.813-0.050 (2) 0.148-0.455 (73) 0.036-0.491 (325)

Misprediction rate

Prediction coverageMisprediction coverage

28

Behavior examples, 64Kbits

high conf medium conf low conf

twolf 15.143 MPKI

0.465-0.053 (13) 0.385-0.460 (137)

0.150-0.487 (390)

gcc 4.192 MPKI

0.780-0.093 (3) 0.195-0.450 (51) 0.025-0.457 (295)

vortex 0.300 MPKI

0.976-0.004 (0) 0.019-0.710(110) 0.005-0.286 (207)

Misprediction rate

Prediction coverageMisprediction coverage

30

Predictions Mispredictions

low

medium

high

31

Summary

• Many studies on applications of confidence estimations, but a very few on confidence estimators.

• Each predictor requires a different confidence estimator

• A very cost-effective and efficient confidence estimator for TAGE

• Storage free, very limited logic

• Discriminate between 3 confidence classes: Medium + low conf > 90 % of the mispredictions High conf in the range of 1 % mispredictions or less

32

The End

Storage Free Confidence Estimator for the TAGE predictor

Documents

Transcript of Storage Free Confidence Estimator for the TAGE predictor