A Penalty-Sensitive Branch Predictor

14
Yue Hu David M. Koppelman Lu Peng A Penalty-Sensitive Branch Predictor Department of Electrical and Computer Engineering Louisiana State University

description

A Penalty-Sensitive Branch Predictor. Yue Hu David M. Koppelman Lu Peng. Department of Electrical and Computer Engineering Louisiana State University . . 1. Motivation. Typical branch predictor: to decrease misprediction rate ( MR ):. - PowerPoint PPT Presentation

Transcript of A Penalty-Sensitive Branch Predictor

Page 1: A  Penalty-Sensitive Branch Predictor

Yue HuDavid M. Koppelman

Lu Peng

A Penalty-Sensitive Branch Predictor

Department of Electrical and Computer EngineeringLouisiana State University .

Page 2: A  Penalty-Sensitive Branch Predictor

Why not favor HP branches to decrease their MR?

1. MotivationTypical branch predictor: to decrease misprediction rate (MR):

i.e. Two-level adaptive (Yeh & Patt), Neural (Vintan & Jimenez) and LTAGE (Seznec)

Performance can also be improved even if MR doesn’t decrease

Even if total MR doesn't decrease, performance could still be improved

Time

Run 1Run 2

Time that a mispredicted branch is on the wrong path

However

2

High penalty (HP) Low penalty (HP)

The same program on the same computers but different branch predictors

Page 3: A  Penalty-Sensitive Branch Predictor

Two-class TAGE predictor

Loop predictor

PCResolve cycles

Final prediction

History

Penalty predictor

PC

PC

Loop enabled?

Yes

No

PC

1 2

3

1: Predict a branch: HP or LP?2: Based on TAGE, can favor HP branches, while only provide normal operation for LP branches;3: Enabled only when beneficial.

Design Overview2. Design OverviewMain predictor

Assistant predictor

Figure 1. Overall structure of our predictor

3

Page 4: A  Penalty-Sensitive Branch Predictor

Penalty table

8-bit penalty counter (CNT) 1-bit penalty state (STA)

Design Overview2.1 Penalty Predictor

CNT = 0;STA = LP

Penalty>= 120 cyc?

CNT += 8; CNT --;Yes

No

CNT >= 192?

STA = HPYes

No

CNT == 0?

STA = LPYes

No

High-penalty state remains at least hundreds of executions, so the following HP branches can get benefits. 4

Page 5: A  Penalty-Sensitive Branch Predictor

Bank 0 Bank 1 Bank 2 Bank 3 Bank 4 Bank 5 Bank 6

History

PC

hash hash hash hashhash hash

...

2-bit bimodalpredictor

3-bit pred

2-bit use (U)

[9-16]-bit tag

Hash (His, PC) Index: direct to one entry in each bank;

wider tag

Prediction:

Higher bank: longer history, wider tag -> more accurate

Design Overview2.2 Two-class TAGE Predictor

Tag: check whether hit (H) or miss (M);

U0

U2

U0

U1

U1U1H

M

HM

M

MM

Final Prediction

[Only rough idea]

5

Page 6: A  Penalty-Sensitive Branch Predictor

Update:

New entries allocated at higher banks when mispred.LP: only one entry allocated; HP: a second entry allocated with two limitations

1. A bank with a useless entry;

Design Overview2.2 Two-class TAGE Predictor

Bank 0 Bank 1 Bank 2 Bank 3 Bank 4 Bank 5 Bank 6

M U0

H U2

M U1

M U0

M U1 M U0

History

PC

hash hash hash hashhash hash

...

mispred Since occupied, not used.

First allocation here

HP’s double-entry allocation doesn’t harm that of LP too much

Since occupied, not used.

Second allocation here for HP

2. Last two allocations in the bank are one-entry allocations; 6

Page 7: A  Penalty-Sensitive Branch Predictor

Update: Design Overview2.2 Two-class TAGE Predictor

Bank 0 Bank 1 Bank 2 Bank 3 Bank 4 Bank 5 Bank 6

M U0

H U2

M U1

M U0

M U1 M U0

History

PC

hash hash hash hashhash hash

...

mispred Since occupied, not used.

First allocation here

Double-entry allocation favors HP branches so that their new entries can survive longer time to establish their usefulness.

Since occupied, not used.

Second allocation here for HP

Two cases for U01. Entry itself is not recently useful, if ever;2. New allocation, usefulness hasn’t been established

7

Page 8: A  Penalty-Sensitive Branch Predictor

CL0

1C

L02

CL0

3C

L04

CL

05C

L06

CL0

7C

L08

CL0

9C

L10

CL

11C

L12

CL1

3C

L14

CL1

5C

L16 I I I I I I S S S S S

Ave

-100

102030405060708090

100

1. predicted to be HP (50.2%);2. among all branches, actual HP (27%);3. predicted LP while turn out to be HP (1.3%);

PerformanceAnalysis3.1 Penalty Predictor

Average penalty of branches predicted LP: 121 HP: 212 cycles

%

8

covers 98.7% actual HP

Page 9: A  Penalty-Sensitive Branch Predictor

8K 16K 32K 64K 128K 256K0.03

0.031

0.032

0.033

0.034

0.035

0.036

0.037

0.038

0.039

High-penalty branches

-5E-5

-4E-5

-6E-5

-4E-5

-8E-5-7E-5

8K 16K 32K 64K 128K 256K0.03

0.031

0.032

0.033

0.034

0.035

0.036

0.037

0.038

0.039LTAGE PSLTAGE

Low-penalty branches

+7E-5

+4E-5+3E-6

+3E-5

+2E-5

-9E-5

3.2 Two-class TAGE predictorMR

PerformanceAnalysis

1. MR of HP branches is about 10% higher;

All negative

2. Penalty-Sensitive (PS) method effectively favors HP branch;3. 64KB: HP, -6E-5; LP, +3E-5. 9Overall, it is beneficial.

Loop branches; branches with cache misses

Page 10: A  Penalty-Sensitive Branch Predictor

4 SummaryOur penalty-sensitive branch predictor works Penalty predictor: 50.2% predicted HP; covers 98.7% actual HP Average penalty ( HP VS LP= 212: 121)Two-class TAGE predictor: favor HP branches, globally beneficial, but limited

Limited favoring mechanism: Double-entry allocation for HP branches to increase the chance that their new entries will survive longer time to establish usefulness. Future: more helpful favoring mechanism needed

10

Conclusion:

2. Even if total MR doesn’t decrease, performance could still be improved by favoring HP branches;

1. Mispredicted HP branches are more harmful;

3. Can be applied to any predictors once we can find an effective favoring mechanism.

Page 11: A  Penalty-Sensitive Branch Predictor

Thanks!

11

Page 12: A  Penalty-Sensitive Branch Predictor

CL0

1

CL0

2

CL0

3

CL0

4

CL0

5

CL0

6

CL0

7

CL0

8

CL0

9

CL1

0

CL1

1

CL1

2

CL1

3

CL1

4

CL1

5

CL1

6

INT0

1

INT0

2

INT0

3

INT0

4

INT0

5

INT0

6

MM

01

MM

02

MM

03

MM

04

MM

05

MM

06

MM

07

SE

R01

SE

R02

SE

R03

SE

R04

SE

R05

WS

01

WS

02

WS

03

WS

04

WS

05

WS

06

Ave

rage

0

50

100

150

200

250

300Lo_AvgPen Hi_AvgPen

1830317

Penalty Predictor Backup Slides

12

Page 13: A  Penalty-Sensitive Branch Predictor

8K 16K 32K 64K 128K 256K0.03

0.031

0.032

0.033

0.034

0.035

0.036

0.037

0.038

0.039

High-penalty branches

-5E-5

-4E-5

-6E-5

-4E-5

-8E-5-7E-5

8K 16K 32K 64K 128K 256K0.03

0.031

0.032

0.033

0.034

0.035

0.036

0.037

0.038

0.039LTAGE PSLTAGE

Low-penalty branches

+7E-5

+4E-5+3E-6

+3E-5

+2E-5

-9E-5

Two-class TAGE predictorMR

-6E-5 -4.7E-4

= 12.8%

-6E-5

-4.7E-4

Penalty-Sensitive achieved 12.8% improvement on MR of HP Branch that would be achieved by doubling storage budget.

Backup Slides

13

Page 14: A  Penalty-Sensitive Branch Predictor

Loop PredictorC

lient

01C

lient

02C

lient

03C

lient

04C

lient

05C

lient

06C

lient

07C

lient

08C

lient

09C

lient

10C

lient

11C

lient

12C

lient

13C

lient

14C

lient

15C

lient

16in

t01

int0

2in

t03

int0

4in

t05

int0

6m

m01

mm

02m

m03

mm

04m

m05

mm

06m

m07

serv

er01

serv

er02

serv

er03

serv

er04

serv

er05

ws0

1w

s02

ws0

3w

s04

ws0

5w

s06

Ave

rage

0100200300400500600700800900

1000PSTAGE(without loop)

PSLTAGE16431643

22082208

28392839

82047920

66246630

25922515

1000 987

36733596

1.3% Improvement with only 0.53KB

MPPKI

Average MPPKI normalized to 1000 Very efficient

Backup Slides

14