Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003,...

13
Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt

Transcript of Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003,...

Page 1: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.

Trace Substitution

Hans Vandierendonck,Hans Logie, Koen De Bosschere

Ghent University

EuroPar 2003, Klagenfurt

Page 2: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.

August 27, 2003 Euro-Par 2003 2

Instruction Fetch

• Wide-issue superscalar processors need to fetch multiple branches per cycle– IPC=8 implies fetching ~16 instructions/cycle and

predicting ~3 branches/cycle– Multi-ported instruction cache?

• Trace cache:– Packs fetch groups in a trace– Trace tagged with PC, path, next fetch PC– Multiple branch predictor (MBP) predicts branch

directions

Page 3: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.

August 27, 2003 Euro-Par 2003 3

The Trace Cache

instructioncache

tracecache

MBP

MUX

select

hit

pred. trace

pred. insn

fetch addressinstructionshit/miss

legend

pred. path

fetch address

next addressinstructions

fillunit

onlyexecuted

paths!

Page 4: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.

August 27, 2003 Euro-Par 2003 4

Overview

• Observation– Trace cache misses are (sometimes) branch

mispredictions

• Trace Substitution– How to make use of it

• Evaluation– Is it worth it?

• Conclusion

Page 5: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.

August 27, 2003 Euro-Par 2003 5

Observation

• Multiple branch predictor affects trace cache:– Non-perfect branch

predictors reduce the trace cache hit rate

– FIPA correlates better with TC hit rate than with MBP accuracy

TC: 16K-traces, 4-way set-assoc, path associativityMGAg, Mgshare: 12-bit historyrepeat: 8Kbit hybrid, accessed 3x

0

2

4

6

8

10

12

14

16

MG

Ag

Mg

sha

re

rep

ea

t

pe

rfe

ct

MG

Ag

Mg

sha

re

rep

ea

t

pe

rfe

ct

MG

Ag

Mg

sha

re

rep

ea

t

pe

rfe

ct

gcc vortex avg

FIP

A

70%

75%

80%

85%

90%

95%

100%

Hit

ra

te (

%)

FIPA MBP hits TC hits

Page 6: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.

August 27, 2003 Euro-Par 2003 6

TC Misses Are a Tell-Tale for MBP misses

• Trace cache misses coincide with branch mispredictions, e.g.:– 16K-entry trace cache, 12-bit MGAg:

• 84.9% of TC misses are also MBP misses• 37.6% of MBP misses are also TC misses

– 256-entry trace cache, 12 bit MGAg:• 25.1% of TC misses are also MBP misses• 55.9% of MBP misses are also TC misses

• This work: use TC misses to detect MBP misses and fix them

high accuracy,low coverage

low accuracy,higher coverage

Page 7: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.

August 27, 2003 Euro-Par 2003 7

Trace Substitution

• Assumption: TC miss implies MBP miss– Correlation between branches implies that some

paths never occur– TC stores only those paths that do occur

• If the predicted path is wrong …– Fetch a different trace– Override MBP with MRU trace starting at fetch PC

• Detect MRU trace from LRU bits stored in TC• No trace substitution applied if it does not exist

Page 8: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.

August 27, 2003 Euro-Par 2003 8

Implementation

instructioncache

tracecache

MBP

MUX

select

hit

MRU hit

MRU

pred. trace

pred. insn

fetch addressinstructionshit/miss

legend

pred. path

fetch address

next addressinstructions

fillunit

Page 9: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.

August 27, 2003 Euro-Par 2003 9

Evaluation Setup

• Benchmarks– SPECint95 (except compress, go), reference inputs– 500 million instructions from start of program– Compiled for Alpha ISA, Compaq C compiler, -O4

• Fetch Unit– TC: 1 trace = 16 instructions, 3 cond. branches, trace ends at

system call, indirect jump– TC: 4-way set-assoc., path associativity– MBP: MGAg, varying history length– Instruction cache: 32K, 2-way, 32byte blocks, LRU

• Metric– FIPA = fetched instructions per fetch unit access

Page 10: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.

August 27, 2003 Euro-Par 2003 10

Evaluation (1)

• Observations:– Gap MGAg-perfect

increases with TC size– 20-40% of gap filled

with trace substitution– Only on TC miss, thus

performance increase drops with TC size

TC: 4-way set-associativeMGAg: 12-bit history

8

9

10

11

12

13

14

64 256 1024 4096 16384

Trace cache size (traces)

FIP

A

perfect

MGAg+subst

MGAg

Page 11: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.

August 27, 2003 Euro-Par 2003 11

Evaluation (2)

• Observations:– Compensate poor

branch predictor– No history ~ 10 bit

history– Improvement drops

with more accurate predictor

TC: 256 traces, 4-ways

8.0

8.5

9.0

9.5

10.0

10.5

11.0

11.5

12.0

0 2 4 6 8 10 12 14 16

Branch history length

FIP

A

MGAg+subst

MGAg

Page 12: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.

August 27, 2003 Euro-Par 2003 12

Accuracy vs. Usage

• Definitions:– Usage = substitutions

per fetch unit access– Accuracy = fraction

correct substitutions

• Note– Accuracy limited

because correct-path trace is not always present!

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

0 2 4 6 8 10 12 14 16

Branch history length

Fra

ction o

f A

ccesses

Usage

Accuracy

TC: 256 traces, 4-way

Page 13: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.

August 27, 2003 Euro-Par 2003 13

Conclusion

• Proposed trace substitution– TC miss flags MBP miss

• Not always correct, not all MBP misses found• Fetch MRU trace instead: cheap implementation

• Results in– Consistent performance improvement

• No history+substitution ~ MGAg with 10-bit history• In other cases: 0.2 instructions/access

or same performance as with 16 times smaller MBP

• Most effective when MBP or TC is small