Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global...

33
Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith

Transcript of Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global...

Page 1: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

Electrical and Computer EngineeringUniversity of Wisconsin - Madison

Prefetching Using a Global History Buffer

Kyle J. Nesbit and James E. Smith

Page 2: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 2/19

Outline

Motivation

Related Work

Global History Buffer Prefetching

Results

Conclusion

Page 3: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 3/19

Motivation

D-Cache misses to main memory are of increasing importance– Main memory is getting farther away (in clock cycles)– Many demanding, memory intensive workloads

Computation is inexpensive compared to data accesses– Good opportunity to reevaluate prefetching data structures– Simple computation can supplement table information

We consider prefetches from main memory to lowest level cache (L2 cache in this study)

Page 4: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 4/19

Markov Prefetching

Markov prefetching forms address correlations – Joseph and Grunwald (ISCA ‘97)

Uses global memory addresses as states in the Markov graph

Correlation Table approximates Markov graph

B

C

B

A

B

C

Correlation Table

1st predict. 2nd predict.

miss address

A B C A B C B C . . .

A B

C

1

.5

Miss Address Stream

1.5

Markov Graph

A

Page 5: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 5/19

Correlation Prefetching Distance Prefetching forms delta correlations

– Kandiraju and Sivasubramaniam (ISCA ‘02)

Delta-based prefetching leads to much smaller table than “classical” Markov Prefetching

Delta-based prefetching can remove compulsory misses

Markov Prefetching

1 1 -2 1 1 -1 1

Global Delta Stream

Distance Prefetching

27 28 29 27 28 29 28 29

Miss Address Stream

1

1

-1 -2

-2

-1 1

global delta28

29

28 29

27

28 29

1st predict. 2nd predict.miss address

1st predict. 2nd predict.

Page 6: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 6/19

Global History Buffer (GHB)

Holds miss address history in FIFO order

Linked lists within GHB connect related addresses– Same static load– Same global miss address– Same global delta

Global History Buffer

miss addresses

Index Table

FI

Load PC

Linked list walk is short compared with L2 miss latency F

O

Page 7: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 7/19

Miss Address Stream

Global History Buffermiss address pointerpointer

Index Table

28

29 29

29

head pointer

28

27

27

27 28 29 27 28 29 28

27

GHB - Example

=> Current

=> Prefetches

Key

282928

29

Global MissAddress

Page 8: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 8/19

GHB – Deltas

14 8

1 8 8 1 4 4 1 8 8Global Delta Stream

Miss Address Stream27 28 36 44 45 49 53 54 62 70 71

1

1

8

=> Current

=> Prefetches

Key

8

4

4

WidthDepthHybridMarkov Graph

.3 .3

.3 .7

.7.7

71 + 8 => 79

79 + 8 => 87

Prefetches

71 + 4 => 75

79 + 4 => 79

Prefetches71 + 8 => 79

71 + 4 => 75

Prefetches

Page 9: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 9/19

GHB – Hybrid Delta

Width prefetching suffers from poor accuracy and short look-ahead

Depth prefetching has good look-ahead, but may miss prefetch opportunities when a number of “next” addresses have similar probability

The hybrid method combines depth and width

Page 10: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 10/19

79 + 4 => 79

71 + 4 => 75

Global History Buffermiss address pointerpointer

Index Table

head pointer

27283644454953

1

GHB - Hybrid Example

1

=> Current

=> Prefetches

Key

546270

48

8 8

Global Delta

1 8 8 1 4 4 1 8 8Global Delta Stream

Miss Address Stream27 28 36 44 45 49 53 54 62 70 71

1

8

4 4

71

8

71 + 8 => 79

79 + 8 => 87

Prefetches

Page 11: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 11/19

Simulation Methodology

Simulated SPEC CPU2000 benchmarks

Fast forwarded 1 billion instructions and simulated 1 billion instructions

Used peak binaries compiled -O4 optimization

Results include all benchmarks that have at least a 5% IPC improvement with an ideal L2 cache

Issue Width 4 Instructions

Load Store Queue

64 Entries

RUU Size 128 Entries

Level 1 D-Cache 16 KB, 2-way

Level 1 I-Cache 16 KB, 2-way

Level 2 Cache 512 KB, 4-way

Memory Latency 140 Cycles

Page 12: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 12/19

Simulation Methodology

Table walk - one cycle per access

IT size reduces table conflicts

GHB size reflects prefetch history working set

In general, the GHB prefetching requires less history

Prefetching Method Table Configuration Size

Conventional Distance Prefetching 512 Table Entries 18 KBGHB Distance Prefetching 512 IT Entries &

512 GHB Entries8 KB

Page 13: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 13/19

Results

Our results compare:– IPC Improvement (harmonic mean) vs. Prefetch Degree– Increase in Memory Traffic per instruction (arithmetic mean) vs.

Prefetch Degree– Prefetch Accuracy – The percent of prefetches that are used by

the program

Page 14: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 14/19

Distance Prefetching (Performance)

5%

15%

25%

35%

1 2 4 8 16Prefetch Degree

Table (width)GHB (width)GHB (depth)GHB (hybrid)

IPC

Im

pro

vem

ent

Page 15: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 15/19

Distance Prefetching (Performance)

-10%

10%

30%

50%

70%

90%

110%

amm

p

art

wup

wis

e

swim

luca

s

mgr

id

appl

u

galg

el

apsi

mcf

twol

f

vpr

pars

er

gap

bzip

2

hmea

n

Table (width)

GHB (width)

GHB (depth)

GHB (hybrid)

IPC

Im

pro

vem

ent

(~300%)

Page 16: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 16/19

Distance Prefetching (Memory Traffic)

0%

30%

60%

90%

120%

150%

180%

1 2 4 8 16Prefetch Degree

Table (width)GHB (width)GHB (depth)GHB (hybrid)

Incr

ease

in

Mem

ory

Tra

ffic

Page 17: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 17/19

Age of Table History

0%

20%

40%

60%

16 256 4K 64K 1M 16M 256MAge (cycles)

Prefetch AccuracyAge Distribution

Page 18: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 18/19

Distance Prefetching (Memory Traffic)

0%

30%

60%

90%

120%

150%

180%

1 2 4 8 16Prefetch Degree

Table (width)GHB (width)GHB (depth)GHB (hybrid)

Incr

ease

in

Mem

ory

Tra

ffic

Page 19: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 19/19

Conclusions More complete picture of history

– Allows width, depth, and hybrid– Also can improve other prefetching methods (covered in depth in

the paper)

Eliminates stale history in a natural way– FIFO discards old history to make room for new history– In a conventional table, old history can remain for a very long

time and trigger inaccurate prefetches

Page 20: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 20/19

Acknowledgements

This research was funded by:– An Intel Undergraduate Research scholarship.– A University of Wisconsin Hilldale Undergraduate Research

fellowship.– The National Science Foundation under grants CCR-0311361

and EIA-0071924.

Page 21: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 21/19

Backup Slides

Page 22: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 22/19

Prefetching Metrics

Accuracy is the percent of prefetches that are actually used.

Coverage is the percent of memory references prefetched rather than demand fetched.

Timeliness indicates if prefetched data arrives early enough to prevent the processor from stalling.

Page 23: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 23/19

GHB – Deltas

14 8

1 8 8 1 4 4 1 8 8Global Delta Stream

Miss Address Stream27 28 36 44 45 49 53 54 62 70 71

1

1

8

=> Current

=> Prefetches

Key

8

4

4

Markov Graph

.3 .3

.3 .7

.7.7

1 1

Page 24: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 24/19

Prefetch Taxonomy

To simplify the discussion and illustrate the relation between prefetching methods we introduce a consistent naming convention.

Each name is a X/Y pair.– X is the key used for localizing the address stream.– Y is the method for detecting address patterns.

Page 25: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 25/19

Prefetch Taxonomy

We study two localizing methods– No localization or global (G)– Program Counter (PC)

And three pattern detection methods– Address Correlation– Delta Correlation– Constant Stride

Page 26: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 26/19

Prefetch Taxonomy

Markov Prefetching - G/AC

Distance Prefetching - G/DC

Stride Prefetching - PC/CS

Page 27: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 27/19

Stride Prefetching

Table tracks the local history of loads.

If a constant stride is detected in a load’s local history, then n + s, n + 2s, …, n + ds are prefetched.– n is the current target address– s is the detected stride– d is the prefetch degree or aggressiveness of the prefetching.

Page 28: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 28/19

Stride Prefetching

Tag Last Address Stride State

Reference Prediction Table

PC of Load

Target Address

sub

addPrefetch Address

Page 29: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 29/19

GHB – Stride Prefetching

GHB-Stride uses the PC to access the index table.

The linked lists contain the local history of each load.

Compare the last two local strides. If the same then prefetch n + s, n + 2s, …, n + ds.

Global History Buffermiss address pointerpointer

Index Table

head pointer

A

B

C

A

B

C

B

1

C

1

PC

=?

Page 30: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 30/19

GHB – Local Delta Correlation

Form delta correlations within each load’s local history.

For example, consider the local miss address stream:

Addresses 0 1 2 64 65 66 128 129

Deltas 1 1 62 1 1 62 1

Correlation Prefetch Predictions

(1,1) 62 1 1

(1,62) 1 1 62

(62, 1) 1 62 1

Page 31: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 31/19

PC Local (Memory Traffic)

0%

5%

10%

15%

20%

1 2 4 8 16Prefetch Degree

Table Stride

GHB Stride

GHB PC/DC

Incr

ease

in M

emory

Tra

ffic

Page 32: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 32/19

PC Local (Performance)

15%

20%

25%

30%

35%

40%

1 2 4 8 16Prefetch Degree

Table Stride

GHB Stride

GHB PC/DC

IPC

Im

pro

vem

ent

Page 33: Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

February 2004 33/19

PC Local (Performance)

0%

20%

40%

60%

80%

100%

amm

p

art

wup

wis

e

swim

luca

s

mgr

id

appl

u

galg

el

apsi

mcf

twol

f

vpr

pars

er

gap

bzip

2

hmea

n

Table StrideGHB StrideGHB PC/DC

IPC

Im

pro

vem

ent

(~500%)