Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global...
-
Upload
hillary-robertson -
Category
Documents
-
view
214 -
download
1
Transcript of Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global...
Electrical and Computer EngineeringUniversity of Wisconsin - Madison
Prefetching Using a Global History Buffer
Kyle J. Nesbit and James E. Smith
February 2004 2/19
Outline
Motivation
Related Work
Global History Buffer Prefetching
Results
Conclusion
February 2004 3/19
Motivation
D-Cache misses to main memory are of increasing importance– Main memory is getting farther away (in clock cycles)– Many demanding, memory intensive workloads
Computation is inexpensive compared to data accesses– Good opportunity to reevaluate prefetching data structures– Simple computation can supplement table information
We consider prefetches from main memory to lowest level cache (L2 cache in this study)
February 2004 4/19
Markov Prefetching
Markov prefetching forms address correlations – Joseph and Grunwald (ISCA ‘97)
Uses global memory addresses as states in the Markov graph
Correlation Table approximates Markov graph
B
C
B
A
B
C
Correlation Table
1st predict. 2nd predict.
miss address
A B C A B C B C . . .
A B
C
1
.5
Miss Address Stream
1.5
Markov Graph
A
February 2004 5/19
Correlation Prefetching Distance Prefetching forms delta correlations
– Kandiraju and Sivasubramaniam (ISCA ‘02)
Delta-based prefetching leads to much smaller table than “classical” Markov Prefetching
Delta-based prefetching can remove compulsory misses
Markov Prefetching
1 1 -2 1 1 -1 1
Global Delta Stream
Distance Prefetching
27 28 29 27 28 29 28 29
Miss Address Stream
1
1
-1 -2
-2
-1 1
global delta28
29
28 29
27
28 29
1st predict. 2nd predict.miss address
1st predict. 2nd predict.
February 2004 6/19
Global History Buffer (GHB)
Holds miss address history in FIFO order
Linked lists within GHB connect related addresses– Same static load– Same global miss address– Same global delta
Global History Buffer
miss addresses
Index Table
FI
Load PC
Linked list walk is short compared with L2 miss latency F
O
February 2004 7/19
Miss Address Stream
Global History Buffermiss address pointerpointer
Index Table
28
29 29
29
head pointer
28
27
27
27 28 29 27 28 29 28
27
GHB - Example
=> Current
=> Prefetches
Key
282928
29
Global MissAddress
February 2004 8/19
GHB – Deltas
14 8
1 8 8 1 4 4 1 8 8Global Delta Stream
Miss Address Stream27 28 36 44 45 49 53 54 62 70 71
1
1
8
=> Current
=> Prefetches
Key
8
4
4
WidthDepthHybridMarkov Graph
.3 .3
.3 .7
.7.7
71 + 8 => 79
79 + 8 => 87
Prefetches
71 + 4 => 75
79 + 4 => 79
Prefetches71 + 8 => 79
71 + 4 => 75
Prefetches
February 2004 9/19
GHB – Hybrid Delta
Width prefetching suffers from poor accuracy and short look-ahead
Depth prefetching has good look-ahead, but may miss prefetch opportunities when a number of “next” addresses have similar probability
The hybrid method combines depth and width
February 2004 10/19
79 + 4 => 79
71 + 4 => 75
Global History Buffermiss address pointerpointer
Index Table
head pointer
27283644454953
1
GHB - Hybrid Example
1
=> Current
=> Prefetches
Key
546270
48
8 8
Global Delta
1 8 8 1 4 4 1 8 8Global Delta Stream
Miss Address Stream27 28 36 44 45 49 53 54 62 70 71
1
8
4 4
71
8
71 + 8 => 79
79 + 8 => 87
Prefetches
February 2004 11/19
Simulation Methodology
Simulated SPEC CPU2000 benchmarks
Fast forwarded 1 billion instructions and simulated 1 billion instructions
Used peak binaries compiled -O4 optimization
Results include all benchmarks that have at least a 5% IPC improvement with an ideal L2 cache
Issue Width 4 Instructions
Load Store Queue
64 Entries
RUU Size 128 Entries
Level 1 D-Cache 16 KB, 2-way
Level 1 I-Cache 16 KB, 2-way
Level 2 Cache 512 KB, 4-way
Memory Latency 140 Cycles
February 2004 12/19
Simulation Methodology
Table walk - one cycle per access
IT size reduces table conflicts
GHB size reflects prefetch history working set
In general, the GHB prefetching requires less history
Prefetching Method Table Configuration Size
Conventional Distance Prefetching 512 Table Entries 18 KBGHB Distance Prefetching 512 IT Entries &
512 GHB Entries8 KB
February 2004 13/19
Results
Our results compare:– IPC Improvement (harmonic mean) vs. Prefetch Degree– Increase in Memory Traffic per instruction (arithmetic mean) vs.
Prefetch Degree– Prefetch Accuracy – The percent of prefetches that are used by
the program
February 2004 14/19
Distance Prefetching (Performance)
5%
15%
25%
35%
1 2 4 8 16Prefetch Degree
Table (width)GHB (width)GHB (depth)GHB (hybrid)
IPC
Im
pro
vem
ent
February 2004 15/19
Distance Prefetching (Performance)
-10%
10%
30%
50%
70%
90%
110%
amm
p
art
wup
wis
e
swim
luca
s
mgr
id
appl
u
galg
el
apsi
mcf
twol
f
vpr
pars
er
gap
bzip
2
hmea
n
Table (width)
GHB (width)
GHB (depth)
GHB (hybrid)
IPC
Im
pro
vem
ent
(~300%)
February 2004 16/19
Distance Prefetching (Memory Traffic)
0%
30%
60%
90%
120%
150%
180%
1 2 4 8 16Prefetch Degree
Table (width)GHB (width)GHB (depth)GHB (hybrid)
Incr
ease
in
Mem
ory
Tra
ffic
February 2004 17/19
Age of Table History
0%
20%
40%
60%
16 256 4K 64K 1M 16M 256MAge (cycles)
Prefetch AccuracyAge Distribution
February 2004 18/19
Distance Prefetching (Memory Traffic)
0%
30%
60%
90%
120%
150%
180%
1 2 4 8 16Prefetch Degree
Table (width)GHB (width)GHB (depth)GHB (hybrid)
Incr
ease
in
Mem
ory
Tra
ffic
February 2004 19/19
Conclusions More complete picture of history
– Allows width, depth, and hybrid– Also can improve other prefetching methods (covered in depth in
the paper)
Eliminates stale history in a natural way– FIFO discards old history to make room for new history– In a conventional table, old history can remain for a very long
time and trigger inaccurate prefetches
February 2004 20/19
Acknowledgements
This research was funded by:– An Intel Undergraduate Research scholarship.– A University of Wisconsin Hilldale Undergraduate Research
fellowship.– The National Science Foundation under grants CCR-0311361
and EIA-0071924.
February 2004 21/19
Backup Slides
February 2004 22/19
Prefetching Metrics
Accuracy is the percent of prefetches that are actually used.
Coverage is the percent of memory references prefetched rather than demand fetched.
Timeliness indicates if prefetched data arrives early enough to prevent the processor from stalling.
February 2004 23/19
GHB – Deltas
14 8
1 8 8 1 4 4 1 8 8Global Delta Stream
Miss Address Stream27 28 36 44 45 49 53 54 62 70 71
1
1
8
=> Current
=> Prefetches
Key
8
4
4
Markov Graph
.3 .3
.3 .7
.7.7
1 1
February 2004 24/19
Prefetch Taxonomy
To simplify the discussion and illustrate the relation between prefetching methods we introduce a consistent naming convention.
Each name is a X/Y pair.– X is the key used for localizing the address stream.– Y is the method for detecting address patterns.
February 2004 25/19
Prefetch Taxonomy
We study two localizing methods– No localization or global (G)– Program Counter (PC)
And three pattern detection methods– Address Correlation– Delta Correlation– Constant Stride
February 2004 26/19
Prefetch Taxonomy
Markov Prefetching - G/AC
Distance Prefetching - G/DC
Stride Prefetching - PC/CS
February 2004 27/19
Stride Prefetching
Table tracks the local history of loads.
If a constant stride is detected in a load’s local history, then n + s, n + 2s, …, n + ds are prefetched.– n is the current target address– s is the detected stride– d is the prefetch degree or aggressiveness of the prefetching.
February 2004 28/19
Stride Prefetching
Tag Last Address Stride State
Reference Prediction Table
PC of Load
Target Address
sub
addPrefetch Address
February 2004 29/19
GHB – Stride Prefetching
GHB-Stride uses the PC to access the index table.
The linked lists contain the local history of each load.
Compare the last two local strides. If the same then prefetch n + s, n + 2s, …, n + ds.
Global History Buffermiss address pointerpointer
Index Table
head pointer
A
B
C
A
B
C
B
1
C
1
PC
=?
February 2004 30/19
GHB – Local Delta Correlation
Form delta correlations within each load’s local history.
For example, consider the local miss address stream:
Addresses 0 1 2 64 65 66 128 129
Deltas 1 1 62 1 1 62 1
Correlation Prefetch Predictions
(1,1) 62 1 1
(1,62) 1 1 62
(62, 1) 1 62 1
February 2004 31/19
PC Local (Memory Traffic)
0%
5%
10%
15%
20%
1 2 4 8 16Prefetch Degree
Table Stride
GHB Stride
GHB PC/DC
Incr
ease
in M
emory
Tra
ffic
February 2004 32/19
PC Local (Performance)
15%
20%
25%
30%
35%
40%
1 2 4 8 16Prefetch Degree
Table Stride
GHB Stride
GHB PC/DC
IPC
Im
pro
vem
ent
February 2004 33/19
PC Local (Performance)
0%
20%
40%
60%
80%
100%
amm
p
art
wup
wis
e
swim
luca
s
mgr
id
appl
u
galg
el
apsi
mcf
twol
f
vpr
pars
er
gap
bzip
2
hmea
n
Table StrideGHB StrideGHB PC/DC
IPC
Im
pro
vem
ent
(~500%)