Exploiting Compressed Block Size as an Indicator of Future Reuse
Transcript of Exploiting Compressed Block Size as an Indicator of Future Reuse
![Page 1: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/1.jpg)
Exploiting Compressed Block Size as an Indicator of Future Reuse
Gennady Pekhimenko,
Tyler Huberty, Rui Cai,
Onur Mutlu, Todd C. Mowry
Phillip B. Gibbons,
Michael A. Kozuch
![Page 2: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/2.jpg)
Executive Summary
2
• In a compressed cache, compressed block size is an additional dimension
• Problem: How can we maximize cache performance utilizing both block reuse and compressed size?
• Observations: – Importance of the cache block depends on its compressed size
– Block size is indicative of reuse behavior in some applications
• Key Idea: Use compressed block size in making cache replacement and insertion decisions
• Results:– Higher performance (9%/11% on average for 1/2-core)
– Lower energy (12% on average)
![Page 3: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/3.jpg)
Potential for Data Compression
3Compression Ratio
Late
ncy
0.5 1.51.0 2.0
FPC
X
C-Pack
X
BDI
X
SC2
X
Frequent Pattern Compression (FPC)
[Alameldeen+, ISCA’04]
C-Pack [Chen+, Trans. on VLSI’10]
Base-Delta-Immediate (BDI)[Pekhimenko+, PACT’12]
Statictical Compression(SC2) [Arelakis+, ISCA’14]
5 cycles
9 cycles
1-2 cycles
8-14 cycles
Can we do better?All use LRU replacement policyAll these works showed significant
performance improvements
![Page 4: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/4.jpg)
Size-Aware Cache Management
1. Compressed block size matters
2. Compressed block size varies
3. Compressed block size can indicate reuse
4
![Page 5: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/5.jpg)
#1: Compressed Block Size Matters
5
A Y B
Cache:
Xmiss hit
A Yhit
Bmiss
Cmiss
C
time
Xmiss hit
B Ctime
A Ymiss hit hit
Belady’s OPT Replacement
Size-Aware Replacement
saved cycles
Size-aware policies could yield fewer misses
X A Y B CAccess stream:
large small
![Page 6: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/6.jpg)
#2: Compressed Block Size Varies
6
0%
20%
40%
60%
80%
100%
1 2 3 4 5 6 7 8 9
Cac
he
Co
vera
ge, %
Series1 Series2 Series3 Series4 Series5
Series6 Series7 Series8 Series9
Compressed block size varies within and between applications
Block size, bytes
BDI compression algorithm [Pekhimenko+, PACT’12]
![Page 7: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/7.jpg)
#3: Block Size Can Indicate Reuse
7
0
2000
4000
6000
1 8 16 20 24 34 36 40 64
Re
use
Dis
tan
ce
(# o
f m
em
ory
acce
sse
s) bzip2
Block size, bytes
Different sizes have different dominant reuse distances
![Page 8: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/8.jpg)
Code Example to Support Intuition
int A[N]; // small indices: compressible
double B[16]; // FP coefficients: incompressible
for (int i=0; i<N; i++) {
int idx = A[i];
for (int j=0; j<N; j++) {
sum += B[(idx+j)%16];
}
}
8
long reuse, compressible
short reuse, incompressible
Compressed size can be an indicator of reuse distance
![Page 9: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/9.jpg)
Size-Aware Cache Management
1. Compressed block size matters
2. Compressed block size varies
3. Compressed block size can indicate reuse
9
![Page 10: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/10.jpg)
Outline
10
• Motivation
• Key Observations
• Key Ideas:• Minimal-Value Eviction (MVE)
• Size-based Insertion Policy (SIP)
• Evaluation
• Conclusion
![Page 11: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/11.jpg)
Compression-Aware Management Policies (CAMP)
11
CAMP
MVE: Minimal-Value
Eviction
SIP:Size-based
Insertion Policy
![Page 12: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/12.jpg)
Define a value (importance) of the block to the cacheEvict the block with the lowest value
Minimal-Value Eviction (MVE): Observations
12
Block #0Block #1
Block #N
…
Highest priority: MRU
Lowest priority: LRU
Set:
Block #XBlock #X
… Importance of the block depends on both the likelihood of reference and the compressed block size
Highest value
Lowest value
![Page 13: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/13.jpg)
Minimal-Value Eviction (MVE): Key Idea
13
Value = Probability of reuse
Compressed block size
Probability of reuse:• Can be determined in many different ways• Our implementation is based on re-reference
interval prediction value (RRIP [Jaleel+, ISCA’10])
![Page 14: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/14.jpg)
Compression-Aware Management Policies (CAMP)
14
CAMP
MVE: Minimal-Value
Eviction
SIP:Size-based
Insertion Policy
![Page 15: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/15.jpg)
Size-based Insertion Policy (SIP):Observations• Sometimes there is a relation between the
compressed block size and reuse distance
15
compressed block sizedata
structure
• This relation can be detected through the compressed block size
• Minimal overhead to track this relation (compressed block information is a part of design)
reuse distance
![Page 16: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/16.jpg)
Size-based Insertion Policy (SIP):Key Idea
• Insert blocks of a certain size with higher priority if this improves cache miss rate
• Use dynamic set sampling [Qureshi, ISCA’06] to detect which size to insert with higher priority
16
set A
set B
set C
set D
set E
set G
set H
set F
set I
Main Tags
set A 8B
Auxiliary Tags
set D 64B
set F 8B
set I 64B
set A
set F
+ CTR8B -
…miss
set F 8B
set A 8B
…
miss
decides policyfor steady state
![Page 17: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/17.jpg)
Outline
17
• Motivation
• Key Observations
• Key Ideas:• Minimal-Value Eviction (MVE)
• Size-based Insertion Policy (SIP)
• Evaluation
• Conclusion
![Page 18: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/18.jpg)
Methodology
18
• Simulator– x86 event-driven simulator (MemSim [Seshadri+, PACT’12])
• Workloads– SPEC2006 benchmarks, TPC, Apache web server
– 1 – 4 core simulations for 1 billion representative instructions
• System Parameters– L1/L2/L3 cache latencies from CACTI
– BDI (1-cycle decompression) [Pekhimenko+, PACT’12]
– 4GHz, x86 in-order core, cache size (1MB - 16MB)
![Page 19: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/19.jpg)
Evaluated Cache Management Policies
19
Design Description
LRU Baseline LRU policy- size-unaware
RRIP Re-reference Interval Prediction [Jaleel+, ISCA’10] - size-unaware
![Page 20: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/20.jpg)
Evaluated Cache Management Policies
20
Design Description
LRU Baseline LRU policy- size-unaware
RRIP Re-reference Interval Prediction [Jaleel+, ISCA’10] - size-unaware
ECM Effective Capacity Maximizer [Baek+, HPCA’13]+ size-aware
![Page 21: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/21.jpg)
Evaluated Cache Management Policies
21
Design Description
LRU Baseline LRU policy- size-unaware
RRIP Re-reference Interval Prediction [Jaleel+, ISCA’10] - size-unaware
ECM Effective Capacity Maximizer [Baek+, HPCA’13]+ size-aware
CAMP Compression-Aware Management Policies(MVE + SIP)
![Page 22: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/22.jpg)
Size-Aware Replacement
22
• Effective Capacity Maximizer (ECM)
[Baek+, HPCA’13]– Inserts “big” blocks with lower priority
– Uses heuristic to define the threshold
• Shortcomings– Coarse-grained
– No relation between block size and reuse
– Not easily applicable to other cache organizations
![Page 23: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/23.jpg)
CAMP Single-Core
23
0.8
0.9
1
1.1
1.2
1.3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
No
rmal
ize
d IP
C
Series1 Series2 Series3
SPEC2006, databases, web workloads, 2MB L2 cache
CAMP improves performance over LRU, RRIP, ECM
31%
8%5%
![Page 24: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/24.jpg)
Multi-Core Results
• Classification based on the compressed block size distribution– Homogeneous (1 or 2 dominant block size(s))
– Heterogeneous (more than 2 dominant sizes)
• We form three different 2-core workload groups (20 workloads in each):– Homo-Homo
– Homo-Hetero
– Hetero-Hetero
24
![Page 25: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/25.jpg)
Multi-Core Results (2)
25
0.95
1.00
1.05
1.10
1 2 3 4
No
rmal
ize
d W
eig
hte
d S
pe
ed
up
Series1 Series2 Series3
CAMP outperforms LRU, RRIP and ECM policiesHigher benefits with higher heterogeneity in size
![Page 26: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/26.jpg)
Effect on Memory Subsystem Energy
26
0.7
0.8
0.9
1
1.1
1.2as
tar
cact
usA
DM gcc
gob
mk
h2
64
ref
sop
lex
zeu
smp
bzi
p2
Gem
sFD
TDgr
om
acs
lesl
ie3
dp
erlb
ench
sjen
gsp
hin
x3xa
lan
cbm
km
ilco
mn
etp
plib
qu
antu
mlb
mm
cftp
ch2
tpch
6ap
ach
e
GM
ean
Inte
nse
GM
ean
All
No
rmal
ize
d E
ne
rgy RRIP ECM CAMP
CAMP reduces energy consumption in the memory subsystem
5%
L1/L2 caches, DRAM, NoC, compression/decompression
![Page 27: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/27.jpg)
CAMP for Global Replacement
27
G-CAMP
G-MVE: Global Minimal-Value Eviction
G-SIP:Global Size-based
Insertion Policy
CAMP with V-Way [Qureshi+, ISCA’05] cache design
Reuse Replacement instead of RRIPSet Dueling instead of Set Sampling
![Page 28: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/28.jpg)
G-CAMP Single-Core
28
SPEC2006, databases, web workloads, 2MB L2 cache
G-CAMP improves performance over LRU, RRIP, V-Way
0.8
0.9
1
1.1
1.2
1.3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
No
rmal
ize
d IP
C
Series1 Series2 Series3 Series4 Series563%-97%
14%9%
![Page 29: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/29.jpg)
Storage Bit Count Overhead
29
Designs(with BDI)
LRU CAMP V-Way G-CAMP
tag-entry +14b +14b +15b +19b
data-entry +0b +0b +16b +32b
total +9% +11% +12.5% +17%
Over uncompressed cache with LRU
2% 4.5%
![Page 30: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/30.jpg)
Cache Size Sensitivity
30
1
1.1
1.2
1.3
1.4
1.5
1.6
1 2 3 4 5
No
rmal
ize
d IP
C Series1 Series2 Series3
CAMP/G-CAMP outperforms prior policies for all $ sizesG-CAMP outperforms LRU with 2X cache sizes
![Page 31: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/31.jpg)
Other Results and Analyses in the Paper
• Block size distribution for different applications
• Sensitivity to the compression algorithm
• Comparison with uncompressed cache
• Effect on cache capacity
• SIP vs. PC-based replacement policies
• More details on multi-core results
31
![Page 32: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/32.jpg)
Conclusion
32
• In a compressed cache, compressed block size is an additional dimension
• Problem: How can we maximize cache performance utilizing both block reuse and compressed size?
• Observations:– Importance of the cache block depends on its compressed size– Block size is indicative of reuse behavior in some applications
• Key Idea: Use compressed block size in making cache replacement and insertion decisions
• Two techniques: Minimal-Value Eviction and Size-based Insertion Policy
• Results:– Higher performance (9%/11% on average for 1/2-core)– Lower energy (12% on average)
![Page 33: Exploiting Compressed Block Size as an Indicator of Future Reuse](https://reader034.fdocuments.in/reader034/viewer/2022052406/58a1a1b01a28ab2a038b65e9/html5/thumbnails/33.jpg)
Exploiting Compressed Block Size as an Indicator of Future Reuse
Gennady Pekhimenko,
Tyler Huberty, Rui Cai,
Onur Mutlu, Todd C. Mowry
Phillip B. Gibbons,
Michael A. Kozuch