Cache Replacement Policy UsingMap-based Adaptive Insertion
Yasuo Ishii1,2, Mary Inaba1, and Kei Hiraki1
1 The University of Tokyo2 NEC Corporation
IntroductionModern computers have multi-
level cache system
Performance improvement of LLC is the key to achieve high performance
LLC stores many dead-blocksElimination of dead-blocks in LLC improves system performance
CORE
L2
LLC(L3)
Memory
L1
IntroductionMany multi-core systems
adopt shared LLC
Shared LLC make issuesThrashing by other threadsFairness of shared resource
Dead-block elimination is more effective for multi-core systems
Shared LLC(L3)
Memory
CORE1
L2
・・・・・・
L1
CORE2
L2
L1
COREN
L2
L1
・・・
Trade-offs of Prior WorksReplacement Algorithm
Dead-blockElimination
AdditionalHW Cost
LRU Insert to MRU None NoneDIP[2007 Qureshi+]
Partially Random Insertion
Some Several counters
LightLRF[2009 Xiang+]
Predicts from reference pattern
Strong Shadow tag, PHT
Heavy
Problem of dead-block predictionInefficient use of data structure
(c.f. shadow tag)
Map-based Data Structure
・・・・
・・・・Line Size
ACCESS
ACCESS
ACCESS
Shadow Tag
40bit/tag
Memory Address Space Zone Size
Map-based data structure improves cost- efficiency when there is spatial locality
Cost: 40bit/line
I I I IIIA AA
40bit/tag 1 bit/line
Map-base HistoryCost: 15.3bit/line (=40b+6b/3line)
Map-based Adaptive Insertion (MAIP)Modifies insertion position(1)Cache bypass(2)LRU position(3)Middle of MRU/LRU(4)MRU position
Adopts map-based data structure for tracking many memory accesses
Exploits two localities for reuse possibility estimation
Low Reuse Possibility
High Reuse Possibility
Hardware ImplementationMemory access map
Collects memory access history & memory reuse history
Bypass filter tableCollects data reuse
frequency of memory access instructions
Reuse possibility estimationEstimates reuse possibility
from information of other components
Estimation Logic
Mem
ory
Acce
ss
Map
Bypa
ss Fi
lter
Tabl
e
Last
Lev
el C
ache
Memory Access Information
Insertion Position
Memory Access Map (1)
ACCESS
・・・・
・・・・ I I I I
Init Access
DataReuse
State Diagram
FirstTouchZone Size
Line Size
II
ACCESS
ACCESS
A AA
ACCESS
Detects one information(1)Data reuse
The accessed line is previously touched ?
MapTag
AccessCount
ReuseCount
Memory Access Map (2)
A A I I
Init Access
ReuseCountAccess
Count
AI
Detects one statistics(2)Spatial locality
How often the neighboring lines are reused?
Access Map
Attaches counters to detect spatial locality
Data Reuse MetricReuse CountAccess Count
=
Memory Access Map (3)Implementation
Maps are stored in cache like structure
Cost-efficiencyEntry has 256 statesTracks 16KB memory
16KB = 64B x 256stats
Requires ~ 1.2bit for tracking 1 cache line at the best case
Tag Access Map
CacheOffset
MapOffset
MapIndex
MapTag
= =ACCESS
MUX2563030
4
8
Memory Address
Count
Reuse Count
Bypass Filter Table
Each entry is saturating counterCount up on data reuse / Count down on first
touch
Program Counter
Bypass Filter Table(8-bit x 512-entry)
BYPASSUSELESSNORMALUSEFULREUSE
Rarely Reused
Frequently Reused
Detects one statistic(3)Temporal locality:
How often the instruction reuses data?
Reuse Possibility Estimation Logic
Uses 2 localities & data reuse informationData Reuse
Hit / Miss of corresponding lookup of LLC Corresponding state of Memory Access Map
Spatial Locality of Data Reuse Reuse frequency of neighboring lines
Temporal Locality of Memory Access Instruction Reuse frequency of corresponding instruction
Combines information to decide insertion policy
Additional OptimizationAdaptive dedicated set reduction(ADSR)
Enhancement of set dueling [2007Qureshi+]
Reduces dedicated sets when PSEL is strongly biased
Set 7
LRU Dedicated Set
Set 6Set 5Set 4Set 3Set 2Set 1Set 0
Set 7Set 6Set 5Set 4Set 3Set 2Set 1Set 0
MAIP Dedicated SetAdditional FollowerFollower Set
EvaluationBenchmark
SPEC CPU2006, Compiled with GCC 4.2Evaluates 100M instructions (skips 40G inst.)
MAIP configuration (per-core resource)Memory Access Map: 192 entries, 12-wayBypass Filter: 512 entries, 8-bit countersPolicy selection counter: 10 bit
Evaluates DIP & TADIP-F for comparison
Cache Miss Count (1-core)
MAIP reduces MPKI by 8.3% from LRUOPT reduces MPKI by 18.2% from LRU
400.
perl
401.
bzip
429.
mcf
433.
milc
434.
zeus
436.
cact
437.
lesl
450.
sopl
456.
hmm
e45
9.Ge
ms
462.
libq
464.
h264
470.
lbm
471.
omne
473.
asta
481.
wrf
482.
sphi
483.
xala
Aver
age0
20
40
60
LRU DIP MAIP OPT
Miss
per
100
0 in
sts.
Speedup (1-core & 4-core)
4-core result
403429433483
429450456482
401434456470
450464473483
401433450462
401450450482
403434450464
403456459473
434450482483
400429473483
400450456462
433434450462
433450470483
433434450462
400416456464
gmean
-6%
0%
6%
12%
18% TADIP MAIP
Wei
ghte
d Sp
eedu
p
400.
perl
401.
bzip
429.
mcf
433.
milc
434.
zeus
436.
cact
437.
lesl
450.
sopl
456.
...
459.
...
462.
libq
464.
h264
470.
lbm
471.
...
473.
asta
481.
wrf
482.
sphi
483.
xala
gmea
n
-10%
0%
10%
20% DIP MAIP
Spee
dup
1-core result
483.
xal
a
Cost Efficiency of Memory Access Map
Requires 1.9 bit / line in average~ 20 times better than that of shadow tag
Covers >1.00MB(LLC) in 9 of 18 benchmarks
Covers >0.25MB(MLC) in 14 of 18 benchmarks
400.
perl
429.
mcf
434.
zeus
437.
lesl
456.
hmm
e
462.
libq
470.
lbm
473.
asta
482.
sphi
Aver
age0.0
0.5 1.0 1.5 2.0 2.5 3.0
Cove
red
Area
(MB)
Related WorkUses spatial / temporal locality
Using spatial locality [1997, Johnson+]Using different types of locality [1995,
González+]Prediction-base dead-block elimination
Dead-block prediction [2001, Lai+]Less Reused Filter [2009, Xiang+]
Modified Insertion PolicyDynamic Insertion Policy [2007, Qureshi+]Thread Aware DIP[2008, Jaleel+]
ConclusionMap-based Adaptive Insertion Policy
(MAIP)Map-base data structure
x20 cost-effectiveReuse possibility estimation exploiting
spatial locality & temporal locality Improves performance from LRU/DIP
Evaluates MAIP with simulation studyReduces cache miss count by 8.3% from LRUImproves IPC by 2.1% in 1-core, by 9.1% in
4-core
ComparisonReplacement Algorithm
Dead-blockElimination
AdditionalHW Cost
LRU Insert to MRU None NoneDIP[2007 Qureshi+]
Partially Random Insertion
Some Several countersLight
LRF[2009 Xiang+]
Predicts from reference pattern
Strong Shadow tag, PHT
HeavyMAIP Predicts based
on two localities
Strong Mem access map
Medium
Improves cost-efficiency by map data structure
Improves prediction accuracy by 2 localities
Q & A
How to Detect Insertion Position
function is_bypass()
if(Sb = BYPASS) return true if(Ca > 16 x Cr) return true return false
endfunction
function get_insert_position()
integer ins_pos=15 if(Hm) ins_pos = ins_pos/2 if(Cr > Ca) ins_pos=ins_pos/2 if(Sb=REUSE) ins_pos=0 if(Sb=USEFUL) ins_pos=ins_pos/2 if(Sb=USELESS) ins_pos=15 return ins_pos
endfunction
Top Related