A Row Buffer Locality-Aware Caching Policy for Hybrid Memories
description
Transcript of A Row Buffer Locality-Aware Caching Policy for Hybrid Memories
![Page 1: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/1.jpg)
A Row Buffer Locality-Aware Caching Policy for Hybrid Memories
HanBin YoonJustin Meza
Rachata AusavarungnirunRachael Harding
Onur Mutlu
![Page 2: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/2.jpg)
2
Overview• Emerging memories such as PCM offer higher
density than DRAM, but have drawbacks• Hybrid memories aim to achieve best of both• We identify row buffer locality (RBL) as a key
criterion for data placement– We develop a policy that caches to DRAM rows
with low RBL and high reuse• 50% perf. improvement over all-PCM memory• Within 23% perf. of all-DRAM memory
![Page 3: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/3.jpg)
Demand for Memory Capacity• Increasing cores and thread contexts
– Intel Sandy Bridge: 8 cores (16 threads)– AMD Abu Dhabi: 16 cores– IBM POWER7: 8 cores (32 threads)– Sun T4: 8 cores (64 threads)
• Modern data-intensive applications operate on huge datasets
3
![Page 4: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/4.jpg)
Emerging High Density Memory• DRAM density scaling becoming costly• Phase change memory (PCM)
+ Projected 3−12 denser than DRAM1
• However, cannot simply replace DRAM− Higher access latency (4−12 DRAM2)− Higher access energy (2−40 DRAM2)− Limited write endurance (108 writes2)
Use DRAM as a cache to PCM memory3 4[1Mohan HPTS’09; 2Lee+ ISCA’09; 3Qureshi+ ISCA’09]
![Page 5: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/5.jpg)
Hybrid Memory• Benefits from both DRAM and PCM
– DRAM: Low latency, high endurance– PCM: High capacity
• Key question: Where to place data between these heterogeneous devices?
• To help answer this question, let’s take a closer look at these technologies
5
![Page 6: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/6.jpg)
Hybrid Memory: A Closer Look
6
MC MC
DRAM(small capacity cache)
PCM(large capacity memory)
CPU
Memory channel
Bank Bank Bank Bank
Row buffer
![Page 7: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/7.jpg)
Row Buffers and Latency• Memory cells organized in columns and rows
• Row buffers store last accessed row– Hit: Access data from row buffer fast– Miss: Access data from cell array slow
7
![Page 8: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/8.jpg)
Key Observation• Row buffers exist in both DRAM and PCM
– Row buffer hit latency similar in DRAM & PCM2
– Row buffer miss latency small in DRAM– Row buffer miss latency large in PCM
• Place data in DRAM which– Frequently miss in row buffer (low row buffer
locality) miss penalty is smaller in DRAM– Are reused many times data worth the caching
effort (contention in mem. channel and DRAM)
8[2Lee+ ISCA’09]
![Page 9: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/9.jpg)
Data Placement Implications
9
Let’s say a processor accesses four rows
Row A Row B Row C Row D
![Page 10: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/10.jpg)
Data Placement Implications
10
Let’s say a processor accesses four rowswith different row buffer localities (RBL)
Row A Row B Row C Row D
Low RBL(Frequently miss
in row buffer)
High RBL(Frequently hitin row buffer)
![Page 11: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/11.jpg)
RBL-Unaware Policy
11
A row buffer locality-unaware policy couldplace these rows in the following manner
DRAM(High RBL)
PCM(Low RBL)
Row CRow D
Row ARow B
![Page 12: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/12.jpg)
RBL-Unaware Policy
12
DRAM(High RBL)
PCM(Low RBL)
A B
C DC C D D
A B A B
Accesses pattern to main memory:A (oldest), B, C, C, C, A, B, D, D, D, A, B (youngest)
Stall time: 6 PCM device accesses
time
![Page 13: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/13.jpg)
RBL-Aware Policy
13
A row buffer locality-aware policy wouldplace these rows in the following manner
DRAM(Low RBL)
PCM(High RBL)
Access data at lower row buffer miss latency of DRAM
Access data at low row buffer hit latency of PCM
Row ARow B
Row CRow D
![Page 14: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/14.jpg)
RBL-Aware Policy
14
DRAM(Low RBL)
PCM(High RBL)
A B
C DC C D D
A B A B
Accesses pattern to main memory:A (oldest), B, C, C, C, A, B, D, D, D, A, B (youngest)
Stall time: 6 DRAM device accesses
Saved cycles
time
![Page 15: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/15.jpg)
Our Mechanism: DynRBLA1. For a subset of recently used rows in PCM:
– Count row buffer misses as indicator of row buffer locality (RBL)
2. Cache to DRAM rows with misses threshold– Row buffer miss counts are periodically reset (only
cache rows with high reuse)3. Dynamically adjust threshold to adapt to
workload/system characteristics– Interval-based cost-benefit analysis
15
![Page 16: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/16.jpg)
Evaluation Methodology• Cycle-level x86 CPU-memory simulator
– CPU: 16 out-of-order cores, 32KB private L1 per core, 512KB shared L2 per core
– Memory: DDR3 1066 MT/s, 1GB DRAM, 16GB PCM, 1KB migration granularity
• Multi-programmed server & cloud workloads– Server (18): TPC-C, TPC-H– Cloud (18): Webserver, Video, TPC-C/H
16
![Page 17: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/17.jpg)
Comparison Points and Metrics• DynRBLA: Adaptive RBL-aware caching• RBLA: Row buffer locality-aware caching• Freq4: Frequency-based caching• DynFreq: Adaptive Freq.-based caching• Weighted speedup (performance) = sum
of speedups versus when run alone• Max slowdown (fairness) = highest
slowdown experienced by any thread17[4Jiang+ HPCA’10]
![Page 18: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/18.jpg)
Server Cloud Avg0
0.2
0.4
0.6
0.8
1
1.2
1.4
Freq DynFreq RBLA DynRBLA
Workload
Performance
18
Wei
ghte
d Sp
eedu
p (n
orm
.)
Benefit 1: Reduced memory bandwidth consumption due to stricter caching criteriaBenefit 2: Increased row buffer locality (RBL)
in PCM by moving low RBL data to DRAM
![Page 19: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/19.jpg)
Server Cloud Avg0
0.2
0.4
0.6
0.8
1
1.2
Freq DynFreq RBLA DynRBLA
Workload
Fairness
19
Max
imum
Slo
wdo
wn
(nor
m.)
Lower contention for row buffer in PCM & memory channel
![Page 20: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/20.jpg)
Server Cloud Avg0
0.2
0.4
0.6
0.8
1
1.2
1.4
Freq DynFreq RBLA DynRBLA
Workload
Energy Efficiency
20
Perfo
rman
ce p
er W
att (n
orm
.)
Increased performance & reduced data movement between DRAM and PCM
![Page 21: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/21.jpg)
0
0.2
0.4
0.6
0.8
1
1.2
Maximum Slowdown
00.20.40.60.8
11.21.41.61.8
22.2
Weighted Speedup
Compared to All-PCM/DRAM
21
Wei
ghte
d Sp
eedu
p (n
orm
.)
Max
imum
Slo
wdo
wn
(nor
m.)
Our mechanism achieves 50% better performance than all PCM, within 23% of all DRAM performance
![Page 22: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/22.jpg)
22
Conclusion• Demand for huge main memory capacity
– PCM offers greater density than DRAM– Hybrid memories achieve the best of both
• We identify row buffer locality (RBL) as a key metric for caching decisions
• We develop a policy that caches to DRAM rows with low RBL and high reuse
• Enables high-performance energy-efficient hybrid main memories
![Page 23: A Row Buffer Locality-Aware Caching Policy for Hybrid Memories](https://reader035.fdocuments.in/reader035/viewer/2022062813/5681661d550346895dd96cc7/html5/thumbnails/23.jpg)
Thank you! Questions?
23