18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 ·...
Transcript of 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 ·...
![Page 1: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/1.jpg)
18-447
Computer Architecture
Lecture 23: Memory Management
Prof. Onur Mutlu
Carnegie Mellon University
Spring 2015, 3/27/2015
![Page 2: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/2.jpg)
Where We Are in Lecture Schedule
The memory hierarchy
Caches, caches, more caches
Virtualizing the memory hierarchy: Virtual Memory
Main memory: DRAM
Main memory control, scheduling
Memory latency tolerance techniques
Non-volatile memory
Multiprocessors
Coherence and consistency
Interconnection networks
Multi-core issues (e.g., heterogeneous multi-core)
2
![Page 3: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/3.jpg)
Required Reading (for the Next Few Lectures)
Onur Mutlu, Justin Meza, and Lavanya Subramanian,"The Main Memory System: Challenges and Opportunities"Invited Article in Communications of the Korean Institute of Information Scientists and Engineers (KIISE), 2015.
http://users.ece.cmu.edu/~omutlu/pub/main-memory-system_kiise15.pdf
3
![Page 4: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/4.jpg)
Required Readings on DRAM
DRAM Organization and Operation Basics
Sections 1 and 2 of: Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,” HPCA 2013.
http://users.ece.cmu.edu/~omutlu/pub/tldram_hpca13.pdf
Sections 1 and 2 of Kim et al., “A Case for Subarray-Level Parallelism (SALP) in DRAM,” ISCA 2012.
http://users.ece.cmu.edu/~omutlu/pub/salp-dram_isca12.pdf
DRAM Refresh Basics
Sections 1 and 2 of Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012. http://users.ece.cmu.edu/~omutlu/pub/raidr-dram-refresh_isca12.pdf
4
![Page 5: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/5.jpg)
Memory Interference and Scheduling
in Multi-Core Systems
![Page 6: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/6.jpg)
6
Review: A Modern DRAM Controller
![Page 7: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/7.jpg)
(Un)expected Slowdowns in Multi-Core
7
Low priority
High priority
(Core 0) (Core 1)
Moscibroda and Mutlu, “Memory performance attacks: Denial of memory service in multi-core systems,” USENIX Security 2007.
![Page 8: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/8.jpg)
Memory Scheduling Techniques
We covered
FCFS
FR-FCFS
STFM (Stall-Time Fair Memory Access Scheduling)
PAR-BS (Parallelism-Aware Batch Scheduling)
ATLAS
TCM (Thread Cluster Memory Scheduling)
There are many more …
See your required reading (Section 7):
Mutlu et al., “The Main Memory System: Challenges and Opportunities,” KIISE 2015.
8
![Page 9: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/9.jpg)
Other Ways of
Handling Memory Interference
![Page 10: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/10.jpg)
Fundamental Interference Control Techniques
Goal: to reduce/control inter-thread memory interference
1. Prioritization or request scheduling
2. Data mapping to banks/channels/ranks
3. Core/source throttling
4. Application/thread scheduling
10
![Page 11: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/11.jpg)
Observation: Modern Systems Have Multiple Channels
A new degree of freedom
Mapping data across multiple channels
11
Channel 0Red App
Blue App
Memory Controller
Memory Controller
Channel 1
Memory
Core
Core
Memory
Muralidhara et al., “Memory Channel Partitioning,” MICRO’11.
![Page 12: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/12.jpg)
Data Mapping in Current Systems
12
Channel 0Red App
Blue App
Memory Controller
Memory Controller
Channel 1
Memory
Core
Core
Memory
Causes interference between applications’ requests
Page
Muralidhara et al., “Memory Channel Partitioning,” MICRO’11.
![Page 13: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/13.jpg)
Partitioning Channels Between Applications
13
Channel 0Red App
Blue App
Memory Controller
Memory Controller
Channel 1
Memory
Core
Core
Memory
Page
Eliminates interference between applications’ requests
Muralidhara et al., “Memory Channel Partitioning,” MICRO’11.
![Page 14: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/14.jpg)
Overview: Memory Channel Partitioning (MCP)
Goal
Eliminate harmful interference between applications
Basic Idea
Map the data of badly-interfering applications to different channels
Key Principles
Separate low and high memory-intensity applications
Separate low and high row-buffer locality applications
14Muralidhara et al., “Memory Channel Partitioning,” MICRO’11.
![Page 15: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/15.jpg)
Key Insight 1: Separate by Memory Intensity
High memory-intensity applications interfere with low memory-intensity applications in shared memory channels
15
Map data of low and high memory-intensity applications to different channels
12345Channel 0
Bank 1
Channel 1
Bank 0
Conventional Page Mapping
Red App
Blue App
Time Units
Core
Core
Bank 1
Bank 0
Channel Partitioning
Red App
Blue App
Channel 0Time Units
12345
Channel 1
Core
Core
Bank 1
Bank 0
Bank 1
Bank 0
Saved Cycles
Saved Cycles
![Page 16: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/16.jpg)
Key Insight 2: Separate by Row-Buffer Locality
16
High row-buffer locality applications interfere with low
row-buffer locality applications in shared memory channels
Conventional Page Mapping
Channel 0
Bank 1
Channel 1
Bank 0R1
R0R2R3R0
R4
Request Buffer State
Bank 1
Bank 0
Channel 1
Channel 0
R0R0
Service Order
123456
R2R3
R4
R1
Time units
Bank 1
Bank 0
Bank 1
Bank 0
Channel 1
Channel 0
R0R0
Service Order
123456
R2R3
R4R1
Time units
Bank 1
Bank 0
Bank 1
Bank 0
R0
Channel 0
R1
R2R3
R0
R4
Request Buffer State
Channel Partitioning
Bank 1
Bank 0
Bank 1
Bank 0
Channel 1
Saved CyclesMap data of low and high row-buffer locality applications
to different channels
![Page 17: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/17.jpg)
Memory Channel Partitioning (MCP) Mechanism
1. Profile applications
2. Classify applications into groups
3. Partition channels between application groups
4. Assign a preferred channel to each application
5. Allocate application pages to preferred channel
17
Hardware
System
Software
Muralidhara et al., “Memory Channel Partitioning,” MICRO’11.
![Page 18: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/18.jpg)
Interval Based Operation
18
time
Current Interval Next Interval
1. Profile applications
2. Classify applications into groups3. Partition channels between groups4. Assign preferred channel to applications
5. Enforce channel preferences
![Page 19: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/19.jpg)
Observations
Applications with very low memory-intensity rarely access memory Dedicating channels to them results in precious memory bandwidth waste
They have the most potential to keep their cores busy We would really like to prioritize them
They interfere minimally with other applications Prioritizing them does not hurt others
19
![Page 20: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/20.jpg)
Integrated Memory Partitioning and Scheduling (IMPS)
Always prioritize very low memory-intensity applications in the memory scheduler
Use memory channel partitioning to mitigate interference between other applications
20Muralidhara et al., “Memory Channel Partitioning,” MICRO’11.
![Page 21: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/21.jpg)
Hardware Cost
Memory Channel Partitioning (MCP)
Only profiling counters in hardware
No modifications to memory scheduling logic
1.5 KB storage cost for a 24-core, 4-channel system
Integrated Memory Partitioning and Scheduling (IMPS)
A single bit per request
Scheduler prioritizes based on this single bit
21Muralidhara et al., “Memory Channel Partitioning,” MICRO’11.
![Page 22: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/22.jpg)
Performance of Channel Partitioning
22
1%
5%
0.9
0.95
1
1.05
1.1
1.15
No
rmal
ize
d
Syst
em
Pe
rfo
rman
ce FRFCFS
ATLAS
TCM
MCP
IMPS
7%
11%
Better system performance than the best previous scheduler at lower hardware cost
Averaged over 240 workloads
![Page 23: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/23.jpg)
Combining Multiple Interference Control Techniques
Combined interference control techniques can mitigate interference much more than a single technique alone can do
The key challenge is:
Deciding what technique to apply when
Partitioning work appropriately between software and hardware
23
![Page 24: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/24.jpg)
Fundamental Interference Control Techniques
Goal: to reduce/control inter-thread memory interference
1. Prioritization or request scheduling
2. Data mapping to banks/channels/ranks
3. Core/source throttling
4. Application/thread scheduling
24
![Page 25: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/25.jpg)
Source Throttling: A Fairness Substrate
Key idea: Manage inter-thread interference at the cores (sources), not at the shared resources
Dynamically estimate unfairness in the memory system
Feed back this information into a controller
Throttle cores’ memory access rates accordingly
Whom to throttle and by how much depends on performance target (throughput, fairness, per-thread QoS, etc)
E.g., if unfairness > system-software-specified target thenthrottle down core causing unfairness &throttle up core that was unfairly treated
Ebrahimi et al., “Fairness via Source Throttling,” ASPLOS’10, TOCS’12.
25
![Page 26: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/26.jpg)
26
Runtime UnfairnessEvaluation
DynamicRequest Throttling
1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)
if (Unfairness Estimate >Target) {1-Throttle down App-interfering
(limit injection rate and parallelism)
2-Throttle up App-slowest}
FST
Unfairness Estimate
App-slowest
App-interfering
⎪ ⎨ ⎪ ⎧⎩
Slowdown Estimation
TimeInterval 1 Interval 2 Interval 3
Runtime UnfairnessEvaluation
DynamicRequest Throttling
Fairness via Source Throttling (FST) [ASPLOS’10]
![Page 27: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/27.jpg)
Core (Source) Throttling
Idea: Estimate the slowdown due to (DRAM) interference and throttle down threads that slow down others
Ebrahimi et al., “Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multi-Core Memory Systems,” ASPLOS 2010.
Advantages
+ Core/request throttling is easy to implement: no need to change the memory scheduling algorithm
+ Can be a general way of handling shared resource contention
+ Can reduce overall load/contention in the memory system
Disadvantages
- Requires interference/slowdown estimations difficult to estimate
- Thresholds can become difficult to optimize throughput loss
27
![Page 28: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/28.jpg)
Fundamental Interference Control Techniques
Goal: to reduce/control interference
1. Prioritization or request scheduling
2. Data mapping to banks/channels/ranks
3. Core/source throttling
4. Application/thread scheduling
Idea: Pick threads that do not badly interfere with each other to be scheduled together on cores sharing the memory system
28
![Page 29: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/29.jpg)
Interference-Aware Thread Scheduling
An example from scheduling in clusters (data centers)
Clusters can be running virtual machines
29
![Page 30: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/30.jpg)
Virtualized Cluster
30
Core0 Core1
Host
LLC
DRAM
VM
App
VM
App
Core0 Core1
Host
LLC
DRAM
VM
App
VM
App
How to dynamically schedule VMs onto
hosts?
Distributed Resource Management (DRM) policies
![Page 31: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/31.jpg)
Conventional DRM Policies
31
Core0 Core1
Host
LLC
DRAM
App App
Core0 Core1
Host
LLC
DRAM
VM
App
VM
App
VM
App
Memory Capacity
CPU
Based on operating-system-level metricse.g., CPU utilization, memory capacity demand
![Page 32: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/32.jpg)
Microarchitecture-level Interference
32
VM
App
Core0 Core1
Host
LLC
DRAM
VM
App
• VMs within a host compete for:
– Shared cache capacity
– Shared memory bandwidth
Can operating-system-level metrics capture the microarchitecture-level resource interference?
![Page 33: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/33.jpg)
Microarchitecture Unawareness
33
Core0 Core1
Host
LLC
DRAM
VM
App
VM
App
Core0 Core1
Host
LLC
DRAM
VM
App
VM
App
VMOperating-system-level metrics
CPU Utilization Memory Capacity
92% 369 MB
93% 348 MBApp
App
STREAM
gromacs
Microarchitecture-level metrics
LLC Hit Ratio Memory Bandwidth
2% 2267 MB/s
98% 1 MB/s
VM
App
Memory Capacity
CPU
![Page 34: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/34.jpg)
Impact on Performance
34
0.0
0.2
0.4
0.6
IPC (Harmonic
Mean)
Conventional DRM with Microarchitecture Awareness
Core0 Core1
Host
LLC
DRAM
VM
App
VM
App
Core0 Core1
Host
LLC
DRAM
VM
App
VM
App
App
STREAM
gromacs
VM
App
Memory Capacity
CPU SWAP
![Page 35: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/35.jpg)
Impact on Performance
35
0.0
0.2
0.4
0.6
IPC (Harmonic
Mean)
Conventional DRM with Microarchitecture Awareness
49%
Core0 Core1
Host
LLC
DRAM
VM
App
VM
App
Core0 Core1
Host
LLC
DRAM
VM
App
VM
App
App
STREAM
gromacs
VM
App
Memory Capacity
CPU
We need microarchitecture-level interference awareness in
DRM!
![Page 36: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/36.jpg)
A-DRM: Architecture-aware DRM
• Goal: Take into account microarchitecture-level shared resource interference– Shared cache capacity
– Shared memory bandwidth
• Key Idea:
– Monitor and detect microarchitecture-level shared resource interference
– Balance microarchitecture-level resource usage across cluster to minimize memory interference while maximizing system performance
36
![Page 37: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/37.jpg)
A-DRM: Architecture-aware DRM
37
OS+Hypervisor
VM
App
VM
App
A-DRM: Global Architecture –aware Resource Manager
Profiling Engine
Architecture-awareInterference Detector
Architecture-aware Distributed Resource Management (Policy)
Migration Engine
Hosts Controller
CPU/Memory Capacity
Profiler
Architectural Resource
•••
Architectural Resources
![Page 38: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/38.jpg)
More on Architecture-Aware DRM Optional Reading
Wang et al., “A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters,” VEE 2015.
http://users.ece.cmu.edu/~omutlu/pub/architecture-aware-distributed-resource-management_vee15.pdf
38
![Page 39: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/39.jpg)
Interference-Aware Thread Scheduling
Advantages
+ Can eliminate/minimize interference by scheduling “symbiotic applications” together (as opposed to just managing the interference)
+ Less intrusive to hardware (no need to modify the hardware resources)
Disadvantages and Limitations
-- High overhead to migrate threads between cores and machines
-- Does not work (well) if all threads are similar and they interfere
39
![Page 40: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/40.jpg)
Summary: Fundamental Interference Control Techniques
Goal: to reduce/control interference
1. Prioritization or request scheduling
2. Data mapping to banks/channels/ranks
3. Core/source throttling
4. Application/thread scheduling
Best is to combine all. How would you do that?
40
![Page 41: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/41.jpg)
Handling Memory Interference
In Multithreaded Applications
![Page 42: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/42.jpg)
Multithreaded (Parallel) Applications
Threads in a multi-threaded application can be inter-dependent
As opposed to threads from different applications
Such threads can synchronize with each other
Locks, barriers, pipeline stages, condition variables, semaphores, …
Some threads can be on the critical path of execution due to synchronization; some threads are not
Even within a thread, some “code segments” may be on the critical path of execution; some are not
42
![Page 43: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/43.jpg)
Critical Sections
Enforce mutually exclusive access to shared data
Only one thread can be executing it at a time
Contended critical sections make threads wait threads
causing serialization can be on the critical path
43
Each thread:
loop {
Compute
lock(A)
Update shared data
unlock(A)
}
N
C
![Page 44: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/44.jpg)
Barriers
Synchronization point
Threads have to wait until all threads reach the barrier
Last thread arriving to the barrier is on the critical path
44
Each thread:
loop1 {
Compute
}
barrier
loop2 {
Compute
}
![Page 45: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/45.jpg)
Stages of Pipelined Programs
Loop iterations are statically divided into code segments called stages
Threads execute stages on different cores
Thread executing the slowest stage is on the critical path
45
loop {
Compute1
Compute2
Compute3
}
A
B
C
A B C
![Page 46: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/46.jpg)
Handling Interference in Parallel Applications
Threads in a multithreaded application are inter-dependent
Some threads can be on the critical path of execution due to synchronization; some threads are not
How do we schedule requests of inter-dependent threads to maximize multithreaded application performance?
Idea: Estimate limiter threads likely to be on the critical path and prioritize their requests; shuffle priorities of non-limiter threadsto reduce memory interference among them [Ebrahimi+, MICRO’11]
Hardware/software cooperative limiter thread estimation:
Thread executing the most contended critical section
Thread executing the slowest pipeline stage
Thread that is falling behind the most in reaching a barrier
46
![Page 47: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/47.jpg)
Prioritizing Requests from Limiter Threads
47
Critical Section 1 BarrierNon-Critical Section
Waiting for Sync
or Lock
Thread D
Thread C
Thread B
Thread A
Time
Barrier
Time
Barrier
Thread D
Thread C
Thread B
Thread A
Critical Section 2 Critical Path
Saved
Cycles Limiter Thread: DBCA
Most Contended
Critical Section: 1
Limiter Thread Identification
![Page 48: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/48.jpg)
More on Parallel Application Memory Scheduling
Optional reading
Ebrahimi et al., “Parallel Application Memory Scheduling,” MICRO 2011.
http://users.ece.cmu.edu/~omutlu/pub/parallel-memory-scheduling_micro11.pdf
48
![Page 49: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/49.jpg)
More on DRAM Management and
DRAM Controllers
![Page 50: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/50.jpg)
DRAM Power Management
DRAM chips have power modes
Idea: When not accessing a chip power it down
Power states
Active (highest power)
All banks idle
Power-down
Self-refresh (lowest power)
State transitions incur latency during which the chip cannot be accessed
50
![Page 51: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/51.jpg)
DRAM Refresh
![Page 52: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/52.jpg)
DRAM Refresh
DRAM capacitor charge leaks over time
The memory controller needs to refresh each row periodically to restore charge
Read and close each row every N ms
Typical N = 64 ms
Downsides of refresh
-- Energy consumption: Each refresh consumes energy
-- Performance degradation: DRAM rank/bank unavailable while refreshed
-- QoS/predictability impact: (Long) pause times during refresh
-- Refresh rate limits DRAM capacity scaling
52
![Page 53: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/53.jpg)
DRAM Refresh: Performance
Implications of refresh on performance
-- DRAM bank unavailable while refreshed
-- Long pause times: If we refresh all rows in burst, every 64ms the DRAM will be unavailable until refresh ends
Burst refresh: All rows refreshed immediately after one another
Distributed refresh: Each row refreshed at a different time, at regular intervals
53
![Page 54: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/54.jpg)
Distributed Refresh
Distributed refresh eliminates long pause times
How else can we reduce the effect of refresh on performance/QoS?
Does distributed refresh reduce refresh impact on energy?
Can we reduce the number of refreshes?
54
![Page 55: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/55.jpg)
Refresh Today: Auto Refresh
55
Columns
Row
s
Row Buffer
DRAM CONTROLLER
DRAM Bus
BANK 0 BANK 1 BANK 2 BANK 3
A batch of rows are
periodically refreshed
via the auto-refresh command
![Page 56: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/56.jpg)
Refresh Overhead: Performance
56
8%
46%
Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.
![Page 57: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/57.jpg)
Refresh Overhead: Energy
57
15%
47%
Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.
![Page 58: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/58.jpg)
Problem with Conventional Refresh
Today: Every row is refreshed at the same rate
Observation: Most rows can be refreshed much less often without losing data [Kim+, EDL’09]
Problem: No support in DRAM for different refresh rates per row
58
![Page 59: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/59.jpg)
Retention Time of DRAM Rows
Observation: Only very few rows need to be refreshed at the worst-case rate
Can we exploit this to reduce refresh operations at low cost?
59
![Page 60: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/60.jpg)
Reducing DRAM Refresh Operations
Idea: Identify the retention time of different rows and refresh each row at the frequency it needs to be refreshed
(Cost-conscious) Idea: Bin the rows according to their minimum retention times and refresh rows in each bin at the refresh rate specified for the bin
e.g., a bin for 64-128ms, another for 128-256ms, …
Observation: Only very few rows need to be refreshed very frequently [64-128ms] Have only a few bins Low HW
overhead to achieve large reductions in refresh operations
Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.
60
![Page 61: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/61.jpg)
1. Profiling: Profile the retention time of all DRAM rows
can be done at DRAM design time or dynamically
2. Binning: Store rows into bins by retention time
use Bloom Filters for efficient and scalable storage
3. Refreshing: Memory controller refreshes rows in different bins at different rates
probe Bloom Filters to determine refresh rate of a row
RAIDR: Mechanism
61
1.25KB storage in controller for 32GB DRAM memory
![Page 62: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/62.jpg)
1. Profiling
62
![Page 63: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/63.jpg)
2. Binning
How to efficiently and scalably store rows into retention time bins?
Use Hardware Bloom Filters [Bloom, CACM 1970]
63Bloom, “Space/Time Trade-offs in Hash Coding with Allowable Errors”, CACM 1970.
![Page 64: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/64.jpg)
Bloom Filter
[Bloom, CACM 1970]
Probabilistic data structure that compactly represents set membership (presence or absence of element in a set)
Non-approximate set membership: Use 1 bit per element to indicate absence/presence of each element from an element space of N elements
Approximate set membership: use a much smaller number of bits and indicate each element’s presence/absence with a subset of those bits
Some elements map to the bits other elements also map to
Operations: 1) insert, 2) test, 3) remove all elements
64Bloom, “Space/Time Trade-offs in Hash Coding with Allowable Errors”, CACM 1970.
![Page 65: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/65.jpg)
Bloom Filter Operation Example
65Bloom, “Space/Time Trade-offs in Hash Coding with Allowable Errors”, CACM 1970.
![Page 66: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/66.jpg)
Bloom Filter Operation Example
66
![Page 67: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/67.jpg)
Bloom Filter Operation Example
67
![Page 68: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/68.jpg)
Bloom Filter Operation Example
68
![Page 69: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/69.jpg)
Bloom Filter Operation Example
69
![Page 70: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/70.jpg)
Bloom Filters
70
![Page 71: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/71.jpg)
Bloom Filters: Pros and Cons
Advantages
+ Enables storage-efficient representation of set membership
+ Insertion and testing for set membership (presence) are fast
+ No false negatives: If Bloom Filter says an element is not present in the set, the element must not have been inserted
+ Enables tradeoffs between time & storage efficiency & false positive rate (via sizing and hashing)
Disadvantages
-- False positives: An element may be deemed to be present in the set by the Bloom Filter but it may never have been inserted
Not the right data structure when you cannot tolerate false positives
71Bloom, “Space/Time Trade-offs in Hash Coding with Allowable Errors”, CACM 1970.
![Page 72: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/72.jpg)
Benefits of Bloom Filters as Refresh Rate Bins
False positives: a row may be declared present in the Bloom filter even if it was never inserted
Not a problem: Refresh some rows more frequently than needed
No false negatives: rows are never refreshed less frequently than needed (no correctness problems)
Scalable: a Bloom filter never overflows (unlike a fixed-size table)
Efficient: No need to store info on a per-row basis; simple hardware 1.25 KB for 2 filters for 32 GB DRAM system
72
![Page 73: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/73.jpg)
Use of Bloom Filters in Hardware
Useful when you can tolerate false positives in set membership tests
See the following recent examples for clear descriptions of how Bloom Filters are used
Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.
Seshadri et al., “The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing,”PACT 2012.
73
![Page 74: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/74.jpg)
3. Refreshing (RAIDR Refresh Controller)
74
![Page 75: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/75.jpg)
3. Refreshing (RAIDR Refresh Controller)
75
Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.
![Page 76: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/76.jpg)
RAIDR: Baseline Design
76
Refresh control is in DRAM in today’s auto-refresh systems
RAIDR can be implemented in either the controller or DRAM
![Page 77: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/77.jpg)
RAIDR in Memory Controller: Option 1
77
Overhead of RAIDR in DRAM controller:1.25 KB Bloom Filters, 3 counters, additional commands issued for per-row refresh (all accounted for in evaluations)
![Page 78: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/78.jpg)
RAIDR in DRAM Chip: Option 2
78
Overhead of RAIDR in DRAM chip:Per-chip overhead: 20B Bloom Filters, 1 counter (4 Gbit chip)
Total overhead: 1.25KB Bloom Filters, 64 counters (32 GB DRAM)
![Page 79: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/79.jpg)
RAIDR: Results and Takeaways System: 32GB DRAM, 8-core; SPEC, TPC-C, TPC-H workloads
RAIDR hardware cost: 1.25 kB (2 Bloom filters)
Refresh reduction: 74.6%
Dynamic DRAM energy reduction: 16%
Idle DRAM power reduction: 20%
Performance improvement: 9%
Benefits increase as DRAM scales in density
79
![Page 80: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/80.jpg)
DRAM Refresh: More Questions
What else can you do to reduce the impact of refresh?
What else can you do if you know the retention times of rows?
How can you accurately measure the retention time of DRAM rows?
Recommended reading:
Liu et al., “An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms,” ISCA 2013.
80
![Page 81: 18-447 Computer Architecture Lecture 23: Memory …ece447/s15/lib/exe/fetch.php?...2015/03/27 · Onur Mutlu, Justin Meza, and Lavanya Subramanian, "The Main Memory System: Challenges](https://reader034.fdocuments.in/reader034/viewer/2022050516/5f9fe00d967c78716c3257c9/html5/thumbnails/81.jpg)
More Readings on DRAM Refresh
Liu et al., “An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms,” ISCA 2013.
http://users.ece.cmu.edu/~omutlu/pub/dram-retention-time-characterization_isca13.pdf
Chang+, “Improving DRAM Performance by Parallelizing Refreshes with Accesses,” HPCA 2014.
http://users.ece.cmu.edu/~omutlu/pub/dram-access-refresh-parallelization_hpca14.pdf
81