Rethinking Memory System Design in the Nanoscale Many-Core...
Transcript of Rethinking Memory System Design in the Nanoscale Many-Core...
Rethinking Memory System Design in the Nanoscale Many-Core Era
Onur Mutlu [email protected]
WAMT, 3/14/2010, © Onur Mutlu
Summary Tech/App/Arch Trends New Needs from Memory Hierarchy
Technology scalable (not DRAM) QoS and performance guarantees (not free for all) Energy and power efficient (not one size fits all)
New Memory Hierarchy Research Challenges Tech. Scalability: Enabling NVRAM (+ DRAM) memory QoS/performance: Reducing and controlling interference Energy-efficiency: Customizability and minimal waste
Need fundamental research in HW/SW cooperation Memory architecture, organization, controllers, HW algorithms Low level system software and HW/SW interface
2
WAMT, 3/14/2010, © Onur Mutlu
Agenda
Technology, Application, Architecture Trends Requirements from the Memory Hierarchy Research Challenges and Possible Avenues Summary
3
WAMT, 3/14/2010, © Onur Mutlu
Modern Memory Subsystems (Multi-Core)
4
WAMT, 3/14/2010, © Onur Mutlu
Technology Trends DRAM does not scale well beyond N nm
Memory scaling benefits: density, capacity, cost
Energy/power already key design limiters Memory hierarchy responsible for a large fraction of power
More transistors (cores) on chip (Moore’s Law) Pin bandwidth not increasing as fast as number of
transistors Memory subsystem is a key shared resource among cores More pressure on the memory hierarchy
5
WAMT, 3/14/2010, © Onur Mutlu
Application/System Trends Many different threads/applications/virtual machines will
share the memory system
Cloud computing/servers: Many workloads consolidated on-chip to improve efficiency
GP-GPUs: Many threads from multiple parallel applications Mobile: Interactive + non-interactive consolidation
Different applications with different requirements (SLAs) Some applications/threads require performance guarantees Modern hierarchies do not distinguish between applications
Different goals for different systems/users System throughput, fairness, per-application performance Modern hierarchies are not flexible/configurable
6
WAMT, 3/14/2010, © Onur Mutlu
Architecture Trends
More cores and components More pressure on the memory hierarchy
Asymmetric cores: Performance asymmetry, CPU+GPUs, accelerators, … Motivated by energy efficiency and Amdahl’s Law
Different cores have different performance requirements Memory hierarchies do not distinguish between cores
7
WAMT, 3/14/2010, © Onur Mutlu
Agenda
Technology, Application, Architecture Trends Requirements from the Memory Hierarchy Research Challenges and Possible Avenues Summary
8
WAMT, 3/14/2010, © Onur Mutlu
Requirements from an Ideal Hierarchy
Traditional High system performance Enough capacity Low cost
New Technology scalability QoS support, performance guarantees, configurability Energy (and power, bandwidth) efficiency
9
WAMT, 3/14/2010, © Onur Mutlu
Requirements from an Ideal Hierarchy
Traditional High system performance: Reduce inter-thread interference Enough capacity Low cost
New Technology scalability
Emerging non-volatile memory technologies (PCM, MRAM) can help
QoS support, performance guarantees, configurability Need HW mechanisms SW can use to satisfy QoS policies
Energy (and power, bandwidth) efficiency One size fits all wastes energy, performance, bandwidth
10
WAMT, 3/14/2010, © Onur Mutlu
Agenda
Technology, Application, Architecture Trends Requirements from the Memory Hierarchy Research Challenges and Possible Avenues
Technology scalability QoS support: Inter-thread/application interference Energy/power/bandwidth efficiency
Summary
11
WAMT, 3/14/2010, © Onur Mutlu
Technology Scalability Challenges
Problem: DRAM is not scalable beyond N nm memory capacity, cost may not continue to scale
Some emerging resistive memory technologies (NVRAM) more scalable than DRAM
NVRAM will likely be a main component of the memory hierarchy, but we need to enable it: Redesign the hierarchy to mitigate NVRAM shortcomings Find the right way to place NVRAM in the subsystem Satisfy all other requirements in the presence of NVRAM
12
WAMT, 3/14/2010, © Onur Mutlu
Emerging Memory Technologies
Phase Change Memory
Pros Better technology scaling Non volatility Low idle power
Cons Higher latencies (especially write) Higher active energy Lower endurance
13
WAMT, 3/14/2010, © Onur Mutlu
NVRAM-based Hierarchy: Key Challenges How should NVRAM-based (main) memory be organized?
Hybrid NVRAM+DRAM [Qureshi et al., ISCA’09]: How to partition/migrate data (energy, performance,
endurance) How to design the memory controllers and system software
Exploit advantages, minimize disadvantages
14
WAMT, 3/14/2010, © Onur Mutlu
NVRAM-based Hierarchy: Key Challenges How should NVRAM-based (main) memory be organized?
Pure NVRAM main memory [Lee et al., ISCA’09, IEEE Micro’10]: How to redesign entire hierarchy (and cores) to overcome
NVRAM shortcomings Latency, energy, endurance
15
WAMT, 3/14/2010, © Onur Mutlu
Many Research Challenges: Hybrid Systems Partitioning
Should DRAM be a cache or main memory, or configurable? What fraction? How many controllers?
Data allocation/movement (energy, perf, lifetime, security) Access latency critical, heavy modifications DRAM Non-volatility critical, not accessed heavily PCM Who manages allocation/movement? What are good control algorithms?
Redesign of cache hierarchy, memory controllers How can NVRAM be exploited on chip?
Design of NVRAM/DRAM chips Rethink the design of PCM/DRAM with new requirements?
16
WAMT, 3/14/2010, © Onur Mutlu
Agenda
Technology, Application, Architecture Trends Requirements from the Memory Hierarchy Research Challenges and Possible Avenues
Technology scalability QoS support: Inter-thread/application interference Energy/power/bandwidth efficiency
Summary
17
WAMT, 3/14/2010, © Onur Mutlu
Memory System is the Major Shared Resource
18
threads’ requests interfere
WAMT, 3/14/2010, © Onur Mutlu
Inter-Thread/Application Interference
Problem: Threads share the memory system, but memory system does not distinguish threads’ requests Memory system algorithms thread-unaware and thread-unfair
Existing memory systems Free-for-all, demand-based sharing of the memory system Aggressive threads can deny service to others Do not try to reduce or control inter-thread interference
19
WAMT, 3/14/2010, © Onur Mutlu
Problems due to Uncontrolled Interference
20
Unfair slowdown of different threads [MICRO’07, ISCA’08, ASPLOS’10]
System performance loss [MICRO’07, ISCA’08, HPCA’10]
Vulnerability to denial of service [USENIX Security’07]
Priority inversion: unable to enforce priorities/SLAs [MICRO’07]
Poor performance predictability (no performance isolation)
Cores make very slow progress
Memory performance hog Low priority
High priority N
orm
aliz
ed M
emor
y S
tall-
Tim
e DRAM is the only shared resource
WAMT, 3/14/2010, © Onur Mutlu
QoS-Aware Memory Systems: Challenges How do we reduce inter-thread interference?
Improve system performance and utilization Preserve the benefits of single-thread performance techniques
How do we control inter-thread interference? Provide fairness when needed Satisfy performance guarantees of threads when needed Provide mechanisms to enable system software to enforce a
variety of QoS policies All the while providing high system performance
How do we make the memory system configurable/flexible? Enable flexible mechanisms that can achieve many goals
21
WAMT, 3/14/2010, © Onur Mutlu
Hardware/Software Cooperation for Memory QoS
Hardware good at fine-grained prioritization mechanisms high performance
Software good at coarse-grained prioritization mechanisms Software needed to decide the QoS policy in the memory system
(e.g., system throughput vs. fairness, which app more important)
Hardware provides configurable partitioning/prioritization and feedback mechanisms (fine grained interference control)
Software configures the hardware mechanisms (coarse-grained)
Many challenges How to design flexible hardware resources? How to design the software/hardware interface? How should system software be written?
22
WAMT, 3/14/2010, © Onur Mutlu
Designing QoS-Aware Memory Systems: Approaches
Smart resources: Design each shared resource to have a configurable fairness/QoS mechanism Fair/QoS-aware memory schedulers, interconnects, caches, arbiters Examples: fair memory schedulers [Mutlu MICRO 2007], parallelism-
aware memory schedulers [Mutlu ISCA 2008], application-aware on-chip networks [Das et al. MICRO 2009, ISCA 2010, Grot et al. MICRO 2009]
Dumb resources: Keep each resource free-for-all, but control access to memory system at the cores/sources Fairness via Source Throttling [Ebrahimi et al., ASPLOS 2010]
Estimate thread slowdowns in the entire system and throttle cores that slow down others
Coordinated Prefetcher Throttling [Ebrahimi et al., MICRO 2009]
Combined approaches are even more powerful
23
WAMT, 3/14/2010, © Onur Mutlu
Agenda
Technology, Application, Architecture Trends Requirements from the Memory Hierarchy Research Challenges and Possible Avenues
Technology Scalability QoS support: Inter-thread/application interference Energy/power/bandwidth efficiency
Summary
24
WAMT, 3/14/2010, © Onur Mutlu
Energy-Efficiency in the Memory System
Problem: How to minimize energy/power consumption while satisfying performance requirements
Existing memory systems are wasteful Optimized for “general” behavior, suboptimal for particular
access patterns E.g., fixed cache line size, fixed cache size
A lot of data movement: Moving data can be inefficient
Can we design the memory hierarchy to be customizable? Can we minimize (data) movement?
25
WAMT, 3/14/2010, © Onur Mutlu
Configurability Enables Customization
Non-configurable: One size fits all Energy and performance suboptimal for different behaviors
Configurable: Enables tradeoffs and customization Processing requirements vary across applications and phases Execute code on best-fit resources (minimal energy, adequate perf.)
26
C4 C4
C5 C5
C4 C4
C5 C5
C2
C3
C1
Configurable
C C
C C
C C
C C
C C
C C
C C
C C
Non-configurable
C C
C C
C C
C C
C C
C C
C C
C C
Configurable
WAMT, 3/14/2010, © Onur Mutlu
Customizable Memory Systems: Challenges
What type of access/communication patterns deserve customization?
How do we enable customization?
How should applications be mapped to the best-fit memory hierarchy resources?
Many design, monitoring, program characterization questions
Hardware and software should work cooperatively
27
WAMT, 3/14/2010, © Onur Mutlu
Summary Technology, application, architecture trends dictate
fundamentally new needs from memory system
A Fresh Look at Re-designing Memory Hierarchy Tech. Scalability: Enabling NVRAM (+ DRAM) memory QoS/performance: Reducing and controlling interference Energy-efficiency: Customizability and minimal waste
HW/SW cooperation essential Fundamental changes to architecture, uarch, software Many challenges and opportunities
28
Rethinking Memory System Design in the Nanoscale Many-Core Era
Onur Mutlu [email protected]