Rethinking Memory System Design in the Nanoscale Many-Core...

Rethinking Memory System Design in the Nanoscale Many-Core Era

Onur Mutlu [email protected]

WAMT, 3/14/2010, © Onur Mutlu

Summary   Tech/App/Arch Trends New Needs from Memory Hierarchy

  Technology scalable (not DRAM)   QoS and performance guarantees (not free for all)   Energy and power efficient (not one size fits all)

  New Memory Hierarchy Research Challenges   Tech. Scalability: Enabling NVRAM (+ DRAM) memory   QoS/performance: Reducing and controlling interference   Energy-efficiency: Customizability and minimal waste

  Need fundamental research in HW/SW cooperation   Memory architecture, organization, controllers, HW algorithms   Low level system software and HW/SW interface

2


Agenda

  Technology, Application, Architecture Trends   Requirements from the Memory Hierarchy   Research Challenges and Possible Avenues   Summary

3


Modern Memory Subsystems (Multi-Core)

4


Technology Trends   DRAM does not scale well beyond N nm

  Memory scaling benefits: density, capacity, cost

  Energy/power already key design limiters   Memory hierarchy responsible for a large fraction of power

  More transistors (cores) on chip (Moore’s Law)   Pin bandwidth not increasing as fast as number of

transistors   Memory subsystem is a key shared resource among cores   More pressure on the memory hierarchy

5


Application/System Trends   Many different threads/applications/virtual machines will

share the memory system

  Cloud computing/servers: Many workloads consolidated on-chip to improve efficiency

  GP-GPUs: Many threads from multiple parallel applications   Mobile: Interactive + non-interactive consolidation

  Different applications with different requirements (SLAs)   Some applications/threads require performance guarantees   Modern hierarchies do not distinguish between applications

  Different goals for different systems/users   System throughput, fairness, per-application performance   Modern hierarchies are not flexible/configurable

6


Architecture Trends

  More cores and components   More pressure on the memory hierarchy

  Asymmetric cores: Performance asymmetry, CPU+GPUs, accelerators, …   Motivated by energy efficiency and Amdahl’s Law

  Different cores have different performance requirements   Memory hierarchies do not distinguish between cores

7


Agenda

  Technology, Application, Architecture Trends   Requirements from the Memory Hierarchy   Research Challenges and Possible Avenues   Summary

8


Requirements from an Ideal Hierarchy

  Traditional   High system performance   Enough capacity   Low cost

  New   Technology scalability   QoS support, performance guarantees, configurability   Energy (and power, bandwidth) efficiency

9


Requirements from an Ideal Hierarchy

  Traditional   High system performance: Reduce inter-thread interference   Enough capacity   Low cost

  New   Technology scalability

  Emerging non-volatile memory technologies (PCM, MRAM) can help

  QoS support, performance guarantees, configurability   Need HW mechanisms SW can use to satisfy QoS policies

  Energy (and power, bandwidth) efficiency   One size fits all wastes energy, performance, bandwidth

10


Agenda

  Technology, Application, Architecture Trends   Requirements from the Memory Hierarchy   Research Challenges and Possible Avenues

  Technology scalability   QoS support: Inter-thread/application interference   Energy/power/bandwidth efficiency

  Summary

11


Technology Scalability Challenges

  Problem: DRAM is not scalable beyond N nm memory capacity, cost may not continue to scale

  Some emerging resistive memory technologies (NVRAM) more scalable than DRAM

  NVRAM will likely be a main component of the memory hierarchy, but we need to enable it:   Redesign the hierarchy to mitigate NVRAM shortcomings   Find the right way to place NVRAM in the subsystem   Satisfy all other requirements in the presence of NVRAM

12


Emerging Memory Technologies

  Phase Change Memory

  Pros   Better technology scaling   Non volatility   Low idle power

  Cons   Higher latencies (especially write)   Higher active energy   Lower endurance

13


NVRAM-based Hierarchy: Key Challenges   How should NVRAM-based (main) memory be organized?

  Hybrid NVRAM+DRAM [Qureshi et al., ISCA’09]:   How to partition/migrate data (energy, performance,

endurance)   How to design the memory controllers and system software

  Exploit advantages, minimize disadvantages

14


NVRAM-based Hierarchy: Key Challenges   How should NVRAM-based (main) memory be organized?

  Pure NVRAM main memory [Lee et al., ISCA’09, IEEE Micro’10]:   How to redesign entire hierarchy (and cores) to overcome

NVRAM shortcomings   Latency, energy, endurance

15


Many Research Challenges: Hybrid Systems   Partitioning

  Should DRAM be a cache or main memory, or configurable?   What fraction? How many controllers?

  Data allocation/movement (energy, perf, lifetime, security)   Access latency critical, heavy modifications DRAM   Non-volatility critical, not accessed heavily PCM   Who manages allocation/movement?   What are good control algorithms?

  Redesign of cache hierarchy, memory controllers   How can NVRAM be exploited on chip?

  Design of NVRAM/DRAM chips   Rethink the design of PCM/DRAM with new requirements?

16


Agenda


  Technology scalability   QoS support: Inter-thread/application interference   Energy/power/bandwidth efficiency

  Summary

17


Memory System is the Major Shared Resource

18

threads’ requests interfere


Inter-Thread/Application Interference

  Problem: Threads share the memory system, but memory system does not distinguish threads’ requests   Memory system algorithms thread-unaware and thread-unfair

  Existing memory systems   Free-for-all, demand-based sharing of the memory system   Aggressive threads can deny service to others   Do not try to reduce or control inter-thread interference

19


Problems due to Uncontrolled Interference

20

  Unfair slowdown of different threads [MICRO’07, ISCA’08, ASPLOS’10]

  System performance loss [MICRO’07, ISCA’08, HPCA’10]

  Vulnerability to denial of service [USENIX Security’07]

  Priority inversion: unable to enforce priorities/SLAs [MICRO’07]

  Poor performance predictability (no performance isolation)

Cores make very slow progress

Memory performance hog Low priority

High priority N

orm

aliz

ed M

emor

y S

tall-

Tim

e DRAM is the only shared resource


QoS-Aware Memory Systems: Challenges   How do we reduce inter-thread interference?

  Improve system performance and utilization   Preserve the benefits of single-thread performance techniques

  How do we control inter-thread interference?   Provide fairness when needed   Satisfy performance guarantees of threads when needed   Provide mechanisms to enable system software to enforce a

variety of QoS policies   All the while providing high system performance

  How do we make the memory system configurable/flexible?   Enable flexible mechanisms that can achieve many goals

21


Hardware/Software Cooperation for Memory QoS

  Hardware good at fine-grained prioritization mechanisms high performance

  Software good at coarse-grained prioritization mechanisms   Software needed to decide the QoS policy in the memory system

(e.g., system throughput vs. fairness, which app more important)

  Hardware provides configurable partitioning/prioritization and feedback mechanisms (fine grained interference control)

  Software configures the hardware mechanisms (coarse-grained)

  Many challenges   How to design flexible hardware resources?   How to design the software/hardware interface?   How should system software be written?

22


Designing QoS-Aware Memory Systems: Approaches

  Smart resources: Design each shared resource to have a configurable fairness/QoS mechanism   Fair/QoS-aware memory schedulers, interconnects, caches, arbiters   Examples: fair memory schedulers [Mutlu MICRO 2007], parallelism-

aware memory schedulers [Mutlu ISCA 2008], application-aware on-chip networks [Das et al. MICRO 2009, ISCA 2010, Grot et al. MICRO 2009]

  Dumb resources: Keep each resource free-for-all, but control access to memory system at the cores/sources   Fairness via Source Throttling [Ebrahimi et al., ASPLOS 2010]

  Estimate thread slowdowns in the entire system and throttle cores that slow down others

  Coordinated Prefetcher Throttling [Ebrahimi et al., MICRO 2009]

  Combined approaches are even more powerful

23


Agenda


  Technology Scalability   QoS support: Inter-thread/application interference   Energy/power/bandwidth efficiency

  Summary

24


Energy-Efficiency in the Memory System

  Problem: How to minimize energy/power consumption while satisfying performance requirements

  Existing memory systems are wasteful   Optimized for “general” behavior, suboptimal for particular

access patterns   E.g., fixed cache line size, fixed cache size

  A lot of data movement: Moving data can be inefficient

  Can we design the memory hierarchy to be customizable?   Can we minimize (data) movement?

25


Configurability Enables Customization

  Non-configurable: One size fits all   Energy and performance suboptimal for different behaviors

  Configurable: Enables tradeoffs and customization   Processing requirements vary across applications and phases   Execute code on best-fit resources (minimal energy, adequate perf.)

26

C4 C4

C5 C5

C4 C4

C5 C5

C2

C3

C1

Configurable

C C

C C

C C

C C

C C

C C

C C

C C

Non-configurable

C C

C C

C C

C C

C C

C C

C C

C C

Configurable


Customizable Memory Systems: Challenges

  What type of access/communication patterns deserve customization?

  How do we enable customization?

  How should applications be mapped to the best-fit memory hierarchy resources?

  Many design, monitoring, program characterization questions

  Hardware and software should work cooperatively

27


Summary   Technology, application, architecture trends dictate

fundamentally new needs from memory system

  A Fresh Look at Re-designing Memory Hierarchy   Tech. Scalability: Enabling NVRAM (+ DRAM) memory   QoS/performance: Reducing and controlling interference   Energy-efficiency: Customizability and minimal waste

  HW/SW cooperation essential   Fundamental changes to architecture, uarch, software   Many challenges and opportunities

28

Rethinking Memory System Design in the Nanoscale Many-Core Era

Onur Mutlu [email protected]

Rethinking Memory System Design in the Nanoscale Many-Core...

Documents

Transcript of Rethinking Memory System Design in the Nanoscale Many-Core...