Scalable High Performance Main Memory System Using PCM...

24
© 2007 IBM Corporation International Symposium on Computer Architecture (ISCA-2009) Jul-28-09 Scalable High Performance Main Memory System Using PCM Technology Moinuddin K. Qureshi Viji Srinivasan and Jude Rivers IBM T. J. Watson Research Center, Yorktown Heights, NY

Transcript of Scalable High Performance Main Memory System Using PCM...

Page 1: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation International Symposium on Computer Architecture (ISCA-2009) Jul-28-09

Scalable High Performance Main Memory System Using PCM Technology

Moinuddin K. Qureshi Viji Srinivasan and Jude Rivers

IBM T. J. Watson Research Center, Yorktown Heights, NY

Page 2: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 2

Main Memory Capacity Wall

More cores in system More concurrency Larger working set

Demand for main memory capacity continues to increase

Main Memory System consisting of DRAM are hitting: 1. Cost wall: Major % of cost of large servers is main memory 2. Scaling wall: DRAM scaling to small technology is challenge 3. Power wall:

IBM P670 Server Processor Memory Small (4 proc, 16GB) 384 Watts 314 Watts

Large (16 proc, 128GB) 840 Watts 1223 Watts Source: Lefurgy et al. IEEE Computer 2003

Need a practical solution to increase main-memory capacity

Page 3: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 3

The Technology Hierarchy

More capacity by cheaper, denser, (slower) technology

21 23 27 211 213 215 219 223

Typical access latency in processor cycles (@ 4 GHz)

L1(SRAM) EDRAM DRAM HDD

25 29 217 221

Flash PCM

Phase Change Memory (PCM) promising candidate for large capacity main memory

High-Performance Disk Memory System

Page 4: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 4

Outline

  Introduction   What is PCM ?   Hybrid Memory System   Evaluation   Lifetime Analysis   Summary

Page 5: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 5

What is Phase Change Memory?

Phase change material (chalcogenide glass) exists in two states: 1. Amorphous: high resistivity 2. Crystalline: low resistivity

Materials can be switched between states reliably, quickly, large number of times

PCM stores data in terms of resistance •  Low resistance (SET state) = 1 •  High resistance (RESET state) = 0 I

Bit Line

Word Line

N

Word Line

N N

Page 6: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 6

How does PCM work ? Tmelt

Tcryst

Time [ns]

RESET

SET

Tem

pera

ture

Switching by heating using electrical pulses

SET: sustained current to heat cell above Tcryst

RESET: cell heated above Tmelt and quenched

Large Current

SET Low resistance

103-104 Ω

Small Current

RESET High resistance

Access Device

Memory Element

106-107 Ω

Photo Courtesy: Bipin Rajendran, IBM

Page 7: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 7

Key Characteristics of PCM

+ Scales better than DRAM, small cell size Prototypes as small as 3nm x 20 nm fabricated and tested [Raoux+ IBMJRD’08]

+ Can store multiple bits/cell More density in the same area Prototypes with 2 bits/cell in ISSCC’08. >2 bits/cell expected soon.

+ Non-Volatile Memory Technology Data retention of 10 years Power implications, system implications

Challenges: -  More latency compared to DRAM. -  Limited Endurance (~10 million writes per cell) -  Write bandwidth constrained, so better to write less often.

Page 8: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 8

Outline

  Introduction   What is PCM ?   Hybrid Memory System   Evaluation   Lifetime Analysis   Summary

Page 9: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 9

Hybrid Memory System

Hybrid Memory System: 1.  DRAM as cache to tolerate PCM Rd/Wr latency and Wr bandwidth 2.  PCM as main-memory to provide large capacity at good cost/power

DATA W

PCM Main Memory

DATA T

DRAM Buffer

PCM Write Queue

T=Tag-Store

Processor

Flash Or

HDD

Page 10: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 10

Lazy Write Architecture

Problem: Double PCM writes to dirty pages on install

DRAM Buffer PCM

Flash/Disk

WRQ

Processor

For example: Daxpy Kernel: Y[i] = Y[i] + X[i] Baseline has 2 writes for Y[i] and 1 for X[i] Lazy write has 1 write for Y[i] and 1 for X[i]

Page 11: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 11

Line Level Write Back

Problem: Not all lines in a dirty page are dirty Solution: Dirty bits per line in DRAM buffer and

write-back only dirty lines from DRAM to PCM

Problem: With LLWB, not all lines in dirty pages are written uniformly

Num

Writ

es to

Eac

h Li

ne (M

ln)

db1 db2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Average Average

Line_id

Page 12: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 12

Fine Grained Wear Leveling

Solution: Fine Grained Wear Leveling (FGWL) -When a page gets allocated page is rotated by a random shift value -The rotate value remains constant while page remains in memory -On replacement of a page, a new random value is assigned for a new page -Over time, the write traffic per line becomes uniform.

FGWL makes writes across lines in a dirty page uniform

Num

Writ

es to

Eac

h Li

ne (M

ln)

db1 db2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Average Average

Line_id

Page 13: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 13

Outline

  Introduction   What is PCM ?   Hybrid Memory System   Evaluation   Lifetime Analysis   Summary

Page 14: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 14

Evaluation Framework

Trace Driven Simulator: 16-core system (simple core), 8GB DRAM main-memory at 320 cycles HDD (2 ms) with Flash (32 us) with Flash hit-rate of 99%

Workloads: Database workloads & Data parallel kernels

1. Database workloads: db1 and db2 2. Unix utilities: qsort and binary search 3. Data Mining : K-means and Gauss Seidal 4. Streaming: DAXPY and Vector Dot Product

Assumption: PCM 4X denser & 4X slower than DRAM 32GB @ 1280 cycle read latency

Page 15: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 15

Reduction in Page Faults

Benefit from capacity Need >16GB Streaming

Page 16: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 16

Impact on Execution Time

PCM with DRAM buffer performs similar to equal capacity DRAM storage

Page 17: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 17

Impact of PCM Latency

Hybrid memory system is relatively insensitive to PCM Latency

DRAM-8GB DRAM-32GB PCM-32GB HYBRID (1+32)GB

Page 18: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 18

Power Evaluations

Significant Power and Energy savings with PCM based hybrid memory system

Page 19: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 19

Outline

  Introduction   What is PCM ?   Hybrid Memory System   Evaluation   Lifetime Analysis   Summary

Page 20: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 20

Impact of Write Endurance

F Frequency of System (4GHz) Y = Number of years (lifetime)

There are 225 seconds in a year

Num. cycles in Y years = Y. F.225

B Bytes/Cycle written to PCM S PCM capacity in bytes Wmax Max writes per PCM cell Assuming uniform writes to PCM

Endurance (in cycles) = (S/B).Wmax

Y = (S/B). Wmax F.225

If Wmax = 10 million, PCM will last for 2.5 years

For a 4GHz System, a 32GB PCM written at

1 Byte per Cycle

Y = Wmax 4 million

Page 21: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 21

Lifetime Results

Configuration Avg. Bytes/Cycle Avg. Lifetime

1GB DRAM + 32GB PCM 0.807 3.0 yrs

+ Lazy Write 0.725 3.4 yrs

+ Line Level Write Back 0.316 7.6 yrs

+ Bypass Streaming Apps 0.247 9.7 yrs

Table shows average bytes per cycle written to PCM and Average lifetime of PCM assuming Wmax = 10 million

Proposed filtering techniques reduce write traffic to PCM by 3.2X, increasing its lifetime from 3 to 9.7 years

Page 22: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 22

Outline

  Introduction   What is PCM ?   Hybrid Memory System   Evaluation   Lifetime Analysis   Summary

Page 23: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 23

Summary

  Need more main memory capacity: DRAM hitting power, cost, scaling wall

  PCM is an emerging technology – 4x denser than DRAM but with slower access time and limited write endurance

  We propose a Hybrid Memory System (DRAM+PCM) that provides significant power and performance benefits

  Proposed write filtering techniques reduce writes by 3x and increase PCM lifetime from 3 years to 9 years

Not touched in this talk but important: Exploiting non-volatile memories for system enhancement & related OS issues.

Page 24: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 24

Thanks!