SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor...

85
RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS Samira Khan SOLVING THE DRAM SCALING CHALLENGE:

Transcript of SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor...

Page 1: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS

Samira Khan

SOLVING THE DRAM SCALING CHALLENGE:

Page 2: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

DRAM

MEMORY IN TODAY’S SYSTEM

Processor

Memory

Storage

DRAM is critical for performance

2

Page 3: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

MAIN MEMORY CAPACITY

Gigabytes of DRAM

Increasing demand for high capacity

1. More cores2. Data-intensive applications

3

Page 4: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

DEMAND 1: INCREASING NUMBER OF CORES

2012 2013 2014

SPARC M56 Cores

SPARC M612 Cores

SPARC M732 Cores

2015

More cores need more memory4

Page 5: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

DEMAND 2:DATA-INTENSIVE APPLICATIONS

MEMORY CACHING

IN-MEMORY DATABASE

More demand for memory5

Page 6: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

HOW DID WE GET MORE CAPACITY?

TechnologyScaling

DRAM Cells DRAM Cells

DRAM scaling enabled high capacity

6

Page 7: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

DRAM SCALING TREND

1

10

100

1000

10000

1985 1995 2005 2015

MEG

AB

ITS/

CH

IP

START OF MASS PRODUCTION

Source: Flash Memory Summit 2013, Memcon 2014

DRAM scaling is getting difficult

7

Page 8: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

DRAM SCALING CHALLENGE

TechnologyScaling

DRAM Cells DRAM Cells

Manufacturing reliable cells at low cost is getting difficult

8

Page 9: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

DRAM Cells

In order to answer this we need to take a closer look to a DRAM cell

WHY IS IT DIFFICULT TO SCALE?

9

Page 10: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

A DRAM cell

Capacitor

Transistor

Contact

Transistor

Bitline

Capacitor

LOGICAL VIEW VERTICAL CROSS SECTION

WHY IS IT DIFFICULT TO SCALE?

10

Page 11: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

1. Capacitor reliability2. Cell-to-cell interference

Challenges in Scaling

TechnologyScaling

DRAM Cells DRAM Cells

WHY IS IT DIFFICULT TO SCALE?

11

Page 12: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

SCALING CHALLENGE 1:CAPACITOR RELIABILITY

TechnologyScaling

DRAM Cells DRAM Cells

Capacitor is getting taller

12

Page 13: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

CAPACITOR RELIABILITY

58 nm 140 m

Source: Flash Memory Summit, Hynix 2012

Results in failures while manufacturing 13

Page 14: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

IMPLICATION:DRAM COST TREND

2000 2005 2010 2015 2020

Co

st/B

it

YEAR

Source: Flash Memory Summit, Hynix 2012

PROJECTION

Cost is expected to go higher

14

Page 15: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

TechnologyScaling

SCALING CHALLENGE 2:CELL-TO-CELL INTERFERENCE

More interference results in more failures

Less Interference More Interference

Indirect pathIndirect path

15

Page 16: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

1.52% of DRAM modules failedin Google Servers

1.6% of DRAM modules failedin LANL

IMPLICATION: DRAM ERRORS IN THE FIELD

SIGMETRICS’09, SC’12 16

Page 17: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

ENABLEHIGH CAPACITY MEMORY

WITHOUT SACRIFICING RELIABILITY

GOAL

17

Page 18: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

DRAM

MAKE DRAM SCALABLE

Difficult to scale

NEW TECHNOLOGIES

LEVERAGE NEW TECHNOLOGIES

Predicted to be highly scalable

TWO DIRECTIONS

18

SIGMETRICS’14, DSN’15, ONGOING WEED’13, ONGOING

Page 19: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

DRAM SCALING CHALLENGE

TechnologyScaling

DRAM Cells DRAM Cells

NON-VOLATILE MEMORIES:

UNIFIED MEMORY & STORAGE

Non-VolatileMemory Storage

UNIFY

PAST AND FUTURE WORK

Detectand

MitigateReliable SystemDRAM Cells

SYSTEM-LEVEL TECHNIQUES

TO ENABLE DRAM SCALING

19

Page 20: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

TRADITIONAL APPROACHTO ENABLE DRAM SCALING

UnreliableDRAM Cells

ReliableDRAM Cells

MakeDRAM

Reliable

Reliable System

Manufacturing TimeSystem

in the Field

DRAM has strict reliability guarantee20

Page 21: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

MY APPROACH

UnreliableDRAM Cells

ReliableDRAM Cells

MakeDRAM

Reliable

Reliable System

Manufacturing TimeSystem

in the Field

21

Manufacturing Time

System in the Field

Shift the responsibility to systems

Page 22: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

VISION: SYSTEM-LEVEL DETECTION AND MITIGATION

UnreliableDRAM Cells

Detectand

Mitigate

Reliable System

Detect and mitigate errors after the system has become operational

22

ONLINE PROFILINGReduces cost, increases yield,

and enables scaling

Page 23: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

Detectand

MitigateReliable SystemDRAM Cells

SYSTEM-LEVEL TECHNIQUES

TO ENABLE DRAM SCALING

CHALLENGE:INTERMITTENT

FAILURES

EFFICACY OF SYSTEM-LEVEL

TECHNIQUES WITH REAL DRAM CHIPS

HIGH-LEVEL DESIGN

NEW SYSTEM-LEVEL TECHNIQUES

23

Page 24: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

CHALLENGE: INTERMITTENT FAILURES

UnreliableDRAM Cells

Detectand

Mitigate

Reliable System

If failures were permanent, a simple boot up test would have worked

What are the characteristics of intermittent failures?

24

Page 25: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

DRAM RETENTION FAILURE

Retention Time

Time

Leakage

Capacitor

Switch

RefreshedEvery 64 ms

Refresh Interval 64 ms

Retention Time

25

Page 26: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

• Some retention failures are intermittent

• Two characteristics of intermittent retention failures

INTERMITTENT RETENTION FAILURE

DRAM Cells

Data Pattern Sensitivity1

Variable Retention Time226

Page 27: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

0 0 0 FAILURENO FAILURE

11INTERFERENCE

DATA PATTERN SENSITIVITY

Some cells can fail depending on the data stored in neighboring cells

27JSSC’88, MDTD’02

Page 28: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

VARIABLE RETENTION TIME

0

128

256

384

512

640

Ret

en

tio

n T

ime

(m

s)Time

Retention time changes randomlyin some cells

28IEDM’87, IEDM’92

Page 29: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

CURRENT APROACH TO CONTAIN INTERMITTENT FAILURES

Manufacturing Time Testing

PASS

FAIL

1. Manufacturers perform exhaustive testing of DRAM chips2. Chips failing the tests are discarded

29

Page 30: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

Manufacturing Time Testing

PASS

FAIL

SCALING AFFECTING TESTING

Longer Tests and More Failures

More interference in smaller technology nodes leads to lower yield and higher cost

30

Page 31: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

SYSTEM-LEVEL ONLINE PROFILING

PASS

FAIL

Not fully tested duringmanufacture-time

Ship modules with possible failures1

2

Detect and mitigate failures online 3

Increases yield, reduces cost, enables scaling

31

Page 32: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

Detectand

MitigateReliable SystemDRAM Cells

SYSTEM-LEVEL TECHNIQUES

TO ENABLE DRAM SCALING

CHALLENGE:INTERMITTENT

FAILURES

EFFICACY OF SYSTEM-LEVEL

TECHNIQUES WITH REAL DRAM CHIPS

HIGH-LEVEL DESIGN

NEW SYSTEM-LEVEL TECHNIQUES

32

Page 33: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

EFFICACY OF SYSTEM-LEVEL TECHNIQUESCan we leverage existing techniques?

Testing1

Guardbanding2

Error Correcting Code3

Higher Strength ECC4

We analyze the effectiveness of these techniques using experimental data from real DRAMs

33Data set publicly available

Page 34: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

METHODOLOGY

Evaluated 96 chips from three major vendors

FPGA-based testing infrastructure

34

Page 35: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

1. TESTING

Write some patternin the module 1

Wait until refresh interval

2Read and verify

3

Repeat

Test each module with different patterns for many roundsZeros (0000), Ones (1111), Tens (1010), Fives (0101), Random

35

Page 36: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

EFFICACY OF TESTING

0 100 200 300 400 500 600 700 800 900 1000

Number of Rounds

0

50000

100000

150000

200000N

um

ber

of

Fa

ilin

g C

ells

Fou

nd ZERO ONE TEN FIVE RAND All

Only a few rounds can discover most of the

failures

Even after hundreds of rounds, a small number of new cells keep failing

Conclusion: Testing alone cannot detect all possible failures

36

Page 37: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

2. GUARDBANDING

Refresh Interval

2X Guardband

4X Guardband

• Adding a safety-margin on the refresh interval• Can avoid VRT failures

37

Effectiveness depends on the difference between retention times of a cell

Page 38: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

EFFICACY OF GUARDBANDING

0 4 8 12 16 20

Retention Time (in seconds)

1

10

100

1000

10000

100000

1000000

Nu

mb

er o

f F

ail

ing

Cel

ls

38

Page 39: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

0 4 8 12 16 20

Retention Time (in seconds)

1

10

100

1000

10000

100000

1000000

Nu

mb

er o

f F

ail

ing

Cel

ls

EFFICACY OF GUARDBANDING

39

Page 40: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

0 4 8 12 16 20

Retention Time (in seconds)

1

10

100

1000

10000

100000

1000000

Nu

mb

er o

f F

ail

ing

Cel

ls

EFFICACY OF GUARDBANDING

40

Page 41: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

0 4 8 12 16 20

Retention Time (in seconds)

1

10

100

1000

10000

100000

1000000

Nu

mb

er o

f F

ail

ing

Cel

ls

EFFICACY OF GUARDBANDING

Most of the cells exhibit closeby retention times

41

Page 42: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

0 4 8 12 16 20

Retention Time (in seconds)

1

10

100

1000

10000

100000

1000000

Nu

mb

er o

f F

ail

ing

Cel

ls

EFFICACY OF GUARDBANDING

There are few cells with large differences in

retention times

42

Conclusion: Even a large guardband (5X) cannot detect 5-15% of the failing cells

Page 43: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

3. ERROR CORRECTING CODE

• Error Correcting Code (ECC)

– Additional information to detect error and correct data

43

Page 44: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

1 10 100 1000

Number of Rounds

No ECC

SECDED

SECDED, 2X Guardband 1E+00

1E-06

1E-12

1E-18

Prob

ab

ilit

y o

f N

ew F

ail

ure

EFFECTIVENESS OF ECC

44

Page 45: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

1 10 100 1000

Number of Rounds

No ECC

SECDED

SECDED, 2X Guardband 1E+00

1E-06

1E-12

1E-18

Prob

ab

ilit

y o

f N

ew F

ail

ure

EFFECTIVENESS OF ECC

45

Page 46: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

1 10 100 1000

Number of Rounds

No ECC

SECDED

SECDED, 2X Guardband 1E+00

1E-06

1E-12

1E-18

Prob

ab

ilit

y o

f N

ew F

ail

ure

EFFECTIVENESS OF ECC

46

Page 47: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

1 10 100 1000

Number of Rounds

No ECC

SECDED

SECDED, 2X Guardband 1E+00

1E-06

1E-12

1E-18

Prob

ab

ilit

y o

f N

ew F

ail

ure

EFFECTIVENESS OF ECC

SECDED code reduces error rate by 100 times

Combination of techniquesreduces error rate by 107 times

Adding a 2X guardband reduces error rate

by 1000 times

47

Conclusion: A combination of mitigation techniques is much more effective

Page 48: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

4. HIGHER STRENGTH ECC (HI-ECC)

No testing, use strong ECCBut amortize cost of ECC over larger data chunk

Can potentially tolerate errors at the cost of higher strength ECC

48

Page 49: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

1 10 100 1000 10000

Number of Rounds

4EC5ED, 2X Guardband

3EC4ED, 2X Guardband

DECTED, 2X Guardband

SECDED, 2X Guardband1E+25

1E+20

1E+00

1E+15

1E+10

1E+05

1E-05

Tim

e t

o F

ail

ure

(in

yea

rs)

EFFICACY OF HI-ECC

10 Years

49

Page 50: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

1 10 100 1000 10000

Number of Rounds

4EC5ED, 2X Guardband

3EC4ED, 2X Guardband

DECTED, 2X Guardband

SECDED, 2X Guardband1E+25

1E+20

1E+00

1E+15

1E+10

1E+05

1E-05

Tim

e t

o F

ail

ure

(in

yea

rs)

EFFICACY OF HI-ECC

10 Years

50

Page 51: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

1 10 100 1000 10000

Number of Rounds

4EC5ED, 2X Guardband

3EC4ED, 2X Guardband

DECTED, 2X Guardband

SECDED, 2X Guardband1E+25

1E+20

1E+00

1E+15

1E+10

1E+05

1E-05

Tim

e t

o F

ail

ure

(in

yea

rs)

EFFICACY OF HI-ECC

10 Years

After starting with 4EC5ED, can reduce to 3EC4ED code

after 2 rounds of tests

51

Page 52: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

1 10 100 1000 10000

Number of Rounds

4EC5ED, 2X Guardband

3EC4ED, 2X Guardband

DECTED, 2X Guardband

SECDED, 2X Guardband1E+25

1E+20

1E+00

1E+15

1E+10

1E+05

1E-05

Tim

e t

o F

ail

ure

(in

yea

rs)

EFFICACY OF HI-ECC

10 Years

Can reduce to DECTED code after 10 rounds of tests

52

Page 53: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

1 10 100 1000 10000

Number of Rounds

4EC5ED, 2X Guardband

3EC4ED, 2X Guardband

DECTED, 2X Guardband

SECDED, 2X Guardband1E+25

1E+20

1E+00

1E+15

1E+10

1E+05

1E-05

Tim

e t

o F

ail

ure

(in

yea

rs)

EFFICACY OF HI-ECC

10 Years

Can reduce to SECDED code, after 7000 rounds of tests

(4 hours)

Conclusion: Testing can help to reduce the ECC strength, but blocks memory for hours

53

Page 54: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

Key Observations:• Testing alone cannot detect all possible failures

• Combination of ECC and other mitigation techniques is much more effective

• Testing can help to reduce the ECC strength– Even when starting with a higher strength ECC– But degrades performance

CONCLUSIONS SO FAR

54

Page 55: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

Detectand

MitigateReliable SystemDRAM Cells

SYSTEM-LEVEL TECHNIQUES

TO ENABLE DRAM SCALING

CHALLENGE:INTERMITTENT

FAILURES

EFFICACY OF SYSTEM-LEVEL

TECHNIQUES WITH REAL DRAM CHIPS

HIGH-LEVEL DESIGN

NEW SYSTEM-LEVEL TECHNIQUES

55

Page 56: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

TOWARDS AN ONLINE PROFILING SYSTEM

Initially Protect DRAM with Strong ECC1

Periodically TestParts of DRAM 2

Test

Test

Test

Mitigate errors andreduce ECC 3

56

Run tests periodically after a short interval at smaller regions of memory

Page 57: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

Detectand

MitigateReliable SystemDRAM Cells

SYSTEM-LEVEL TECHNIQUES

TO ENABLE DRAM SCALING

CHALLENGE:INTERMITTENT

FAILURES

EFFICACY OF SYSTEM-LEVEL

TECHNIQUES WITH REAL DRAM CHIPS

HIGH-LEVEL DESIGN

NEW SYSTEM-LEVEL TECHNIQUES

57

Page 58: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

WHY SO MANY ROUNDS OF TESTS?DATA-DEPENDENT FAILURE

Fails when specific pattern in the neighboring cell

LINEARADDRESS X-1 X X+1

L D R

58

Even many rounds of random patterns cannot detect all failures

0 1 0

SCRAMBLEDADDRESS

X-4 X X+2

0 1 00 1 0

X-1 X+1

Page 59: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

DETERMINE THE LOCATION OF PHYSICALLY ADJACENT CELLS

NAÏVE SOLUTIONFor a given failure X,

test every combination of two bit addresses in the row

Our goal is to reduce the test time

O(n2)8192*8192 tests, 49 days for a row with 8K cells

59

SCRAMBLEDADDRESS

X-? X X+?

L D R

Page 60: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

STRONGLY DEPENDENTFails even if only one neighbor data changes

WEAKLY DEPENDENTFails if both neighbor data change

STRONGLY VS. WEAKLY DEPENDENT CELLS

60

Can detect neighbor location in strongly dependent cells by testing every bit address

0, 1, … , X, X+1, X+2, … n

Page 61: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

L A

L C R

B R

PHYSICAL NEIGHBOR LOCATION TESTTesting every bit address will

detect only one neighbor Run parallel tests in different rows

61Aggregate the locations from different rows

X-4

X+2

Page 62: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

62

PHYSICAL NEIGHBOR LOCATION TEST

LINEAR TESTING 0 1 2 3 4 5 6 7

RECURSIVETESTING 0, 1, 2, 3

0 1

4, 5

2 3 4 5

2, 30, 1

6 7

SCRAMBLEDADDRESS

X-4

L A

4, 5, 6, 7

6, 7

X2 6

Page 63: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

27% 11% 14%

PHYSICAL NEIGHBOR-AWARE TEST

A B C

EXTRA FAILURES

63

NUM TESTREDUCED

745654X 1016800X 745654X

Detects more failures with small number of tests

leveraging neighboring information

Page 64: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

NEW SYSTEM-LEVEL TECHNIQUES

REDUCE FAILURE MITIGATION OVERHEAD

Mitigation for worst vs. common caseReduces mitigation cost around 3X-10X

Variable refreshLeverages adaptive refresh to mitigate failures

64

ONGOING

DSN’15

Page 65: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

SUMMARY

UnreliableDRAM Cells

Detectand

Mitigate

Reliable System

Proposed online profiling to enable scaling

Analyzed efficacy of system-level detection and mitigation techniques

Found that combination of techniques is much more effective, but blocks memory

Proposed new system-level techniques to reduce detection and mitigation overhead

65

Page 66: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

DRAM SCALING CHALLENGE

TechnologyScaling

DRAM Cells DRAM Cells

NON-VOLATILE MEMORIES:

UNIFIED MEMORY & STORAGE

Non-VolatileMemory Storage

UNIFY

PAST AND FUTURE WORK

Detectand

MitigateReliable SystemDRAM Cells

SYSTEM-LEVEL TECHNIQUES

TO ENABLE DRAM SCALING

66

Page 67: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

TWO-LEVEL STORAGE MODEL

CP

UM

EMO

RY

STO

RA

GE

VOLATILE

FAST

BYTE ADDR

NONVOLATILE

SLOW

BLOCK ADDR

Ld/St

FILE I/O

DRAM

67

Page 68: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

TWO-LEVEL STORAGE MODEL

CP

UM

EMO

RY

STO

RA

GE

VOLATILE

FAST

BYTE ADDR

NONVOLATILE

SLOW

BLOCK ADDR

Ld/St

FILE I/O

DRAM

68

PCM, STT-RAM

NVM

Non-volatile memories combine characteristics of memory and storage

Page 69: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

VISION: UNIFY MEMORY AND STORAGE

CP

UP

ERSISTEN

TM

EMO

RY

Provides an opportunity to manipulate persistent data directly

Ld/St

NVM

69

Page 70: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

DRAM IS STILL FASTER

CP

UP

ERSISTEN

TM

EMO

RY

A hybrid unified memory-storage system

CP

UM

EMO

RY

Ld/St

70

NVMDRAM

Page 71: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

CHALLENGE: DATA CONSISTENCY

CP

UP

ERSISTEN

TM

EMO

RY

System crash can result in permanent data corruption in NVM

CP

UM

EMO

RY

Ld/St

71

Page 72: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

CURRENT SOLUTIONSExplicit interfaces to manage consistency

– NV-Heaps [ASPLOS’11], BPFS [SOSP’09], Mnemosyne [ASPLOS’11]

AtomicBegin {Insert a new node;} AtomicEnd;

Limits adoption of NVMHave to rewrite code with clear partition

between volatile and non-volatile

Burden on the programmers

72

Page 73: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

Goal: Provide efficient transparent

consistency in hybrid systems

Periodic Checkpointing of dirty dataTransparent to application and system

Hardware checkpoints and recovers data

OUR GOAL AND APPROACH

Running

Epoch 0 Epoch 1

time

Checkpointing Running Checkpointing

73

Page 74: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

DRAM NVM

CHECKPOINTING GRANULARITY

PAGE DIRTY CACHE BLOCK

High write locality pages in DRAM, low write locality pages in NVM

EXTRA WRITESSMALL METADATA

NO EXTRA WRITEHUGE METADATA

74

Page 75: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

DRAM NVM

DUAL GRANULARITY CHECKPOINTING

Can adapt to access patterns

GOOD FOR STREAMING WRITES

GOOD FOR RANDOM WRITES

75

Page 76: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

UNMODIFIEDLEGACYCODE

TRANSPARENT DATA CONSISTENCY

DRAM NVM

-3.5% +2.7%

76

Provides consistency without significant performance overhead

Cost of consistency compared to

systems with zero-cost consistency

Page 77: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

DRAM SCALING CHALLENGE

TechnologyScaling

DRAM Cells DRAM Cells

NON-VOLATILE MEMORIES:

UNIFIED MEMORY & STORAGE

Non-VolatileMemory Storage

UNIFY

PAST AND FUTURE WORK

Detectand

MitigateReliable SystemDRAM Cells

SYSTEM-LEVEL TECHNIQUES

TO ENABLE DRAM SCALING

77

Page 78: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

OTHER WORKC

PU

MEM

OR

YST

OR

AG

EIMPROVING CACHE

PERFORMANCEMICRO’10, PACT’10, ISCA’12, HPCA’12, HPCA’14

EFFICIENT LOW VOLTAGE PROCESSOR OPERATION

HPCA’13, INTEL TECH JOURNAL’14

NEW DRAM ARCHITECTUREHPCA’15, ONGOING

ENABLING DRAM SCALINGSIGMETRICS’14, DSN’15, ONGOING

UNIFYING MEMORY & STORAGE

WEED’13, ONGOING

78

Page 79: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

CP

UM

EMO

RY

STO

RA

GE

FUTURE WORK: TRENDS

DRAM NVM LOGIC

79

Page 80: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

DESIGN AND BUILD BETTER SYSTEMS WITH NEW CAPABILITIES

BY REDEFINING FUNCTIONALITIES ACROSS DIFFERENT LAYERS

IN THE SYSTEM STACK

APPROACH

80

Page 81: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

RETHINKING STORAGEA

PP

LIC

ATI

ON

OP

ERA

TIN

G

SYST

EM

SSD

CPU FLASHCONTROLLER

FLASHCHIPS

What is the best way to design a system to take advantage of the SSDs?

APPLICATION, OPERATING SYSTEM, DEVICES

81

Page 82: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

ENHANCING SYSTEMS WITH NON-VOLATILE MEMORY

How to provide efficient instantaneous system recovery and migration?

Recoverand

migrateNVM

PROCESSOR, FILE SYSTEM, DEVICE, NETWORK, MEMORY

82

Page 83: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

DESIGNING SYSTEMS WITH MEMORY AS AN ACCELERATOR

SPECIALIZED CORES MEMORY WITH LOGIC

MANAGEDATA FLOW

How to manage data movement when applications run on different accelerators?

APPLICATION, PROCESSOR, MEMORY

83

Page 84: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

THANK YOU

QUESTIONS?

84

Page 85: SOLVING THE DRAM SCALING CHALLENGEDRAM SCALING CHALLENGE: DRAM MEMORY IN TODAY’S SYSTEM Processor Memory Storage DRAM is critical for performance 2. MAIN MEMORY CAPACITY Gigabytes

RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS

Samira Khan

SOLVING THE DRAM SCALING CHALLENGE: