Improving Read Performance of PCM via Write Cancellation and Write Pausing

28
© 2007 IBM Corporation HPCA – 2010 Improving Read Performance of PCM via Write Cancellation and Write Pausing Moinuddin Qureshi Michele Franceschini and Luis Lastras IBM T. J. Watson Research Center, Yorktown Heights, NY

description

Improving Read Performance of PCM via Write Cancellation and Write Pausing. Moinuddin Qureshi Michele Franceschini and Luis Lastras IBM T. J. Watson Research Center, Yorktown Heights, NY. Introduction. More cores in system  More concurrency  Larger working set - PowerPoint PPT Presentation

Transcript of Improving Read Performance of PCM via Write Cancellation and Write Pausing

Page 1: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

© 2007 IBM Corporation

HPCA – 2010

Improving Read Performance of PCM via Write Cancellation and Write Pausing

Moinuddin QureshiMichele Franceschini and Luis Lastras

IBM T. J. Watson Research Center, Yorktown Heights, NY

Page 2: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

2 © 2007 IBM Corporation

Introduction

More cores in system More concurrency Larger working set

DRAM-based memory system hitting: power, cost, scaling wall

Phase Change Memory (PCM): Emerging technology, projected to be more scalable, higher density, power-efficient

Page 3: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

3 © 2007 IBM Corporation

PCM OperationTmelt

Tcryst

Time

RESET

SET

Tem

pera

ture

Switching by heating using electrical pulses

RESET state: amorphous (high resistance)SET state: crystalline (low resistance)

LargeCurrent

SETLow resistance

Photo Courtesy: Bipin Rajendran, IBM

Read latency 2x-4x of DRAM. Write latency much higher

SmallCurrent

RESETHigh resistance

AccessDevice

MemoryElement

Page 4: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

4 © 2007 IBM Corporation

Problem of Contention from Slow Writes

PCM writes 4x-8x slower than reads Writes not latency critical.Typical response: Use large buffers and intelligent scheduling.

But once write is scheduled to a bank, later arriving read waits

Write request causes contention for reads increased read latency

Page 5: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

5 © 2007 IBM Corporation

Outline

Introduction Quantifying the Problem Adaptive Write Cancellation Write Pausing Combining Cancellation & Pausing Summary

Page 6: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

6 © 2007 IBM Corporation

Configuration: Hybrid Memory

Processor Chip DRAM Cache

PCM-Based Main Memory

Baseline uses read priority scheduling if WRQ < 80% full. If WRQ>80% full, oldest-first policy “forced write” (rare <0.1%)

Each bank has a separate RDQ and WRQ (32-entry)

(256MB)

Page 7: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

7 © 2007 IBM Corporation

Problem

Writes significantly increase read latency (Problem only for asymmetric memories)

Read Latency=1k cycles Write Latency=8k cycles (sensitivity in paper)12 workloads: each with 8 benchmarks from SPEC06

0200400600800

10001200140016001800200022002400260028003000

1 2 3 4

BaselineNo Read Priority

Write Latency=1K

Write Latency=0

Effe

ctiv

e R

ead

Late

ncy

(Cyc

les)

00.10.20.30.40.50.60.70.80.9

11.11.2

1 2 3 4

Nor

m.

Exe

cutio

n Ti

me

Page 8: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

8 © 2007 IBM Corporation

Outline

Introduction Problem: Writes Delaying Reads Adaptive Write Cancellation Write Pausing Combining Cancellation & Pausing Summary

Page 9: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

9 © 2007 IBM Corporation

Write Cancellation

Write Cancellation: “abort” on-going write to Improve read latency

Line in non-deterministic state: read matching read request from WRQ

Perform write cancellation as soon as a read request arrives at a bank (as long as the write is not done in forced-mode)

Page 10: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

10 © 2007 IBM Corporation

Write Cancellation with Static Threshold

WCST: Cancel write request only if less than K% service done

Canceling a write request close to completion is wasteful and causes episodes of forced-writes (low performance)

1000

1100

1200

1300

1400

1500

1600

K=0% K=50% K=65% K=75% K=90% K=100%

Effe

ctiv

e R

ead

Late

ncy

(Cyc

les)

2365

(NeverCancel) (AlwaysCancel)

Page 11: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

11 © 2007 IBM Corporation

Adaptive Write Cancellation

Best threshold depends on num pending entries in WRQ. Fewer entries Higher threshold (best read latency)More entries Lower threshold (reduces forced writes)

Write Cancellation with Adaptive Threshold (WCAT)Threshold = 100 – (4*NumEntriesInWRQ)

100%

0%10 20 30

50%

Num Entries in WRQ

Thre

shol

d

High

LowForcedWrites

Page 12: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

12 © 2007 IBM Corporation

Adaptivity of WCAT

Num Entries in WRQ Low (0-1)

Med(2-13)

High(14-25)

Forced(26+)

WCST(K=75%) 61.4% 29.8% 7.4% 1.43%

WCAT 58.2% 35.4% 5.6% 0.72%

WCAT uses higher threshold initially with empty WRQ butLower threshold later reduces the episodes of forced-writes

We sampled all WRQ every 2M cycles to measure occupancy

Page 13: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

13 © 2007 IBM Corporation

Results for WCAT

1000

1050

1100

1150

1200

1250

1300

1350

1400

1450

1500

1550

Write Cancellation WCST (K=75%) WCAT

Ave

rage

Rea

d La

tenc

y

Baseline: 2365 cycles Ideal:1K cycles

0

5

10

15

20

25

30

35

40

45

Write Cancellation WCST (K=75%) WCAT

Extr

a W

rite

Cyc

les

(%)

Adaptive threshold reduces latency and incurs half the overhead

Page 14: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

14 © 2007 IBM Corporation

Outline

Introduction Problem: Writes Delaying Reads Adaptive Write Cancellation Write Pausing Combining Cancellation & Pausing Summary

Page 15: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

15 © 2007 IBM Corporation

Iterative Write in PCM devices

In Multi-Level Cells (MLC), the programming precision requirementincreases linearly with the number of levels

PCM cells respond differently to same programming pulse

Acknowledged solution to address uncertainty: Iterative writes

Each iteration consists of steps of: write-read-verify

Write VerifyRead

Not done

Done

Page 16: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

16 © 2007 IBM Corporation

Model for Iterative Writes

We develop an analytical model to capture number of iterations:In terms of bits/cell, num levels written in one shot, and learning

Time required to write a line is worst-case of all cells in line

Avg number of iterations: 8.3 (consistent with MLC literature)

MLC:3 bits/cell

Page 17: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

17 © 2007 IBM Corporation

Concept of Write Pausing

Iterative writes can be paused to service pending read requests

Reads can be performed at the end of each iteration (potential pause point)

Iter 1 Iter 2 Iter 3 Iter 4

Potential Pause Points

Iter 1 Iter 2 Rd X Iter 3

Rd X

Iter 4

Better read latency with negligible write overhead

We extend the iterative write algorithm of Nirschl et al. [IEDM’07] to support Write Pausing

Page 18: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

18 © 2007 IBM Corporation

Results for Write Pausing

1000

1100

1200

1300

1400

1500

16001700

1800

1900

2000

2100

2200

2300

2400

Baseline Write Pause Anytime Pause

Effe

ctiv

e R

ead

Late

ncy

Write Pausing at end of iteration gets 85% of benefit of “Anytime” Pause

Page 19: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

19 © 2007 IBM Corporation

Outline

Introduction Problem: Writes Delaying Reads Adaptive Write Cancellation Write Pausing Combining Cancellation & Pausing Summary

Page 20: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

20 © 2007 IBM Corporation

Write Pausing + WCAT

Iter 1 Iter 2 Iter 3

Rd X

Iter 4

Iter 1 Iter 2 Rd X Iter 3

Rd X

Iter 4

Iter 1 Iter 2Rd X Iter 3

Rd X

Iter 4

Iter2 Cancelled

Only one iteration is cancelled “micro-cancellation” has low overhead

Page 21: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

21 © 2007 IBM Corporation

Results

1000

1050

1100

1150

1200

1250

1300

1350

1400

1450

1500

Write Pause Write Pause+MicroCancellation

Anytime Pause

Effe

ctiv

e R

ead

Late

ncy

Write Pause + Micro Cancellation very close to Anytime Pause(re-execution overhead of micro cancellation <4% extra iterations)

1

1.1

1.2

1.3

1.4

1.5

Write Pause Write Pause+MicroCancellation

Anytime Pause

Spee

dup

(wrt

Bas

elin

e)

Baseline: 2365 cycles Ideal:1K cycles

Page 22: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

22 © 2007 IBM Corporation

Impact of Write Queue Size

We will need large buffers to best exploit the benefit of Pausing

00.10.20.30.40.50.60.70.80.9

11.11.21.31.41.51.6

8 16 32 64 128 256 512

Number of Entries in Each WRQ

BaselinePause + Micro Cancellation

Spee

dup

wrt

Bas

elin

e (3

2-en

try)

Page 23: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

23 © 2007 IBM Corporation

Outline

Introduction Problem: Writes Delaying Reads Adaptive Write Cancellation Write Pausing Combining Cancellation & Pausing Summary

Page 24: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

24 © 2007 IBM Corporation

Summary

Slow writes increase the effective read latency (2.3x)

Write Cancellation: Cancel ongoing write to service read Threshold based write cancellation Adaptive Threshold: better performance, half the overhead

Write Pausing exploits iterative write to service pending reads Write Pausing + Micro Cancellation close to optimal pause Effective read latency: from 2365 to 1330 cycles (1.45x speedup)

We will need large write buffers to exploit the benefit of Pausing

Page 25: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

25 © 2007 IBM Corporation

Questions

Page 26: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

26 © 2007 IBM Corporation

Write Pausing in Iterative Algorithms

(Nirschl+ IEDM’07)

Page 27: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

27 © 2007 IBM Corporation

Workloads and Figure of Merit

12 memory-intensive workloads from SPEC 2006: •6 rate-mode (eight copies of same benchmark) •6 mix-mode (two copies of four benchmarks)

Key metric: Effective Read Latency

Tin = Time at which read request enters RDQ Tout = Time at which read request finishes service at memory

Effective Read Latency = Tout – Tin (average reported)

Page 28: Improving Read Performance of PCM via  Write Cancellation and Write Pausing

28 © 2007 IBM Corporation

Sensitivity to Write Latency

At WriteLatency=4K, the speedup is 1.35x instead of 1.45x (at 8K latency)