Flikker: Saving DRAM Refresh-power through Critical Data Partitioning
A Case for Refresh Pausing in DRAM Memory Systems
description
Transcript of A Case for Refresh Pausing in DRAM Memory Systems
1
A Case for Refresh Pausing in DRAM Memory Systems
Prashant NairChia-Chen Chou
Moinuddin Qureshi
2
• Dynamic Random Access Memory (DRAM) used as main memory• DRAM stores data as charge on capacitor
Leakage
DRAM cells leak data!
DRAM Chip
1
DRAM is a volatile memory Charge leaks quickly
Introduction
3
DRAM maintains data by Refresh operations
DRAM Chip
RefreshRefreshRefreshRefresh
JEDEC specified DRAM retention time:64ms (< 85 C)32ms (> 85 C)
Charge on cells restored
DRAM relies on Refresh for data integrity
Refresh: Restoring Data in DRAM
Time between Refresh ≤ Retention Time
4
1Gb 2Gb 4Gb 8Gb
2.8%
9%7.7%
5.1%
Chip Density
~18%
~36%
16Gb 32Gb
Time spent in Refresh proportional to number of Rows
Increasing memory capacity More time spent in Refresh
The time for doing Refresh is increasing with chip density
Refresh: A Growing Problem
5
Memory unavailable for Read/Write during Refresh
timeNo Refresh
REFRESH
B
Interference due to Refresh time
Wait
Refresh blocks reads Higher read latency
Refresh Blocks Reads
A B
A B Serviced
6
8Gb 16Gb 32Gb0%
10%
20%
30%
40%
50%
60% Read Latency
Incr
ease
in R
ead
Late
ncy
8Gb 16Gb 32Gb0%5%
10%15%20%25%30%35%40%
Performance
Perf
orm
ance
Los
s
Our Goal: Reduce the Read Latency impact of Refresh
Impact of Refresh is significant, and increasing
Impact of Refresh
7
Introduction & Motivation
Refresh Operation: Background
Refresh Pausing
Evaluation
Alternative Proposals
Summary
Outline
8
Row 1Row 2Row 3Row 4Row 5
Row n-1Row n
A DRAM Bank
RefreshRefreshRefresh
Refresh operates on a Row granularity
Refresh Operation
9
• Burst Mode:
Memory unavailable until all rows finish refresh
• Distributed Mode:
8K refresh pulses in 64ms
Refresh
64ms
Refresh
64ms
Distributed mode reduces contention from Refresh
Refresh Modes
10
Every pulse refreshes a ‘Bundle of rows’
Chip Size Rows in a Refresh bundle (per bank)
512 Mb 11Gb 22Gb 4
4Gb or 8Gb (Twin 4Gb die) 8
Refresh Bundle currently have upto 8 rows, and increasing
Refresh Bundle
TRFC is the time to do refresh for every refresh pulse
11
TRFC
unavailable
available
8Gb
unavailable
TRFCavailable
16Gb
TRFC
unavailable
available
32Gb
High TRFC Read waits for refresh for long time
The Latency Wall of Refresh
Current 8Gb chips have TRFC of 350ns >> read latency
12
Introduction & Motivation
Refresh Operation: Background
Refresh Pausing
Evaluation
Alternative Proposals
Summary
Outline
13
A
time
Refresh
B
Request B arrives
Interrupted
Baseline system
Refresh (Cont.)
Refresh Pausing
B
A Refresh
time
Request B arrives
Insight: Make Refresh Operations Interruptible
Pausing Refresh reduces wait time for Reads
Pausing at arbitrary point can cause data loss
Refresh Pausing
14
Bank
Rows
Row Buffer
dcba
Refresh Pulse (4 rows in a bundle)
ChipWith Refresh Pausing
Pause
Read X
X
Without Refresh Pausing
X
Refresh Pausing at Row boundary to service read
Refresh Pausing: When to Pause?
15
• Memory Controller generates a Refresh Enable (RE) signal
• Pausing requires ‘active low’ detection of RE
• One way communication only
Memory ControllerRefresh Enable(RE) to DRAM
RE
1
0
Pause
Resume
Refresh Pausing: Interface Details
16
• Row Address Counter increments the addresses
• Stop the increment using a simple AND gate
• Active Low Refresh Enable as ‘Refresh Pause’
Address Generator
Row Address CounterEN
Incrementer
Refresh Bundle Addresses
DRAM
RE
Refresh Pausing: Track a Paused Row
17
• Scheduler schedules: Read, Write, and Refresh
• Responsible for Pausing Refresh for Read
• Keeps track of refresh time done before Pause
ProcessorBus
MemoryController
Scheduler Read Queue
Write Queue
Refresh Enable
DRAM
Refresh Pausing: Memory Scheduler
18
• Pausing can delay Refresh
• JEDEC allows delay of up-to 8 pending refresh
• If 8 pending refresh, then issue ‘Forced Refresh’
• Forced Refresh cannot be Paused
Reads/Writes Forced Refresh
Refresh Pulses
Refresh Issued
Refresh Not Issued
Forced Refresh for data integrity
Forced Refresh
19
Introduction & Motivation
Refresh Operation: Background
Refresh Pausing
Evaluation
Alternative Proposals
Summary
Outline
20
• Simulator: uSIMM from Memory Scheduling Championship (MSC)
• Workloads: MSC SuiteCOMMERCIAL(5), PARSEC(9), BIOBENCH(2) and SPEC(2)
• Configuration:Number of Cores 4Last Level Cache 1MBDRAM (DDR3) 8 Chips/Rank, 8Gb/Chip Channels, Ranks, Banks 4,2,8Refresh (Baseline) Distributed (JEDEC)
• Results presented for temperature > 85C (paper also has <85C)
Experimental Setup
21
- Refresh Pausing gives ~7% read latency reduction for an 8Gb chip
COMMERCIAL SPEC PARSEC BIOBENCH GMEAN0.75
0.80
0.85
0.90
0.95
1.00
Normalized Read LatencyRefresh Pausing No Refresh
Nor
mal
ized
Rea
d La
tenc
y
7%
Results: Read Latency
22
- Refresh Pausing gives ~5% performance improvement for an 8Gb chip
COM-MERCIAL
SPEC PARSEC BIOBENCH GMEAN1.021.031.041.051.061.071.081.091.101.11
Performance ComparisonRefresh Pausing No Refresh
Spee
dup
Results: Performance
23
Refresh Pausing more effective as chips density increases
8Gb 16Gb 32Gb1.0
1.1
1.2
1.3
1.4Impact of Density on Refresh PausingRefresh Pausing No Refresh
Spee
dup
Results: Impact of Chip Density
24
Introduction & Motivation
Refresh Operation: Background
Refresh Pausing
Evaluation
Alternative Proposals
Summary
Outline
25
• Elastic Refresh waits for idle period before issuing a refresh
• Estimates average inter-arrival time of memory request
3 unitsA
Request A
B
Request B
time
time
time
A
Request A
B
Request B4 units
Refresh
A
Request A
B
Request B
Refresh
7 units
Wait
No Refreshes
With Refreshes
Elastic Refresh
The “Wait and Watch” policy can increase wait times
Elastic Refresh for Scheduling Refresh [MICRO’10]
26
COM-MERCIAL
SPEC PARSEC BIOBENCH GMEAN0.90
0.95
1.00
1.05
1.10
1.15Comparision of Elastic RefreshElastic Refresh Refresh Pausing No Refresh
Spee
dup
Refresh Pausing outperforms Elastic Refresh
Comparison with Elastic Refresh
27
Reduce bundles size and have more bundles
TREFI TREFI/2
TRFC TRFC TRFC/2 TRFC/2 TRFC/2 TRFC/2
TREFI/2 TREFI/2
DDR4 x2 ModeDDR3 Distributed Mode
• In x2 mode, TREFI is reduced by 2 (x4 mode by 4)
• In x2 mode TRFC is reduced by 2 (x4 mode by 4)
Fine Grained Refresh to reduce contention of Refresh
DDR4 proposals: x2 and x4 modes
28
DDR4
x2
DDR4
x4
Paus
ing
No R
efre
sh
DDR4
x2
DDR4
x4
Paus
ing
No R
efre
sh
16Gb 32Gb
1.00
1.05
1.10
1.15
1.20
1.25
1.30
1.35
1.40
Spee
dup
DDR4 modes (x2 and x4) useful but not enough
Comparison with DDR4
29
Introduction & Motivation
Refresh Operation: Background
Refresh Pausing
Evaluation
Alternative Proposals
Summary
Outline
30
• DRAM relies on Refresh for data integrity
• Time for Refresh increases with chip density
• Refresh blocks read, increases read latency
• Refresh Pausing: make Refresh Interruptible
• Pausing provides 5% improvement for 8Gb, increases with higher density
• Applicable also to DDR4 (fine grained refresh)
Summary
31
THANK YOU
32
Refresh+Read• Reads operate on a rank• Refreshes may also operate on the same rank• DRAMs serve only a single request at a time
Scheduler
Read Queue
Reads
Rank
Refresh
33
Refresh Row Bundle
• TRFC : Time to refresh one bundle of rows • TREC : Current Recovery Time• TREFI : Time until next bundle refresh
Larger refresh-row bundle implies larger TRFC
TRFC
TRECREFRESH REFRESH
TREFI
Row
1
Row
n
Refresh Row Bundle
34
Hierarchically organized as Channels, Ranks and Banks
Chip
DRAM Organization
Rank 1
Rank 2
Channel
Banks
Rows
READ
35
Refresh ModesBurst and Distributed Mode
Bank
Rows
Rank
Chips
Refresh
Burst Mode
Refresh
In burst mode, all rows in all banks refresh simultaneously
Distributed Mode
Distributed mode: Only a few rows in all banks refresh; refresh is distributed in time
36
Transactions in DRAMsThree transactions of concern– Reads– Writes – Refreshes
DRAM
Processor Bus
ReadWrite
Refresh
Mismanagement of requests leads
to collisions!
A scheduler is needed to manage requests to DRAM
37
Temperature Sensitivity of Refresh Pausing
- Upto 22% increase in speedup for future chips
The savings of Refresh Pausing is higher whileoperating at high temperatures
<85C<85C
>85C>85C
No RefreshRefresh Pausing
No RefreshRefresh Pausing
0.00%5.00%
10.00%15.00%20.00%25.00%30.00%35.00%40.00%
Temperature Sensitivity
8Gb16Gb32Gb
38
Auto and Self Refresh• Special Refresh Modes for DRAMs
• Auto Refresh – Internal Counter issues pulses in distributed fashion (CBR and RAS only)
• Self Refresh – DRAM is internally refreshed at a power optimized rate (Activity == 0)
Self Refresh Modes are only used when DRAMs stay idle
39
Mitigating Penalty
• Pause a refresh bundle at row granularity
• TRPC = row cycle time + current recovery time• Current recovery time is small for individual rows
• Thus refreshes can be made interruptible
a. Maximum Refresh penalty without pausing is TRFC
b. Maximum Refresh penalty with pausing is to TRPC