Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

24
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

description

Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs. Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech. Motivation. Increase in DRAM power consumption Increasing DRAM density - PowerPoint PPT Presentation

Transcript of Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

Page 1: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and

3D Die-Stacked DRAMs

Mrinmoy Ghosh

Hsien-Hsin S. Lee

School of Electrical and Computer Engineering

Georgia Tech

Page 2: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

2/21Ghosh & Lee, Smart Refresh

Motivation

Increase in DRAM power consumption• Increasing DRAM density

• Ability to put more DIMMs in a computing system

• Refresh is a major component of DRAM energy – up to 1/3 of DRAM energy 1

DRAM energy is a major component of system energy

(consumes up to 10W)

1 M.Viredaz and D. Wallach, “Power Evaluation of a Handheld computer: A Case Study”, Technical report, Compaq WRL, 2001.

Page 3: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

3/21Ghosh & Lee, Smart Refresh

Outline

• Redundancy in conventional DRAM refresh techniques

• Smart Refresh architecture

• Our technique for 3D die-stacked DRAMs on processors

• Results

Page 4: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

4/21Ghosh & Lee, Smart Refresh

Current Refresh Policies

• Row Address Strobe (RAS) Only Refresh

• CAS Before RAS Refresh

MemoryController

DRAM Module

DRAM Module

MemoryController

RRAR

RRARAddr Bus

WE

CAS

RAS

Addr Bus

WE

CAS

RAS

Assert RAS

Row Address

Refresh Row

Assert RAS

Refresh Row

Assert CAS

WE High

Increment RRAR

Page 5: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

5/21Ghosh & Lee, Smart Refresh

Redundancy in Existing DRAM Refresh Techniques

Each row accessed as soon as it is to be refreshed

Refresh of DRAM is not required if the row is accessed

Time

Refresh Time

for Row 0

Refresh Time

for Row 1

Refresh Time

for Row 2

Refresh Time

for Row 3

Mem access Mem access Mem access Mem accessMem Refresh Mem Refresh Mem Refresh Mem Refresh

Page 6: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

6/21Ghosh & Lee, Smart Refresh

Smart Refresh

A countdown counter for each DRAM row

The counter decrements to zero just before the row needs refreshing

Update Counter

Circuit

Countdown Counters

Pending Refresh

Request Queue

Memory ControllerDRAM Module

Page 7: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

7/21Ghosh & Lee, Smart Refresh

Smart Refresh

Implemented using RAS-only refresh

Provides better energy savings than CBR refresh

Update Counter

Circuit

Countdown Counters

Pending Refresh

Request Queue

Memory ControllerDRAM Module

Page 8: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

8/21Ghosh & Lee, Smart Refresh

Naïve (Simultaneous) Counter Updates

3 3 … 32 2 … 2

Simultaneous update causes burst refresh

Solution? If the counters are initialized to different initial values

1 1 … 1

Counters initialized to max after access/ refresh

Refresh if counter = 0

0 0 … 03 3 … 3

Page 9: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

9/21Ghosh & Lee, Smart Refresh

Naïve (Simultaneous) Counter Updates

3 0 … 2

One fourth of the counters simultaneously become zero => Burst refresh situation

Solution? Staggering of counter updates

1 2 … 02 3 … 10 1 … 30 1 … 3

Page 10: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

10/21Ghosh & Lee, Smart Refresh

Staggered Counter Updates

At most K simultaneous refreshes, K = number of logical segments.

Correctness condition: Interval between two counter updates must be enough to handle K refresh operations.

Segment 1 Segment 2 Segment 8 1 2 ….. 16

T 0 2 … 0 0 2 … 0 0 2 … 0

1 2 ….. 16 1 2 ….. 16

T+1 ms 3 2 … 0 3 2 … 0 3 2 … 0T+2 ms 3 1 … 0 3 1 … 0 3 1 … 0T+16 ms 3 1 … 3 3 1 … 3 3 1 … 3

This Example:

Refresh Interval = 64 ms, All counters updated once within 16ms

Iterates over all the indeces four times within 64 ms

Page 11: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

11/21Ghosh & Lee, Smart Refresh

3D Die Stacking

Why stack DRAM on top of processors

– High density inter-die vias

– Short distance inter-die vias

– Lower power

– High throughput

Heat sink

Processor

DRAM (Thinned die)

Die-to-die vias

Page 12: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

12/21Ghosh & Lee, Smart Refresh

Smart Refresh for 3D DRAM Cache

• DRAM Cache Issues

– More accesses per cycle

– Higher temperature (90 C) higher refresh rates.

– Significant potential for Smart Refresh

Tags

Core0

Core1

L2 Cache64 MB

DRAM Cache

Off Chip DRAM

Memory

Page 13: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

13/21Ghosh & Lee, Smart Refresh

Other Applications of Smart Refresh

• Use programmable counters to keep rows off

• Implement Retention-aware DRAMs [HPCA-06]

• Change protocol to reduce address transmission overhead

Page 14: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

14/21Ghosh & Lee, Smart Refresh

Simulation:

Experimental Framework

Instruction stream

Simics(Full system

functional simulator)

Ruby(Cache

hierarchysimulator)

Memory references

DRAMsim (DRAM

simulator)

Power model:DRAM: DRAMsimCounters: Artisan SRAM generator

Workload:BiobenchSplash-2SpecInt 2000

Page 15: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

15/21Ghosh & Lee, Smart Refresh

DRAM Configurations

Parameter Conventional DRAM

3D die-stacked DRAM cache

Type DDR2 DDR2

Size 2 GB and 4 GB 64 MB

Rows 16384 16384

Frequency 667 MHz 667 MHz

Number of banks 4 and 8 4

Number of ranks 2 1

Number of columns

2048 128

Data width 64 64

Row buffer policy Open page Open page

Refresh interval 64 milliseconds 32 milliseconds

L2 cache size 1 MB 1 MB

Page 16: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

16/21Ghosh & Lee, Smart Refresh

# of Refreshes Per Second (4 GB DRAM)

Average reduction in number of refreshes per second = 40 %

Biobench SPLASH2 SPECint2000 2 Processes (SPECint2000)

GMEAN = 2,453,055

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

clus

talw

fast

a

hmm

er

mum

mer

phyl

ip

tiger

barn

es

chol

esky ff

t

fmm

luco

ntig

luno

ncon

tig

ocea

n-co

ntig

radi

x

wat

er-n

squa

red

wat

er-s

patia

l

eon

gcc

pars

er

perl

twol

f

vpr

gcc_

pars

er

gcc_

perl

gcc_

twol

f

pars

er_p

erl

pars

er_t

wol

f

perl_

twol

f

vpr_

gcc

vpr_

pars

er

vpr_

perl

vpr_

twol

f

Mill

ions

ref

resh

es /

sec

Baseline = 4,096,000

Page 17: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

17/21Ghosh & Lee, Smart Refresh

Refresh Energy Savings (4GB DRAM)

Average energy saving = 23.8%

Biobench SPLASH2 SPECint2000 2 Processes (SPECint2000)

GMEAN = 23.76%

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

clus

talw

fast

ah

mm

er

mum

me

rp

hyl

iptig

er

ba

rne

sch

ole

sky fft

fmm

luco

ntig

luno

nco

ntig

oce

an

-con

tigra

dix

wa

ter-

nsq

ua

red

wa

ter-

spat

ial

eo

ng

ccp

ars

er

pe

rltw

olf

vpr

gcc

_p

ars

er

gcc

_p

erl

gcc

_tw

olf

pa

rse

r_pe

rlp

ars

er_

two

lfp

erl_

two

lfvp

r_g

ccvp

r_p

ars

er

vpr_

pe

rlvp

r_tw

olf

Page 18: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

18/21Ghosh & Lee, Smart Refresh

Total DRAM Energy Savings (4 GB DRAM)

Average energy saving = 9.1% (up to 21% in perl_twolf)

No performance degradation

SPECint2000SPLASH2Biobench 2 Processes (SPECint2000)

GMEAN = 9.10%

0%

5%

10%

15%

20%

25%

clu

sta

lw

fast

a

hm

me

r

mu

mm

er

ph

ylip

tige

r

ba

rne

s

cho

lesk

y fft

fmm

luco

ntig

lun

on

con

tig

oce

an

-co

ntig

rad

ix

wa

ter-

nsq

ua

red

wa

ter-

spa

tial

eo

n

gcc

pa

rse

r

pe

rl

two

lf

vpr

gcc

_p

ars

er

gcc

_p

erl

gcc

_tw

olf

pa

rse

r_p

erl

pa

rse

r_tw

olf

pe

rl_

two

lf

vpr_

gcc

vpr_

pa

rse

r

vpr_

pe

rl

vpr_

two

lf

Page 19: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

19/21Ghosh & Lee, Smart Refresh

Total Energy Saving (64 MB 3D DRAM Cache)

Average energy saving = 6.9% (up to 12% in Tiger)

SPECint2000SPLASH2Biobench 2 Processes (SPECint2000)

GMEAN = 6.87%

0%

2%

4%

6%

8%

10%

12%

14%

clus

talw

fast

ahm

mer

mum

mer

phyl

iptig

er

barn

esch

oles

ky fft

fmm

luco

ntig

luno

ncon

tigoc

ean-

cont

igra

dix

wat

er-n

squa

red

wat

er-s

patia

l

eon

gcc

pars

erpe

rltw

olf

vpr

gcc_

pars

ergc

c_pe

rlgc

c_tw

olf

pars

er_p

erl

pars

er_t

wol

fpe

rl_tw

olf

vpr_

gcc

vpr_

pars

ervp

r_pe

rlvp

r_tw

olf

Page 20: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

20/21Ghosh & Lee, Smart Refresh

Conclusions

• Redundant refresh operations cost significant energy

• Smart refresh eliminates unnecessary periodic refreshes

• 11% (up to 17%) energy savings in conventional DRAMs

• 7% energy savings in 3D DRAM caches

• No performance impact

Page 21: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

Thank You!

Georgia TechECE MARS Labshttp://arch.ece.gatech.edu

Page 22: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

22/21Ghosh & Lee, Smart Refresh

Correctness of Smart Refresh

Page 23: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

23/21Ghosh & Lee, Smart Refresh

No overflow of refresh queue

Typical Refresh Time = 70 ns

Counter Update Period = 8ms/((16384)/8)

= 3906 ns

Number of refreshes possible = 56

Number of refreshes required = 8

Page 24: Mrinmoy Ghosh   Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

24/21Ghosh & Lee, Smart Refresh

Area Overhead

Number of counters = 16384*2*4 = 131072

Space for 3 bit counters = 131072*3/(8*1024)

= 48kB

Ways to mitigate Area Overhead;

Use 2 bit counters.

Have DRAM module block for counters