Memory Power Management via Dynamic Voltage/Frequency Scaling

35
Memory Power Management via Dynamic Voltage/Frequency Scaling Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel) Chris Fallin (CMU) Onur Mutlu (CMU)

description

Memory Power Management via Dynamic Voltage/Frequency Scaling. Howard David (Intel) Eugene Gorbatov (Intel) Ulf R. Hanebutte (Intel). Chris Fallin (CMU) Onur Mutlu (CMU). Memory Power is Significant. Power consumption is a primary concern in modern servers - PowerPoint PPT Presentation

Transcript of Memory Power Management via Dynamic Voltage/Frequency Scaling

Page 1: Memory Power Management via Dynamic Voltage/Frequency Scaling

Memory Power Management viaDynamic Voltage/Frequency

Scaling

Howard David (Intel)Eugene Gorbatov (Intel)Ulf R. Hanebutte (Intel)

Chris Fallin (CMU)Onur Mutlu (CMU)

Page 2: Memory Power Management via Dynamic Voltage/Frequency Scaling

2

Memory Power is Significant Power consumption is a primary concern in modern

servers Many works: CPU, whole-system or cluster-level

approach But memory power is largely unaddressed Our server system*: memory is 19% of system

power (avg) Some work notes up to 40% of total system power

Goal: Can we reduce this figure?

lbm

Gem

sFDT

Dm

ilcle

slie3

dlib

quan

tum

sopl

exsp

hinx

3m

cfca

ctus

ADM gc

cde

alII

tont

obz

ip2

gobm

ksje

ngca

lcul

ixpe

rlben

chh2

64re

fna

md

grom

acs

gam

ess

povr

ayhm

mer

0100200300400

System PowerMemory Power

Pow

er (W

)

*Dual 4-core Intel Xeon®, 48GB DDR3 (12 DIMMs), SPEC CPU2006, all cores active. Measured AC power, analytically modeled memory power.

Page 3: Memory Power Management via Dynamic Voltage/Frequency Scaling

3

Existing Solution: Memory Sleep States? Most memory energy-efficiency work uses sleep

states Shut down DRAM devices when no memory requests

active But, even low-memory-bandwidth workloads keep

memory awake Idle periods between requests diminish in multicore

workloads CPU-bound workloads/phases rarely completely cache-

resident

lbm

Gem

sFDT

D

milc

lesli

e3d

libqu

antu

m

sopl

ex

sphi

nx3

mcf

cact

usAD

M gcc

deal

II

tont

o

bzip

2

gobm

k

sjeng

calc

ulix

perlb

ench

h264

ref

nam

d

grom

acs

gam

ess

povr

ay

hmm

er

0%2%4%6%8%

Sleep State Residency

Tim

e Sp

ent i

n Sl

eep

St

ates

Page 4: Memory Power Management via Dynamic Voltage/Frequency Scaling

4

Memory Bandwidth Varies Widely Workload memory bandwidth requirements vary

widely

Memory system is provisioned for peak capacity often underutilized

lbm

GemsFD

TD milc

leslie

3d

libquan

tumsoplex

sphinx3 mcf

cactusA

DM gcc dealII

tontobzip

2go

bmksje

ng

calcu

lix

perlben

ch

h264refnam

d

gromacs

gamess

povray

hmmer0

2

4

6

Memory Bandwidth for SPEC CPU2006

Band

wid

th/c

hann

el (G

B/s)

Page 5: Memory Power Management via Dynamic Voltage/Frequency Scaling

5

Memory Power can be Scaled Down DDR can operate at multiple frequencies reduce

power Lower frequency directly reduces switching power Lower frequency allows for lower voltage Comparable to CPU DVFS

Frequency scaling increases latency reduce performance Memory storage array is asynchronous But, bus transfer depends on frequency When bus bandwidth is bottleneck, performance

suffers

CPU Voltage/Freq.

System Power

Memory Freq.

System Power

↓ 15% ↓ 9.9% ↓ 40% ↓ 7.6%

Page 6: Memory Power Management via Dynamic Voltage/Frequency Scaling

6

Observations So Far Memory power is a significant portion of total power

19% (avg) in our system, up to 40% noted in other works

Sleep state residency is low in many workloads Multicore workloads reduce idle periods CPU-bound applications send requests frequently

enoughto keep memory devices awake

Memory bandwidth demand is very low in some workloads

Memory power is reduced by frequency scaling And voltage scaling can give further reductions

Page 7: Memory Power Management via Dynamic Voltage/Frequency Scaling

7

DVFS for Memory Key Idea: observe memory bandwidth utilization,

then adjust memory frequency/voltage, to reduce power with minimal performance loss

Dynamic Voltage/Frequency Scaling (DVFS) for memory

Goal in this work: Implement DVFS in the memory system, by: Developing a simple control algorithm to exploit

opportunity for reduced memory frequency/voltage by observing behavior

Evaluating the proposed algorithm on a real system

Page 8: Memory Power Management via Dynamic Voltage/Frequency Scaling

8

Outline Motivation

Background and Characterization DRAM Operation DRAM Power Frequency and Voltage Scaling

Performance Effects of Frequency Scaling

Frequency Control Algorithm

Evaluation and Conclusions

Page 9: Memory Power Management via Dynamic Voltage/Frequency Scaling

9

Outline Motivation

Background and Characterization DRAM Operation DRAM Power Frequency and Voltage Scaling

Performance Effects of Frequency Scaling

Frequency Control Algorithm

Evaluation and Conclusions

Page 10: Memory Power Management via Dynamic Voltage/Frequency Scaling

10

DRAM Operation Main memory consists of DIMMs of DRAM devices Each DIMM is attached to a memory bus (channel) Multiple DIMMs can connect to one channel

Memory Bus (64 bits)

/8 /8 /8 /8 /8 /8 /8 /8to Memory Controller

Page 11: Memory Power Management via Dynamic Voltage/Frequency Scaling

11

Inside a DRAM Device

Bank 0

Sense AmpsColumn Decoder

Row

Deco

der ODT

Recie

ver

sDr

ive

rs

Regi

ster

s

Writ

e FI

FO

Banks• Independent

arrays• Asynchronous:

independent of memory bus speed

I/O Circuitry• Runs at bus speed• Clock sync/distribution• Bus drivers and receivers• Buffering/queueing

On-Die Termination• Required by bus electrical

characteristicsfor reliable operation

• Resistive element that dissipates power when bus is active

Page 12: Memory Power Management via Dynamic Voltage/Frequency Scaling

12

Effect of Frequency Scaling on Power Reduced memory bus frequency: Does not affect bank power:

Constant energy per operation Depends only on utilized memory bandwidth

Decreases I/O power: Dynamic power in bus interface and clock circuitry

reduces due to less frequent switching Increases termination power:

Same data takes longer to transfer Hence, bus utilization increases

Tradeoff between I/O and termination results in a net power reduction at lower frequencies

Page 13: Memory Power Management via Dynamic Voltage/Frequency Scaling

13

Effects of Voltage Scaling on Power Voltage scaling further reduces power because all

parts of memory devices will draw less current (at less voltage)

Voltage reduction is possible because stable operation requires lower voltage at lower frequency:

1333MHz 1066MHz 800MHz1

1.11.21.31.41.51.6

Minimum Stable Voltage for 8 DIMMs in a Real System

Vdd for Power Model

DIM

M V

olta

ge (V

)

Page 14: Memory Power Management via Dynamic Voltage/Frequency Scaling

14

Outline Motivation

Background and Characterization DRAM Operation DRAM Power Frequency and Voltage Scaling

Performance Effects of Frequency Scaling

Frequency Control Algorithm

Evaluation and Conclusions

Page 15: Memory Power Management via Dynamic Voltage/Frequency Scaling

15

How Much Memory Bandwidth is Needed?

lbm milc

libquantum

sphinx3

cactusA

DMdealII

bzip2

sjeng

perlbench

namd

gamess

hmmer01234567

Memory Bandwidth for SPEC CPU2006

Band

wid

th/c

hann

el (G

B/s)

Page 16: Memory Power Management via Dynamic Voltage/Frequency Scaling

16

Performance Impact of Static Frequency Scaling Performance impact is proportional to bandwidth

demand Many workloads tolerate lower frequency with

minimal performance droplb

mG

emsF

DTD

milc

lesli

e3d

libqu

antu

mso

plex

sphi

nx3

mcf

cact

usAD

M gcc

deal

IIto

nto

bzip

2go

bmk

sjen

gca

lcul

ixpe

rlben

chh2

64re

fna

md

grom

acs

gam

ess

povr

ayhm

mer

01020304050607080

Performance Loss, Static Frequency Scaling

1333->8001333->1066

Perf

orm

ance

Dro

p (%

)

lbm

Gem

sFD

TD milc

lesli

e3d

libqu

antu

mso

plex

sphi

nx3

mcf

cact

usAD

M gcc

deal

IIto

nto

bzip

2go

bmk

sjen

gca

lcul

ixpe

rlben

chh2

64re

fna

md

grom

acs

gam

ess

povr

ayhm

mer

0

2

4

6

8Performance Loss, Static Frequency Scaling

1333->8001333->1066

Perf

orm

ance

Dro

p (%

)

:::: :::: :::: :::: :::: :: :: ::

Page 17: Memory Power Management via Dynamic Voltage/Frequency Scaling

17

Outline Motivation

Background and Characterization DRAM Operation DRAM Power Frequency and Voltage Scaling

Performance Effects of Frequency Scaling

Frequency Control Algorithm

Evaluation and Conclusions

Page 18: Memory Power Management via Dynamic Voltage/Frequency Scaling

18

Memory Latency Under Load At low load, most time is in array access and bus

transfer small constant offset between bus-frequency latency curves

As load increases, queueing delay begins to dominate

bus frequency significantly affects latency

0 1000 2000 3000 4000 5000 6000 7000 80006090

120150180

Memory Latency as a Function of Bandwidth and Mem Frequency

Utilized Channel Bandwidth (MB/s)

Late

ncy

(ns)

Page 19: Memory Power Management via Dynamic Voltage/Frequency Scaling

19

Control Algorithm: Demand-Based Switching

After each epoch of length Tepoch:Measure per-channel bandwidth BWif BW < T800 : switch to 800MHzelse if BW < T1066 : switch to 1066MHzelse : switch to 1333MHz

0 1000 2000 3000 4000 5000 6000 7000 80006090

120150180

Memory Latency as a Function of Bandwidth and Mem Frequency800MHz 1067MHz 1333MHz 800-fit

Utilized Channel Bandwidth (MB/s)

Late

ncy

(ns)

T1066T800

Page 20: Memory Power Management via Dynamic Voltage/Frequency Scaling

20

Implementing V/F Switching Halt Memory Operations

Pause requests Put DRAM in Self-Refresh Stop the DIMM clock

Transition Voltage/Frequency Begin voltage ramp Relock memory controller PLL at new frequency Restart DIMM clock Wait for DIMM PLLs to relock

Begin Memory Operations Take DRAM out of Self-Refresh Resume requests

C Memory frequency already adjustable staticallyC Voltage regulators for CPU DVFS can work for memory DVFSC Full transition takes ~20µs

Page 21: Memory Power Management via Dynamic Voltage/Frequency Scaling

21

Outline Motivation

Background and Characterization DRAM Operation DRAM Power Frequency and Voltage Scaling

Performance Effects of Frequency Scaling

Frequency Control Algorithm

Evaluation and Conclusions

Page 22: Memory Power Management via Dynamic Voltage/Frequency Scaling

22

Evaluation Methodology Real-system evaluation

Dual 4-core Intel Xeon®, 3 memory channels/socket 48 GB of DDR3 (12 DIMMs, 4GB dual-rank, 1333MHz)

Emulating memory frequency for performance Altered memory controller timing registers (tRC,

tB2BCAS) Gives performance equivalent to slower memory

frequencies

Modeling power reduction Measure baseline system (AC power meter, 1s

samples) Compute reductions with an analytical model (see

paper)

Page 23: Memory Power Management via Dynamic Voltage/Frequency Scaling

23

Evaluation Methodology Workloads

SPEC CPU2006: CPU-intensive workloads All cores run a copy of the benchmark

Parameters Tepoch = 10ms Two variants of algorithm with different switching

thresholds: BW(0.5, 1): T800 = 0.5GB/s, T1066 = 1GB/s BW(0.5, 2): T800 = 0.5GB/s, T1066 = 2GB/s

More aggressive frequency/voltage scaling

Page 24: Memory Power Management via Dynamic Voltage/Frequency Scaling

24

Performance Impact of Memory DVFS Minimal performance degradation: 0.2% (avg), 1.7%

(max) Experimental error ~1%

lbm

Gem

sFDT

Dm

ilcle

slie3

dlib

quan

tum

sopl

exsp

hinx

3m

cfca

ctus

ADM gcc

deal

IIto

nto

bzip

2go

bmk

sjen

gca

lcul

ixpe

rlben

chh2

64re

fna

md

grom

acs

gam

ess

povr

ayhm

mer

AVG

-1

0

1

2

3

4

BW(0.5,1)BW(0.5,2)

Perf

orm

ance

Deg

rada

tion

(%)

Page 25: Memory Power Management via Dynamic Voltage/Frequency Scaling

25

Memory Frequency Distribution Frequency distribution shifts toward higher memory frequencies with more memory-intensive benchmarks

lbm milc

libquan

tum

sphinx3

cactusA

DMdea

lIIbzip

2sje

ng

perlben

chnam

d

gamess

hmmer0%

20%

40%

60%

80%

100%

13331066800

Page 26: Memory Power Management via Dynamic Voltage/Frequency Scaling

26

Memory Power Reduction Memory power reduces by 10.4% (avg), 20.5%

(max) lb

mGe

msF

DTD

milc

lesli

e3d

libqu

antu

mso

plex

sphi

nx3

mcf

cact

usAD

M gcc

deal

IIto

nto

bzip

2go

bmk

sjeng

calc

ulix

perlb

ench

h264

ref

nam

dgr

omac

sga

mes

spo

vray

hmm

erAV

G

0

5

10

15

20

25

BW(0.5,1)BW(0.5,2)

Mem

ory

Pow

er R

educ

tion

(%)

Page 27: Memory Power Management via Dynamic Voltage/Frequency Scaling

27

System Power Reductionlb

mGe

msF

DTD

milc

lesli

e3d

libqu

antu

mso

plex

sphi

nx3

mcf

cact

usAD

M gcc

deal

IIto

nto

bzip

2go

bmk

sjeng

calc

ulix

perlb

ench

h264

ref

nam

dgr

omac

sga

mes

spo

vray

hmm

erAV

G

00.5

11.5

22.5

33.5

4

BW(0.5,1)BW(0.5,2)

Syst

em P

ower

Red

uctio

n (%

)

As a result, system power reduces by 1.9% (avg), 3.5% (max)

Page 28: Memory Power Management via Dynamic Voltage/Frequency Scaling

28

System energy reduces by 2.4% (avg), 5.1% (max) System Energy Reduction

lbm

Gem

s...

milc

lesli

e3d

libqu

a...

sopl

exsp

hinx

3m

cfca

ctu.

..gc

cde

alII

tont

obz

ip2

gobm

ksje

ngca

lcul

ixpe

rlb...

h264

ref

nam

dgr

omac

sga

mes

spo

vray

hmm

erAV

G-1

0

1

2

3

4

5

6

BW(0.5,1)BW(0.5,2)

Syst

em E

nerg

y Re

ducti

on (%

)

Page 29: Memory Power Management via Dynamic Voltage/Frequency Scaling

29

Related Work MemScale [Deng11], concurrent work (ASPLOS 2011)

Also proposes Memory DVFS Application performance impact model to decide voltage

and frequency: requires specific modeling for a given system; our bandwidth-based approach avoids this complexity

Simulation-based evaluation; our work is a real-system proof of concept

Memory Sleep States (Creating opportunity with data placement [Lebeck00,Pandey06], OS scheduling [Delaluz02], VM subsystem [Huang05]; Making better decisions with better models [Hur08,Fan01])

Power Limiting/Shifting (RAPL [David10] uses memory throttling for thermal limits; CPU throttling for memory traffic [Lin07,08]; Power shifting across system [Felter05])

Page 30: Memory Power Management via Dynamic Voltage/Frequency Scaling

30

Conclusions Memory power is a significant component of system power

19% average in our evaluation system, 40% in other work

Workloads often keep memory active but underutilized Channel bandwidth demands are highly variable Use of memory sleep states is often limited

Scaling memory frequency/voltage can reduce memory power with minimal system performance impact 10.4% average memory power reduction Yields 2.4% average system energy reduction

Greater reductions are possible with wider frequency/voltage range and better control algorithms

Page 31: Memory Power Management via Dynamic Voltage/Frequency Scaling

Memory Power Management viaDynamic Voltage/Frequency

Scaling

Howard David (Intel)Eugene Gorbatov (Intel)Ulf R. Hanebutte (Intel)

Chris Fallin (CMU)Onur Mutlu (CMU)

Page 32: Memory Power Management via Dynamic Voltage/Frequency Scaling

32

Why Real-System Evaluation? Advantages:

Capture all effects of altered memory performance System/kernel code, interactions with IO and peripherals, etc

Able to run full-length benchmarks (SPEC CPU2006) rather than short instruction traces

No concerns about architectural simulation fidelity Disadvantages:

More limited room for novel algorithms and detailed measurements

Inherent experimental error due to background-task noise, real power measurements, nondeterministic timing effects

For a proof-of-concept, we chose to run on a real system in order to have results that capture all potential side-effects of altering memory frequency

Page 33: Memory Power Management via Dynamic Voltage/Frequency Scaling

33

CPU-Bound Applications in a DRAM-rich system We evaluate CPU-bound workloads with 12 DIMMs:

what about smaller memory, or IO-bound workloads?

12 DIMMs (48GB): are we magnifying the problem? Large servers can have this much memory, especially for database or

enterprise applications Memory can be up to 40% of system power [1,2], and reducing its

power in general is an academically interesting problem

CPU-bound workloads: will it matter in real life? Many workloads have CPU-bound phases (e.g., database scan or

business logic in server workloads) Focusing on CPU-bound workloads isolates the problem of varying

memory bandwidth demand while memory cannot enter sleep states, and our solution applies for any compute phase of a workload

[1] L. A. Barroso and U. Holzle. “The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.” Synthesis Lectures on Computer Architecture. Morgan & Claypool, 2009.[2] C. Lefurgy et al. “Energy Management for Commercial Servers.” IEEE Computer, pp. 39—48, December 2003.

Page 34: Memory Power Management via Dynamic Voltage/Frequency Scaling

34

Combining Memory & CPU DVFS? Our evaluation did not incorporate CPU DVFS:

Need to understand effect of single knob (memory DVFS) first

Combining with CPU DVFS might produce second-order effects that would need to be accounted for

Nevertheless, memory DVFS is effective by itself, and mostly orthogonal to CPU DVFS: Each knob reduces power in a different component Our memory DVFS algorithm has neligible performance

impact negligible impact on CPU DVFS CPU DVFS will only further reduce bandwidth demands

relative to our evaluations no negative impact on memory DVFS

Page 35: Memory Power Management via Dynamic Voltage/Frequency Scaling

35

Why is this Autonomic Computing? Power management in general is autonomic: a system

observes its own needs and adjusts its behavior accordingly Lots of previous work comes from architecture community, but crossover in ideas and approaches could be beneficial

This work exposes a new knob for control algorithms to turn, has a simple model for the power/energy effects of that knob, and observes opportunity to apply it in a simple way

Exposes future work for: More advanced control algorithms Coordinated energy efficiency across rest of system Coordinated energy efficiency across a cluster/datacenter,

integrated with memory DVFS, CPU DVFS, etc.