1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

70
1/1/2011 CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    223
  • download

    0

Transcript of 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

Page 1: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

1/1/2011 CPU Power and Thermals 1

CPU Power Performance and Thermals 101

Efi Rotem, Jan 2011

Page 2: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 2

Compute Performance

1978 1982 1986 1990 1994 1998 2002 20061

10

100

1,000

10,000

Source: Dave Patterson

386

486

Pentium

P-2

P-4

Core-2 Duo

MMX

286 PC-XT8086

Page 3: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 3

Compute Performance

1978 1982 1986 1990 1994 1998 2002 20061

10

100

1,000

10,000

Source: Dave Patterson

PC-XT8086

386

486

Pentium

P-2

P-4

Core-2 Duo

MMX

286

• Compute performance has evolved our society• Industry, science, art, information,

entertainment …• Enabled compute in appliances and gadgets• Semicon industry aligned to drive performance• Transistor density• Frequency• Still need more performance but:• Harder to get • Power is becoming a limiter

Page 4: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 4

Compute Performance

1978 1982 1986 1990 1994 1998 2002 20061

10

100

1,000

10,000

Source: Dave Patterson

PC-XT8086

386

486

Pentium

P-2

P-4

Core-2 Duo

MMX

286

• Single thread Performance is harder to get• Looking at multi threading / processing:

• CMP, SMT, GPGPU (open-CL, CUDA etc)• Power is already a constraint for over a decade• More performance transistors power• Power killed P-4 family

1W

100W

Page 5: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 5

Outline• The basics of CPU power and energy

• Basics of cooling

• Power and thermal management

• “Turbo”

• Power constrained performance

Page 6: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

1/1/2011 CPU Power and Thermals 6

Power & EnergyEnergy• “The capacity for doing work”

– In CPU – S/W activity translates into real H/W “work”• Active and stand-by Energy - Important for

– Electric bills - lower energy per task lower bills– Battery life - lower energy per task longer battery life

• Measured over time (Power * Time)Power• Energy / Time• Important for

– Heat and Cooling– power delivery

• Instantaneous or over “thermally significant” time– mSec for local heating to minutes over the entire

platform– nSec for power delivery

Many technological implications

Page 7: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 7

Power-aware figures of merit• Power (P): Packaging / power delivery /

Cooling

• Energy (PD): battery life (mobile), Electrical bills

• Energy-delay (PD^2): performance and low power

• Energy-delay^2 (PD^3): emphasis on performance

Power-aware low power

Recent studies and market directions also look at constrained systems

• Maximize performance in power constraints

• Maximize performance in thermal constraints

• Minimize energy while mentioning QoS requirements

• Etc.

Page 8: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 8

Sources of power consumptionor

How computational work translates into real physical work

Page 9: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 9

Sources – Dynamic (Switch)• Due to signal activity

• ~ 80% of dynamic component

• 4 ways to reduce dynamic power

– Switched capacitance, Power Supply, Activity, Frequency

WDSW12

CL VDD2 f

IC (charge)

ID (discharge)

CL

Vdd

Gnd

In Out

0 1

In Out

1 0

t

t

t

t

IC

ID

Vin

Vout

Page 10: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 10

Sources – Dynamic (Short)• When both P and N transistors are “on”

• ~ 20% of dynamic

– More if CL is small of freq is high

• Isc depends on

– Temperature, Transistor sizes, Process technology, In/Out slope ratio

WDST12

tsc VDD Isc f

ISC

Vdd

Gnd

In Out

In Out

t

t

t

ISC

Vin

Vout

Page 11: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 11

Dynamic power modeling - DVFS• Dynamic power has capacitive behavior

• CPU operation frequency is a function on Vdd

– Roughly linearly depended on Vdd• Freq~K*Vdd• In a CPU, K also represents activity factor – average

number of transistors that toggle every clock – A function of the type of application executed

– Dynamic power is a function of Vdd2

• Pact=K* Vdd2 *Cdyn*Freq

– Combining the two results a cubic dependency• Pact~Freq3

• DVFS = Dynamic Voltage and Frequency Scaling– Going up or down the Cubic curve of power

frequency• High cost to achieve frequency• large power savings for some small frequency reduction

K(Vdd-Vt)a

Vdd

F=

Page 12: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 12

Sources - Static

• Main leakages

– Subthreshold (ISUB)

– Gate direct tunneling (IOX)

• Other

– Reverse biased juction

– Hot carrier injection

– Puncthrough

– Oxide tunneling current

VGS

GATE

SOURCE DRAIN

BULK

Subthreshold leakage(ISUB)

Gate leakage(IOX)

channel (VTH)

other leakages

Gate

Source

Drain

Bulk

TOX

WS Ileak VDD

● Combined die's gate width is a measure of leakage

● Reducing width reduces Ileak

but reduces performance

Ileak ISU IOX

Ileak Ki 1

N

Width i

Power = Current * Voltage

Page 13: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 13

Sources – Static (Subthreshold leak)

• Happens when transistor is “off” (VGS < VTH)

• Depends exponentially on VTH, temperature (V )

• Depends on transistor width

• Very sensitive to parameter variation

• Two ways to reduce it

– Set VGS to 0 (but info is lost)

– Increase VTH

• but performance is reduced, and• VDD (VGS) has to decrease with each process technology

– To reduce dynamic power, but mainly ...– To make current delivery to the devices feasible

» I = V/R => if V high and R low =>I high => metal migration

– Use SOI (Silicon-on-Insulator) process

ISU K1 Width eVTH

n V 1 eVGS

V

[Chandrakasan et al, “Design of high-performance microprocessor circuits” IEEE Press, 01]

VTH

determines how fast the transistor reaches

saturation: the smaller VTH

, the fastest transistor

fVDD VTH

VDD

1.3 currentprocesses

Page 14: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 14

Sources – Static (Subthreshold leak)• SOI (Silicon On Insulator)

– Use an insulator substrate as opposed to silicon substrate

• Lower substrate capacitances, denser structures (region boundaries are “crispier”)

• Channel (VTH) can be made smaller

– Increases performance• Or channel can be kept the same, and VDD is reduced

– Reduces subthreshold leakage

• Duat VTH technology

– Use low VTH for devices in critical path

– Use high VTH for the rest

– Needs more complex process technology and CAD tools

• Sleep transistors (multi-threshold CMOS or MTCMOS)

– High VTH transistors isolate the logic

– Increase area, delay

– No data retention

– For low voltage technologies it takes time to turn on/off

Page 15: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 15

Power/Performance abstractions• Low-level:

– Hspice

– PowerMill

• Medium-Level: – RTL Models

• Architecture-level:– PennState SimplePower

– Intel Tempest

– Princeton Wattch

– IBM PowerTimer

– Umich/Colorado PowerAnalyzer

Page 16: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 16

Power Trends with technology

Page 17: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 17

Process Tendencies

• New process, devices shrink

– How does this affect to the power of the chip?

– Example: 130nm to 90nm shrink• Shrink by a factor of 0.7• X' = X*0.7• Y' = Y*0.7• Z' = Z*0.7• VDD' = VDD*0.7 X

YZ

X '

Y 'Z '

Process shrink

Page 18: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 18

Moore’s law for power performance

• Theoretical scaling of new Process technology:– Linear Dimensions: Shrinks by 0.7– Area: Shrinks by 0.5– Capacitance: Shrinks by 0.7– Voltage: Scale down by 0.7– Frequency: Scale up by 1/0.7– Power Scale down by 0.5

• There’s additional 2X transistor and power budget for:– New features– Architectural extensions – Performance improvement

Half the areaHalf the power

Sustainable Performance improvement at same power consumption

Power = C * V2 * F + Leakage

Page 19: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 19

Recent Reality

• Practical scaling factors:– Linear Dimensions, Area and active capacitance continue to

shrink

– Interconnect impact increases

– Voltage: Roughly the same!!!

– Power = C * V2 * F + Leakage Roughly the same!!!

Smaller area~Same power

Higher power density

• The Power wall:

– Higher power density just to enable the process shrink• Harder to cool

– Any architectural additions come at cost on higher powerAlready got us to the power wall

Page 20: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 20

Sources – Dynamic vs. Static• Dynamic (Active) component

– Only when the circuit is doing “something”• Even if just being clocked

• Static (Leakage) component

– Always!!

In the past static was not a problem

Today it is becoming a significant portion of the total power

Page 21: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 21

Product trends optionsRelative Performance

0%

100%

200%

300%

400%

500%

600%

Banias Dothan Merom Nehalem Gesher Gesher + Gesher ++ Gesher +++

Arch Generation

Rel

ativ

e P

erfo

rman

ce

Power Limited Perfromance

Perfect Moor's Low

Thermal Limited Performance

Page 22: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

18/12/2008 CPU Power and Thermals 22

Page 23: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 23

Page 24: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 24

Why is power a problem

Page 25: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 25

Power and Energy implications• Active Energy

– Electricity bills Data centres and home

• Average Energy - Increase battery life

– High power more energy less battery time

• Power dissipation– Component and platform Cooling

• Package cost

– High power expensive package• heat dissipation and power delivery

• Reliability

– High temperature increased failure rate

Page 26: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 26

Energy savings in the Data center

SAVING ENERGY AT THE DATA CENTERS THAT ENABLE OUR E-LIVES

In 2006, the United States spent nearly $4.5 billion in energy costs to power the servers, cooling, and auxiliary infrastructure equipment in data centers. However, recent chip advances bring new means for saving energy in all arenas of data center operation.

Data centers provide the massive computing power necessary todrive the Internet and the ever-expanding global communicationsnetwork of satellites, fiber optics, and cell phones.Energy , ho wever , is proving to be the bigest chalengefacing data center advancement .

Source: http://www.sia-online.org

Computation accounts for only 35% of the server energy consumedNew chips reduce both active time power as well as standby powerEnergy Star initiative targets to reduce standby power

Page 27: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 27

Motivation – Package & Electrical

• Package cost– 45-75W => ~ $1/W

– >75W => ~$2,$3/W

• Electrical bills– ~ $1500/month/server rack

– Office equipment in US: ~5% of total commercial energy use.

[Borkar, “Design challenges of technology scaling” IEEE Micro, 19(4), 99][Gunther et al, “Managing the impact of increasing microprocessor power” Intel Tech Journal, 1Q, 02][Barroso et al, “Web search for a planet: the Google cluster architecture” 23(2), 03]

Page 28: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 28

Battery life• Form factor in valued by Notebook users

– Computation demands increase day by day

– Darwin's law: big, heavy, low performance or high maintenance species will die

– Mac-Air and Lenovo X300 introduce ultra-thin and light form-factors

– NetBook is gaining market acceptance

– Smart Internet & Wireless handheld devices

• From SIA data

– Battery operated devices 1999 -> 2005 have grown up 70%

– However, improvements in battery technology is < 50% in the same period

– Improvement in battery life need to come from power efficient computers

• Significant overall weight

– Batteries occupy 20% to 33% to the total device weight

[Powers, “Batteries for low power electronics” Proc of the IEEE, v83, n4, 95]

Page 29: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 29

Reliability

The Arrhenius Equation:Fr: mean time to failure at T

A: empirical constant

Ea: activation energy

K: Boltzmann’s constant

T: absolute temperature (‘K)

Failure mechanisms:Die metalization (Corrosion, Electromigration, Contact spiking)

Oxide (charge trapping, gate oxide breakdown, hot electrons)

Device (ionic contamination, second breakdown, surface-charge)

Die attach (fracture, thermal breakdown, adhesion fatigue)

Interconnect (wirebond failure, flip-chip joint failure)

Package (cracking, whisker and dendritic growth, lid seal failure)

Most of the above increase with T (Arrhenius)

Operation temperature limited to 85-100’C

TR

E

r

a

eAF **

Page 30: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 30

Reliability• On-chip temperature gradient

– Power is not distributed evenly over the chip

– Differences in temperatures at different portions of the device caused by unbalanced power consumption

– May produce mechanical stress

– May produce malfunctions if a single point reaches the maximum power allowed

0.22.6

5.07.3

9.7

9.7

7.3

5.0

2.6

0.2

0

50

100

150

200

250

300

Po

wer

Den

sit

y

( cm

2)

x (mm)

y (mm)

250-300

200-250

150-200

100-150

50-100

0-50

Page 31: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 31

                                                                                       

Power and cooling

Source: Tom’s Hardware

Page 32: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 32

Heat mechanisms

• Conduction

• Convection

• Radiation

• Phase change

• Heat storage

Page 33: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 33

Conduction• Similar to electrical conduction (e.g. metals are good conductors)

• Heat flow from high energy to low energy

• Microscopic (vibration, adjacent molecules, electron transport)

• No major displacement of molecules

• Need a material: typically in solids (fluids: distance between mol)

• Typical example: thermal “slug”, spreader, heatsink

Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001

A

Page 34: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 34

Conduction

Different materials conductivity

(not a strong function of temp.)

Si – more variation

Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001

Page 35: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 35

Convection

• Macroscopic (bulk transport, mix of hot and cold, energy storage)

• Need material (typically in fluids, liquid, gas)

• Natural vs. forced (gas or liquid)

• Typical example: heatsink (fan), liquid cooling

Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001

Page 36: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 36

Radiation

• Electromagnetic waves (can occur in vacuum)

• Negligible in typical applications

• Sometimes the only mechanism (e.g. in space)

Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001

Page 37: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 37

Surface-to-surface contacts

• Not negligible, heat crowding

• Thermal greases (can “pump-out”)

• Phase Change Films (undergo a transition from solid to semi-solid with the application of heat)

Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001

Page 38: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 38

Phase-changeThermal solutions evolution:• Natural air cooling

• Forced-air cooling

• Liquid cooling

• Phase change (e.g. heat pipe)

• Refrigeration

Phase change:a. Solid changing to a liquid—fusion, or melting,

b. Liquid changing to a vapor—evaporation, also boiling,

c. Vapor changing to a liquid—condensation,

e. Liquid changing to a solid—crystallization, or freezing,

f. Solid changing to a vapor—sublimation,

g. Vapor changing to a solid—deposition.

Page 39: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 39

                                                                                       

Power and cooling

Source: Tom’s Hardware

Page 40: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 40

CPU Cooling mechanisms• Actually we transfer heat and not cool

– Transfers heat from hot junction to a cooler ambient

• Consist of Package, Heat Pipe, Heat Sink, fan and thermal attach

Cooling Challenges:

• Junction to ambient heat transfer

• Thermodynamic limits

• Ergonomics constraints

– Skin temp, Exhaust temp, Fan noise

• Total platform power increases

Tj

DiePackage

TIM1

IHS

Main Board - Mobile

Ta

TIM2Tc

CPU coolingAir inlet

Platform Cooling air

www.intel.com/cd/channel/reseller Technical reference - 97374

Page 41: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 41

Simplistic steady state model

• Can be approximated as DT = P * Q– Q is “Thermal resistance” –Similar to DV = I * R–T is measured in oC, P in Watts, Q in oC/W.

• As in resistors we see:–Serial: = Q Q1 + Q2

–Parallel: 1/ = 1/Q Q1 + 1/Q2 • Q behaves like resistance

–Grows with length–Reduced with area growth–For a piece of silicon, we can

measure R = Q per area: oC/(W*cm2)• Simplified model

–Steady state modeling –Realistic models take into account power distribution

maps

1

2

1 2

P = P1 = P2

= 1 + 2

DT = DT1 +

DT2

P = P1 + P2

1/ = 1/1 +1/2

DT = DT1 = DT2

P

Page 42: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 42

Some calculations…

• Only Core active, ignore cache– Tj core is 90oC

–Ambient Temp is 35oC

–Die size = 1.4Cm2

• Thermal resistance:1.2 + 0.5/1.4= 1.56 oC/W

• Temperature differenceDT = 90-35= 55oC

• Cooling capabilityP= DT/ ja= 55/1.56=35W

Ta = 35o

Tc

Corejc = 0.5 cm2*Co/W

Heat Sink ca = 1.2 Co/W

Tj core = 90o

Tj cache

35W is the highest allowable power which causes Tj max (90’C in this example)Lower power lower temperatureHigher power higher temperature that violate spec.

Page 43: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 43

Real Power Map1

3

5

7

9

11

13

15

17

19

21

S 1 S 3 S 5 S 7 S 9 S 11 S 13 S 15 S 17 S 19 S 21

275-300

250-275

225-250

200-225

175-200

150-175

125-150

100-125

75-100

50-75

25-50

0-25

Pentium-M Power Map

• Power is not uniformly distributed

–Hot spots may determine thermal limitations – Tj=hot spot• High power density is harder to cool

– Effectively increases thermal resistance

Wattch, a framework for analyzing and optimizing microprocessor power dissipation at the architecture-le

Page 44: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 44

Simplistic dynamic thermal model

Electrical-thermal duality

V temp (T)

I power (P)

R thermal resistance (Rth)

C thermal capacitance (Cth)

RC time constant

KCL

differential eq. I = C · dV/dt + V/R

difference eq. V = I/C · t + V/RC · t

thermal domain T = P/C · t + T/RC · t

(T = T_hot – T_amb)

T_hot

T_amb

Page 45: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 45

Platform Thermal Boundary Conditions

Noise

Fan noise 35 dBA

Air Flow

Side or bottom air inlet and outlet vents with

minimal grill obstruction

Ambient Temp

35 ºC

Skin TemperaturesAverage 15 ºC + ambientPeak top 20 ºC + ambient

Peak bottom 25 ºC + ambient

Platform featuresPower effects?

Small form factor dominated by

thermodynamic limit

Page 46: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 46

                                                                                       

Power Management

Page 47: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 47

Power Management Fundamentals

• Typical CPU usage varies over time– Bursts of high utilization & Long idle periods

• CPU optimize power and energy consumption– High power when high performance is needs

– Low power at low activity or idle

Page 48: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 48

Power Management Fundamentals

• CPU power management is used to control:

– Power active and energy• Meet platform component cooling capability• Power supply and power delivery • Total platform thermals • Control fans (active) and passive cooling (throttle)

– Active energy and Battery life• Maximize platform operation time

Page 49: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 49

ACPI Control

• Operating system feature– Industry standard, supported by MS, Linux and

others

– Manages energy, power and thermal

– Value and policy defined in BIOS

– Uses user preferences

Page 50: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 50

ACPI Terminology

• Power, thermal and energy management – Sleep states – C states

• C0 – Active state. CPU running and executing instructions

• C1 – Halt• C3…C7 deeper C states

– Platform Off states• S1-S5

– Energy/Performance states – P States• P0 – Highest voltage and frequency• P1, P2, … Lower voltage and frequency states• Pn lowest voltage point – most energy efficient point

– Thermal control – T states

Page 51: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 51

C-states• Make sure CPU dose noting very well

• Whenever OS have no more jobs to schedule– CPU enters sleep state

– Tradeoff between power and latency• Deeper sleep more power savings longer to

wake

• CPU need really efficient sleep states– Close every non used transistor

– Exit really fast

– Algorithms to figure out

• Complex platform dependencies

Page 52: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 52

P state overview

• CPU frequency is dependent on operation voltage

• Operation freq are called P-states = Performance states

– P0 is the highest frequency

– P1,2,3… are lower frequencies

– Pn is the min Vcc point = Energy efficient point

• DVFS

– Power = Cdyn*Vcc2*F• F = K*Vcc Power~Vcc3

– Pn is the most energy efficient point• Execution time increases with frequency• Power decreases with Vcc3 E~Vcc2

P0

P1

Pn

Freq

Vcc

P2

Page 53: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 53

Operating system control - ACPI• CPU exposes the two extreme frequency points

• BIOS builds a P-state table read by the OSPM– OSPM = Operating System Power Management kernel (MS)

• OSPM tracks CPU activity via C-state residency– Heuristics to move P state up or down

– All OS algorithms on the market are biased towards performance

– User interface to define policy

– Controls P-state via WRMSR command

• An axiom – CPU need to honor OS request– Unless emergency

P-state Command Frequency

P0 0x3f 3.0GHz

P1 0x3e 2.8GHz

P2 0x3b 2.6GHz

Page 54: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 54

Operating system control - ACPI• CPU exposes the two extreme frequency points

• BIOS builds a P-state table read by the OSPM– OSPM = Operating System Power Management kernel (MS)

• OSPM tracks CPU activity via C-state residency– Heuristics to move P state up or down

– All OS algorithms on the market are biased towards performance

– User interface to define policy

– Controls P-state via WRMSR command

• An axiom – CPU need to honor OS request– Unless emergency

P-state Command Frequency

P0 0x3f 3.0GHz

P1 0x3e 2.8GHz

P2 0x3b 2.6GHz

Page 55: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 55

Putting it all together – Demand Based control

• CPU running at max power and frequency

• Periodically enters C1

0

2

4

6

8

10

12

14

16

18

20

Pow

er [W

]

C1

C0P0

Time

Page 56: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 56

Putting it all together – Demand Based control

• Going into idle period– Gradually enters deeper C states

– Controlled by OS

0

2

4

6

8

10

12

14

16

18

20

Time

Pow

er [W

]

C2C3

C4C1

C0P0

Page 57: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 57

Putting it all together – Demand Based control

• Tracking CPU utilization history– OS identifies low activity

– Switches CPU to lower P state

0

2

4

6

8

10

12

14

16

18

20

Time

Pow

er [W

]

C2C3

C4

C0P1

C1

C0P0

Page 58: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 58

Putting it all together – Demand Based control

• CPU enters Idle state again

0

2

4

6

8

10

12

14

16

18

20

Time

Pow

er [W

]

C2C3

C4

C0P1

C2C3

C4C1

C0P0

Page 59: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 59

0

2

4

6

8

10

12

14

16

18

20

Putting it all together – Demand Based control

C2C3

C4

C0P1

C0P2C2

C3C4

C1

Time

Pow

er [W

]

C0P0

• Further lowering the P state

• DVD play runs at lowest P state

Page 60: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 60

Putting it all together - Summary

• CPU power consumption is well managed

– Intel® Core 2 max power equals 35W

– Average power as low as 1W at typical workload

Page 61: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 61

ACPI Thermal Control

• Platform defines thermal zones

– Manages active policy – fans

– Passive policy - throttling

• Uses the temperature reading

– DVS or clock throttling used for power control• Implements controller algorithm

– Fan on/off and speed used for active cooling

• Implements PD controller

– Thermal equation

• User policy defines preference

– Bias to more performance, battery life, quite system

_TMP = 60

-70-

-65-

-60-

-55-

-50-

-45-

-40-

-35-

-30-

-25-

-95-

-85-

-90-

-80-

-75-

_CRT

_PSV

_AC0

_AC1

Page 62: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 62

Formal Feedback Control• Regulatory control problem: hold value to a

specified setpoint

– Example: temperature

– Proved that PID controller will not allow temperature to exceed setpoint by more than 0.02°• Max power dissipation, thermal dynamics,

sampling rate max overshoot• This precision is excessive but illustrates the value of

formal feedback control theory

Page 63: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 63

                                                                                       

Turbo Basics

Page 64: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 64

Frequency headroom exists

Page 65: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

Headroom exists

• Headroom exists in applications

– Less stressful workloads

– Fewer threads then cores (e.g. single thread on multi core)

• Headroom exists in system

– Better cooling, lower ambient temperature

– Normal distribution of parameters

– Concurrency of activities (GFX vs. IA, CPU vs. rest of platform)

• Headroom exists over time

– Heat capacity of cooling allows busts of high power

– Benefits interactive work

1/12011 CPU Power and Thermals 65

Page 66: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 66

Thermal Design Power (TDP)

• Power Virus – synthetic high power• Most useful applications consume lower power• Worse case design drives high system cost• Most systems designed for real applications• Highly integrated parts further increase dynamic range

Source: Bob Jackson

Power Distribution of Common Application

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Ap

plic

ati

on

Rat

io Typical app ratio is ~80% of VirusGames

Office Apps& Benchmarks

SimpleMulti-Media

Power Virus

Applications sorted by power

Worse case design

TDP used to indicate worse case application AND cooling requirement

Page 67: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 67

Living in the power wall• P1 is the guaranteed frequency - defined by the

“power wall”– Can always meet P1 at any workload condition

• All cores + GPU at max load• Worse case part in worse case ambient

– Going down and “hiding” the power wall

• Pn is the max energy efficient state– Driven to this point by OS to maximize battery life

– Frequency below Pn is useful for thermal control

• OS controls Pn-P1 for performance

• If more performance is needed request P0 – Control handed to H/W

– Only P1 is guaranteed

– Usually will result in higher performance

– Fullu managed by CPU H/W

“Turbo”H/W

Control

OS VisibleStates

OS Control

T-state &Throttle

P1

Pn

freq

uenc

y

P0

Means to achieve ST performance in thermal constraints

Page 68: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

Intel® Enhanced Speed Step Technology

Time

P-statePower

C6

Actual instantaneous power – Brick and Battery !

Hard LimitMax Icc

“Dynamic” Turbo

“TDP”

5 Sec exp. Average

20 Sec exp. average

C0 P0

P > TDP:

Dynamic turbo

Sust

ain

pow

er

Page 69: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 69

“Intel Dynamic Acceleration” (IDA)

If 1 of 2 cores sleep (<C-3), Turbo the other up. Also works on 2 of 4-core on QC parts

Work Load – Partial Core(s) and Multi thread

Like IDA above, but increase frequency as long as Core Power < Core TDP Budget.

Package Turbo Shares budge between CPU cores and other blocks like GPU

Dynamic – Full Make use of platform, environmental condition, parts and platform variance.External driver/ SW based application uses CPU capabilities to reports parameters and control power

Turbo operation modesP

latf

orm

OE

Ms

i7In

tel

® C

ore

2

Operate a symmetric architecture asymmetrically to maximize goodnessMeant to provide single thread performance in power constraints

Page 70: 1/1/2011CPU Power and Thermals 1 CPU Power Performance and Thermals 101 Efi Rotem, Jan 2011.

CPU Power and Thermals 70

Power constraint performance study