EE241 - Spring 2005 · EE241 - Spring 2005 Advanced Digital Integrated Circuits ... (e.g. metals...

32
1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 20: Thermal design Guest Lecturer: Prof. Mircea Stan ECE Dept., University of Virginia 2 Thermal Design Why should you care about thermals? What do we mean by thermals? How do you model thermals? What can you do about thermals? Temperature-aware circuit design Thermal sensors References: - Intel® Technology Journal: http://developer.intel.com/technology/itj/ - IBM Journal of Research and Development: http://www.research.ibm.com/journal/rd/ - IEEE Transactions on Components and Packaging Technologies - IEEE Transactions on VLSI Systems - IEEE Journal on Solid-State Circuits

Transcript of EE241 - Spring 2005 · EE241 - Spring 2005 Advanced Digital Integrated Circuits ... (e.g. metals...

1

EE241 - Spring 2005Advanced Digital Integrated

Circuits

Lecture 20:

Thermal design

Guest Lecturer: Prof. Mircea Stan

ECE Dept., University of Virginia

2

Thermal Design

Why should you care about thermals?

What do we mean by thermals?

How do you model thermals?

What can you do about thermals?

Temperature-aware circuit design

Thermal sensors

References:

- Intel® Technology Journal: http://developer.intel.com/technology/itj/

- IBM Journal of Research and Development: http://www.research.ibm.com/journal/rd/

- IEEE Transactions on Components and Packaging Technologies

- IEEE Transactions on VLSI Systems

- IEEE Journal on Solid-State Circuits

2

3

Why should you care about thermals?

Temperature affects:

Circuit performance

Circuit power (especially leakage)

System reliability

IC and system packaging cost

“Environment”

4

Circuit Performance vs. Temperature

Temperature � => Performance?

Temperature => Performance?

Source: E Long, WR Daasch, R Madge, B Benware, “Detection of Temperature Sensitive Defects Using ZTC”

VLSI Test Symposium, 2004

Temperature � => Transistor threshold � and carrier mobility �

Temperature => Transistor threshold and carrier mobility

( )α−µ= ThGSoxDS VVCL

WI

2

3

5

Leakage vs. Temperature

���

���

−���

����

�µ=

−−

qkT

Vds

qkTmThVgV

ds eeq

kT

L

WI 1

2

[Taur, Ning] EECS241 Lecture 3

-9

-8

-7

-6

-5

-4

-3

0 0.2 0.4 0.6 0.8 1 1.2

VGS [V]

log

I DS

[log

A]

Subthreshold slope S>ln10 kT/q

k = 1.38x10^-23

q = 1.6x10^-19kT/q = 25.9mV at 27C

= 23.5mV at 0C (273K)

= 32mV at 100C (373K)

S = kT/q ln10 (1+Cd/Ci)

6

Leakage Power

Fraction of leakage power increasing:

exponentially with each generation

exponentially dependent on temperatureIncreasingratio for newtechnology nodes

Source: Sankaranarayanan et al, University of Virginia

Static power/ Dynamic Power

0

10

20

30

405060

70

298

303

308

313

318

323

328

333

338

343

348

353

358

363

368

373

Temperature(K)

Per

cent

age

180nm 130nm 100nm 90nm 80nm 70nm

4

7

Reliability

The Arrhenius Equation: MTF=A*exp(Ea/k*T)

MTF: mean time to failure at T

A: empirical constant

Ea: activation energy

k: Boltzmann’s constant

T: absolute temperature

Failure mechanisms:

Die metalization (Corrosion, Electromigration, Contact spiking)

Oxide (charge trapping, oxide breakdown, hot electrons)

Device (ionic contamination, second breakdown, surface-charge)

Die attach (fracture, thermal breakdown, adhesion fatigue)

Interconnect (wirebond failure, flip-chip joint failure)

Package (cracking, whisker and dendritic growth, lid seal failure)

8

System Packaging Cost

Today…

Grid computing: power plants co-located near compute farms

IBM S/390:

refrigeration

Source: R. R. Schmidt, B. D. Notohardjono “High-end server low temperature cooling”

IBM Journal of R&D

5

9

IC Packaging Cost

IBM S/390 processor subassembly: complex!

C4: Controlled Collapse Chip Connection (flip-chip)

Source: R. R. Schmidt, B. D. Notohardjono “High-end server low temperature cooling”

IBM Journal of R&D

10

Desktop processor, simpler, but still…

Pentium 4, Itanium

Source: Intel web site

6

11

“Environment”Environment Protection Agency (EPA): computers consume 10% of

commercial electricity consumption

This includes peripherals, possibly also manufacturing

A DOE report suggested this percentage is much lower

No consensus, but it’s still a lot

Equivalent power (with only 30% efficiency) for AC

CFCs used for refrigeration

Lap burn

Fan noise

12

Ultimate Effect: Thermal Runaway

Temperature � => Leakage power � => Temperature � …

“Loop gain” > 1 trouble!

Source: Tom’s Hardware Guidehttp://www6.tomshardware.com/cpu/01q3/010917/heatvideo-01.html

7

13

Thermal Design

Why should you care about thermals?

What do we mean by thermals?

How do you model thermals?

What can you do about thermals?

Temperature-aware circuit design

Thermal sensors

14

What do we mean by thermals?

Anything that has to do with heat/temperature

Heat is a form of energy transfer

Temperature is a measure of entropy and determines heat flow

Source: http://www.iun.edu/~cpanhd/C101webnotes/matter-and-energy/specificheat.html

8

15

Heat mechanisms

Heat Conduction: phonons, vibrations

Heat Convection: fluid molecules movement

Heat Radiation: photons, EM waves

Phase change: boiling, sublimation, condensation, etc.

Heat storage: specific heat

Refrigeration: move heat “backwards”

Other many mechanisms…

16

Conduction“Similar” to electrical conduction (e.g. metals are good conductors)

Heat flow from high temperature to low temperature

Microscopic (vibration, adjacent molecules, electron transport)

In a material: typically in solids (fluids: distance between mol)

Typical example: thermal “slug”, spreader, heatsink

Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001

A

9

17

ConvectionMacroscopic (bulk transport, mix of hot and cold, energy storage)

Need material (typically in fluids, liquid, gas)

Natural vs. forced (air or liquid)

Typical example: heatsink (fan), liquid cooling

Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001

18

Simplistic Thermal Model

Most thermal transfers: R = k/A

Power density matters!

Ohm’s law for thermals (steady-state)

∆∆∆∆V = I · R -> ∆∆∆∆T = P · R

T_hot = P · Rth + T_amb

Ways to reduce T_hot:

- reduce P (power-aware)

- reduce Rth (packaging)

- reduce T_amb (move to Alaska?)

- maybe also take advantage of transients (Cth)

�����

�����

10

19

Simplistic Dynamic Model

Electrical-thermal duality

V ≅≅≅≅ temp (T)

I ≅≅≅≅ power (P)

R ≅≅≅≅ thermal resistance (Rth)

C ≅≅≅≅ thermal capacitance (Cth)

RC ≅≅≅≅ time constant

KCL

differential eq. I = C · dV/dt + V/R

difference eq. ∆∆∆∆V = I/C · ∆∆∆∆t + V/RC · ∆∆∆∆t

thermal domain ∆∆∆∆T = P/C · ∆∆∆∆t + T/RC · ∆∆∆∆t

(T = T_hot – T_amb)

One can compute stepwise changes in temperature for any granularity at which one can get P, T, R, C

�����

�����

20

IC with die, package, heatsink

R = � T/Q

R = � V/I

Rja = Rjc + Rcs + Rsa = (Tj - Ta)/Q

Rsa = ((Ts - Ta)/Q) - Rjc - Rcs

11

21

Hot spots in Power4

Temperature “landscape”: space and time

How to estimate early in the design cycle?

22

Trends in Power Density

Wat

ts/c

m2

1

10

100

1000

1.5µ1.5µ1.5µ1.5µ 1µ1µ1µ1µ 0.7µ0.7µ0.7µ0.7µ 0.5µ0.5µ0.5µ0.5µ 0.35µ0.35µ0.35µ0.35µ 0.25µ0.25µ0.25µ0.25µ 0.18µ0.18µ0.18µ0.18µ 0.13µ0.13µ0.13µ0.13µ 0.1µ0.1µ0.1µ0.1µ 0.07µ0.07µ0.07µ0.07µ

i386i386i486i486

Pentium® Pentium® Pentium® ProPentium® Pro

Pentium® IIPentium® IIPentium® IIIPentium® IIIHot plateHot plate

Nuclear ReactorNuclear ReactorNuclear Reactor

RocketNozzleRocketRocketNozzleNozzle

Source: “New Microarchitecture Challenges in the Coming GeneratiSource: “New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies” ons of CMOS Process Technologies” –– Fred Pollack, Intel Corp. Fred Pollack, Intel Corp.

Pentium® 4Pentium® 4

12

23

Thermals for low-power ICs

Different: little self-generated heat

But…

Cheaper packaging (higher Rth): challenge

More extreme ambient (freezing to hot)

Temporal thermal effects more important than spatial

24

Thermal Design

Why should you care about thermals?

What do we mean by thermals?

How do you model thermals?

What can you do about thermals?

Temperature-aware circuit design

Thermal sensors

13

25

How do you model thermals?

Source: Electro-thermal circuit simulation using simulator couplingWunsche, S. Clauss, C. Schwarz, P. Winkler, F. IEEE Transactions on VLSI Systems, Sep 1997

26

Why need to model thermals?

Power metrics are not acceptable proxy

Chip-wide average will not capture hot spots

Localized average will not capture lateral coupling

Different units have different power densities

14

27

Power electronics: long time ago!Integrated-circuit thermal modeling Castello, R. Antognetti, P. , IEEE Journal of Solid-

State Circuits Jun 1978

28

Model (package)

“Vertical” heat flow

15

29

Model (die)

•Block granularity (architecture)

•Grid (circuits)•Also lateral flow

30

Spatial behavior - Hot Spots

Source: W. Huang, S. Ghosh, K. Sankaranarayanan, K. Skadron, and M. R. Stan. “Compact Thermal Modeling for Temperature-Aware Design.”

41st ACM/IEEE Design Automation Conference (DAC), June 2004

16

31

Time-Varying Behavior – Hot Spotsmesa

32

Tool validation: on-chip measurements

�� � � ���������� �������� ������������

M. R. Stan, K. Skadron, M. Barcella, W. Huang, K. Sankaranarayanan, and S. Velusamy. “HotSpot: A Dynamic Compact Thermal Model at the Processor-Architecture Level.”

Microelectronics Journal: Circuits and Systems, Dec. 2003

17

33

Dynamic validation: measurements

Micred test chip, transient vs. HotSpot

34

Thermal Design

Why should you care about thermals?

What do we mean by thermals?

How do you model thermals?

What can you do about thermals?

Temperature-aware circuit design

Thermal sensors

18

35

What can you do about thermals?

Better estimates of performance, power, reliability

Optimize at design time (e.g. package co-design)

Adapt at run-time

36

The Role of a Thermal Modelhelps close loop for accurate design estimations:

static or dynamic

Thermal Model

Power Model

Performance Model

Reliability Model

19

37

Self-consistent leakage

38

Design flow: still work in progress!

20

39

Package co-design

For 200 traces (TPC-C, SPEC, Microsoft)

Thermal design point can be reduced to 75% of true “max power”

with minimal performance loss

Aggressive clock gating

Application variations

Underutilized resources

Source: Intel

40

Thermal Performance Graph

How to select a heat sink Seri Lee, Aavid Thermal Technologieshttp://www.electronics-cooling.com/Resources/EC_Articles/JUN95/jun95_01.htm

21

41

Adapt at run-time

Time

Tem

pera

ture

DTM Disabled DTM/Response Engaged

Designed for Cooling Capacity w/out DTM

DTM TriggerLevel

Designed for CoolingCapacity w/ DTM

SystemCost Savings

Source: David Brooks 2002

42

Thermal Design

Why should you care about thermals?

What do we mean by thermals?

How do you model thermals?

What can you do about thermals?

Temperature-aware circuit design

Thermal sensors

22

43

Temperature-Aware circuit design

Power: first-order design constraint

� max power consumption: limits power delivery

� sustained power dissipation: limits thermal design/packaging

� average active power and idle power consumption: limit battery life, etc.

� fallacy: instantaneous power ≠≠≠≠ temperature

� Power-aware design:

� maximize performance for given power

� Low-power design:

� minimize power for required performance

� Temperature-aware design:

� performance, power, reliability: function of T

� T function of power density, ambient T

� maximize performance for given thermal envelope

� related to Power Density

44

Performance and Leakage

Temperature (Berkeley PTM 70nm CMOS):

Transistor threshold and mobility

Subthreshold leakage, gate leakage

Ion, Ioff, delay

���

���

����

23

45

Temperature-aware circuits

Robustness constraint: sets Ion/Ioff ratio

Robustness and reliability: Ion/Igate ratio

70nm CMOS, 1.2V, 110oC

Ion/Ioff ~ 1000

Ion/Igate ~ 10000

Idea: keep ratio

constant with T

Trade leakage for

performance

Ref: “Ghoshal et al. “Refrigeration Technologies…”, ISSCC 2000Garrett et al. “T3…”, ISCAS 2001

46

Adaptive Ion/Ix control

Ion/Ioff = B/A = ct. through ABB

Temperature-aware circuits (TAC) patent (2004)

24

47

Resulting voltages

���

Wide range: -.4V < Vbb < .4V; 1.2V < Vdd < 1.3V

Almost linear

Robust to inter-die parameter variations

Needs trimming for setpoint

Margin for intra-die parameter variations

Active cooling or natural thermal landscape

���

48

Resulting performance

25% extra performance (110oC to 0oC) – only NMOS

13% from low temperature alone

� � !���"#

25

49

Temperature-Aware SRAM

Pre-Charge

Dec

oder

s

Bitlines

(Data Width of Entries)

Wordlines

(Number of Entries)

Sense Amps

Cell

Bit

Cell Access Transistors (N1)

Number of Ports

Number of Ports

Bit

Worst-case bitline leakage limits performance

50

SRAM Read time

Same circuit, different application

6T SRAM memory: “reverse application” (heating)

70nm process (200mV threshold)

Zero biasing at low temperature

26

51

SRAM bit-line sensing

Differential sensing (100mV bitline difference)

128 cells per bit line

Faster read even if higher RBB, smaller Ion

52

Electro-thermal simulations

A rational formulation of thermal circuit models for electrothermalsimulation. I. Finite element method [power electronic systems]Jia Tzer Hsu Vu-Quoc, L.Circuits and Systems I: Fundamental Theory and Applications, IEEE Transactions

27

53

Also need electro-thermal models

Electro-thermal circuit simulation using simulator couplingWunsche, S. Clauss, C. Schwarz, P. Winkler, F.Very Large Scale Integration (VLSI) Systems, IEEE Transactions onSep 1997

54

SOI circuits

SOI thermal impedance extraction methodology and its significance for circuit simulationWei Jin Weidong Liu Fung, S.K.H. Chan, P.C.H. Chenming HuElectron Devices, IEEE Transactions on Apr 2001

28

55

Refrigeration

“conventional” vs. thermo-electric (TEC)

Can get T < T_amb (Rth < 0!)

TEC: Peltier effect (can use for local cooling)

56

TEC electro-thermal model

29

57

Thermal Design

Why should you care about thermals?

What do we mean by thermals?

How do you model thermals?

What can you do about thermals?

Temperature-aware design

Thermal sensors

58

Sensors needed for run-time

Thermocouples – voltage output

Junction between wires of different materials; voltage at terminals is � Tref – Tjunction

Often used for external measurements

Thermal diodes – voltage output

Biased p-n junction; voltage drop for a known current is temperature-dependent

Biased resistors (thermistors) – voltage output

Voltage drop for a known current is temperature dependent

You can also think of this as varying R

Example: 1 K�

metal “snake”

BiCMOS, CMOS – voltage or current output

Rely on reference voltage or current generated from a reference band-gap circuit; or simple ring oscillators with no reference

Relative (just need to adapt) vs. Absolute sensors (need actual T)

May need a Reference – typically a Bandgap circuit

30

59

Typical Sensor Configuration

PTAT – Proportional to Absolute Temperature

60

Absolute Sensor

Delta Vgs Current Reference

Syal, Lee, Ivanov, Altet, Online Testing Workshop, 2001

Generator and Delay Cell

31

61

Sensors: Problem Issues

Poor control of CMOS transistor parameters

Noisy environment

Cross talk

Ground noise

Power supply noise

These can be reduced by making the sensor larger

This increases power dissipation

But we may want many sensors

62

Calibration

Accuracy vs. Precision

Analogous to mean vs. stdev

Calibration deals with accuracy

The main issue is to reduce inter-die variations in offset

Typically requires per-part testing and configuration

Basic idea: measure offset, store it, then subtract this

from dynamic measurements

32

63

Recap: Thermal Design

Why should you care about thermals?

What do we mean by thermals?

How do you model thermals?

What can you do about thermals?

Temperature-aware circuit design

Thermal sensors

Questions?