EE241 - Spring 2005 · EE241 - Spring 2005 Advanced Digital Integrated Circuits ... (e.g. metals...
-
Upload
nguyenkhanh -
Category
Documents
-
view
217 -
download
2
Transcript of EE241 - Spring 2005 · EE241 - Spring 2005 Advanced Digital Integrated Circuits ... (e.g. metals...
1
EE241 - Spring 2005Advanced Digital Integrated
Circuits
Lecture 20:
Thermal design
Guest Lecturer: Prof. Mircea Stan
ECE Dept., University of Virginia
2
Thermal Design
Why should you care about thermals?
What do we mean by thermals?
How do you model thermals?
What can you do about thermals?
Temperature-aware circuit design
Thermal sensors
References:
- Intel® Technology Journal: http://developer.intel.com/technology/itj/
- IBM Journal of Research and Development: http://www.research.ibm.com/journal/rd/
- IEEE Transactions on Components and Packaging Technologies
- IEEE Transactions on VLSI Systems
- IEEE Journal on Solid-State Circuits
2
3
Why should you care about thermals?
Temperature affects:
Circuit performance
Circuit power (especially leakage)
System reliability
IC and system packaging cost
“Environment”
4
Circuit Performance vs. Temperature
Temperature � => Performance?
Temperature => Performance?
Source: E Long, WR Daasch, R Madge, B Benware, “Detection of Temperature Sensitive Defects Using ZTC”
VLSI Test Symposium, 2004
Temperature � => Transistor threshold � and carrier mobility �
Temperature => Transistor threshold and carrier mobility
( )α−µ= ThGSoxDS VVCL
WI
2
3
5
Leakage vs. Temperature
���
�
�
���
�
�
−���
����
�µ=
−−
qkT
Vds
qkTmThVgV
ds eeq
kT
L
WI 1
2
[Taur, Ning] EECS241 Lecture 3
-9
-8
-7
-6
-5
-4
-3
0 0.2 0.4 0.6 0.8 1 1.2
VGS [V]
log
I DS
[log
A]
Subthreshold slope S>ln10 kT/q
k = 1.38x10^-23
q = 1.6x10^-19kT/q = 25.9mV at 27C
= 23.5mV at 0C (273K)
= 32mV at 100C (373K)
S = kT/q ln10 (1+Cd/Ci)
6
Leakage Power
Fraction of leakage power increasing:
exponentially with each generation
exponentially dependent on temperatureIncreasingratio for newtechnology nodes
Source: Sankaranarayanan et al, University of Virginia
Static power/ Dynamic Power
0
10
20
30
405060
70
298
303
308
313
318
323
328
333
338
343
348
353
358
363
368
373
Temperature(K)
Per
cent
age
180nm 130nm 100nm 90nm 80nm 70nm
4
7
Reliability
The Arrhenius Equation: MTF=A*exp(Ea/k*T)
MTF: mean time to failure at T
A: empirical constant
Ea: activation energy
k: Boltzmann’s constant
T: absolute temperature
Failure mechanisms:
Die metalization (Corrosion, Electromigration, Contact spiking)
Oxide (charge trapping, oxide breakdown, hot electrons)
Device (ionic contamination, second breakdown, surface-charge)
Die attach (fracture, thermal breakdown, adhesion fatigue)
Interconnect (wirebond failure, flip-chip joint failure)
Package (cracking, whisker and dendritic growth, lid seal failure)
8
System Packaging Cost
Today…
Grid computing: power plants co-located near compute farms
IBM S/390:
refrigeration
Source: R. R. Schmidt, B. D. Notohardjono “High-end server low temperature cooling”
IBM Journal of R&D
5
9
IC Packaging Cost
IBM S/390 processor subassembly: complex!
C4: Controlled Collapse Chip Connection (flip-chip)
Source: R. R. Schmidt, B. D. Notohardjono “High-end server low temperature cooling”
IBM Journal of R&D
10
Desktop processor, simpler, but still…
Pentium 4, Itanium
Source: Intel web site
6
11
“Environment”Environment Protection Agency (EPA): computers consume 10% of
commercial electricity consumption
This includes peripherals, possibly also manufacturing
A DOE report suggested this percentage is much lower
No consensus, but it’s still a lot
Equivalent power (with only 30% efficiency) for AC
CFCs used for refrigeration
Lap burn
Fan noise
12
Ultimate Effect: Thermal Runaway
Temperature � => Leakage power � => Temperature � …
“Loop gain” > 1 trouble!
Source: Tom’s Hardware Guidehttp://www6.tomshardware.com/cpu/01q3/010917/heatvideo-01.html
7
13
Thermal Design
Why should you care about thermals?
What do we mean by thermals?
How do you model thermals?
What can you do about thermals?
Temperature-aware circuit design
Thermal sensors
14
What do we mean by thermals?
Anything that has to do with heat/temperature
Heat is a form of energy transfer
Temperature is a measure of entropy and determines heat flow
Source: http://www.iun.edu/~cpanhd/C101webnotes/matter-and-energy/specificheat.html
8
15
Heat mechanisms
Heat Conduction: phonons, vibrations
Heat Convection: fluid molecules movement
Heat Radiation: photons, EM waves
Phase change: boiling, sublimation, condensation, etc.
Heat storage: specific heat
Refrigeration: move heat “backwards”
Other many mechanisms…
16
Conduction“Similar” to electrical conduction (e.g. metals are good conductors)
Heat flow from high temperature to low temperature
Microscopic (vibration, adjacent molecules, electron transport)
In a material: typically in solids (fluids: distance between mol)
Typical example: thermal “slug”, spreader, heatsink
Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001
A
9
17
ConvectionMacroscopic (bulk transport, mix of hot and cold, energy storage)
Need material (typically in fluids, liquid, gas)
Natural vs. forced (air or liquid)
Typical example: heatsink (fan), liquid cooling
Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001
18
Simplistic Thermal Model
Most thermal transfers: R = k/A
Power density matters!
Ohm’s law for thermals (steady-state)
∆∆∆∆V = I · R -> ∆∆∆∆T = P · R
T_hot = P · Rth + T_amb
Ways to reduce T_hot:
- reduce P (power-aware)
- reduce Rth (packaging)
- reduce T_amb (move to Alaska?)
- maybe also take advantage of transients (Cth)
�����
�����
10
19
Simplistic Dynamic Model
Electrical-thermal duality
V ≅≅≅≅ temp (T)
I ≅≅≅≅ power (P)
R ≅≅≅≅ thermal resistance (Rth)
C ≅≅≅≅ thermal capacitance (Cth)
RC ≅≅≅≅ time constant
KCL
differential eq. I = C · dV/dt + V/R
difference eq. ∆∆∆∆V = I/C · ∆∆∆∆t + V/RC · ∆∆∆∆t
thermal domain ∆∆∆∆T = P/C · ∆∆∆∆t + T/RC · ∆∆∆∆t
(T = T_hot – T_amb)
One can compute stepwise changes in temperature for any granularity at which one can get P, T, R, C
�����
�����
20
IC with die, package, heatsink
R = � T/Q
R = � V/I
Rja = Rjc + Rcs + Rsa = (Tj - Ta)/Q
Rsa = ((Ts - Ta)/Q) - Rjc - Rcs
11
21
Hot spots in Power4
Temperature “landscape”: space and time
How to estimate early in the design cycle?
22
Trends in Power Density
Wat
ts/c
m2
1
10
100
1000
1.5µ1.5µ1.5µ1.5µ 1µ1µ1µ1µ 0.7µ0.7µ0.7µ0.7µ 0.5µ0.5µ0.5µ0.5µ 0.35µ0.35µ0.35µ0.35µ 0.25µ0.25µ0.25µ0.25µ 0.18µ0.18µ0.18µ0.18µ 0.13µ0.13µ0.13µ0.13µ 0.1µ0.1µ0.1µ0.1µ 0.07µ0.07µ0.07µ0.07µ
i386i386i486i486
Pentium® Pentium® Pentium® ProPentium® Pro
Pentium® IIPentium® IIPentium® IIIPentium® IIIHot plateHot plate
Nuclear ReactorNuclear ReactorNuclear Reactor
RocketNozzleRocketRocketNozzleNozzle
Source: “New Microarchitecture Challenges in the Coming GeneratiSource: “New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies” ons of CMOS Process Technologies” –– Fred Pollack, Intel Corp. Fred Pollack, Intel Corp.
Pentium® 4Pentium® 4
12
23
Thermals for low-power ICs
Different: little self-generated heat
But…
Cheaper packaging (higher Rth): challenge
More extreme ambient (freezing to hot)
Temporal thermal effects more important than spatial
24
Thermal Design
Why should you care about thermals?
What do we mean by thermals?
How do you model thermals?
What can you do about thermals?
Temperature-aware circuit design
Thermal sensors
13
25
How do you model thermals?
Source: Electro-thermal circuit simulation using simulator couplingWunsche, S. Clauss, C. Schwarz, P. Winkler, F. IEEE Transactions on VLSI Systems, Sep 1997
26
Why need to model thermals?
Power metrics are not acceptable proxy
Chip-wide average will not capture hot spots
Localized average will not capture lateral coupling
Different units have different power densities
14
27
Power electronics: long time ago!Integrated-circuit thermal modeling Castello, R. Antognetti, P. , IEEE Journal of Solid-
State Circuits Jun 1978
28
Model (package)
“Vertical” heat flow
15
29
Model (die)
•Block granularity (architecture)
•Grid (circuits)•Also lateral flow
30
Spatial behavior - Hot Spots
Source: W. Huang, S. Ghosh, K. Sankaranarayanan, K. Skadron, and M. R. Stan. “Compact Thermal Modeling for Temperature-Aware Design.”
41st ACM/IEEE Design Automation Conference (DAC), June 2004
16
31
Time-Varying Behavior – Hot Spotsmesa
32
Tool validation: on-chip measurements
�� � � ���������� �������� ������������
M. R. Stan, K. Skadron, M. Barcella, W. Huang, K. Sankaranarayanan, and S. Velusamy. “HotSpot: A Dynamic Compact Thermal Model at the Processor-Architecture Level.”
Microelectronics Journal: Circuits and Systems, Dec. 2003
17
33
Dynamic validation: measurements
Micred test chip, transient vs. HotSpot
34
Thermal Design
Why should you care about thermals?
What do we mean by thermals?
How do you model thermals?
What can you do about thermals?
Temperature-aware circuit design
Thermal sensors
18
35
What can you do about thermals?
Better estimates of performance, power, reliability
Optimize at design time (e.g. package co-design)
Adapt at run-time
36
The Role of a Thermal Modelhelps close loop for accurate design estimations:
static or dynamic
Thermal Model
Power Model
Performance Model
Reliability Model
20
39
Package co-design
For 200 traces (TPC-C, SPEC, Microsoft)
Thermal design point can be reduced to 75% of true “max power”
with minimal performance loss
Aggressive clock gating
Application variations
Underutilized resources
Source: Intel
40
Thermal Performance Graph
How to select a heat sink Seri Lee, Aavid Thermal Technologieshttp://www.electronics-cooling.com/Resources/EC_Articles/JUN95/jun95_01.htm
21
41
Adapt at run-time
Time
Tem
pera
ture
DTM Disabled DTM/Response Engaged
Designed for Cooling Capacity w/out DTM
DTM TriggerLevel
Designed for CoolingCapacity w/ DTM
SystemCost Savings
Source: David Brooks 2002
42
Thermal Design
Why should you care about thermals?
What do we mean by thermals?
How do you model thermals?
What can you do about thermals?
Temperature-aware circuit design
Thermal sensors
22
43
Temperature-Aware circuit design
Power: first-order design constraint
� max power consumption: limits power delivery
� sustained power dissipation: limits thermal design/packaging
� average active power and idle power consumption: limit battery life, etc.
� fallacy: instantaneous power ≠≠≠≠ temperature
� Power-aware design:
� maximize performance for given power
� Low-power design:
� minimize power for required performance
� Temperature-aware design:
� performance, power, reliability: function of T
� T function of power density, ambient T
� maximize performance for given thermal envelope
� related to Power Density
44
Performance and Leakage
Temperature (Berkeley PTM 70nm CMOS):
Transistor threshold and mobility
Subthreshold leakage, gate leakage
Ion, Ioff, delay
���
���
����
23
45
Temperature-aware circuits
Robustness constraint: sets Ion/Ioff ratio
Robustness and reliability: Ion/Igate ratio
70nm CMOS, 1.2V, 110oC
Ion/Ioff ~ 1000
Ion/Igate ~ 10000
Idea: keep ratio
constant with T
Trade leakage for
performance
Ref: “Ghoshal et al. “Refrigeration Technologies…”, ISSCC 2000Garrett et al. “T3…”, ISCAS 2001
46
Adaptive Ion/Ix control
Ion/Ioff = B/A = ct. through ABB
Temperature-aware circuits (TAC) patent (2004)
24
47
Resulting voltages
���
Wide range: -.4V < Vbb < .4V; 1.2V < Vdd < 1.3V
Almost linear
Robust to inter-die parameter variations
Needs trimming for setpoint
Margin for intra-die parameter variations
Active cooling or natural thermal landscape
���
48
Resulting performance
25% extra performance (110oC to 0oC) – only NMOS
13% from low temperature alone
� � !���"#
25
49
Temperature-Aware SRAM
Pre-Charge
Dec
oder
s
Bitlines
(Data Width of Entries)
Wordlines
(Number of Entries)
Sense Amps
Cell
Bit
Cell Access Transistors (N1)
Number of Ports
Number of Ports
Bit
Worst-case bitline leakage limits performance
50
SRAM Read time
Same circuit, different application
6T SRAM memory: “reverse application” (heating)
70nm process (200mV threshold)
Zero biasing at low temperature
26
51
SRAM bit-line sensing
Differential sensing (100mV bitline difference)
128 cells per bit line
Faster read even if higher RBB, smaller Ion
52
Electro-thermal simulations
A rational formulation of thermal circuit models for electrothermalsimulation. I. Finite element method [power electronic systems]Jia Tzer Hsu Vu-Quoc, L.Circuits and Systems I: Fundamental Theory and Applications, IEEE Transactions
27
53
Also need electro-thermal models
Electro-thermal circuit simulation using simulator couplingWunsche, S. Clauss, C. Schwarz, P. Winkler, F.Very Large Scale Integration (VLSI) Systems, IEEE Transactions onSep 1997
54
SOI circuits
SOI thermal impedance extraction methodology and its significance for circuit simulationWei Jin Weidong Liu Fung, S.K.H. Chan, P.C.H. Chenming HuElectron Devices, IEEE Transactions on Apr 2001
28
55
Refrigeration
“conventional” vs. thermo-electric (TEC)
Can get T < T_amb (Rth < 0!)
TEC: Peltier effect (can use for local cooling)
56
TEC electro-thermal model
29
57
Thermal Design
Why should you care about thermals?
What do we mean by thermals?
How do you model thermals?
What can you do about thermals?
Temperature-aware design
Thermal sensors
58
Sensors needed for run-time
Thermocouples – voltage output
Junction between wires of different materials; voltage at terminals is � Tref – Tjunction
Often used for external measurements
Thermal diodes – voltage output
Biased p-n junction; voltage drop for a known current is temperature-dependent
Biased resistors (thermistors) – voltage output
Voltage drop for a known current is temperature dependent
You can also think of this as varying R
Example: 1 K�
metal “snake”
BiCMOS, CMOS – voltage or current output
Rely on reference voltage or current generated from a reference band-gap circuit; or simple ring oscillators with no reference
Relative (just need to adapt) vs. Absolute sensors (need actual T)
May need a Reference – typically a Bandgap circuit
30
59
Typical Sensor Configuration
PTAT – Proportional to Absolute Temperature
60
Absolute Sensor
Delta Vgs Current Reference
Syal, Lee, Ivanov, Altet, Online Testing Workshop, 2001
Generator and Delay Cell
31
61
Sensors: Problem Issues
Poor control of CMOS transistor parameters
Noisy environment
Cross talk
Ground noise
Power supply noise
These can be reduced by making the sensor larger
This increases power dissipation
But we may want many sensors
62
Calibration
Accuracy vs. Precision
Analogous to mean vs. stdev
Calibration deals with accuracy
The main issue is to reduce inter-die variations in offset
Typically requires per-part testing and configuration
Basic idea: measure offset, store it, then subtract this
from dynamic measurements