Thermal Modeling and ManagementThermal Modeling and ... · Thermal-Reliability Issues in 3D Chips...
Transcript of Thermal Modeling and ManagementThermal Modeling and ... · Thermal-Reliability Issues in 3D Chips...
Thermal Modeling and ManagementThermal Modeling and Management for 3D MPSoCs with Active Cooling
P f D id Ati AlProf. David Atienza AlonsoEmbedded Systems Laboratory (ESL) y y ( )Institute of EE, Faculty of Engineering
© ESL/EPFL 2010
ARTIST Summer School 2010, Autrans (France)
Advantages of 3D vs. 2D Chips
Promises • Reduce average length of on-chip global wires• Reduce average length of on-chip global wires• Increase number of devices
reachable in given time budget
• Greatly facilitate heterogeneous integration (e.g. logic-DRAM stacks)(e.g. logic DRAM stacks)
© ESL/EPFL 20102
Samsung Wafer Stack Package (WSP) memory
[Figures: Ray Yarema, Fermilab]
Thermal-Reliability Issues in 3D Chips
Latest chips increase power densityp p y Non-uniform hot-spots in 2D chips [Sun,
1.8 GHz Sparc v9 In 3D chips heat affects several p
Microproc]In 3D chips, heat affects several
layers! (even more “cool” components)
[Sun, Courtesy: [ ,Niagara
BroadbandProcessor]
[IBM and
Irvine Sens.]
© ESL/EPFL 2010 3
Thermal-Reliability Issues in 3D Chips
Latest chips increase power densityp p y Non-uniform hot-spots in 2D chips [Sun,
1.8 GHz Sparc v9 In 3D chips heat affects several p
Microproc]In 3D chips, heat affects several
layers! (even more “cool” components)
[Sun, Courtesy:Higher chances
of thermal [ ,Niagara
BroadbandProcessor]
[IBM and
Irvine Sens.]
wear-outs and
hvery short lifetimes!
© ESL/EPFL 2010 4
Run-Time Heat Spreading in 3D Chips
8000
9000
10000 Layer 2
420 5-tier 3D stack: 10 heat sources and sensors
2000
3000
4000
5000
6000
7000
8000
wid
th (u
m)
400
405
410
415
Inject between 4W – 1.5W
0 2000 4000 6000 8000 100000
1000
length (um)
7000
8000
9000
10000 Layer 3
404
406 2nd Tier
1000
2000
3000
4000
5000
6000
7000
wid
th (u
m)
396
398
400
402
0 2000 4000 6000 8000 100000
length (um)
394
5000
6000
7000
8000
9000
10000 Layer 4
dth
(um
)
397
398
399
400
401
3rd Tier5th Tier
0 2000 4000 6000 8000 100000
1000
2000
3000
4000
length (um)
wid
393
394
395
396
5000
6000
7000
8000
9000
10000 Layer 5
h (u
m)
394
395
396
397
Large and non-uniform
© ESL/EPFL 20100 2000 4000 6000 8000 10000
0
1000
2000
3000
4000
length (um)
wid
th
390
391
392
393 4th Tier heat propagation! (up to 130º C on top tier) 5
NanoTera CMOSAIC Project: Design of 3D MPSoCs with Advanced Cooling3D MPSoCs with Advanced Cooling
3D systems require novel electro-thermal co-design• Academic partners: EPFL and ETHZ• Academic partners: EPFL and ETHZ• Industrial: IBM Zürich
© ESL/EPFL 20106
NanoTera CMOSAIC Project: Design of 3D MPSoCs with Advanced Cooling3D MPSoCs with Advanced Cooling
3D systems require novel electro-thermal co-design• Academic partners: EPFL and ETHZ
3D stacked MPSoC chips: microchannels etched on
• Academic partners: EPFL and ETHZ• Industrial: IBM Zürich
pback side to circulate liquid coolant
S tSystem Level Active
C liadjustment Cooling Manager
adjustmentof coolant flux
(3D heat flow
task scheduling andexecution control
© ESL/EPFL 2010
prediction)7
Outline
IntroductionIntroduction 3D chip thermal modeling framework Validation of 3D thermal model Validation of 3D thermal model Liquid cooling modeling
Li id li d l lid ti Liquid cooling model validation Close-loop 3D MPSoCs thermal management with
ti liactive cooling Experiments and conclusions
© ESL/EPFL 2010 8
Compact RC-Based Tier Thermal Model
Gate-level thermal model qbTbq 6
qb_top qb_backRC Network of
qbTbq jf
ffjbj
01
q qb_right
qb_left
Si/metal layer cells qb_bottomqb_front
Convective boundary conditions
cells
qb_top = htopA(Ta-Ttop)
2D tier modeled as heat flux moving between adjacent cells
Convective boundary conditions between layers in tier
I•
(qbi)I
I+1•
(qbi)I+1
f i
I-1•
© ESL/EPFL 20109
qb_bottom = hbottomA(Ta-Tbottom)face i
[Atienza et al., TODAES 2007]
Complete 3D Chip Thermal Modeling
Multi-level execution for thermal convergence in 3D • Local (2D-tier), liquid channels and global (3D) propagation
Evaluate local temperature for each cell
erge
nce
ns
vel c
onve
N it
erat
ion
Feedbacktemperature Update with neighbour
Tier
-levN temperature Upda e e g bou
temperature difusion
Go to next tier or microchannel
© ESL/EPFL 201010[Ayala et al., NanoNet 2009]
3D Chip Thermal Library Validation
Extensible set of layers in 3D stackExtensible set of layers in 3D stack• up to 9 tiers and heat spreader• Pre-defined layers:
Sili (10 l ) l Silicon, copper (10 layers), glue, overmold, interposer, bump
Configurable nr. of cells and iterations per tier• Initially 10ms thermal interval (1000 iterat./tier)
Multi-tier test chip manufactured at EPFL:
© ESL/EPFL 2010 11
3D Chip Thermal Library Validation
Extensible set of layers in 3D stackExtensible set of layers in 3D stack• up to 9 tiers and heat spreader• Pre-defined layers:
Sili (10 l ) l Silicon, copper (10 layers), glue, overmold, interposer, bump
Configurable nr. of cells and iterations per tier• Initially 10ms thermal interval (1000 iterat./tier)
Multi-tier test chip manufactured at EPFL:• Three types of tiers• Three types of tiers
© ESL/EPFL 2010 12
3D Thermal Library Validation: Creating Various 3D Thermal MapsCreating Various 3D Thermal Maps
Flexibility for thermal characterizationy
© ESL/EPFL 2010 13
3D Thermal Library Validation: Creating Various 3D Thermal Maps
Flexibility for thermal characterization
Creating Various 3D Thermal Maps
y
© ESL/EPFL 2010 14
3D Thermal Library Validation: Creating Various 3D Thermal MapsCreating Various 3D Thermal Maps
Flexibility for thermal characterizationy
10 heat sources and sensors per layer, ibl t b i lt l ti t d
© ESL/EPFL 2010
accesible to be simultaneously activated
15
3D Thermal Library Validation: Correlation with 5-Tier 3D StackCorrelation with 5 Tier 3D Stack
3D Chi EPFL L 3 h t i ti
29303132
e (m
V)
3D Chip, EPFL, Layer 3 characterizationBlue Curve: 3D current -heat model for D8Pink curve: Heater current measured in D8
Dev8D7HD8S
242526272829
0 200 400 600 800 1000 1200
Sens
or V
olta
g
0 200 400 600 800 1000 1200Heater Current (mA), applied to Dev 7
3D Chip EPFL multi tier characterization
343638
mV)
3D Chip, EPFL, multi-tier characterizationBue/Pink Curve: D7 (tier 1) and D8 (tier 4)Red Curve: 3D current-heat model for D8
Dev6D 7
2628303234
Sens
or V
olta
ge ( Dev7
Div6_Iheat7
© ESL/EPFL 2010
240 200 400 600 800 1000 1200 1400
Heater Current (mA), applied to Dev 7
16[Ayala et al., Nano-Nets ’09]
3D Thermal Library Validation: Correlation with 5-Tier 3D StackCorrelation with 5 Tier 3D Stack
3D Chi EPFL L 3 h t i ti
29303132
e (m
V)
3D Chip, EPFL, Layer 3 characterizationBlue Curve: 3D current -heat model for D8Pink curve: Heater current measured in D8
Dev8D7HD8S
242526272829
0 200 400 600 800 1000 1200
Sens
or V
olta
g
0 200 400 600 800 1000 1200Heater Current (mA), applied to Dev 7
3D Chip EPFL multi tier characterization
Variations of less than 1.5% between 3D stack measurements and new 3D thermal model
343638
mV)
3D Chip, EPFL, multi-tier characterizationBue/Pink Curve: D7 (tier 1) and D8 (tier 4)Red Curve: 3D current-heat model for D8
Dev6D 7
2628303234
Sens
or V
olta
ge ( Dev7
Div6_Iheat7
© ESL/EPFL 2010
240 200 400 600 800 1000 1200 1400
Heater Current (mA), applied to Dev 7
17[Ayala et al., Nano-Nets ’09]
Modeling Through Silicon Vias (TSVs) in 3D Stacksin 3D Stacks
TSVFigure: LSM-EPFL
TSVs:• Size: 5-10um x 10-100um• TSVs change resistivity of interlayer material (IM)
Modeling Granularities:1. Homogeneous
distribution, one R ,value for the IM
2. Different R value per unit (core cache etc )unit (core, cache, etc.)
3. Exact locations of TSVs
Hi h Source: IBM Zürich and
© ESL/EPFL 2010
• Higher accuracy• Higher complexity
18
Source: IBM Zürich and Y.Heights
TSV Modeling Accuracy in 3D Stacks
Chosen to model TSV groups in localized positions
© ESL/EPFL 2010 19
Chosen to model TSV groups in localized positions of 3D MPSoCs
Liquid Flux Model for Laminar Flowq
Local junction temperature modeled as RC t knetwork:
Rtot = Rcond + Rconv + Rheat Thermal resist.
Heat source
Rtot = 1/(Gsi/t + 1/Rb) + A/(bAt) + A/(VPcp) of Si
Chip back-side temperatureSi b Heating Fl t d
Thermal
temperatureSi base thickness
Heating area Total area
Flow rate and density
Dependence of thermal resistance in liquid Thermal resistance of
wiringq2
Dependence of thermal resistance in liquid flux modeled as a quadratic form• Variable value of coolant flux (Φ)
T1
P P
Variable value of coolant flux (Φ)∆Rheat ≈ aΦ + bΦ2 ; b << a
© ESL/EPFL 2010q1
P1
P2
[Atienza et al., THERMINIC ’09 and DATE ‘10] 20
3D Thermal Model with Liquid Cooling
New set of layers in 3D stack3D t k ( t 9 ti )• 3D stack (up to 9 tiers)
• 1 microchannel and coolant flow per tier
5 ti t k ith i h l d if ld 5-tier stack with microchannels and manifold cooling seal manufactured at IBM/EPFL• Enables different multi-tier liquid flux injection
Micro-HeaterLiquid
j
PCB Micro-Channels
© ESL/EPFL 2010 21
Source: IBM & ESL, EPFL
Manufacturing of 5-Tier 3D Test Chip with Liquid Channels in Multiple TiersLiquid Channels in Multiple Tiers
Front-sideBack-side
Figure: IBM & ESL, EPFL
Adding multi-tier liquid cooling in-/out-lets Multi-tier active cooling technology feasible for
© ESL/EPFL 2010 22
technology feasible for 3D-stacked chips
Correlation Results: Liquid Cooling and 3D Heat TransferLiquid Cooling and 3D Heat Transfer
Temperature evolution at the junction (Tj)
q 1q 2 P2
• Tested range: 0.015 to 0.15 L/min • Similar accuracy results at different channels
T 1 P1
Avg Max temp Error= 0.6%
© ESL/EPFL 2010 23[Atienza et al., THERMINIC ’09]
Correlation Results: Liquid Cooling and 3D Heat Transfer
q 1q 2 P2
Liquid Cooling and 3D Heat Transfer Temperature evolution at the junction (Tj)
qq• Tested range: 0.015 to 0.15 L/min • Similar accuracy results at different channels
T 1 P1Variations of less than 1% between measurements
and RC-based 3D thermal model with liquid cooling
Avg Max temp Error= 0.6%
© ESL/EPFL 2010 24[Atienza et al., THERMINIC ’09]
Complete 3D Chip Thermal Modeling Flow with Liquid Coolingwith Liquid Cooling
Inputs:Inputs:
• Workload information• Floorplan, TSV areas, package
( )
Inputs: • Workload information
• Activity of cores
Scheduler (Reactive Proactive)
• Temperature (for dynamic policies) Power Manager (DPM)
Inputs: • Power trace for each unitScheduler (Reactive, Proactive) Power trace for each unit
• Floorplan, package and die properties (Niagara-1), TSV area percentage/distribution
• Flow rate
3D Thermal Simulator w. Liquid Cooling based on EPFL-IBM 3D chips
(Integrated within internal HotSpot tool version)
Transient Temperature Response
© ESL/EPFL 2010
Transient Temperature Response for Each Unit
25
Run-Time HW/SW Thermal Modeling Framework for 3D Chips
Multi Proc OS + DVFS + Task Migration
Framework for 3D Chips Exploitation of both hardware and software benefits
I/O
SRAM SRAM
CPU
SRAM
CPUMulti-Proc. OS + DVFS + Task Migration
Sw app 1 ... Sw app NZeroZero--delaydelayMPS CMPS C
sniffer
sniffersniffer
iff
sniffer
SRAM SRAM
I/O
SRAM
CPUCPUMPSoCMPSoC
architecturearchitecturesimulationsimulation
Energy of 2D componentssniffer
sniffer
sniffersniffer
sniffersnifferMPSoC Behavior
Emulation on FPGA
simulationsimulation p
Temp. (T) of 2D components
standardstandard Ethernet Ethernet connectionconnection & & dedicateddedicated HW monitorHW monitorDetailedDetailed components
Software Thermal
si si si sicu cucucucu
thermalthermalanalysisanalysis of of 2D2D MPSoCMPSoC
© ESL/EPFL 2010
Model Host PC
si sisi
sisi
sisi
sisi2D 2D MPSoCMPSoC
layoutlayout26[D. Atienza et al., TODAES 2007]
Run-Time HW/SW Thermal Modeling Framework for 3D ChipsFramework for 3D Chips
Multi Proc OS + DVFS + Task Migration Exploitation of both hardware and software benefits
I/O
SRAM SRAM
CPU
SRAM
CPUMulti-Proc. OS + DVFS + Task Migration
Sw app 1 ... Sw app NZeroZero--delaydelayMPS CMPS C
sniffer
sniffersniffer
iff
sniffer
SRAM SRAM
I/O
SRAM
CPUCPUEnergy of 3D components
MPSoCMPSoCarchitecturearchitecturesimulationsimulation sniffer
sniffer
sniffersniffer
sniffersnifferMPSoC Behavior
Emulation on FPGA
componentssimulationsimulation
standardstandard Ethernet Ethernet connectionconnection & & dedicateddedicated HW monitorHW monitor
Temp. of3D components
3D StackThermal
Nth Tier
© ESL/EPFL 2010
Model1st Tier Host PC
[D. Atienza, THERMINIC 2009] 27
Thermal Management for 3D-MPSoCs with Liquid Coolingwith Liquid Cooling
Active-Adapt3D: Combined policy manager (B t P A d t IEEE/IFIP VLSI S C 2009)(Best-Paper Award at IEEE/IFIP VLSI-SoC 2009)• Predictive, floorplan-based task assignment and DVFS
Cl l i bl li id li t l• Close-loop variable liquid cooling control T ≥ 80°C Increment flow rate ; T < 80°C Decrement P li b li d ti l ti l Policy can be applied reactively or proactively
Thermal SensorsSystem Temperature
Flow Rate
Temperature Measurements
pDynamics
REACTIVE
ARMA‐Based
Flow Rate Tuner
Temperature
© ESL/EPFL 2010 28
Predictor Forecast
PROACTIVE
Adaptive Thermal-Aware Task Assignment Policy for 3D MPSoCsAssignment Policy for 3D MPSoCs
Cores on layers closer to the heat sink can be cooled faster in comparison to cores further away
Adapt-3D assigns a thermal index ( ) to each core in order to distinguish the location of the cores• Higher Core more prone to hot spotsi
Higher Core more prone to hot spotsi
For cores at locations 1, 2 and 3:
Chip 2 2
3
321
Chip‐1
Chip‐2 2
© ESL/EPFL 2010
Chip‐1
29[Coskun and Atienza, DATE ‘09]
Adaptive Thermal-Aware Task Assignment Policy for 3D MPSoCsAssignment Policy for 3D MPSoCs
WPP tt 1Probability of receiving workload at time t:
preferredavginitinc TTW if1
Weight:Cool core
preferredavgiinitdec
preferredavgi
initinc
TTWW
if
Hot coreFor each core
TTWEmpirical avgpreferredinit TTW
Measured by sensors
pconstants
© ESL/EPFL 2010
E.g., 80oCMeasured by sensors
30
Experiments 3D Thermal Management: 3D MPSoCs with Microchannels
Target 3D systems based on 3D version Sun UltraSPARC T1 P l d kl d f l t d i S
3D MPSoCs with Microchannels
• Power values and workloads from real traces measured in Sun platforms (multimedia players, web servers, databases, etc.)
Cores and caches in separate layersCores and caches in separate layers
Channels:Width 400umWidth 400um, Depth 250um. Four flow rate
© ESL/EPFL 2010
settings, default at 15ml/min.
31(EXP1-2) (EXP3) (EXP4)
Thermal Management for 3D Chips: Active-Adapt3D ComparisonsActive Adapt3D Comparisons
Predictive task scheduling, active cooling and floorplan-Predictive task scheduling, active cooling and floorplanaware DVFS achieves less than 5% hotspots
© ESL/EPFL 2010
Promising figures for thermal control in 3D-MPSoCs32[Coskun and Atienza, DATE ‘10]
Thermal Management in 3D Chips: Active-Adapt3D ComparisonsActive Adapt3D Comparisons
Variable multi-tier flow control useful for 3D systems with 3+ layersVariable multi tier flow control useful for 3D systems with 3+ layers. Proactive thermal management achieves:
• 75% reduction in spatial gradients on average -- for fixed flow ratep g g• 97% reduction in spatial gradients on average -- for variable flow rate
*LC: Multi-tier variable liquid
cooling
© ESL/EPFL 2010 33
Cooling power savings up to 67% to worst-case flux [Coskun and Atienza, DATE ‘10]
Conclusions
Complexity of coming 3D MPSoC chips requires novelthermal modeling approachesthermal modeling approaches• Application of simple RC-based methods demonstrated,
validated with 3D test chip• Initial model of liquid cooling channels in 3D chips
Simple RC laminar flow model, works well with variable liquidfluxes (errors of less than 2%)fluxes (errors of less than 2%)
Integrated the compact model into custom HotSpot tool
New thermal management: feedback controller adjusts flow New thermal management: feedback controller adjusts flow rate to allowed temperature with job assignment and DVFS• Proactive control improves the hot spot reduction to 95% forProactive control improves the hot spot reduction to 95% for
systems with variable flow rates, and reduces thermal variations• Dynamic flow rate adjustment is helpful in reducing the energy cost
f th d ll t (67% i )
© ESL/EPFL 2010
of the pump and overall system (67% power savings)
34
Key References and Bibliography
3D Thermal modeling and FPGA-based emulation• “3D-ICE: Compact transient thermal model for 3D ICs with liquid cooling via
enhanced heat transfer cavity geometries”, A. Sridhar, et al. Proc. of ICCAD 2010,USA, November 2010.
• “Transient Thermal Modeling of 2D/3D Systems-on-Chip with Active Cooling”,David Atienza, Proc. of THERMINIC 2009, Belgium, October, 2009.
Thermal management for 3D MPSoCs• “Fuzzy Control for Enforcing Energy Efficiency in High-Performance 3D
Systems”, M. Sabry, Ayse K. Coskun, David Atienza, Proc. of ICCAD 2010, USA,November 2010.
• “Energy-Efficient Variable-Flow Liquid Cooling in 3D Stacked Architectures”,Ayse K. Coskun, David Atienza, et al., Proc. of DATE 2010, Germany, March 2010.
• “Modeling and Dynamic Management of 3D Multicore Systems with LiquidC C f S S C OCooling”, Ayse K. Coskun, et al., Proc. of VLSI-SoC 2009, Brazil, October 2009.(Best Paper Award)
• “Dynamic Thermal Management in 3D Multicore Architectures”, Ayse K. Coskun,f
© ESL/EPFL 2010
et al., Proc. of DATE 2009, France, April 2009.
Nano-Tera.ch Swiss Engineering Programme
European
QUESTIONS ?
pCommission
QUESTIONS ? Swiss National Science Foundation
© ESL/EPFL 2010