On-chip power distribution in deep submicron technologies Aida Todri Electrical and Computer...
-
Upload
joshua-mills -
Category
Documents
-
view
215 -
download
0
Transcript of On-chip power distribution in deep submicron technologies Aida Todri Electrical and Computer...
On-chip power distribution in deep submicron technologiesOn-chip power distribution in deep submicron technologies
Aida Todri
Electrical and Computer Engineering Department
University of California Santa Barbara
2
OutlineOutline
Introduction
Problem Statement and Formulation
Electromigration (EM) Phenomena in Power Gated Networks EM Analysis and Grid Optimization
Decoupling Capacitor Efficiency in Power Networks Metrics and Placement
Power Supply Noise Reduction in Multi-core System Power vs Performance Trade-offs
Conclusions
3
Technology ScalingTechnology Scaling
Advantages: Increasing device count Higher transistor density Increasing logic switching
speed Increasing clock frequencies
Disadvantages: Increasing internal capacitance Increasing leakage current
higher standby power Increasing dynamic power
larger transient currents3
4
4
On-Chip Power Delivery NetworkOn-Chip Power Delivery Network
Hierarchical mesh structure on several metal layers Global grid occupies the top two
layers of the chip Local (block) grid occupies lower
metal layers
Must satisfy reliability constraints: In DC (steady state) conditions:
Voltage drop (IR) must be within margins
Current density in power tracks should not surpass allowed current density
In AC (transient) conditions: Power supply noise must be within
margins Decaps may be inserted to
suppress power supply noise and to lower impedance of power tracks
5
Low-Power Strategy Low-Power Strategy
Idle blocks can be disconnected from the grid Their static power can be eliminated
Sleep transistor controls the wake up or sleep mode of the gated block
Vdd
Logic Block
Gnd
HeaderSwitch
Sleep
Power gating technique
6
Power Gating TechniquePower Gating Technique
Vdd
Vdd
Vdd
VddSleep
Transistor
Top Layer
SleepTransistor
Block
v
GatedBlock
v
GatedBlock
B2
B3
B1
UngatedBlock
Top layer is global grid Designed to satisfy reliability constraints (EM and IR) when all circuits are
switching
Each block has its local power mesh
Many power gating configurations exist
7
Research Topics of Interest - 1Research Topics of Interest - 1
Designing Power Grid for Power-Gated Chips
Typically designed at the early stages of the design process
Mostly over-designed causing a large overhead in chip power consumption
Power gating is not considered during the design of power grids.
Ipeak
Ileakage
tpto tf Ttime
Current
8
On-chip Power Delivery for Power Gated ChipsOn-chip Power Delivery for Power Gated ChipsObjective: Deliver power to the circuit blocks while satisfying reliability constraints in the power grid when power gating is applied.
Global Power GridVdd Vdd
Vdd
S1
S2
S3
Local Power Grid
Ungated Block
Gated BlockSleep
Transistors
Gated Block
IntermediateVias
Global Vias
Local Vias
• Power tracks are not ideal and have finite resistance
• Many possible configurations of operating blocks
9
Electromigration MechanismsElectromigration Mechanisms
Transport of metal atoms under the force of an electron flux High current density stress
Depletion/ accumulation of metal material from atomic flow can lead to the formation of hillocks and voids in metal lines lead to shorts and open circuits faults
Voids
Grain Boundaries
Hillocks
Photo courtesy of University of Notre Dame
10
Electromigration on Power Gated GridsElectromigration on Power Gated Grids
IA
R1 R2
R3
I1 I2I3
IB
VDD
R1 R2
R3
VDD
I'1I'2
I'3IA1
IA2IB1 IB2
IA3 IB3IA IB
IB1 IB2
IB3
EM violations may occur only on those branches where base currents flow in opposite directions.
MacroI
MacroII
Vdd
Before power gating
AB
BA
BA
III
III
III
333
222
111
,
,
After power gating
B
B
B
II
II
II
3
~
2
~
1
~
3
2
1
,
,
11
IR Drop Analysis for Power GatingIR Drop Analysis for Power Gating
Theorem 1: The grid node voltages can only increase when a current source is turned off.
Corollary: When a source is turned off, IR drop may only decrease when power gating is applied.
Theorem 2: Uniform track resizing of a resistive grid does not change the current flow.
Corollary: Uniform upsizing does not change currents on a grid, so we can always upsize tracks to meet EM and IR constraints.
maxJ
JVB
Uniform upsizing by guarantees that all EM and IR constraints are satisfied for all power gating configurations.
12
Power-Gating Aware OptimizationPower-Gating Aware Optimization
We reduce the complexity of the optimization problem by reducing the grid granularity by applying the multi-grid technique.
Our optimization scheme has three main steps: Reduce grid size by folding tracks
Optimize the reduced grid
Unfold the grid to its original granularity
13
1. Grid Folding1. Grid Folding
Identify a few neighbor tracks around a violation that remain unfolded. VDD
VDDVDD
VDD
(a)
VDD
VDDVDD
VDD
(c)
VDD
VDDVDD
VDD
(d)
(b)
14
2. Reduced Grid Optimization2. Reduced Grid Optimization
A three-step iterative process, 3 Step LP : Derive current and voltage sensitivities to grid sizing Uniformly upsize the grid by fine scale upsizing steps {ψ1, ψ2,…, ψr} Shrink the selected tracks
The process is repeated until no violations exist.
Upsizing by ψi from {ψ1, ψ2,…, ψr}
Shrink selected tracksOriginal grid
15
LP ProblemLP Problem
Minimize the total resizing of the grid as
subject to the three constraints:
Current Density
Voltage Drop
Resizing Coefficients
)...21
max(qttt
')(
'
VBJ
wtwhbiI
o
i
o
DDiDDVVV
~
it
16
3-step Iterative LP Algorithm3-step Iterative LP AlgorithmInitial Optimized Grid for All
Sources On
Computations from Power Gating Configurations
EM violation JVB
IR violation Vnode Upsizing coefficient
Finer scale coefficients i
Upsize Grid by
Shrink Grid
Feasible GridN
Y
i
JVB>JmaxVnode <0.9V DD
17
3. Grid Unfolding3. Grid Unfolding
As we only considered only worst case violations on the grid, minor violations after optimization and unfolding are possible.
These violations are miniscule and can be fixed by applying greedy upsizing of the track with violation.
18
Experiments- FloorplansExperiments- Floorplans
H
H
HH
H
H
H
H
H
H H
L
M
LM
L
M
M
M
L
M
H M L H
M
H
H
L
L H
LM
H M
L H
L
L
M
LM
L
M
M
M
L
M
H M L H
M
H
H
L
L H
LM
H M
L H
L
L
M
LM
L
M
M
M
L
M
H M L H
M
H
H
L
L H
LM
H M
L H
L
High density blocks located in the center of the grid.
H
H
HH
H
H
H
H
H
H H
H
H
HH
H
H
H
H
H
H H
Power gating configurations.
Low/medium density blocks located in the center of the grid.
Power gating configurations
Low/medium current density blocks
High current density blocks
Gated blocks
19
ResultsResults
Experiments to observe: Various current density blocks (high, med, low) Various power grid granularities
20x20, 30x30, 50x50, 100x100 All vs. some power gating configurations
Percentages in area savings compared to uniform upsizing up to 48% of area savings
100x100 granularity grid with high density blocks placed on the center of the grid
20
Decoupling Capacitor vs. PSNDecoupling Capacitor vs. PSN
• Inserted decoupling capacitor (decaps) can provide charge to switching circuit to reduce power supply noise (PSN).
• Decaps consume power due to switching
• PSN suppression depends on decap efficiency
Vdd
Vdd
Vdd
Vdd
Global Grid
21
Research Topics of Interest - 2Research Topics of Interest - 2
How to Use Decoupling Capacitors Most Efficiently ?
Decoupling capacitor is a reservoir of charge
Used to reduce voltage drop at the switching current load
Amount of charge supplied depends on
Parasitic conductance between decap and current load
Parasitic conductance between decap and power supply
Switching frequency of the current load
Capacitor
To current load
Charge
Interconnect
22
Decoupling Capacitance EffectivenessDecoupling Capacitance Effectiveness
Decoupling capacitors suppress power supply noise
Decaps reduce the impedance of the power delivery system operating at high frequencies.
Efficacy of decoupling capacitors depends upon Impedance of conductors
connecting the capacitor to current loads and power sources
Charge-back ability after a transitions is completed.
+-
Vdd
Iswitching_circuit Ccircuit
Cdecap
RpkgLpkg
Cpkg
Rgrid2Lgrid2Rgrid1Lgrid1
1 2
+-
Vdd
Iswitching_circuitCcircuit
Rgrid2Lgrid2
Cdecap
RpkgLpkg
Cpkg
RgridLgrid
23
Decap Effectiveness in Mesh GridsDecap Effectiveness in Mesh Grids
Original mesh
Mesh A circuit
Mesh B circuit
Mesh C circuit
1
8765
432
1211109
16151413
(a)
1
8765
432
1211109
16151413
(b)
A
1
8765
432
1211109
16151413
(c)
B
1
8765
432
1211109
16151413
(d)
C
24
Decap Effectiveness on Mesh GridsDecap Effectiveness on Mesh Grids
Detrimental decoupling capacitance.
25
Decap Effectiveness in Mesh GridsDecap Effectiveness in Mesh Grids
1
8765
432
1211109
16151413
(c)
B
Ineffective decoupling capacitance.
26
Decap Effectiveness in Mesh GridsDecap Effectiveness in Mesh Grids
1
8765
432
1211109
16151413
(d)
C
Effective decoupling capacitance
27
Mesh AnalysisMesh Analysis
Decap effectiveness depends upon Zd impedance has an impact on how fast Cdecap will be recharged Zs,impedance has an impact on how much voltage drop will be at the
switching circuit Zsd,impedance has an impact on how much current (charge) Cdecap can
provide to the switching circuit. tr, tf, Ipeak, switching frequency and current magnitude Cdecap, decap size
Vdd
Zsd
Zs Zd
CdecapIswitching_circuit
28
Decap’s effectiveness metricsDecap’s effectiveness metricsRegion of Effectivennes for
Decap Insertion
a
b
u a: effective distance between decap and Vdd pin
b: effective distance between current source and decap
u: minimum distance between decap and Vdd pin to avoid spurious switching.
29
Decap Effectiveness ModelDecap Effectiveness ModelVdd
Zsd
Zs Zd
CdecapIswitching_circuit
Amount of charge providedfrom Vdd supply and non-switching circuit decap.
Amount of charge thatshould be provided frominserted decaps.
Ipeak
t
Circuit Current Profile
nrofdecaps
iidecap
pliesnrofVdd
iplynswcircuitswitching
QQQQi 1
sup
1sup_
Region of Effectivennes forDecap Insertion
a
b
u
30
Decap Budget : Optimization FunctionDecap Budget : Optimization Function
LP optimization problem
Subject to :
1) Voltage drop margin
2) Charge transfer balance
3) Allowed cap constraint
4) Efficiency metrics constraints
31
Sequence of Linear ProgramsSequence of Linear Programs
Cdecapi is dependent on the node voltage Vi ; Cdecapi and Vi are variables.
Sequence of linear programs:
1. Initial transient analysis performed with existing decaps, solved for Vi’s
2. Determine decap budgets Cdecapi based on LP formulation where node voltages are determined in step 1.
3. Re-perform transient analysis with Cdecapi to check the node voltages. Update node voltages Vi.
4. Check if Vi >Vthresh.
1. If Vi >Vthresh+σ, run decap budget to reduce decaps, step 2
2. If Vi <Vthresh-σ, run decap budget to allocate more decaps, step 2
32
Case StudyCase Study
Courtesy of STMicroelectronics
33
ExperimentsExperiments
34
ExperimentsExperiments
35
ExperimentsExperiments
Total Decap Reduction Total amount of decap reduced on chip 297pF
Percentage 5.56% Number of Filler Cells Reduction (placed decaps)
297pF out of 623pF = > 52%
Correlations
Case Study Max IR Drop (mV) Power (W)
Apache’s Redhawk
51.8 0.645
Our method (before)
43.1 0.660
(after) 43.7 0.660
36
Multi-Core SystemMulti-Core System
Several cores integrated on a chip
Chips with Several cores have been produced Tens to hundreds of cores per chip are envisioned
Physical design problems Thermal management Power management Power delivery Noise control …
37
Research Topics of Interest - 3Research Topics of Interest - 3
How to Suppress Power Supply Noise?
Sources Fast transient currents of
switching blocks Turn on/off of power gated blocks Parasitic impedance of power
tracks (package)
Detrimental Effects Circuit delay increase Logical faults due to increased
delay
In Out
Cload
Vdd
Vdd
90%Vdd
Voltage
time
DrainCurrent
Id
Vds
Drain -Source Voltage
38
Multi-Core SystemsMulti-Core Systems
Shared global grid
Uniform controlled collapse chip connection (C4s) distribution
Vdd
Vdd Vdd
Vdd
Macros
Core
C4 Bumps
Vdd
Objective: Assign task to cores such that minimum power supply noise is generated.
39
CoresAssigned
Assignment Workloads Power SupplyNoise (V*ps)
1
2
1-2-4
1-2-4
1-2-4
1-5-9
1-5-9
1-5-9
W3-W3-W3
W2-W2-W2
W1-W2-W3
W1-W2-W3
2.56
0.06
1.98
0.06
1.82
1.83
W3-W3-W3
W2-W2-W2
9
321
8
654
7
PSN vs. Workload AssignmentsPSN vs. Workload Assignments
9
321
8
654
7
• PSN vs. proximity between working cores
• PSN vs. available decap
• PSN vs. operating frequencies
40
Grid ModelsGrid Models
Vdd
C4 Bump
Vdd
Vdd
Vdd
Vdd
Vdd
Vdd
Vdd
VddVdd
Core1
Core2 Core3
Core4 Core6Core5
Core7 Core8 Core9
Core grid
Global grid
Vdd Vdd
Vdd Vdd
Base grid
41
Circuit Reduction Circuit Reduction
(b)
Vdd
r
C
R
loadI
L
+- Ceff
Vdd Vdd
Vdd Vdd
2 3
4 5
(a)
6
7 8 9
1
Reducing base grid (a) to a simplified model (b) Circuit voltage response maintained for the worst case voltage drop Assumption: the worst case voltage drop is on node 5
42
Power Supply Noise Aware AssignmentPower Supply Noise Aware Assignment
We apply simulated annealing (SA) based algorithm to minimize PSN.
A workload can be assigned to any core
Task assignments on cores will vary due to: Location
same task at different location Frequency
Same location but varying workloads Location and Frequency
whHwmHwlH
wh
wm
wl
43
Assignment HeuristicsAssignment Heuristics
Current Demand-Based Assignment (CDA) Workloads assigned to cores which are farther away from large
current workloads to minimize noise propagation.
W1
W2
Large CurrentWorkload
44
ExperimentsExperiments
Experiments to observe Various core granularities
3x3,5x5,7x7, 10x10 Various operating frequencies Various core sizes Impact of initial task assignment on the multicore system
Results No initial assignment
Up to 30% less in PSN compared to CDA method With initial assignment
Up to 37% less in PSN compared to CDA method.
45
ConclusionsConclusions
On-chip power distribution for low-power applications
Power gating induced electromigration issues in the power networks Analysis and optimization of power network
Analysis of decoupling capacitance efficiency in power grids Decoupling capacitance placement in power networks
Low power supply noise task assignment for multicore systems Analysis of multicore systems power network Task assignment optimization for low power noise