Optimizing Power @ Standby – Circuits and Systems ... · PDF fileFloating-point unit and...
Transcript of Optimizing Power @ Standby – Circuits and Systems ... · PDF fileFloating-point unit and...
Chapter 8
Optimizing Power @ Standby – Circuits and Systems
Jan M. Rabaey
Optimizing Power @ Standby
Circuits and Systems
Slide 8.1
Chapter Outline
Why Sleep Mode Management?Dynamic power in standby– Clock gating
Static power in standby– Transistor sizing– Power gating– Body biasing– Supply voltage ramping
Slide 8.2
Arguments for Sleep Mode Management
Many computational applications operate in burst modes, interchanging active and non-active modes– General-purpose computers, cell phones, interfaces, embedded
processors, consumer applications, etc.
Prime concept: Power dissipation in standby should beabsolutely minimum, if not zeroSleep mode management has gained importance with increasing leakage
Clock gating
Leakageelimination
Slide 8.3
Standby Power Was Not a Concern in Earlier Days
Pentium-1: 15 W (5 V - 66 MHz)Pentium-2: 8 W (3.3 V - 133 MHz)
Floating-point unit and cache powered down when not in use
Processor in idle mode!
[Source: Intel]
Slide 8.4
Dynamic Power – Clock Gating
Turn off clocks to idle modules– Ensure that spurious activity is set to zero
Must ensure that data inputs to the module are in stable mode – Primary inputs are from gated latches or
registers– Or, disconnected from interconnect network
Can be done at different levels of system hierarchy
Slide 8.5
Clock Gating
Turning off the clock to non-active components
Register File
Logic Module
Clk
Enable
Logic Module
Enable
Bus
Disconnecting the inputs
Slide 8.6
DSP/HIF
DEU
MIF
VDE
896Kb SRAM
10
8.5 mW
0 155
30.6 mW
20 25
Without clock gating
With clock gating
Power [mW]
Clock gating Efficiently Reduces Power
90% of FFs clock-gated.
70% power reduction by clockgating alone.
MPEG-4 decoder
© IEEE 2002[Ref: M. Ohashi, ISSCC’02]
Slide 8.7
Clock Gating
Challenges to skew management and clock distribution (load on clock network varies dynamically)Fortunately state-of-the-art design tools are starting to do a better job– For example, physically aware clock gating inserts gaters in clock tree
based on timing constraints and physical layout
CG
CG
CG
CG
CG
Simpler skew management, less areaPower savings
Slide 8.8
Clock Hierarchy and Clock Gating
Example: Clock distribution of dual-core Intel Montecito processor
“Gaters” provided at lower clock-tree levelsAutomatic skew compensation
[Ref: T. Fischer, ISSCC’05]
© IEEE 2005
Slide 8.9
Trade-Off Between Sleep Modes and Sleep Time
Active modenormal processing
Standby modefast resume
high passive power
Typical operation modes
Sleep modeslower resume
low passive power
Resume-time from clock gating determined by the time it takes to turn on the clock distribution network Standby Options:
Just gate the clock to the module in questionTurn off phased-locked loop(s)Turn off clock completely
Slide 8.10
Sleep Modes in μProcessors and μControllers
[Ref: S. Gary, Springer’95]
[Ref: TI’06]
• 0.1-μA power down• 0.8-μA standby• 250-μA/MIPS @ 3 V
TI MSP430™From standby to active in 1 μs using dual clock system
Slide 8.11
The Standby Design Exploration Space
Standby Power
Wak
e-up
Del
ay
Standby
Sleep
Nap
Doze
Trade-off between different operational modesShould blend smoothly with runtime optimizations
Slide 8.12
[Ref: T. Simunic, Kluwer’02]
Also the Case for Peripheral Devices
Wireless LAN Card
Hard diskPsleepW sec sec
IBM 0.75 3.48 0.51 6.97
Fujitsu 0.13 0.95 0.67 1.61
PactiveW
Tsleep Tactive
Slide 8.13
The Leakage Challenge – Power in Standby
With clock gating employed in most designs, leakage power has become the dominant standby power sourceWith no activity in module, leakage power should be minimized as well– Remember constant ratio between dynamic
and static power …
Challenge – how to disable unit most effectively given that no ideal switches are available
Slide 8.14
Standby Static Power Reduction Approaches
Transistor stackingPower gatingBody biasingSupply voltage ramping
Slide 8.15
Transistor Stacking
Off-current reduced in complex gates (see leakage power reduction @ design time)Some input patterns more effective than others in reducing leakageEffective standby power reduction strategy:– Select input pattern that minimizes leakage current of
combinational logic module– Force inputs of module to correspond to that pattern
during standby
Pros: Little overhead, fast transitionCon: Limited effectiveness
Slide 8.16
Transistor Stacking
CombinationalModule
Lat
ches
Lat
ches … …
Clk Standby
[Ref: S. Narendra, ISLPED’01]
Slide 8.17
Forced Transistor Stacking
Useful for reducing leakage in non-critical shallow gates(in addition to high VTH)
[Ref: S. Narendra, ISLPED’01]
Slide 8.18
Power Gating
Disconnect module from supply rail(s) during standby
Footer or header transistor, or bothMost effective when high-VTH transistors are availableEasily introduced in standard design flowsBut … Impact on performance
Very often called “MTCMOS” (when using high- and low-threshold devices)
Logic
sleep
sleep
[Ref: T. Sakata, VLSI’93; S. Mutoh, ASIC’93]
Slide 8.19
Power Gating – Concept
Leakage current reduces becauseIncreased resistance in leakage pathStacking effect introduces source biasing
(similar effect at PMOS side)
VDD
OUT
VS = IleakRS
RSSleep
IN = 0
M1
VS
Ileak
RS
M1
VTH shift
Extra resistance
Slide 8.20
Power Gating Options
Low VTH
sleep
sleep
Low VTH
sleep
Low VTH
sleep
footer + header footer only header only
NMOS sleeper transistor more area-efficient than PMOSLeakage reduction more effective (under all input patterns) when both footer and header transistors are present
Slide 8.21
Other option: Boosted-Gate MOS (BGMOS)
Leak cut-off Switch (LS)- high VTH- thick TOX
(eliminates tunneling)
VDD
Virtual GND
CMOS logic- low VTH- thin TOX
0 VVDD
VBOOST
<Standby><Active>
[T. Inukai, CICC’00]
Slide 8.22
Other Option: Boosted-Sleep MOS
Leak cut-off Switch (LS)- normal (or high) VTH- normal TOX
Area-efficient
VDD
Virtual GND
CMOS logic- low VTH- thin TOX
-Vboost
0 VVDD
<Standby><Active>
(also called Super-Cutoff CMOS or SCCMOS)
[Ref: T. Inukai, CICC’00]
Slide 8.23
Virtual Supplies
ON
...
VDD
Virtual VDD
GND
©IEEE 2003
Virtual GND
ON
...
VDD
Virtual VDD
Virtual GND
OFF
OFFGND
Virtual supply collapse
Active Mode Standby Mode
Noise on virtual supplies
[Ref: J. Tschanz, JSSC’03]
Slide 8.24
Decoupling Capacitor Placement
PerformanceConvergence time
Oxide leakage savings
Decap on supply rails Decap on virtual rails
[Ref: J. Tschanz, JSSC’03]
© IEEE 2003
Slide 8.25
Leakage Power Savings versus Decap
Idle time10 ns 1 µs 100 µs 10 ms10 µs
No
rmal
ized
leak
age
po
wer
in id
lem
od
e
90%
40%
Low-leakage 133 nF decap on
virtual VCC
No decap on virtual VCC
[Ref: J. Tschanz, JSSC’03]
0
0.2
0.4
0.6
0.8
1
1.32 V75°C
© IEEE 2003
Slide 8.26
How to Size the Sleep Transistor?
Sleep transistor is not free – it will degrade the performance in active modeCircuits in active mode see the sleep transistor as extra power-line resistance– The wider the sleep transistor, the better
Wide sleep transistors cost area– Minimize the size of the sleep transistor for given
ripple (e.g., 5%)– Need to find the worst-case vector
Slide 8.27
Sleep Transistor Sizing
High-VTH transistor must be very large for low resistancein linear region Low-VTH transistor needs less areafor same resistance.
[Ref: R. Krishnamurthy, ESSCIRC’02]
Slide 8.28
Preserving State
Virtual supply collapse in sleep mode causes the loss of state in registersKeeping the registers at nominal VDD preserves the state– These registers leak …
Can lower the VDD in sleep– Some impact on robustness, noise, and soft-
error immunity
Slide 8.29
Latch-Retaining State During Sleep
Clk
sleep sleep
sleep sleep
QD
Black-shaded devices use low-VTH tranistorsAll others are high- VTH.
Transmission gate
[Ref: S. Mutoh, JSSC’95]
Slide 8.30
MTCMOS Derivatives Preventing State Loss
low-VTHlogic
sleep
VDD
virtual-VDD
High-VTH
(small W )
HVT
Vretain
RetentionClamping
low-VTHlogic
sleep
virtual GND
High-VTH
VDD
Reduce voltage and retain state
Slide 8.31
Sleep Transistor Placement
No sleep transistors
Standard cell row“strapper”
cells
VDD
GND
GNDVDD
VDD
GND ′ GND ′
GNDVDD
With headers and footers
M4
M3
M3
M4
′ VDD ′
Slide 8.32
Dynamic Body Biasing
Increase thresholds of transistors during sleep using reverse body biasing – Can be combined with forward body biasing in active mode
No delay penaltyBut
Requires triple-well technologyLimited range of threshold adjustments (<100 mV)– Not improving with technology scaling
Limited leakage reduction (<10x)Energy cost of charging/discharging the substrate capacitance
Slide 8.34
Dynamic Body Biasing
... ...
FBB
FBB
VDD
GND
PMOS body
NMOS body
PMOS bias
NMOS bias
PMOS bias
... ...NMOSbias
RBB
RBB
VDD
GND
PMOS body
NMOS body
VHIGH
VLOW
Active mode: Forward Body Bias Standby mode: Reverse Body Bias
Low threshold, high performance High threshold, low leakage
Can also be used to compensate for threshold variations
© IEEE 2003
[Ref’s: T Kuroda ISSCC’96; J. Tschanz, JSSC’03]
Slide 8.35
-Needs level-shifting and voltage switch circuitry
[Ref: K. Seta, ISSCC’95]
The Dynamics of Dynamic Body Bias
VNBB (4 V)
V1
V2V3
V4
CEM2
M3M4
M5
M1
CE
CW
CWVDD (2 V)
VSS (0 V)
VPwell (0 or –2 V)
VPBB (–2 V)Voltage switch CE onLevel shifter
VNwell (2 or 4 V)
© IEEE 1995
V1
V3
V2
V4
VNBB
VDD
VSS
VPBB0–2
–1
0
1
Vol
tage
(V
) 2
3
4
100Time (ns)
200
VNwell
VPwell
CE offStand by –> Active mode Active –> Standby mode
Slide 8.36
Body Bias LayoutSleep transistor LBGs
ALU core LBGs
Sleep transistor LBGsALU core LBGs
ALU
LBG: Local bias generator[Ref: J. Tschanz, JSSC’03]
Slide 8.37
DBB for Standby Leakage Reduction - Example
Application-specific processor(SH-mobile)
250 nm technologycore at 1.8 VI/O at 3.3 V3.3M transistors
[Ref: M. Miyazaki, Springer’06]
© Springer 2006
VBC (0.13 mm2)
Slide 8.38
Effectiveness of Dynamic Body Biasing
0
0.1
0.2
0.3
0.4
0.5
0.6
-2 -1 0 1 2
VBS(V )
VT
H(V
)
Reverse VBS
Forward VBS
Practical VTH tuning range less than 150 mV in 90 nm technology
Slide 8.39
Supply Voltage Ramping (SVR)
Reduce supply voltage of modules in sleep mode – Can go to 0 V if no state-retention is necessary– Down to state retention voltage otherwise,
(see Memory in next chapter), or move state to persistent memory before power-down
Most effective leakage reduction technique– Reduces current and voltage
ButNeeds controllable voltage regulator– Becoming present more often in modern integrated system designs
Longer reactivation time
Simplified version switches between VDD and GND (or VDDL)
[Ref: M. Sheets, VLSI’06]
Slide 8.40
Supply Ramping
Standby power = VDD(standby) × I leak(standby)Modules must be isolated from neighborsCreating “voltage islands”
Module
0
VDD VDD
Module
DRV
Full power-down Power-down with data retention
Slide 8.41
Supply Ramping – Impact
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
2
2.5
3
3.5
4× 10
–9
Leakage power as a function of the supply voltage (90 nm)
Factor 8.5
Inverter
NAND 4
Because of DIBL, dropping supply voltage causes dramatic reduction in leakage – Can go as low as 300 mV before data retention is lost
Slide 8.42
Integration in Standard-Cell Layout Methodology
Power switch cell easily incorporated into standard design flow
– Cell has same pitch as existing components– No changes required to cell library from foundry
Switch design can be independent of block size
GND
VDDL (RV)
Awake
Awake_buf
VvDD
VvDD
VvDD
VvDD GND
GND
GND
VD
DH
VD
DL
VD
DH
VD
DL
GN
D
Power switch cell Integration into power grid
GN
D
VD
DH
VD
DL
GN
DVDDH
Slide 8.43
Some Long-Term Musings
Ideal power-off switch should have zero leakage current (S = 0 mV/decade)Hard to accomplish with traditional electronic devicesMaybe possible using MEMS – mechanical switches have a long standing reputation for good isolation
[Ref: N. Abele, IEDM’05]
Slide 8.45
Summary and Perspectives
Today’s designs are not leaky enough to be truly power–performance optimal! Yet, when not switching, circuits should not leak!Clock gating effectively eliminates dynamic power in standbyEffective standby power management techniques are essential in sub-100 nm design– Power gating the most popular and effective technique– Can be supplemented with body biasing and transistor stacking– Voltage ramping probably the most effective technique in the
long range (if gate leakage becomes a bigger factor)
Emergence of “voltage or power” domains
Slide 8.46
References
Books and Book ChaptersV. De et al., “ Techniques for Leakage Power Reduction,” in A. Chandrakasan et al., Design of High-Performance Microprocessor Circuits, Ch. 3, IEEE Press, 2001.
K. Roy et al., “Circuit Techniques for Leakage Reduction,” in C. Piguet, Low-Power Electronics Design, Ch. 13, CRC Press, 2005.
S. Narendra and A. Chandrakasan, Leakage in Nanometer CMOS Technologies, Springer, 2006.
Articles
N. Abele, R. Fritschi, K. Boucart, F. Casset, P. Ancey, and A.M. Ionescu, “Suspended-gateMOSFET: bringing new MEMS functionality into solid-state MOS transistor,” Proc. Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE International, pp. 479–481, Dec. 2005
T. Fischer, et al., “A 90-nm variable frequency clock system for a power-managed Itanium®architecture processor,” IEEE J. Solid-State Circuits, pp. 217–227, Feb. 2006.
,S. Gary, “Low-Power Microprocessor Design,” in Low Power Design Methodologies Ed. J. Rabaey and M. Pedram, Chapter 9, pp. 255–288, Kluwer Academic, 1995.
T. Inukai et al., “Boosted Gate MOS (BGMOS): Device/Circuit Cooperation Scheme to Achieve Leakage-Free Giga-Scale Integration,” CICC, pp. 409–412, May 2000.H. Kam et al., “A new nano-electro-mechanical field effect transistor (NEMFET) design for low-
-
power electronics, IEDM Tech. Digest, pp. 463–466, Dec. 2005.
R. Krishnamurthy et al., “High-performance and low-power challenges for sub-70 nm microprocessor circuits,” 2002 IEEE ESSCIRC Conf., pp. 315–321, Sep. 2002.
T. Kuroda et al., “A 0.9 V 150 MHz 10 mW 4 mm2 2-D discrete cosine transform core processor with variable-threshold-voltage scheme,” JSSC, 31(11), pp. 1770–1779, Nov. 1996.
”
M. Miyazaki et al., “Case study: Leakage reduction in hitachi/renesas microprocessors”, in A. Narendra, Leakage in Nanometer CMOS Technologies, Ch 10., Springer, 2006.
T. Simunic, ‘‘Dynamic Management of Power Consumption’’, in Power Aware Computing, edited by R. Graybill, R. Melhem, Kluwer Academic Publishers, 2002.
Slide 8.47
References (cont.)
S. Mutoh et al., 1V high-speed digital circuit technology with 0.5 mm multi-threshold CMOS, “Proc. Sixth Annual IEEE ASIC Conference and Exhibit, pp. 186–189, Sep. 1993.
S. Mutoh et al., “1-V power supply high-speed digital circuit technology with multithreshold -voltage CMOS”, IEEE Journal of Solid-State Circuits, 30, pp. 847–854, Aug. 1995.
S. Narendra, et al., “Scaling of stack effect and its application for leakage reduction,” ISLPED, pp. 195–200, Aug. 2001.
M. Ohashi et al., “A 27MHz 11.1mW MPEG-4 video decoder LSI for mobile application,” ISSCC, pp. 366–367, Feb. 2002.
T. Sakata, M. Horiguchi and K. Itoh, Subthreshold-current reduction circuits for multi-gigabit DRAM's, Symp. VLSI Circuits Dig., pp. 45–46, May 1993.
K. Seta, H. Hara, T. Kuroda, M. Kakumu and T. Sakurai, “50% active-power saving without speed degradation using standby power reduction (SPR) circuit,” IEEE International Solid-State Circuits Conference, XXXVIII, pp. 318–319, Feb. 1995.
M. Sheets et al., J, “A Power-Managed Protocol Processor for Wireless Sensor Networks,” Digest of Technical Papers 2006 Symposium on VLSI Circuits, pp. 212–213, June 15–17, 2006. TI MSP430 Microcontroller family, http://focus.ti.com/lit/Slab034n/slab034n.pdf
J. W. Tschanz, S. G. Narendra, Y. Ye, B. A. Bloechel, S. Borkar and V. De, ‘‘Dynamic sleep transistor and body bias for active leakage power control of microprocessors,’’ IEEE Journal of Solid-State Circuits, 38, pp. 1838–1845, Nov. 2003.
Slide 8.48