Post on 21-Dec-2015
C. H. Ziesler et al., 2003
Energy Recovering Computers
1 Advanced Computer Architecture LaboratoryUniversity of Michigan
Conrad H. Ziesler 1 Joohee Kim 1 Suhwan Kim 1 2
Marios C. Papaefthymiou 1
2 T. J. Watson Research CenterIBM Research Division
C. H. Ziesler et al., 2003
Conventional CMOS Operation
_+
Vdd
Vss
R
CR
time
• Constant supply voltage• One-way street to ground• High voltage drops across conducting devices• Energy dissipation grows with CV2
Vdd
IR
VR
PR
VC
ER
C. H. Ziesler et al., 2003
Charge Recovery
Vpc
R
C
time
• Time-varying voltage supply Vpc• Charge recovered from load C • Charge transfer through resistor is spread over available time• Energy dissipation grows with (RC/T) CV2
~ Vpc
IR
VR PR
ER
Vdd
VC
C. H. Ziesler et al., 2003
Quick Glance at History
● Physics (1970s)
– Logical reversibility of computation– Connection to thermodynamics (“adiabatic” computing)– No absolute minimum to energy dissipation, if computing
is arbitrarily slow. (Does not have to be slow, however.)
● Engineering (1990s)
– Logic circuitry– Energy recycling circuitry– VLSI prototyping
C. H. Ziesler et al., 2003
The Design Space
Complex web of intertwined issues
1. Which capacitance to recover from?
2. How to store/reuse recovered energy?
3. What circuits to do the recovery?
1. 2. 3.
C. H. Ziesler et al., 2003
• The case for reversible computation P. Solomon and D. Frank – SLPE'94
• Asymptotically zero-energy split-level charge recovery logic S.G. Younis and T. F. Knight, Jr. – SPLE’94
• Clock-powered CMOS: A hybrid adiabatic logic style for energy-efficient computing N. Tzartzanis and W. C. Athas – ARVLSI'99
And many, many others.... Few real working chips, however.
Sample Design Points from’90s
C. H. Ziesler et al., 2003
Our Contributions
• Multiplier chip (2001) Custom dynamic circuitry with fine-grain recovery from outputs of individual gates
• Discrete wavelet transform chip (2003) Standard-cell ASIC design with recovery of clock power
• Guiding principles– Simplicity– Minimal control and hardware overhead
• Key attributes– Single-phase sinusoidal waveform for power and synchronization– Integrated power-clock generators with inductive elements– Relatively high operating speeds (200MHz+ in silicon)– Higher energy efficiency than conventional counterparts
C. H. Ziesler et al., 2003
•Minimalist approach– Simple tools: magic and spice– Low-cost standard CMOS process: HP 0.5m, 40-pin DIP package, through MOSIS.
•Operational chip demonstrates practicality of energy-recovering circuit design
– Non-trivial size (8-bit operands, on-chip clock, self-test)– High throughput– Low energy dissipation
Multiplier Chip
• Suhwan Kim (while still at UM) and Conrad Ziesler received First Prize in VLSI Design Contest, DAC 2001.
C. H. Ziesler et al., 2003
• Working 1st silicon• Simple yet effective experimental setup.• Correct operation verified @ 130 MHz.• Energy efficient operation validated by measurements.
Low-Energy High-Speed Operation
C. H. Ziesler et al., 2003
Energy Comparison
2.7V
1.9V
2.2V1.9V
1.6V
2.9V
50 100 200
Frequency (MHz)
100
200
300
400
500
Dis
sipa
tion
per
Cyc
le (
pJ)
0
140
Energy recovery
4 stage static CMOS
2.3V
3.0V
3.0V
2.0V
8 stage static CMOS
2 stage static CMOS
4x
C. H. Ziesler et al., 2003
Simulations vs. Measurements
020406080
40 50 60 70 90 100 110 120 130
Frequency (MHz)
Dis
crep
ancy
(%
)
• Good correspondence up to 100MHz• Simulations gave consistently higher dissipation
C. H. Ziesler et al., 2003
Test Chip Overview
•Two multipliers with self-test per chip (minimum size die)• Integrated power-clock generator•Resonant LC oscillator
C. H. Ziesler et al., 2003
Input BILBO (self-test)
Product array
Multiplicand buffers
Result summation
Result buffers
Self-test Control
Output BILBO (self-test)
Multiplier and Self-Test
• 9,048 devices in multiplier array, 2,806 devices in self-test circuitry• Implemented entirely in energy-recovering dynamic logic family.
C. H. Ziesler et al., 2003
Energy-Recovering Dynamic Logic
• Sense amplifier• Precharge diodes• Current switches• Evaluation tree• Current tail• Power clock
Vss
bias
at
btaf bf
afot
NMOS NAND gatePower clock
• Single-phase sinusoidal power-clock• Minimum-size low-swing evaluation tee• Built-in state element• Dual-rail noise-tolerant design
C. H. Ziesler et al., 2003
Cascade Operation
• NMOS and PMOS gates are cascaded like domino logic.
• Animation shows tokens propagating through buffer chain.
C. H. Ziesler et al., 2003
• Zero-voltage switching
• LC- resonant clock generation
• External/bondwire inductor L
• Resistive/capacitive adiabatic load
• Compact: 170 x 115 m
Single-Phase Power-Clock Generator
Power Clock
Pulse Gen
Ring Osc
Gate Drive
Power Switches: S1, S2
25 tr. 19 tr. 10 tr.
RingOsc.
PulseGen.
GateDrive
_+_+
loadS2
S1
PCL
Vdd
Vss
Vbp
Vbn
C. H. Ziesler et al., 2003
Switch Timings
• Inductor current builds linearly when switches are on.
• Peak switch current less than peak inductor current.
• Switch S1 turned on at positive voltage peak.
• Switch S2 turned on at negative voltage peak.
• Fixed “on-window” controlled by pulse generator.
Indu
ctor
cur
rent
Out
put v
olta
ge
C. H. Ziesler et al., 2003
• Single-phase sinusoidal waveform @140MHz• ~60pF load, ~10nH external inductor• One DC supply (Vdd, Vss), two DC biases (NMOS, PMOS)
Power-Clock Waveform
Vdd
Vss
PMOS Bias
NMOS Bias
Power-Clock
C. H. Ziesler et al., 2003
Resonant Clock ASIC
• Compatible with ASIC flow
• Synthesized by Conrad Ziesler and Joohee Kim using in-house standard-cells library and commercial tools
• Energy recovering clock tree and SRAM word/bit lines
• Low-cost bulk CMOS process: TSMC 0.25m, 108-pin PGA package, through MOSIS
• High frequency (300MHz)
• Low voltage (1.0-1.5V)
C. H. Ziesler et al., 2003
The Energy Recovering Flip-Flop
• Clock signal: Single-phase resonant sinusoid• Probe activates state element only if next state differs from present state.• Low voltage operation at high speeds• Delay similar to conventional flip-flops• Fully compatible with standard-cell ASIC flow
16 transistors84 m2
probe state element
C. H. Ziesler et al., 2003
Flip-Flop Power Characterization
Order of magnitude difference between idle (D, Q constant) and active (D, Q changing) dissipation.
200MHz 300MHz 500MHz1
10
100
1000Flip-Flop Energy
activeidle
frequency
ene
rgy
(fJ)
frequency (MHz)
Flip-Flip Energy per Cycle1000
100
10
1
ener
gy (
fJ)
200 300 500
C. H. Ziesler et al., 2003
Resonant Clock Generator
Resonate entire clock capacitance with small inductor
Pump resonant system with NMOS switch at appropriate times
NMOS switch only conducts incremental losses whenever “on”
NMOS Switch
Control
Pre-driver
Driver
C. H. Ziesler et al., 2003
Recovering vs. Conventional Hardware
Synthesized dual-mode ASIC– conventional– energy recovery
Dual-mode flip-flop cell
Conventional clock treewith conventional flip-flop
Resonant clock tree withenergy-recovering flip-flop
Direct comparison of dissipation at target throughput using identical hardware structures
C. H. Ziesler et al., 2003
ASIC Statistics
Discrete wavelet transform (DWT)
3,897 gates, 413 ffs
15,571 transistors
400m x 900m
13.6 pF , 21 nH
300 MHz , 1.5V
0.25m logic process
Dual-modeDWT
Clock generator
C. H. Ziesler et al., 2003
Simulation Results
Total system dissipation
Conventional mode includes clock tree
Energy recovery mode includes on-chip clock generator
300MHz , 1.5V Energy Recovery Conventional
Idle 6.74pJ 29.72pJ
Active 68.47pJ 78.28pJ
Per Cycle Energy Dissipation
C. H. Ziesler et al., 2003
Correct Function
• 115MHz• 5nH high-Q external inductor
signature output(conventional)
signature output(energy recovery)
C. H. Ziesler et al., 2003
Self-Resonance at 225MHz
• No external inductor!• Clock resonates using parasitic inductance of package• Lossy parasitics limit power savings
200,000 samples@200ps intervals
signature output(energy recovery)
buffered power clock
8ns/div
4s/div
C. H. Ziesler et al., 2003
Preliminary Power Measurements
115MHz , 1.5V Energy Recovery Conventional
Idle 2.6mW 4.3mW
Active 8.9mW 11.7mW
Total system measurements by monitoring DC supplies
– Conventional mode includes clock tree
– Energy recovery mode includes on-chip clock generator
Not included: I/O supply, test circuitry, ring oscillator
Energy recovery data is conservative (obtained when simultaneously driving two DWT cores, one always in idle)
C. H. Ziesler et al., 2003
• Working silicon that realizes the potential of energy recovery– power savings over conventional CMOS– operating speeds in the 100s of Mhz– low control and hardware overhead– integrated power-clock generators
• Multiplier chip in 0.5m CMOS– Fine-grain recovery at the level of individual gate output– True singe-phase energy recovering dynamic logic– 2—4x energy savings over static CMOS in 50—200MHz
• DWT chip in 0.25m CMOS– ASIC-compatible design flow– Correct operation all the way up to self-resonance at 225MHz– Direct hardware comparison with dual-mode chip:
• 60—75% of conventional at 115MHz (measurement) • 1.2—4x savings at 300MHz (simulation)
Summary
C. H. Ziesler et al., 2003
Ongoing Research
• Continue chip testing– Break 300MHz barrier by bonding chip onto PCB– Test SRAM
• Experimentally assess EM noise – Clock EM power is centered around the resonant
frequency of the sinusoid
• Scaling– How large a clock network with single-phase sinusoid?– Skew/jitter?
• Ultra high speed (multiple GHz)– How fast with current clocking scheme?– Recovery circuitry with self-resonating clock network.
C. H. Ziesler et al., 2003
Links and Acknowledgments
Web Sitehttp://www.eecs.umich.edu/acal/energyrecovery
Energy recovery groupSuhwan Kim (emeritus)
Conrad ZieslerJoohee KimJuang-Ying ChuehVisvesh Sathe
Funded in part by U.S. Army Research Office and DARPADAAG-55-97-1-0250, DAAD-19-99-1-0340, N66001-02-C-8059