NVL14. Minimizing Energy of Integer Unit by Higher Voltage-IEEE 2013(T).pdf
-
Upload
sree-harsha -
Category
Documents
-
view
215 -
download
0
Transcript of NVL14. Minimizing Energy of Integer Unit by Higher Voltage-IEEE 2013(T).pdf
-
8/10/2019 NVL14. Minimizing Energy of Integer Unit by Higher Voltage-IEEE 2013(T).pdf
1/5
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 6, JUNE 2013 1175
Minimizing Energy of Integer Unit by Higher Voltage
Flip-Flop: VDDmin-Aware Dual Supply Voltage Technique
Hiroshi Fuketa, Koji Hirairi, Tadashi Yasufuku, Makoto Takamiya,Masahiro Nomura, Hirofumi Shinohara, and Takayasu Sakurai
Abstract To achieve the most energy-efficient operation, thisbrief presents a circuit design technique for separating thepower supply voltage (VDD) of flip-flops (FFs) from that ofcombinational circuits, called the higher voltage FF (HVFF).Although VDD scaling can reduce the energy, the minimumoperating voltage (VDDmin) of FFs prevents the operation at theoptimum supply voltage that minimizes the energy, because theVDDmin of FFs is higher than the optimum supply voltage. InHVFF, the VDDof combinational logic gates is reduced below theVDDmin of FFs while keeping the VDD of FFs at their VDDmin.This makes it possible to minimize the energy without powerand delay penalties at the nominal supply voltage (1.2 V) as wellas without FF topological difications. A 16-bit integer unit withHVFF is fabricated in a 65-nm CMOS process, and measurementresults show that HVFF reduces the minimum energy by 13%compared with the conventional operation, which is 1/10 timessmaller than the energy at the nominal supply voltage.
Index Terms Minimum operating voltage, subthresholdcircuits, variations.
I. INTRODUCTION
With the growing markets of mobile devices,
energy-efficient LSIs are strongly required. Since reducing
the supply voltage (VDD) is one of the most effective methods
for improving the energy efficiency of logic circuits, many
research studies on sub- or near-threshold logic circuitshave been reported [1][6]. VDD scaling, however, degrades
throughput. Therefore, ultradynamic voltage scaling [7],
which uses a nominal supply voltage when high performance
is required and reduces VDD when low performance is
allowed, is a promising approach to the optimization of theenergies of applications with various workloads.
As VDD is reduced, the dynamic energy decreases
quadratically with VDD, while the leakage energy, which is aproduct of the leakage power and delay, dramatically increases
in the subthreshold region owing to the increase in delay.
Therefore, the total energy has its minimum value, and VDDat which the energy is minimized is typically around 0.3 V in
a logic circuit [1]. Thus, this voltage is the target for energy-efficient LSIs.
VDD scaling is, however, obstructed by the minimum
operating voltage (VDDmin), which is the minimum power
Manuscript received December 15, 2011; revised April 6, 2012; acceptedJune 3, 2012. Date of publication July 10, 2012; date of current versionMay 20, 2013. This work was supported in part by the Extremely Low PowerProject supported by the Ministry of Economy, Trade and Industry, and theNew Energy and Industrial Technology Development Organization.
H. Fuketa, T. Yasufuku, M. Takamiya, and T. Sakurai are with theInstitute of Industrial Science, University of Tokyo, Tokyo 153-8505,Japan (e-mail: [email protected]; [email protected]; [email protected]; [email protected]).
K. Hirairi, M. Nomura, and H. Shinohara are with the SemiconductorTechnology Academic Research Center, Yokohama 222-0033, Japan (e-mail:[email protected]; [email protected]; [email protected]).
Digital Object Identifier 10.1109/TVLSI.2012.2203834
1063-8210/$31.00 2012 IEEE
1 10 100 1k 10k
Number of gates
0
100
200
300
400
V
DDmin(mV)
Measured
65nm CMOS
Flip-flop
NAND
NOR
Fig. 1. Measured VDDmins of two-input NAND, NO R, and FF chains as afunction of number of gates in a 65-nm CMOS process [10].
CK2
CKB
CK2
Slave
D
Contention occurs at wired-OR point.
Master
CK
CK2CKB
Leakage current
Drive current
Fig. 2. Schematic of an FF. The contention occurring at the wired-OR point
increases the VDDmin of FF.
supply voltage when circuits operate without functional errors
[8], [9]. Fig. 1 shows the measured VDDmins of various gatechains [10]. The VDDmin of the flip-flop (FF) chain is much
higher than those of the other logic gates, such as the two-
input NAND and NO R. This implies that FFs determine the
VDDmins of sequential circuits, which might make it impossible
to operate at the most energy-efficient supply voltage of around
0.3 V. Therefore, it is essential to reduce the VDDmin of FFs
for ultralow-voltage circuits.
The increase in the VDDmin of FFs is induced by a
contention that occurs at the wired-OR point in the FFs,as shown in Fig. 2 [11]. To reduce the VDDmin of FFs, the
following two techniques have been proposed. 1) The sizes
of transistors contained in the FF are adjusted to mitigate the
contention [2], [12]. 2) The architecture of FFs is changed
to eliminate the contention [11]. These techniques, however,
result in power and delay penalties as well as additional costs
to re-layout the FFs.
In this brief, higher voltage FF (HVFF), which is a VDDmin-
aware circuit design technique for separating the VDD of FFs
from that of combinational circuits, is proposed. In HVFF, any
-
8/10/2019 NVL14. Minimizing Energy of Integer Unit by Higher Voltage-IEEE 2013(T).pdf
2/5
1176 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 6, JUNE 2013
1
10-1
10-2
10-3
10-4
10-5
10-6
FFelgnisfoetareruliaF
10-7100 200 300 400 5000 600
VDD(FF) (mV)
VDDmin of 50 FFs
(432mV)
VDDmin of 5000 FFs(514mV)
Failure rate required toensure correct operation of50 and 5000 FFs
: Monte Carlo SPICE
: Fitted curve
PF(V
DD(FF))
Fig. 3. Simulated functional failure rate of a single FF as a function ofsupply voltage of FF (VDD(FF)). The PF(VDD(FF)) curve is fitted to theresults obtained by Monte Carlo SPICE simulations. The VDDmins of 50 and5000 FFs are estimated to be 432 and 514 mV, respectively, when the yieldis 99% in a 65-nm CMOS process.
FF architectural modifications are not required, and the VDDof combinational logic gates is reduced below the VDDmin of
FFs while keeping the VDD of FFs at their VDDmin, which
makes it possible to reduce the minimum energy without
power and delay penalties at the nominal supply voltage. In
this brief, HVFF is applied to a 16-bit integer unit (IU), and the
measurement results indicate that the minimum energy can be
reduced by 13%. In addition, the physical implementation ofHVFF is discussed. This brief reveals that the implementation
is easily carried out using an existing P&R tool and although
the total wire length increases by HVFF, any area, delay, and
power overheads are not observed in our design.
There are several research studies on the dual-VDDtechnique. Clustered voltage scaling has been proposed in [13].
In this technique, the reduced power supply voltage is applied
to the circuit on noncritical paths, whereas the nominal supply
voltage is applied to the circuit on critical paths, which makes
it possible to reduce the power dissipation while keeping the
entire circuit performance. In [14] and [15], the use of thetechnique for separating the VDD of FFs from that of combi-
national circuits has been reported. This technique, however, is
used for the power gating of combinational circuits. During the
operation mode, the same voltage is supplied to both FFs and
combinational circuits, and the VDDmin of FFs is not consid-ered. To the best of our knowledge, this brief is the first to pro-
pose a dual-VDD technique that considers the VDDmin of FFs.
The remainder of this brief is organized as follows. InSection II, the VDDmin of FFs and the effectiveness of the
proposed HVFF technique are discussed. In Section III, the
application of HVFF to a 16-bit IU and the measurement
results are shown. Finally, Section IV concludes this brief.
II . VDDmin OF FFS AND HVFF
To estimate the VDDmin of FFs, the dependence of the
functional failure rate of a single FF on VDD is obtained
Failure rate required
for 5000 FFs
VDDmin of 50 FFs
(432mV)
VDDmin of 5000 FFs
(514mV)
-140mV
-100mV
: VDD(FF)=432mV(VDDmin of 50 FFs)
: VDD(FF)=514mV(VDDmin of 5000 FFs)
1
10-1
10-2
10-3
10-4
10-5
10-6
FFelgnisfoetareruliaF
10-7100 200 300 400 5000 600
VDD(LOGIC) (mV)
FF
VDD(LOGIC)
VDD(FF)
VDD(LOGIC) < VDD(FF)
: Monte Carlo SPICE: Fitted curve
?F DD(LOGIC)
P V
Fig. 4. Simulated failure rate of a single FF as a function of supply voltageof combinational logic gates (VDD(LOGIC)) when supply voltages of FFs andcombinational logic gates are separated and VDD(FF) is kept at VDDmin ofFFs (HVFF) without level converters in a 65-nm CMOS process.
by Monte Carlo SPICE simulations (5000 trials) with within-
die threshold voltage variation. Fig. 3 shows the simulated
failure rate as a function of the supply voltage of the FF
(VDD(FF)). When PF(VDD(FF)) is defined as the failure rate
of a single FF at VDD(FF), the VDDmin of N-gate FFs can be
expressed as
1 PF
VDDminN
= Y (1)
VDDmin = P1F
1 Y1
N
(2)
where N is the number of FFs and Y is the yield.
As shown in Fig. 3, PF(VDD(FF)) is derived from the
dependence of the simulated failure rate on VDD(FF)by extrap-
olation, and the VDDmin of FFs is calculated from (2) using
PF(VDD(FF)). For example, the VDDmins of 50 and 5000 FFs
are estimated to be 432 and 514 mV, respectively, when theyield Y is 99%.
These estimated VDDmins are much higher than the mostenergy-efficient supply voltage of around 0.3 V. In conven-
tional circuits with a single power supply, all logic gates
must operate at the VDDmin of FFs despite the fact that
combinational logic gates can operate at a supply voltage muchlower than the VDDmin of FFs, which prevents the energy
reduction achieved by VDD scaling. Thus, the energy canbe further reduced, if different supply voltages between FFs
and combinational circuits are supplied.On the other hand, it is possible that the voltage difference
between FFs and combinational circuits worsens the VDDminof
FFs without level converters. Fig. 4 shows the dependenceof the failure rate of a single FF on the supply voltage of
combinational logic gates VDD(LOGIC), when VDD(FF) is kept
at the VDDmin of FFs. The lines represent the fitted PF(failure rate) curves. If PF increases, the VDDmin of FFs
-
8/10/2019 NVL14. Minimizing Energy of Integer Unit by Higher Voltage-IEEE 2013(T).pdf
3/5
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 6, JUNE 2013 1177
Clock
VDD(FF)
Combinational logic
17
VDD(LOGIC)
Implemented commands
in 16-bit ALU
- ADD/SUB with/without saturation
- Universal logic operation- MIN/MAX operation
- Absolute difference
(1.2 - 0.4V) (1.2 - 0.25V)
Inputswith carry-in
Inputs Outputs
with carry-out17
FF 1-bitL/Rshifter
Integer unit (IU)
FF16
FF
17Arithmeticand
logicoperationunit
16
HVFF; separated VDD between
FFs and combinational logics
Fig. 5. Block diagram of the developed 16-bit IU implemented with theproposed HVFF.
rises according to (2). In Fig. 4, however, the PF curvesdo not increase even if VDD(LOGIC)s are reduced by 100 and
140 mV when VDD(FF)s are 432 and 514 mV, respectively.
This indicates that level converters are not required if the
voltage difference between FFs and combinational circuits
is around 100 mV.
When VDD(FF) is 514 mV (in the case of 5000 FFs), the
energy of combinational circuits is reduced by at most 47%.
It should be noted that the VDDmin of combinational circuitsis not considered in this discussion, since it is much lower
than that of FFs, as shown in Fig. 1. If HVFF technique is
applied to random logic circuits, where FFs are used for other
than inputs and outputs of circuits, such as processor cores,
the delay and power overheads would become larger.
III. EXPERIMENTALR ESULTS
A. Overview of IU With HVFF
Fig. 5 shows the block diagram of the developed 16-bit IU
implemented with popular media processing commands [11].
In this brief, HVFF, which is the proposed circuit design
technique for separating the supply voltage of FFs ( VDD(FF))
from that of combinational circuits (VDD(LOGIC)), is applied to
the IU. The IU consists of 13K gates including 50 FFs. Levelconverters are not used.
B. Physical Design of HVFF
One of the concerns in HVFF is its physical implementation
to separate the VDD of FFs from that of combinational circuits.In this brief, the so-called row-by-row architecture [13] shown
in Fig. 6 is used. In this architecture, each row is assigned for
Row for FFs (VDD(FF))
Row for combinational logics (VDD(LOGIC))
Row for combinational logics (VDD(LOGIC))
Row for FFs (VDD(FF))
Row for FFs (VDD(FF))
VDD(FF)VSS
VDD(LOGIC)Power/ground line
Fig. 6. HVFF implementation with row-by-row architecture [13].
TABLE I
COMPARISONBETWEENIUS WITH ANDW ITHOUTHVFF
IU without HVFF(Conv.)
IU with HVFF(row-by-row architecture)
Layout
(70 m x 100 m)
Dual VDD No Yes
Total wire length 13.8 mm 14.1 mm (+2.4%)
Maximum
clock
frequency
&
Power(*)
1.2 V1.4 GHz
5.6 mW
1.4 GHz
5.6 mW
0.5 V110 MHz
76 W
110 MHz
76 W
Minimum
Energy(*)
0.54 pJ 0.45 pJ (-17%)
Data inputs
Data outputs
Data inputs
Data outputs
FFsand
combinationallogicgates
Combinational
logicgates
FFs
(2 rows)
FFs
(3 rows)
@ VDD(FF)=VDD(LOGIC)=432 mV @ VDD(LOGIC) = 330 mVVDD(FF) = 432 mV
(*) Obtained by simulations.
either FFs or combinational logic gates. The advantages of the
row-by-row architecture are as follows: 1) implementationis easily carried out using existing P&R tools and 2) any
modifications of standard cells described in [14] and [15] arenot required.
Table I shows the overhead of HVFF with the row-by-row
architecture. The P&R was performed with a fixed area of
70 m 100 m in a 65-nm CMOS process. Compared
with the IU without HVFF, the total wire length of the IU
with HVFF is increased by 2.4%, yet the delay and power arenot increased and the minimum energy is reduced by 17% in
our design. This is because the IU is suitable for HVFF with
the row-by-row architecture since the number of FFs includedin the IU is relatively large and the FFs are only used for
inputs and outputs.
-
8/10/2019 NVL14. Minimizing Energy of Integer Unit by Higher Voltage-IEEE 2013(T).pdf
4/5
1178 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 6, JUNE 2013
Fail
Pass
100
90
80
70
60
50
4030
20
10
Clockfrequency(MHz)
VDD(LOGIC) = VDD(FF) (mV)
5
400 450 500 550350
VDDmin= 450mV
Fails at any frequencies
below 450mV
(a)
(b)
100
90
80
70
60
50
40
30
20
10
Clockfrequency
(MHz) Fail
Pass
Further VDD scaling
is possible
VDD(LOGIC) (mV)400 450 500 550
VDD(FF) = VDD(LOGIC)VDD(FF) = 450mV
5
350
Fig. 7. Measured shmoo plot of IU. (a) VDD(FF) is equal to VDD(LOGIC)(conventional). (b)VDD(FF)and VDD(LOGIC)are separated (proposed HVFF).
C. Measurement Results
A 16-bit IU with HVFF is fabricated in a 65-nm CMOS
process. Fig. 7(a) shows the measured shmoo plot of the
IU with the conventional implementation VDD(FF) is equal
to VDD(LOGIC), while Fig. 7(b) shows the shmoo plot with
HVFF VDD(FF) and VDD(LOGIC) are separated. In Fig. 7(a),
the IU does not operate correctly below 450 mV even if theclock frequency is reduced from 35 MHz to 10 kHz, which
indicates that the VDDmin of the IU of this chip is 450 mV.In contrast, VDD(FF) is fixed to the FF VDDmin of 450 mV
and only VDD(LOGIC) is reduced when VDD(LOGIC) is less than
450 mV, as shown in Fig. 7(b). In this case, the IU with
HVFF is still functional even when VDD(LOGIC) is reducedto 350 mV.
Fig. 8 shows the measured VDD(LOGIC) dependence of themaximum clock frequency and the power dissipation of the
IU with HVFF. The VDDmin of the chip is 400 mV. Thepower dissipation was measured when the add operation
with random input patterns was performed. As the voltage
difference between FFs and combinational circuits increases,the power dissipation of FFs increases, since level converters
are not used. However, when the voltage difference is 100
mV (VDD(FF) = 400 mV and VDD(LOGIC) = 300 mV), the
leakage power of FFs increases by 29%, whereas the total
1M
10M
100M
1G)zH(ycneuqerfkc
olcmumixaM
2G
100k
10k
1k
1m
10m
100m
1?
10?
100?
100n
Powerd
issipation(W)
VDD(FF) = 400mV VDD(FF) = VDD(LOGIC)
Increase due to voltage difference
between combinational logicgates and FFs
0.2 0.4 0.6 0.8 1.0 1.2
VDD(LOGIC) (V)
Leakage power of FFs
Fig. 8. Measured VDD(LOGIC) dependence of maximum clock frequencyand power dissipation of IU which consists of combinational logic gates andFFs at 25 C. The measured chip is different from the chip shown in Fig. 7,and the VDDmin of FFs in this chip is 400 mV.
1.0
2.0
3.0
4.0
5.0
0
Energy(pJ/instruction)
6.0
0.2 0.4 0.6 0.8 1.0 1.2
VDD(LOGIC) (V)
-13%
Proposed
0.6
0.7
0.8
0.25 0.3 0.35 0.4 0.45
Conventional
VDD(FF)=VDD(LOGIC) (conv.)
HVFF (proposed)
VDD(FF) = 400mV VDD(FF) = VDD(LOGIC)
1/10
Fig. 9. Measured energy of IU. The proposed IU with HVFF(VDD(LOGIC) =320 mV and VDD(FF) = 400 mV) can achieve a minimum energy of 0.61 pJ,which is 13% smaller than that of conventional operation (VDD(LOGIC) =VDD(FF). The VDDmin of FFs is 400 mV.
power of combinational logic gates and FFs is reduced by
85% at 25 C. Although the rise in temperature increases the
leakage power, the reduction of the total power is estimated to
be still 62% at 85 C by simulation. Therefore, the increase inthe leakage power of FFs due to the voltage difference between
FFs and combinational circuits is not a critical problem inactual use.
Fig. 9 shows the measured energies of the IUs with theconventional operation and HVFF. The minimum energy of
the IU with the conventional operation is 0.70 pJ/instruction
at VDD(LOGIC) = VDD(FF) = 400 mV, which is equivalentto the VDDmin of FFs. On the other hand, the minimum
energy of 0.61 pJ/instruction is achieved in the IU with HVFF
when VDD(LOGIC) = 320 mV and VDD(FF) = 400 mV, which
is 13% smaller than that obtained in the case of conventional
-
8/10/2019 NVL14. Minimizing Energy of Integer Unit by Higher Voltage-IEEE 2013(T).pdf
5/5
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 6, JUNE 2013 1179
100m70m
0.9mm
1.2 mm
IU
Fig. 10. Chip micrograph fabricated in a 65-nm CMOS process. The IUoccupies 70 m 100 m.
operation and is 1/10 times smaller than the energy at the
nominal supply voltage of 1.2 V. In addition, HVFF has no
energy penalties above 400 mV. This indicates that HVFF can
reduce the minimum energy without any energy penalties at
the nominal supply voltage.Finally, the chip micrograph is shown in Fig. 10.
IV. CONCLUSION
To achieve the most energy-efficient operation, HVFF,
which is a VDDmin-aware circuit design technique for separat-ing the VDD of FFs from that of combinational circuits, was
proposed. In HVFF, the VDD of combinational logic gates is
reduced below the VDDmin of FFs while keeping the VDD of
FFs at their VDDmin to reduce the energy, since the VDDminof FFs is much higher than that of combinational logic gates.HVFF was applied to a 16-bit IU. The measurement results
in a 65-nm CMOS process showed that HVFF can reducethe minimum energy by 13% compared with conventional
operation, which is 1/10 times smaller than the energy at the
nominal supply voltage, without power and delay penalties at
the nominal supply voltage, as well as without FF topological
modifications.
REFERENCES
[1] A. W. Wang, B. H. Calhoun, and A. P. Chandrakasan, Sub-ThresholdDesign for Ultralow Power Systems. New York: Springer-Verlag,2006.
[2] H. Kaul, M. Anders, S. Mathew, S. Hsu, A. Agarwal, R. Krishnamurthy,and S. Borkar, A 320mV 56 W 411 GOPS/Watt ultralow voltagemotion estimation accelerator in 65 nm CMOS, in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2008, pp. 316317.
[3] Y. Pu, J. P. Gyvez, H. Corporaal, and H. Yajun, An ultralowenergy/frame multi-standard JPEG co-processor in 65 nm CMOS withsub/near-threshold power supply, in IEEE Int. Solid-State Circuits Conf.
Dig. Tech. Papers, Feb. 2009, pp. 146147.[4] H. Kaul, M. A. Anders, S. K. Mathew, S. K. Hsu, A. Agarwal, R.
K. Krishnamurthy, and S. Borkar, A 300 mV 494 GOPS/W reconfig-urable dual-supply 4-way SIMD vector processing accelerator in 45 nmCMOS, in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers , Feb.2009, pp. 260261.
[5] A. Agarwal, S. K. Mathew, S. K. Hsu, M. A. Anders, H. Kaul, F.
Sheikh, R. Ramanarayanan, S. Srinivasan, R. Krishnamurthy, and S.Borkar, A 320 mV-to-1.2 V on-die fine-grained reconfigurable fabricfor DSP/media accelerators in 32 nm CMOS, in IEEE Int. Solid-StateCircuits Conf. Dig. Tech. Papers, 2010, pp. 328329.
[6] J. Kwong, Y. Ramadass, N. Verma, M. Koesler, K. Huber, H. Moormann,and A. Chandrakasan, A 65 nm sub-Vtmicrocontroller with integratedSRAM and switched capacitor DC-DC converter, IEEE J. Solid-StateCircuits, vol. 44, no. 1, pp. 115126, Jan. 2009.
[7] B. H. Calhoun and A. P. Chandrakasan, Ultradynamic voltage scaling(UDVS) using sub-threshold operation and local voltage dithering,
IEEE J. Solid-State Circuits, vol. 41, no. 1, pp. 238245, Jan. 2006.[8] T. Yasufuku, S. Iida, H. Fuketa, K. Hirairi, M. Nomura, M. Takamiya,
and T. Sakurai, Investigation of determinant factors of minimumoperating voltage of logic gates in 65-nm CMOS, in Proc. Int. Symp.
Low Power Electron. Design, Aug. 2011, pp. 117122.[9] H. Fuketa, S. Iida, T. Yasufuku, M. Takamiya, M. Nomura, H. Shinohara,
and T. Sakurai, A closed-form expression for estimating minimum
operating voltage (VD Dmi n ) of CMOS logic gates, in Proc. DesignAutom. Conf., Jun. 2011, pp. 984989.
[10] T. Yasufuku, K. Hirairi, Y. Pu, Y. F. Zheng, R. Takahashi, M. Sasaki, H.Fuketa, A. Muramatsu, M. Nomura, H. Shinohara, M. Takamiya, and T.Sakurai, 245% power reduction by post-fabrication dual supply voltagecontrol of 64 voltage domains in VD Dmi n limited ultralow voltage logiccircuits, in Proc. Int. Symp. Quality Electron. Design, 2012, pp. 586591.
[11] H. Fuketa, K. Hirairi, T. Yasufuku, M. Takamiya, M. Nomura, H.Shinohara, and T. Sakurai, A 12.7-times energy efficiency increase of16-bit integer unit by power supply voltage (VDD ) scaling from 1.2Vto 310 mV enabled by contention-less flip-flops (CLFF) and separatedVD D between flip-flops and combinational logics, in Proc. Int. Symp.
Low Power Electron. Design, Aug. 2011, pp. 163168.[12] B. H. Calhoun, A. Wang, and A. Chandrakasan, Modeling and sizing
for minimum energy operation in subthreshold circuits, IEEE J. Solid-State Circuits, vol. 40, no. 9, pp. 17781786, Sep. 2005.
[13] K. Usami, M. Igarashi, F. Minami, T. Ishikawa, M. Kanazawa, M.Ichida, and K. Nogami, Automated low-power technique exploitingmultiple supply voltages applied to a media processor, IEEE J. Solid-State Circuits, vol. 33, no. 3, pp. 463472, Mar. 1998.
[14] J. S. Wang, H. Y. Li, C. Yeh, and T. F. Chen, Design techniques forsingle-low-VDD CMOS systems, IEEE J. Solid-State Circuits, vol. 40,no. 5, pp. 11571165, May 2005.
[15] J. S. Wang, J. S. Chen, Y. M. Wang, and C. Yeh, A 230 mV-to-500 mV 375 KHz-to-16 MHz 32b RISC Core in 0.18?m CMOS,in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, 2007,pp. 294295.