NVL14. Minimizing Energy of Integer Unit by Higher Voltage-IEEE 2013(T).pdf

download NVL14. Minimizing Energy of Integer Unit by Higher Voltage-IEEE 2013(T).pdf

of 5

Transcript of NVL14. Minimizing Energy of Integer Unit by Higher Voltage-IEEE 2013(T).pdf

  • 8/10/2019 NVL14. Minimizing Energy of Integer Unit by Higher Voltage-IEEE 2013(T).pdf

    1/5

    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 6, JUNE 2013 1175

    Minimizing Energy of Integer Unit by Higher Voltage

    Flip-Flop: VDDmin-Aware Dual Supply Voltage Technique

    Hiroshi Fuketa, Koji Hirairi, Tadashi Yasufuku, Makoto Takamiya,Masahiro Nomura, Hirofumi Shinohara, and Takayasu Sakurai

    Abstract To achieve the most energy-efficient operation, thisbrief presents a circuit design technique for separating thepower supply voltage (VDD) of flip-flops (FFs) from that ofcombinational circuits, called the higher voltage FF (HVFF).Although VDD scaling can reduce the energy, the minimumoperating voltage (VDDmin) of FFs prevents the operation at theoptimum supply voltage that minimizes the energy, because theVDDmin of FFs is higher than the optimum supply voltage. InHVFF, the VDDof combinational logic gates is reduced below theVDDmin of FFs while keeping the VDD of FFs at their VDDmin.This makes it possible to minimize the energy without powerand delay penalties at the nominal supply voltage (1.2 V) as wellas without FF topological difications. A 16-bit integer unit withHVFF is fabricated in a 65-nm CMOS process, and measurementresults show that HVFF reduces the minimum energy by 13%compared with the conventional operation, which is 1/10 timessmaller than the energy at the nominal supply voltage.

    Index Terms Minimum operating voltage, subthresholdcircuits, variations.

    I. INTRODUCTION

    With the growing markets of mobile devices,

    energy-efficient LSIs are strongly required. Since reducing

    the supply voltage (VDD) is one of the most effective methods

    for improving the energy efficiency of logic circuits, many

    research studies on sub- or near-threshold logic circuitshave been reported [1][6]. VDD scaling, however, degrades

    throughput. Therefore, ultradynamic voltage scaling [7],

    which uses a nominal supply voltage when high performance

    is required and reduces VDD when low performance is

    allowed, is a promising approach to the optimization of theenergies of applications with various workloads.

    As VDD is reduced, the dynamic energy decreases

    quadratically with VDD, while the leakage energy, which is aproduct of the leakage power and delay, dramatically increases

    in the subthreshold region owing to the increase in delay.

    Therefore, the total energy has its minimum value, and VDDat which the energy is minimized is typically around 0.3 V in

    a logic circuit [1]. Thus, this voltage is the target for energy-efficient LSIs.

    VDD scaling is, however, obstructed by the minimum

    operating voltage (VDDmin), which is the minimum power

    Manuscript received December 15, 2011; revised April 6, 2012; acceptedJune 3, 2012. Date of publication July 10, 2012; date of current versionMay 20, 2013. This work was supported in part by the Extremely Low PowerProject supported by the Ministry of Economy, Trade and Industry, and theNew Energy and Industrial Technology Development Organization.

    H. Fuketa, T. Yasufuku, M. Takamiya, and T. Sakurai are with theInstitute of Industrial Science, University of Tokyo, Tokyo 153-8505,Japan (e-mail: [email protected]; [email protected]; [email protected]; [email protected]).

    K. Hirairi, M. Nomura, and H. Shinohara are with the SemiconductorTechnology Academic Research Center, Yokohama 222-0033, Japan (e-mail:[email protected]; [email protected]; [email protected]).

    Digital Object Identifier 10.1109/TVLSI.2012.2203834

    1063-8210/$31.00 2012 IEEE

    1 10 100 1k 10k

    Number of gates

    0

    100

    200

    300

    400

    V

    DDmin(mV)

    Measured

    65nm CMOS

    Flip-flop

    NAND

    NOR

    Fig. 1. Measured VDDmins of two-input NAND, NO R, and FF chains as afunction of number of gates in a 65-nm CMOS process [10].

    CK2

    CKB

    CK2

    Slave

    D

    Contention occurs at wired-OR point.

    Master

    CK

    CK2CKB

    Leakage current

    Drive current

    Fig. 2. Schematic of an FF. The contention occurring at the wired-OR point

    increases the VDDmin of FF.

    supply voltage when circuits operate without functional errors

    [8], [9]. Fig. 1 shows the measured VDDmins of various gatechains [10]. The VDDmin of the flip-flop (FF) chain is much

    higher than those of the other logic gates, such as the two-

    input NAND and NO R. This implies that FFs determine the

    VDDmins of sequential circuits, which might make it impossible

    to operate at the most energy-efficient supply voltage of around

    0.3 V. Therefore, it is essential to reduce the VDDmin of FFs

    for ultralow-voltage circuits.

    The increase in the VDDmin of FFs is induced by a

    contention that occurs at the wired-OR point in the FFs,as shown in Fig. 2 [11]. To reduce the VDDmin of FFs, the

    following two techniques have been proposed. 1) The sizes

    of transistors contained in the FF are adjusted to mitigate the

    contention [2], [12]. 2) The architecture of FFs is changed

    to eliminate the contention [11]. These techniques, however,

    result in power and delay penalties as well as additional costs

    to re-layout the FFs.

    In this brief, higher voltage FF (HVFF), which is a VDDmin-

    aware circuit design technique for separating the VDD of FFs

    from that of combinational circuits, is proposed. In HVFF, any

  • 8/10/2019 NVL14. Minimizing Energy of Integer Unit by Higher Voltage-IEEE 2013(T).pdf

    2/5

    1176 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 6, JUNE 2013

    1

    10-1

    10-2

    10-3

    10-4

    10-5

    10-6

    FFelgnisfoetareruliaF

    10-7100 200 300 400 5000 600

    VDD(FF) (mV)

    VDDmin of 50 FFs

    (432mV)

    VDDmin of 5000 FFs(514mV)

    Failure rate required toensure correct operation of50 and 5000 FFs

    : Monte Carlo SPICE

    : Fitted curve

    PF(V

    DD(FF))

    Fig. 3. Simulated functional failure rate of a single FF as a function ofsupply voltage of FF (VDD(FF)). The PF(VDD(FF)) curve is fitted to theresults obtained by Monte Carlo SPICE simulations. The VDDmins of 50 and5000 FFs are estimated to be 432 and 514 mV, respectively, when the yieldis 99% in a 65-nm CMOS process.

    FF architectural modifications are not required, and the VDDof combinational logic gates is reduced below the VDDmin of

    FFs while keeping the VDD of FFs at their VDDmin, which

    makes it possible to reduce the minimum energy without

    power and delay penalties at the nominal supply voltage. In

    this brief, HVFF is applied to a 16-bit integer unit (IU), and the

    measurement results indicate that the minimum energy can be

    reduced by 13%. In addition, the physical implementation ofHVFF is discussed. This brief reveals that the implementation

    is easily carried out using an existing P&R tool and although

    the total wire length increases by HVFF, any area, delay, and

    power overheads are not observed in our design.

    There are several research studies on the dual-VDDtechnique. Clustered voltage scaling has been proposed in [13].

    In this technique, the reduced power supply voltage is applied

    to the circuit on noncritical paths, whereas the nominal supply

    voltage is applied to the circuit on critical paths, which makes

    it possible to reduce the power dissipation while keeping the

    entire circuit performance. In [14] and [15], the use of thetechnique for separating the VDD of FFs from that of combi-

    national circuits has been reported. This technique, however, is

    used for the power gating of combinational circuits. During the

    operation mode, the same voltage is supplied to both FFs and

    combinational circuits, and the VDDmin of FFs is not consid-ered. To the best of our knowledge, this brief is the first to pro-

    pose a dual-VDD technique that considers the VDDmin of FFs.

    The remainder of this brief is organized as follows. InSection II, the VDDmin of FFs and the effectiveness of the

    proposed HVFF technique are discussed. In Section III, the

    application of HVFF to a 16-bit IU and the measurement

    results are shown. Finally, Section IV concludes this brief.

    II . VDDmin OF FFS AND HVFF

    To estimate the VDDmin of FFs, the dependence of the

    functional failure rate of a single FF on VDD is obtained

    Failure rate required

    for 5000 FFs

    VDDmin of 50 FFs

    (432mV)

    VDDmin of 5000 FFs

    (514mV)

    -140mV

    -100mV

    : VDD(FF)=432mV(VDDmin of 50 FFs)

    : VDD(FF)=514mV(VDDmin of 5000 FFs)

    1

    10-1

    10-2

    10-3

    10-4

    10-5

    10-6

    FFelgnisfoetareruliaF

    10-7100 200 300 400 5000 600

    VDD(LOGIC) (mV)

    FF

    VDD(LOGIC)

    VDD(FF)

    VDD(LOGIC) < VDD(FF)

    : Monte Carlo SPICE: Fitted curve

    ?F DD(LOGIC)

    P V

    Fig. 4. Simulated failure rate of a single FF as a function of supply voltageof combinational logic gates (VDD(LOGIC)) when supply voltages of FFs andcombinational logic gates are separated and VDD(FF) is kept at VDDmin ofFFs (HVFF) without level converters in a 65-nm CMOS process.

    by Monte Carlo SPICE simulations (5000 trials) with within-

    die threshold voltage variation. Fig. 3 shows the simulated

    failure rate as a function of the supply voltage of the FF

    (VDD(FF)). When PF(VDD(FF)) is defined as the failure rate

    of a single FF at VDD(FF), the VDDmin of N-gate FFs can be

    expressed as

    1 PF

    VDDminN

    = Y (1)

    VDDmin = P1F

    1 Y1

    N

    (2)

    where N is the number of FFs and Y is the yield.

    As shown in Fig. 3, PF(VDD(FF)) is derived from the

    dependence of the simulated failure rate on VDD(FF)by extrap-

    olation, and the VDDmin of FFs is calculated from (2) using

    PF(VDD(FF)). For example, the VDDmins of 50 and 5000 FFs

    are estimated to be 432 and 514 mV, respectively, when theyield Y is 99%.

    These estimated VDDmins are much higher than the mostenergy-efficient supply voltage of around 0.3 V. In conven-

    tional circuits with a single power supply, all logic gates

    must operate at the VDDmin of FFs despite the fact that

    combinational logic gates can operate at a supply voltage muchlower than the VDDmin of FFs, which prevents the energy

    reduction achieved by VDD scaling. Thus, the energy canbe further reduced, if different supply voltages between FFs

    and combinational circuits are supplied.On the other hand, it is possible that the voltage difference

    between FFs and combinational circuits worsens the VDDminof

    FFs without level converters. Fig. 4 shows the dependenceof the failure rate of a single FF on the supply voltage of

    combinational logic gates VDD(LOGIC), when VDD(FF) is kept

    at the VDDmin of FFs. The lines represent the fitted PF(failure rate) curves. If PF increases, the VDDmin of FFs

  • 8/10/2019 NVL14. Minimizing Energy of Integer Unit by Higher Voltage-IEEE 2013(T).pdf

    3/5

    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 6, JUNE 2013 1177

    Clock

    VDD(FF)

    Combinational logic

    17

    VDD(LOGIC)

    Implemented commands

    in 16-bit ALU

    - ADD/SUB with/without saturation

    - Universal logic operation- MIN/MAX operation

    - Absolute difference

    (1.2 - 0.4V) (1.2 - 0.25V)

    Inputswith carry-in

    Inputs Outputs

    with carry-out17

    FF 1-bitL/Rshifter

    Integer unit (IU)

    FF16

    FF

    17Arithmeticand

    logicoperationunit

    16

    HVFF; separated VDD between

    FFs and combinational logics

    Fig. 5. Block diagram of the developed 16-bit IU implemented with theproposed HVFF.

    rises according to (2). In Fig. 4, however, the PF curvesdo not increase even if VDD(LOGIC)s are reduced by 100 and

    140 mV when VDD(FF)s are 432 and 514 mV, respectively.

    This indicates that level converters are not required if the

    voltage difference between FFs and combinational circuits

    is around 100 mV.

    When VDD(FF) is 514 mV (in the case of 5000 FFs), the

    energy of combinational circuits is reduced by at most 47%.

    It should be noted that the VDDmin of combinational circuitsis not considered in this discussion, since it is much lower

    than that of FFs, as shown in Fig. 1. If HVFF technique is

    applied to random logic circuits, where FFs are used for other

    than inputs and outputs of circuits, such as processor cores,

    the delay and power overheads would become larger.

    III. EXPERIMENTALR ESULTS

    A. Overview of IU With HVFF

    Fig. 5 shows the block diagram of the developed 16-bit IU

    implemented with popular media processing commands [11].

    In this brief, HVFF, which is the proposed circuit design

    technique for separating the supply voltage of FFs ( VDD(FF))

    from that of combinational circuits (VDD(LOGIC)), is applied to

    the IU. The IU consists of 13K gates including 50 FFs. Levelconverters are not used.

    B. Physical Design of HVFF

    One of the concerns in HVFF is its physical implementation

    to separate the VDD of FFs from that of combinational circuits.In this brief, the so-called row-by-row architecture [13] shown

    in Fig. 6 is used. In this architecture, each row is assigned for

    Row for FFs (VDD(FF))

    Row for combinational logics (VDD(LOGIC))

    Row for combinational logics (VDD(LOGIC))

    Row for FFs (VDD(FF))

    Row for FFs (VDD(FF))

    VDD(FF)VSS

    VDD(LOGIC)Power/ground line

    Fig. 6. HVFF implementation with row-by-row architecture [13].

    TABLE I

    COMPARISONBETWEENIUS WITH ANDW ITHOUTHVFF

    IU without HVFF(Conv.)

    IU with HVFF(row-by-row architecture)

    Layout

    (70 m x 100 m)

    Dual VDD No Yes

    Total wire length 13.8 mm 14.1 mm (+2.4%)

    Maximum

    clock

    frequency

    &

    Power(*)

    1.2 V1.4 GHz

    5.6 mW

    1.4 GHz

    5.6 mW

    0.5 V110 MHz

    76 W

    110 MHz

    76 W

    Minimum

    Energy(*)

    0.54 pJ 0.45 pJ (-17%)

    Data inputs

    Data outputs

    Data inputs

    Data outputs

    FFsand

    combinationallogicgates

    Combinational

    logicgates

    FFs

    (2 rows)

    FFs

    (3 rows)

    @ VDD(FF)=VDD(LOGIC)=432 mV @ VDD(LOGIC) = 330 mVVDD(FF) = 432 mV

    (*) Obtained by simulations.

    either FFs or combinational logic gates. The advantages of the

    row-by-row architecture are as follows: 1) implementationis easily carried out using existing P&R tools and 2) any

    modifications of standard cells described in [14] and [15] arenot required.

    Table I shows the overhead of HVFF with the row-by-row

    architecture. The P&R was performed with a fixed area of

    70 m 100 m in a 65-nm CMOS process. Compared

    with the IU without HVFF, the total wire length of the IU

    with HVFF is increased by 2.4%, yet the delay and power arenot increased and the minimum energy is reduced by 17% in

    our design. This is because the IU is suitable for HVFF with

    the row-by-row architecture since the number of FFs includedin the IU is relatively large and the FFs are only used for

    inputs and outputs.

  • 8/10/2019 NVL14. Minimizing Energy of Integer Unit by Higher Voltage-IEEE 2013(T).pdf

    4/5

    1178 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 6, JUNE 2013

    Fail

    Pass

    100

    90

    80

    70

    60

    50

    4030

    20

    10

    Clockfrequency(MHz)

    VDD(LOGIC) = VDD(FF) (mV)

    5

    400 450 500 550350

    VDDmin= 450mV

    Fails at any frequencies

    below 450mV

    (a)

    (b)

    100

    90

    80

    70

    60

    50

    40

    30

    20

    10

    Clockfrequency

    (MHz) Fail

    Pass

    Further VDD scaling

    is possible

    VDD(LOGIC) (mV)400 450 500 550

    VDD(FF) = VDD(LOGIC)VDD(FF) = 450mV

    5

    350

    Fig. 7. Measured shmoo plot of IU. (a) VDD(FF) is equal to VDD(LOGIC)(conventional). (b)VDD(FF)and VDD(LOGIC)are separated (proposed HVFF).

    C. Measurement Results

    A 16-bit IU with HVFF is fabricated in a 65-nm CMOS

    process. Fig. 7(a) shows the measured shmoo plot of the

    IU with the conventional implementation VDD(FF) is equal

    to VDD(LOGIC), while Fig. 7(b) shows the shmoo plot with

    HVFF VDD(FF) and VDD(LOGIC) are separated. In Fig. 7(a),

    the IU does not operate correctly below 450 mV even if theclock frequency is reduced from 35 MHz to 10 kHz, which

    indicates that the VDDmin of the IU of this chip is 450 mV.In contrast, VDD(FF) is fixed to the FF VDDmin of 450 mV

    and only VDD(LOGIC) is reduced when VDD(LOGIC) is less than

    450 mV, as shown in Fig. 7(b). In this case, the IU with

    HVFF is still functional even when VDD(LOGIC) is reducedto 350 mV.

    Fig. 8 shows the measured VDD(LOGIC) dependence of themaximum clock frequency and the power dissipation of the

    IU with HVFF. The VDDmin of the chip is 400 mV. Thepower dissipation was measured when the add operation

    with random input patterns was performed. As the voltage

    difference between FFs and combinational circuits increases,the power dissipation of FFs increases, since level converters

    are not used. However, when the voltage difference is 100

    mV (VDD(FF) = 400 mV and VDD(LOGIC) = 300 mV), the

    leakage power of FFs increases by 29%, whereas the total

    1M

    10M

    100M

    1G)zH(ycneuqerfkc

    olcmumixaM

    2G

    100k

    10k

    1k

    1m

    10m

    100m

    1?

    10?

    100?

    100n

    Powerd

    issipation(W)

    VDD(FF) = 400mV VDD(FF) = VDD(LOGIC)

    Increase due to voltage difference

    between combinational logicgates and FFs

    0.2 0.4 0.6 0.8 1.0 1.2

    VDD(LOGIC) (V)

    Leakage power of FFs

    Fig. 8. Measured VDD(LOGIC) dependence of maximum clock frequencyand power dissipation of IU which consists of combinational logic gates andFFs at 25 C. The measured chip is different from the chip shown in Fig. 7,and the VDDmin of FFs in this chip is 400 mV.

    1.0

    2.0

    3.0

    4.0

    5.0

    0

    Energy(pJ/instruction)

    6.0

    0.2 0.4 0.6 0.8 1.0 1.2

    VDD(LOGIC) (V)

    -13%

    Proposed

    0.6

    0.7

    0.8

    0.25 0.3 0.35 0.4 0.45

    Conventional

    VDD(FF)=VDD(LOGIC) (conv.)

    HVFF (proposed)

    VDD(FF) = 400mV VDD(FF) = VDD(LOGIC)

    1/10

    Fig. 9. Measured energy of IU. The proposed IU with HVFF(VDD(LOGIC) =320 mV and VDD(FF) = 400 mV) can achieve a minimum energy of 0.61 pJ,which is 13% smaller than that of conventional operation (VDD(LOGIC) =VDD(FF). The VDDmin of FFs is 400 mV.

    power of combinational logic gates and FFs is reduced by

    85% at 25 C. Although the rise in temperature increases the

    leakage power, the reduction of the total power is estimated to

    be still 62% at 85 C by simulation. Therefore, the increase inthe leakage power of FFs due to the voltage difference between

    FFs and combinational circuits is not a critical problem inactual use.

    Fig. 9 shows the measured energies of the IUs with theconventional operation and HVFF. The minimum energy of

    the IU with the conventional operation is 0.70 pJ/instruction

    at VDD(LOGIC) = VDD(FF) = 400 mV, which is equivalentto the VDDmin of FFs. On the other hand, the minimum

    energy of 0.61 pJ/instruction is achieved in the IU with HVFF

    when VDD(LOGIC) = 320 mV and VDD(FF) = 400 mV, which

    is 13% smaller than that obtained in the case of conventional

  • 8/10/2019 NVL14. Minimizing Energy of Integer Unit by Higher Voltage-IEEE 2013(T).pdf

    5/5

    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 6, JUNE 2013 1179

    100m70m

    0.9mm

    1.2 mm

    IU

    Fig. 10. Chip micrograph fabricated in a 65-nm CMOS process. The IUoccupies 70 m 100 m.

    operation and is 1/10 times smaller than the energy at the

    nominal supply voltage of 1.2 V. In addition, HVFF has no

    energy penalties above 400 mV. This indicates that HVFF can

    reduce the minimum energy without any energy penalties at

    the nominal supply voltage.Finally, the chip micrograph is shown in Fig. 10.

    IV. CONCLUSION

    To achieve the most energy-efficient operation, HVFF,

    which is a VDDmin-aware circuit design technique for separat-ing the VDD of FFs from that of combinational circuits, was

    proposed. In HVFF, the VDD of combinational logic gates is

    reduced below the VDDmin of FFs while keeping the VDD of

    FFs at their VDDmin to reduce the energy, since the VDDminof FFs is much higher than that of combinational logic gates.HVFF was applied to a 16-bit IU. The measurement results

    in a 65-nm CMOS process showed that HVFF can reducethe minimum energy by 13% compared with conventional

    operation, which is 1/10 times smaller than the energy at the

    nominal supply voltage, without power and delay penalties at

    the nominal supply voltage, as well as without FF topological

    modifications.

    REFERENCES

    [1] A. W. Wang, B. H. Calhoun, and A. P. Chandrakasan, Sub-ThresholdDesign for Ultralow Power Systems. New York: Springer-Verlag,2006.

    [2] H. Kaul, M. Anders, S. Mathew, S. Hsu, A. Agarwal, R. Krishnamurthy,and S. Borkar, A 320mV 56 W 411 GOPS/Watt ultralow voltagemotion estimation accelerator in 65 nm CMOS, in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2008, pp. 316317.

    [3] Y. Pu, J. P. Gyvez, H. Corporaal, and H. Yajun, An ultralowenergy/frame multi-standard JPEG co-processor in 65 nm CMOS withsub/near-threshold power supply, in IEEE Int. Solid-State Circuits Conf.

    Dig. Tech. Papers, Feb. 2009, pp. 146147.[4] H. Kaul, M. A. Anders, S. K. Mathew, S. K. Hsu, A. Agarwal, R.

    K. Krishnamurthy, and S. Borkar, A 300 mV 494 GOPS/W reconfig-urable dual-supply 4-way SIMD vector processing accelerator in 45 nmCMOS, in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers , Feb.2009, pp. 260261.

    [5] A. Agarwal, S. K. Mathew, S. K. Hsu, M. A. Anders, H. Kaul, F.

    Sheikh, R. Ramanarayanan, S. Srinivasan, R. Krishnamurthy, and S.Borkar, A 320 mV-to-1.2 V on-die fine-grained reconfigurable fabricfor DSP/media accelerators in 32 nm CMOS, in IEEE Int. Solid-StateCircuits Conf. Dig. Tech. Papers, 2010, pp. 328329.

    [6] J. Kwong, Y. Ramadass, N. Verma, M. Koesler, K. Huber, H. Moormann,and A. Chandrakasan, A 65 nm sub-Vtmicrocontroller with integratedSRAM and switched capacitor DC-DC converter, IEEE J. Solid-StateCircuits, vol. 44, no. 1, pp. 115126, Jan. 2009.

    [7] B. H. Calhoun and A. P. Chandrakasan, Ultradynamic voltage scaling(UDVS) using sub-threshold operation and local voltage dithering,

    IEEE J. Solid-State Circuits, vol. 41, no. 1, pp. 238245, Jan. 2006.[8] T. Yasufuku, S. Iida, H. Fuketa, K. Hirairi, M. Nomura, M. Takamiya,

    and T. Sakurai, Investigation of determinant factors of minimumoperating voltage of logic gates in 65-nm CMOS, in Proc. Int. Symp.

    Low Power Electron. Design, Aug. 2011, pp. 117122.[9] H. Fuketa, S. Iida, T. Yasufuku, M. Takamiya, M. Nomura, H. Shinohara,

    and T. Sakurai, A closed-form expression for estimating minimum

    operating voltage (VD Dmi n ) of CMOS logic gates, in Proc. DesignAutom. Conf., Jun. 2011, pp. 984989.

    [10] T. Yasufuku, K. Hirairi, Y. Pu, Y. F. Zheng, R. Takahashi, M. Sasaki, H.Fuketa, A. Muramatsu, M. Nomura, H. Shinohara, M. Takamiya, and T.Sakurai, 245% power reduction by post-fabrication dual supply voltagecontrol of 64 voltage domains in VD Dmi n limited ultralow voltage logiccircuits, in Proc. Int. Symp. Quality Electron. Design, 2012, pp. 586591.

    [11] H. Fuketa, K. Hirairi, T. Yasufuku, M. Takamiya, M. Nomura, H.Shinohara, and T. Sakurai, A 12.7-times energy efficiency increase of16-bit integer unit by power supply voltage (VDD ) scaling from 1.2Vto 310 mV enabled by contention-less flip-flops (CLFF) and separatedVD D between flip-flops and combinational logics, in Proc. Int. Symp.

    Low Power Electron. Design, Aug. 2011, pp. 163168.[12] B. H. Calhoun, A. Wang, and A. Chandrakasan, Modeling and sizing

    for minimum energy operation in subthreshold circuits, IEEE J. Solid-State Circuits, vol. 40, no. 9, pp. 17781786, Sep. 2005.

    [13] K. Usami, M. Igarashi, F. Minami, T. Ishikawa, M. Kanazawa, M.Ichida, and K. Nogami, Automated low-power technique exploitingmultiple supply voltages applied to a media processor, IEEE J. Solid-State Circuits, vol. 33, no. 3, pp. 463472, Mar. 1998.

    [14] J. S. Wang, H. Y. Li, C. Yeh, and T. F. Chen, Design techniques forsingle-low-VDD CMOS systems, IEEE J. Solid-State Circuits, vol. 40,no. 5, pp. 11571165, May 2005.

    [15] J. S. Wang, J. S. Chen, Y. M. Wang, and C. Yeh, A 230 mV-to-500 mV 375 KHz-to-16 MHz 32b RISC Core in 0.18?m CMOS,in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, 2007,pp. 294295.