Reducing Energy at the Minimum Energy Operating Point Via Statistical Error Compensation

1328 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 6, JUNE 2014

Reducing Energy at the Minimum Energy OperatingPoint Via Statistical Error Compensation

Rami A. Abdallah, Member, IEEE, and Naresh R. Shanbhag, Fellow, IEEE

Abstract— This paper demonstrates that statistical errorcompensation reduces the energy consumption Emin at theminimum energy operating point (MEOP), which is known tooccur in the subthreshold regime. In particular, the impactof algorithmic noise-tolerance (ANT) [1], in conjunction withfrequency overscaling (FOS) and voltage overscaling, is studiedin the context of an eight-tap finite impulse response (FIR) filterin a 45-nm CMOS process. At the nominal process corner andusing low-Vt devices, we show that the ANT-based FIR filterachieves 20%–47% reduction in Emin and a 1.8×–2.25× increasein the frequency of operation over a conventional (error free)filter operating at its MEOP. This result is achieved via theability of ANT to compensate for a precompensation error rateof 70%–85%. The use of high-Vt devices reduces Emin by 10%.This is due to the reduced effectiveness of FOS and increasedsensitivity of delay to voltage variations. In the presence ofprocess variations, the ANT-based FIR filter reduces Emin by54% over a transistor up-sized design while meeting a fixedthroughput constraint, and a parametric yield of 99.7%.

Index Terms— Error resiliency, minimum energy, subthresh-old, ultralow power (ULP), voltage overscaling.

I. INTRODUCTION

SUBTHRESHOLD designs have been proposed forultralow-power (ULP) applications such as portable med-

ical devices, medical implants, distributed sensor networks,and active radio frequency identifications where energy is ofutmost concern and throughput requirements are relaxed. ULPsystems typically operate at their minimum energy operatingpoint (MEOP) defined via the tuple (Vdd,opt, fopt, and Emin)(see conventional MEOP (MEOPC) in Fig. 1), where Emin,Vdd, opt, and fopt are the minimum achievable energy andits corresponding supply voltage and frequency of opera-tion, respectively. MEOP balances dynamic/active energy andleakage energy and is a well-studied topic in [2]–[4]. Sev-eral integrated circuits (ICs) operating at the MEOP havebeen designed for hearing aids [5], fast Fourier transforms(FFTs) [6], and embedded processors [7]–[9]. These are,however, designed assuming worst case process, voltage,and temperature (PVT) variations, which leads to significantenergy overhead, especially at the MEOP. PVT variations

Manuscript received June 10, 2012; revised November 21, 2012 andMarch 25, 2013; accepted June 11, 2013. Date of publication July 19, 2013;date of current version May 20, 2014.

R. A. Abdallah is with the Visual and Parallel Computing Group, IntelCorporation, Hillsboro, OR 97006 USA (e-mail: [email protected]).

N. R. Shanbhag is with the Coordinated Science Laboratory and Electricaland Computer Engineering (ECE) Department, University of Illinois, Urbana-Champaign, IL 61801 USA (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TVLSI.2013.2271838

Vdd

Energy

Conventional

optddV ,

minE

min,ANTE

optANTV , thV

ANT20%-54%

ANTMEOP

CMEOP

Fig. 1. Energy in the subthreshold regime of conventional and ANT-baseddesigns.

pose a major challenge in the design of subthreshold systems.For example, in a 65-nm CMOS process, process variationsin the subthreshold regime increases the 3σ gate delay bythree orders of magnitude from the nominal case [10]. Circuitlevel techniques, such as body biasing [11], nonratioed logicstyles [12], transistor sizing [12], and architectural techniquessuch as increased pipelining depth [13], have been proposedto mitigate the effect of PVT variations. These techniques,however, result in up to 50% energy overhead from thenominal case.

Error-resiliency techniques based on statistical errorcompensation (SEC) (see Fig. 2) such as algorithmic noise-tolerance (ANT) [1] have been effective in reducing energy indigital signal processing (DSP) kernels by permitting inter-mittent errors at the circuit level and correcting these atthe algorithmic level. This is done with a low overhead byexploiting the input and intermediate data statistics as wellas techniques from estimation and detection theory. Volt-age overscaling (VOS), where the supply voltage is reducedbeyond the critical supply voltage without reducing the clockfrequency, is employed to tradeoff energy with robustness.SEC techniques are then employed to approximately correctthe resulting timing violations, and thus maintain a specificapplication performance metric such as signal-to-noise ratio(SNR). ANT is able to handle very high precompensationerror rates pη (percentage of clock cycles when the mainblock output is in error). ANT in conjunction with VOS inthe superthreshold regime was shown in IC measurements toprovide up to 70% power savings for finite-impulse response(FIR) filters while tolerating an error rate pη > 70% [14], [15].

1063-8210 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

ABDALLAH AND SHANBHAG: REDUCING ENERGY AT THE MEOP VIA SEC 1329

Fig. 2. SEC (a) block diagram and (b) SNR performance at differentprecompensation error rates pη.

ANT and other error-resiliency techniques [16]–[21], however,have not been applied in the subthreshold regime. In thispaper, we focus on SEC techniques because microarchitecturaltechniques, such as RAZOR [16] and EDS [17], can tolerateerror rates less than 0.1%. This limits their applicability inthe subthreshold regime, which is characterized by increasedsensitivity to PVT variations.

This paper studies the impact of ANT on the energyconsumption Emin at the MEOP for an eight-tap FIR filter.In superthreshold systems, VOS is employed to reduce energybecause dynamic energy is the major component of the totalsystem energy. In the subthreshold regime, leakage energy issignificant, and thus frequency overscaling (FOS), where theoperating frequency is increased beyond its critical value, canbe employed as well to reduce energy. In [22], we presentedpreliminary results describing the application of ANT toreduce the Emin of an eight-tap subthreshold MAC unit ina 130-nm CMOS process. The research in [22] was limitedto an application of either VOS or FOS at MEOPC, and wasconducted at the nominal process corner assuming no processvariations. In [23], an ANT-based electrocardiogram (ECG)processor in 45-nm CMOS process was designed to operate inthe subthreshold. The use of ANT was shown to reduce energyat the MEOP by 28% in measurements. In this paper, weexplore the reduction in Emin by optimizing across the entire(Vdd, f ) plane to achieve a global optimum. Furthermore, thisresearch is conducted for an eight-tap direct-form FIR filterin a 45-nm CMOS process node using two different thresholdvoltages, Vth: a high-Vth (HVT) process and a low-Vth (LVT)process, and illustrates the impact of ANT on the MEOP inthe presence of process variations. The use of two differentthreshold voltages changes the contribution of leakage energyat the MEOP and the delay sensitivity to voltage variations,thus impacting the energy efficiency and robustness of SECtechniques.

Fig. 1 shows the results of this paper. We show that:1) the application of ANT results in a new MEOP

(MEOPANT);2) Emin is reduced up to 47% and 10%, in the 45-nm LVT

and HVT processes, respectively, at a precompensationerror rate pη of 85%;

3) the operating frequency fopt is increased by 2.5× and1.2× in the 45-nm LVT and HVT processes, respec-tively, at a precompensation error rate pη of 85%;

4) in the presence of process variations in a LVT process,ANT reduces Emin by 54% on average while maintain-ing a parametric yield of 99.7%.

This paper is organized as follows. Section II presents thebackground on MEOP and SEC techniques, in particular, ANT.Section III studies the effectiveness of ANT in the subthresh-old region where the MEOP typically occurs. Section IVpresents simulation results on an FIR filter in a 45-nm com-mercial CMOS process, which demonstrate the impact of ANTon the MEOP energy consumption in presence of voltageand process variations. Finally, Section V summarizes theconclusion and presents future directions.

II. BACKGROUND

A. Minimum Energy Operating Point

The dominant sources of energy consumption in subthresh-old are dynamic energy (Edyn) and leakage energy (Elkg),which are described by

Eo = Edyn + Elkg

Edyn = αNCVdd2

Elkg = N IOFF Vdd

f(1)

where α is the switching activity factor, N is the numberof gates, C is the output load capacitance per gate, f isthe operating frequency, Vdd is the supply voltage, and IOFFis the OFF-state average leakage current per gate. Here,each gate is modeled as a single equivalent pull-up and anequivalent pull-down device. The subthreshold current, as afunction of gate-to-source and drain-to-source voltage, is givenby

ISUB (VGS, VDS) = Io10VGS−Vth−γ VDS

S (1 − eVDSVT ) (2)

where Io is a average (per gate) reference current, which isproportional to the transistor W/L ratio, S = mVT is theswing factor, γ is the DIBL coefficient, Vth is the thresholdvoltage, and VT is the thermal voltage. Using (2), the ON-state and OFF-state currents for an nMOS transistor are ION =ISUB (Vdd, Vdd) and IOFF = ISUB (0, Vdd), respectively.

Assuming the critical path of the computational kernel hasa logic depth of L gates, the operating frequency f is givenby

f = ION

βLCVdd(3)

where C is the output load capacitance per gate, and β is afitting parameter needed to match the finite signal rise andfall times. The frequency of operation decreases exponentially


Fig. 3. ANT (a) framework and (b) error distributions.

with Vdd in the subthreshold because of the exponentialdependence of ION on Vdd in (2). This leads to an exponentialincrease in leakage energy as seen by substituting (3) in (1)to obtain the following:

Elkg = βNLCVdd2 IOFF

ION= βNLCV2

dd10−Vdd

S (4)

and the total subthreshold energy consumption is given by

Eo = αNCVdd2 + βNLCVdd

210−Vdd

S . (5)

Therefore, reducing Vdd in the subthreshold region decreasesEdyn but increases Elkg exponentially so that the operatingpoint MEOPC in Fig. 1 will exist.

B. Algorithmic Noise-Tolerance

SEC (see Fig. 2(a)) exploits the statistical knowledge ofthe circuit/device fabric error, the input, and the intermediatedata processed by the main block to correct for the outputerror approximately at low overhead such that an applicationperformance metric, such as SNR, is maintained within anapplication-dependent tolerance limit �SNR. The statistics ofthe input and the intermediate data enable SEC to incur a lowoverhead.

VOS [1] is a simple and effective technique to tradeoffenergy with robustness of the main block. In VOS, the supplyvoltage is scaled below the critical voltage Vdd,crit neededfor error-free operation with a fixed operating frequency, i.e.,Vdd = KVOSVdd,crit and f = fcrit where KVOS < 1 is theVOS factor. When the supply voltage is lower than Vdd,crit,the circuit will operate slower than targeted, and thus timingviolations will occur. SEC can be employed to compensate fortiming violations (see Fig. 2) such that: SNR∗ ≈ SNRo andp∗η � po

η, where SNR∗ is the SNR of the error compensatedsystem at an error rate p∗

η, and SNRo is the SNR of theconventional system at an error rate pη ≈ po

η.SEC was first introduced in the form of algorithmic noise-

tolerance [1] (see Fig. 3). ANT incorporates a main block andan estimator. The main block is permitted to make errors, butnot the estimator. The estimator is a low-complexity block(typically 5%–20% of the main block complexity) generatinga statistical estimate of the correct main block output

ya = yo + η (6)

ye = yo + e (7)

where ya is the actual main block output, yo is the error-free main block output, η is the hardware error, ye is theestimator output, and e is the estimation error. The estimator

exhibits an estimation error e because it is an approximateversion of the main block. ANT exploits the difference in thestatistics of η and e [see Fig. 3(b)]. As most computations areleast significant bit first, timing violations because of VOS aregenerally large magnitude most significant bit (MSB) errors.To enhance robustness, it is necessary that when η �= 0, thatη be large compared with e. In addition, the probability of theevent η �= 0, must be small. The final/corrected output of anANT system y is obtained via the following decision rule:

y ={

ya, if |ya − ye| < τ

ye, otherwise(8)

where τ is an application-dependent parameter chosen tomaximize the performance of ANT. Under the conditionsoutlined, it is possible to show that

SNRuc � SNRe � SNRANT ≈ SNRo (9)

where SNRuc, SNRe, SNRANT, and SNRo are the SNRsof the uncorrected main block (η dominates), the estimator(e dominates), the ANT system, and the error-free mainblock (ideal), respectively. Thus, ANT detects and correctserrors approximately, but does so in a manner that satisfiesan application-level performance specification SNR. Thedecision block is designed to be timing error free at allprocess corners and voltages because it is a critical block thatdirectly impacts performance SNR, and typically constitutesless than 5% of the main block complexity. Several low-overhead estimation techniques have been proposed byexploiting data correlation, system architecture, and statisticalsignal processing techniques. ANT has been shown to achieveup to 3× energy savings, while tolerating high error rates(pη > 70%), in theory and in practice via prototype ICdesign [14] for FIR filters in superthreshold applications.

III. IMPACT OF ANT ON THE MEOP

In this section, we study the energy behavior of ANTin the subthreshold regime and study its impact on theMEOP. In the past, we have shown that ANT provides energyefficiency in superthreshold when errors in the main blockare induced by either VOS or a better than worst case(BTWC) design philosophy, where a nominal case designis subjected to a worse case process corner. In the sub-threshold regime, FOS is an effective technique as well. InFOS, Vdd is kept fixed while f is increased beyond fcrit,i.e., Vdd = Vdd,crit and f = KFOS fcrit where KFOS >1 is the FOS factor. FOS results in an error distribu-tion that is similar to VOS or BTWC, but not the same.Unlike VOS, FOS does not lead to a reduction in dynamicenergy, which limits its energy efficiency in the superthresholdregime. FOS can be employed in superthreshold to primarilyachieve a higher throughput. In the subthreshold regime, theleakage energy contributes significantly to the total energy,and thus the reduced clock period of operation because ofFOS leads to not only throughput increase, but also energysavings.

For a specified application-level performance metric, e.g.,maximum SNR loss, we can apply VOS and FOS simultane-ously to save energy in the subthreshold regime. The ANT


Fig. 4. Eight-tap FIR filter: (a) direct-form architecture, and (b) path delaydistribution.

estimator [see Fig. 3(a)] is designed such that it operates errorfree at all voltages and frequencies. This is easy to achievebecause of its lower complexity compared with the main block.The energy of a subthreshold ANT system EANT is given by

EANT = K 2VOS

(1 + αest Nest

αN

)Eo,dyn

+ 1

KFOSKVOS

(1 + Nest

N

)IOFF,Vdd

IOFF,Vdd,crit

Eo,lkg (10)

where Nest is the number of additional gates needed to imple-ment the estimator and the decision block (ANT overhead),and αest is the average switching activity of the Nest nodes.A large class of estimators operate on the high-order bits ofthe input, which usually have a lower switching activity factorthan the low-order bits, and thus αest < α. Several designfactors, such as the hardware error rate of the main block(pη), the application-level error tolerance, and the estimatorcomplexity, determine the total system energy consumption.It will be shown that a new MEOP, MEOPANT characterizedby the tuple (VANT,opt, fANT,opt, and EANT,min), exists, whereVANT,opt ≤ Vdd,opt, fANT,opt > fopt, and EANT,min < Emin.The MEOP energy EANT,min depends on the estimator over-head, the application error tolerance, and delay sensitivity tovoltage variations (process threshold voltage). Next, we illus-trate the energy savings and the different tradeoffs involved insubthreshold ANT-based designs using an FIR filter as a studycase.

IV. SIMULATIONS AND RESULTS

We study the impact of ANT on the minimum energyEmin at MEOP for a 23-b output, eight-tap FIR filter(see Fig. 4(a)) operating in the subthreshold regime. The FIR

Fig. 5. Validation of energy and frequency models for the eight-tap FIR filterin 45-nm CMOS processes with different Vth: (a) energy, and (b) frequencymodels.

filter operates at an error-free critical supply voltage and fre-quency (Vdd,crit, fcrit), and computes y[n] = ∑7

i=0 x[n − i ]hi ,

where x[n] is a 10-b input signal, hi ’s are the 10-b filtercoefficients, and n is the clock cycle/time index. The FIR filterachieves an SNR of 17.1 dB when operating error free. Thefilter employs a ripple-carry-based arithmetic unit architectureand its overall synthesized-gate complexity is 6528 NAND2gates. The path delay distribution is shown in Fig. 4(b). Theripple-carry adder architecture is the favored adder architecturefor MEOP operation because of its low gate complexity,and the relaxed timing requirements in ULP applications. Inaddition, the ripple-carry adder exhibits VOS/FOS-friendlytiming slacks [24]. The filter is simulated in the LVT andHVT 45-nm commercial CMOS processes. The simulationprocedure employed in this paper is described next.

A. Simulation Procedure

We employ the following simulation procedure to computepη, total energy, and study the effect of timing errors on theapplication metric (SNR).

1) Circuit Simulation: We employ circuit simulations usingHSPICE in the 45-nm commercial CMOS process tocharacterize the worst case delay and power of alimited-size gate library (1-bit adder, AND-gate, OR-gate,inverter, and so on) at different voltages and PVT cor-ners. We obtain the best-fit parameters of the analytical


models presented in (3) and (5) to the delay and powervalues obtained for each gate via circuit simulations.

2) Back-Annotated Gate-Level Simulation: We employgate-level simulations of the synthesized filter netlist togenerate the erroneous output y[n] using individual gatedelays obtained from step 1 at Vdd = KVOSVdd,crit andoperating at f = KFOS fcrit .

3) We determine the main block precompensation error ratepη and the SNR of the filter under timing errors bycomparing the erroneous RTL output y to the outputsignal component ys . Here, the hardware error-free out-put yo = ys + no is contaminated by the output filteredsignal noise no only. The SNR is computed as follows:

SNR=10 log

(σ 2

s

σ 2no

+ σ 2e

)=10 log

E[(ys −E[ys])2]

E[(eT −E[eT ])2]

(11)where σ 2

s , σ 2no

, and σ 2e are the output signal variance,

signal-noise variance, and timing-error variance, respec-tively, and eT = y − ys is the total output (signal andtiming) noise.

4) Power Estimation: We compute the overall system leak-age and dynamic energy/power by summing the leakageand dynamic power estimates obtained from step 1,while taking into consideration the average activityfactor α of each gate from step 2.

B. MEOP of (Error Free) Filter

We employ steps 1 and 4 of the simulation procedure toestimate the energy consumption of the conventional (errorfree) FIR filter. We validate the energy consumption bycomparing the estimated values to HSPICE simulations ofthe complete FIR filter with an average switching activityfactor α = 0.1 in the HVT and LVT 45-nm CMOS processes.Fig. 5(a) and (b) shows that the filter energy consumption andthroughput estimates from the analytical model approximatecircuit simulations very well. Several observations can bemade when comparing the energy and throughput of the filterin LVT to that in the HVT process. These are attributed to thethreshold voltage difference between the two processes:

1) Elkg in LVT process is significantly larger (20× innear/superthreshold) than Elkg in HVT process whereasEdyn is almost the same;

2) the total filter energy in LVT process is dominated byleakage even for near/superthreshold supply voltages(Elkg ≈ 4Edyn), whereas Elkg in HVT starts to dominatetotal energy only in the subthreshold regime;

3) the filter in the LVT process achieves higher operatingfrequency than that in the HVT process. The filteroperating frequency starts to decrease significantly forsupply voltages less than 0.38 V in the LVT processand 0.5 V in HVT process, and the corresponding Elkgstarts to increase at similar voltages in each process;

4) the conventional MEOP, i.e., MEOPC, in the LVT andHVT processes are given by (Vdd,opt = 0.38 V, fopt =240 MHz, Emin = 1022 fJ) and (Vdd,opt =0.48 V, fopt = 80 MHz, Emin = 335 fJ), respectively.

Fig. 6. Iso-pη contours in the Vdd– f plane in the 45-nm LVT (solid lines)and HVT (dotted lines) processes.

In other words, the minimum energy consumption Eminin the HVT process is lower than that of the LVTprocess even though it occurs at a higher supply voltagebecause of the significant decrease in Elkg. The optimumfrequency for the HVT process is lower as comparedwith that in the LVT process because of the exponentialdependence of delay on threshold voltage.

C. Energy Versus pη and pη Versus SNR

In this section, we characterize the voltage–frequency oper-ating points (V , f ) required to achieve a given precompen-sation error rate pη in the filter main block and compare theeffect of FOS versus VOS on pη. The simulation procedurein Section IV-A is employed to characterize the (Vdd, f )pairs to obtain the iso-pη contours in the 45nm LVT andHVT processes (see Fig. 6). Vertical translation (fixed Vdd)in the Vdd– f plane from the pe = 0 contour correspondsto an application of FOS. Similarly, a horizontal translation(fixed f ) in the Vdd– f plane from the pe = 0 contourcorresponds to an application of VOS. Arbitrary translationscorrespond to a joint application of VOS and FOS. We can seethat as the critical supply voltage reduces, the horizontal andvertical gaps between different pη curves reduce because ofthe increased sensitivity of delay as supply voltage approachesthe threshold voltage. In addition, the HVT process has largerdelay sensitivity as compared with the LVT process becauseof its higher threshold voltage.

To further compare FOS and VOS at MEOPC, Fig. 7 (a)–(c)shows pη at the output of the main block as well as the mainblock energy consumption under VOS and FOS in the LVT andHVT processes. Under FOS, pη is the same for the LVT andHVT processes because errors due to FOS are only a functionof the architecture and are independent of the underlying cir-cuit and process fabric. Under VOS, the LVT process, however,has a lower pη than HVT process for same KVOS. This isbecause the

∣∣Vdd,opt − Vth∣∣ at the MEOPC for an LVT process


Fig. 7. Precompensation error rate and energy characterization of the eight-tap FIR filter under VOS (x-axis ≤ 1) and FOS (x-axis ≥ 1): (a) precompen-sation error rate pη versus KVOS and KFOS, (b) normalized energy versusKVOS and KFOS, and (c) normalized energy versus pη (error-compensationoverhead is not included).

is less than that for the HVT process, which leads to lower per-centage increase in delay because of VOS in the LVT process.This can be seen in Fig. 5(b) where the rate of change (slope)of frequency at the MEOPC in the LVT process is less thanthat in the HVT process. This also causes the energy savingsdue to VOS to be more than that due to FOS at the same pη inthe LVT process only but not in the HVT process, as shown inFig. 7(c). In addition, FOS is more robust than VOS at a givenpη, as shown in Fig. 7(a) by comparing the slope of the respec-tive pη curves. Small variations in KVOS lead to large varia-tions in pη unlike variations in KFOS. This is due to the expo-nential relation between voltage and delay in the subthresholdregime.

The percentage energy reduction in Emin at the MEOPCin the LVT and HVT processes because of VOS is equalfor the same KVOS [see Fig. 7(b)] because energy reductionpercentage depends on KVOS and is independent of supply orthreshold voltage, unlike the absolute energy. Under FOS, thepercentage energy reduction is, however, larger for the LVTprocess than the HVT process for the same KFOS becausetotal energy in the LVT process at MEOPC is increasinglydominated by Elkg compared with the HVT process (as shownin Fig. 5(a) and discussed in Section IV-B) and FOS reducesonly Elkg.

To see the effect of ANT on the application performancemetric SNR, we employ a reduced-precision redundancy(RPR) version of the main block filter as an estimator [25],as shown in Fig. 8(a). The main filter output precision is23-b with 10-b input and coefficients, and its architecture isshown in Fig. 4(a), whereas the RPR estimator block hassimilar architecture to the main filter while processing onlyBe MSBs of the main filter with 10-b inputs and coefficients(Be < 10) to generate an estimated output having 2Be +3 bits.

Fig. 8. SNR versus error rate for the eight-tap RPR ANT-based filter withdifferent estimator precisions (Be): (a) architecture and (b) SNR performanceversus pη (dotted lines show the SNR achieved by the estimator only).

ANT estimation and correction circuits are operated at thesame voltage and frequency as the main block. They, how-ever, do not have the same timing error rates as the mainblock because of their reduced complexity. In this case, theestimator has lower precision than the main block, resultingin greater timing slack. We follow the simulation procedureoutlined previously in Section IV-A to estimate the SNR underhardware errors. Fig. 8(b) shows the SNR of the uncorrected(conventional) filter and that of the ANT-based filter withdifferent values of estimator precision Be. The conventionalfilter’s SNR drops catastrophically as pη increases above 0.1%while the SNR of the ANT-based filter remains within 0.8 dBof the error-free output for pη values up to 70% for Be = 5(point B). Higher precision estimators, e.g., (ANT, Be = 6),start performing worse than lower precision estimators (ANT,Be = 3, 4, 5) for pη > 0.5, as shown in Fig. 8. This is becausehigh-precision estimators start making more errors than theirlow-precision counterparts because of their longer critical pathdelay.

Next, we study the energy consumption of the ANT-basedfilter while accounting for both estimator and main blockenergy.

D. MEOP of ANT-Based Filter at Nominal Process Corner

The pη-optimal (in the sense of maximizing pη) ANTconfiguration (pη, Best) for a specific value of SNR loss canbe shown from Fig. 8. For example, for an SNR loss of0, 0.2, 0.8, and 3.5 dB, the pη-optimal ANT configurationsare Conventional (0, 0), A (0.4, 6), B (0.7, 5), and C (0.85, 4),respectively. Fig. 9 shows the plots of the total energy,including overhead of error compensation, for each pη-optimal


Fig. 9. Total energy consumption (including error-compensation overhead)of the ANT-based FIR filter in the (a) LVT 45-nm CMOS process and(b) HVT 45-nm CMOS process.

TABLE I

COMPARISON OF MEOP PARAMETERS FOR CONVENTIONAL AND

ANT-BASED FILTERS IN THE 45-nm LVT PROCESS

ANT configurations using the (Vdd, f ) pairs from the iso-pη

contours of Fig. 6 for both the LVT and HVT processes. TheMEOP parameters of each configuration are shown in Tables Iand II for the LVT and HVT processes, respectively. WithFig. 9 and Tables I and II, we conclude the following:

1) ANT reduces the energy consumption Emin at the MEOPfor both LVT and HVT processes. Reduction in Emin is

TABLE II

COMPARISON OF MEOP PARAMETERS FOR CONVENTIONAL AND

ANT-BASED FILTERS IN THE HVT 45-nm CMOS PROCESS

typically greater in the LVT process than in the HVTprocess.

2) The (Vdd,opt, fopt) pairs at MEOPANT are different foreach configuration ( pη, Best), and different from that ofthe conventional system. This shows that MEOPANT isa unique operating point.

3) The supply voltage Vdd,opt at MEOPANT is lower thanVdd,opt at MEOPC, whereas fopt at MEOPANT is greaterthan fopt at MEOPC. This implies that the ANT-basedfilter not only reduces energy at its MEOP, but doesso by operating at a reduced supply voltage and higheroperating frequency (and hence throughput). Throughputenhancement factor is 1.8× and 2.25× at pη = 0.7 and0.85, respectively in the LVT process.

4) Relative to the conventional system operating at a fixedSNR of 17.1 dB, ANT achieves up to 38% and 47%reduction in Emin, for an SNR loss of 0.8 and 3.5 dB,respectively, in the LVT process. For the HVT process,the maximum reduction in Emin is only 10%.

5) For identical SNRs (no SNR loss), ANT achieves anEmin reduction ranging from 4% to 29% in the LVTprocess. The maximum reduction in Emin is only 3.9%in the HVT process.

At pη = 0.4 in the HVT process, ANT results in 11%energy overhead at MEOP, because error correction over-head offsets the energy savings obtained by VOS and FOS.As voltage reduces below 0.43 V, all ANT configurationsshow energy savings compared with the conventional design(see Fig. 9(b)) because leakage energy becomes more dom-inant allowing FOS in conjunction with VOS to show moresavings.

An important factor to consider in Fig. 9 is that theenergy curves under ANT are flatter than those of theconventional error-free design, representing that ANT-baseddesigns are less sensitive to Vdd variations. This showsthat statistical error compensation saves considerable energyand enhances robustness in energy-constrained subthresholdapplications.

E. MEOP of ANT-Based Filter Under Process Variations

It is well known that MEOP designs suffer from processvariations. Timing errors induced by process variations


Fig. 10. FIR filter frequency distribution under process variations in the LVT 45-nm CMOS process using: (a) minimum-size (Wmin), and (b) 1.6-Wmintransistors.

severely affect the output SNR. Delay variations are inducedby within-die (WID) or die-to-die (D2D) variations. Globalspeed binning together with global supply voltage and bodybiasing are common techniques to deal with D2D and WIDvariations [26]. While these techniques have been effective inface of D2D variations at low overhead because of the globalnature of variations, dealing with WID variations remains achallenge as they require local process monitors and variousvoltage and body bias knobs to be locally adjusted. In thispaper, we evaluate the robustness benefits of SEC in face ofdelay variations because of WID. A large portion of WIDvariations are due to random dopant fluctuations (RDF), whichcause large variations in threshold voltage [27]. Increasing thetransistor sizes will reduce RDF at the expense of an increasedenergy consumption. Using minimum-size (Wmin) transistorswill guarantee the lowest energy consumption at MEOP butwill incur a loss of yield if the nominal/target performance isnot met.

To simulate the effect of process variations on performanceand energy, the delay distributions of various gates used in thefilter were obtained via Monte Carlo simulations in the 45-nmLVT commercial CMOS process with WID variations enabled.These delay distributions are sampled to obtain differentinstances of the filter. These instances are then simulatedusing back-annotated gate-level simulation to determine theerror-free operating frequency of each filter instance. Fig. 10shows the frequency distribution of the filter under processvariations at different supply voltages and transistor widths.At the MEOP of the conventional design in Fig. 10(a), there isaround 50% spread in the filter frequency of operation becauseof WID variations. If we need to guarantee an operatingfrequency equal to the nominal operating frequency of theminimum-sized design fopt,nom = 240 MHz at MEOP underWID process variations, the transistor sizes on the criticalpaths will have to be increased by at least 60% to maintain aconstant parametric yield of 99.7%, i.e., a conventional filterdesign where transistors on critical paths have been up-sizedby a factor of 1.6 under process variations achieves the sameparametric yield as one employing minimum-sized transistors

Fig. 11. Energy under process variations for the eight-tap FIR filter usingup-sized (1.6-Wmin) design and minimum-sized (Wmin) ANT-based design.

and operating at the nominal corner. We refer to this up-sizeddesign as 1.6-Wmin design.

Using HSPICE power estimates for each constituent gate ofeach filter instance and taking into consideration the switchingactivity factor of each gate, the energy consumption of the up-sized (1.6-Wmin) conventional filter design and the minimum-sized ANT-based filter design including error compensationoverhead are shown in Fig. 11. The corresponding energydistributions at MEOP are shown in Fig. 12.

On average, there is a 4.5% increase in Emin to guarantee anoperation frequency of fopt,nom under process variations in aconventional filter design. On the other hand, a minimum-sizedfilter, which employs ANT with FOS to meet throughput andcorrect for timing violations, achieves 39% and 54% averagereduction in Emin, when Be = 5 and Be = 4, respectively(see Fig. 12). These results show the benefits of SEC inreducing Emin under process variations in the subthresholdregime while guaranteeing a desired parametric yield.


Fig. 12. Energy distributions at the MEOP of the minimum-sized (nominal)filter design, up-sized design, and ANT-based minimum-sized filter designswith Be = 4 and 5 (error compensation overhead is included).

V. CONCLUSION

This paper has demonstrated that SEC reduces the energyat the MEOP. It motivates the need for a systematic energyoptimization framework incorporating not just the standarddesign variables (supply voltage, clock frequency, architecturalparameters, and others), but also error-resiliency variables(error rate, error probability mass functions/distribution, andothers). Such research needs to evaluate the robustness ver-sus energy tradeoffs explicitly taking into consideration theinterconnect voltage-delay relation in subthreshold, soft errors,and the transistor minimum voltage, especially given that theoptimal supply voltage dictated by SEC is lower than that inconventional designs. Nevertheless, our recent prototype ICfor electrocardiogram processing in subthreshold [23] showsvia measurements that the achievable energy savings at MEOPvia SEC are within the range presented in this paper.

REFERENCES

[1] R. Hegde and N. R. Shanbhag, “Soft digital signal processing,” IEEETrans. Very Large Scale Integr. (VLSI) Syst., vol. 9, no. 6, pp. 813–823,Dec. 2001.

[2] B. H. Calhoun, A. Wang, and A. Chandrakasan, “Modeling and sizingfor minimum energy operation in subthreshold circuits,” IEEE J. Solid-State Circuits, vol. 40, no. 9, pp. 1778–1786, Sep. 2005.

[3] B. Zhai, D. Blaauw, D. Sylvester, and K. Flautner, “The limit of dynamicvoltage scaling and insomniac dynamic voltage scaling,” IEEE Trans.Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 11, pp. 1239–1252,Nov. 2005.

[4] J. Kwong and A. P. Chandrakasan, “Variation-driven device sizingfor minimum energy sub-threshold circuits,” in Proc. ISLPED, 2006,pp. 8–13.

[5] C. H.-I. Kim, H. Soeleman, and K. Roy, “Ultra-low-power DLMSadaptive filter for hearing aid applications,” IEEE Trans. Very LargeScale Integr. (VLSI) Syst., vol. 11, no. 6, pp. 352–357, Dec. 2003.

[6] A. Wang and A. Chandrakasan, “A 180-mV subthreshold FFT processorusing a minimum energy design methodology,” IEEE J. Solid-StateCircuits, vol. 40, no. 1, pp. 310–319, Jan. 2005.

[7] B. H. Calhoun and A. P. Chandrakasan, “Ultra-dynamic voltage scaling(UDVS) using sub-threshold operation and local voltage dithering,”IEEE J. Solid-State Circuits, vol. 41, no. 1, pp. 238–245, Jan. 2006.

[8] S. Hanson, M. Seok, Y.-S. Lin, Z. Y. Foo, D. Kim, Y. L. Lee, N. Liu,D. Sylvester, and D. Blaauw, “A low-voltage processor for sensingapplications with picowatt standby mode,” IEEE J. Solid-State Circuits,vol. 44, no. 4, pp. 1145–1155, Apr. 2009.

[9] B. Zhai, S. Pant, L. Nazhandali, S. Hanson, J. Olson, A. Reeves,M. Minuth, R. Helfand, T. Austin, D. Sylvester, and D. Blaauw, “Energy-efficient subthreshold processor design,” IEEE Trans. Very Large ScaleIntegr. (VLSI) Syst., vol. 17, no. 8, pp. 1127–1137, Aug. 2009.

[10] S. Hanson, B. Zhai, D. Blaauw, D. Sylvester, A. Bryant, and X. Wang,“Energy optimality and variability in subthreshold design,” in Proc.ISLPED, 2006, pp. 363–365.

[11] T. Chen and S. Naffziger, “Comparison of adaptive body bias (ABB)and adaptive supply voltage (ASV) for improving delay and leak-age under the presence of process variation,” IEEE Trans. VeryLarge Scale Integr. (VLSI) Syst., vol. 11, no. 5, pp. 888–899,Oct. 2003.

[12] N. Verma, J. Kwong, and A. P. Chandrakasan, “Nanometer MOSFETvariation in minimum energy subthreshold circuits,” IEEE Trans. Elec-tron Devices, vol. 55, no. 1, pp. 163–174, Jan. 2008.

[13] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, “Analysis andmitigation of variability in subthreshold design,” in Proc. Int. Symp.Low Power Electron. Design, 2005, pp. 20–25.

[14] R. Hegde and N. R. Shanbhag, “A voltage overscaled low-power digitalfilter IC,” IEEE J. Solid-State Circuits, vol. 39, no. 2, pp. 388–391,Feb. 2004.

[15] E. P. Kim, D. J. Baker, S. Narayanan, D. L. Jones, and N. R. Shanbhag,“Low power and error resilient PN code acquisition filter via statisticalerror compensation,” in Proc. IEEE (CICC), Sep. 2011, pp. 1–4.

[16] S. Das, D. Roberts, S. Lee, S. Pant, D. Blaauw, T. Austin, K. Flautner,and T. Mudge, “A self-tuning DVS processor using delay-error detectionand correction,” IEEE J. Solid-State Circuits, vol. 41, no. 4, pp. 792–804,Apr. 2006.

[17] K. A. Bowman, J. W. Tschanz, N.-S. Kim, J. C. Lee, C. B. Wilk-erson, S.-L. L. Lu, T. Karnik, and V. K. De, “Energy-efficientand metastability-immune resilient circuits for dynamic variation tol-erance,” IEEE J. Solid-State Circuits, vol. 44, no. 1, pp. 49–63,Jan. 2009.

[18] M. R. Choudhury and K. Mohanram, “Approximate logic circuits forlow overhead, non-intrusive concurrent error detection,” in Proc. Conf.Design, Autom. Test Eur., 2008, pp. 903–908.

[19] V. Papirla, A. Jain, and C. Chakrabarti, “Low power robust signalprocessing,” in Proc. 14th ACM/IEEE Int. Symp. Low Power Electron.Design, Aug. 2009, pp. 303–306.

[20] G. Karakonstantis, G. Panagopoulos, and K. Roy, “HERQULES: Systemlevel cross-layer design exploration for efficient energy-quality trade-offs,” in Proc. 16th ACM/IEEE Int. Symp. Low Power Electron. Design,Aug. 2010, pp. 117–122.

[21] P. Gupta and R. K. Gupta, “Underdesigned and opportunisticcomputing,” in Proc. 20th IEEE Asian Test Symp., Sep. 2011,pp. 498–499.

[22] R. A. Abdallah and N. R. Shanbhag, “Minimum-energy operation viaerror resiliency,” IEEE Embedded Syst. Lett., vol. 2, no. 4, pp. 115–118,Dec. 2010.

[23] R. A. Abdallah and N. R. Shanbhag, “A 14.5 fJ/cycle/k-gate, 0.33VECG processor in 45-nm CMOS using statistical error compensation,”in Proc. IEEE CICC, Sep. 2012, pp. 1–4.

[24] Y. Liu, T. Zhang, and K. Parhi, “Computation error analysis in digitalsignal processing systems with overscaled supply voltage,” IEEE Trans.Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 4, pp. 517–526,Apr. 2010.

[25] B. Shim, S. R. Sridhara, and N. R. Shanbhag, “Reliable low-powerdigital signal processing via reduced precision redundancy,” IEEE Trans.Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 5, pp. 497–510,May 2004.

[26] J. Tschanz, S. Narendra, R. Nair, and V. De, “Effectiveness ofadaptive supply voltage and body bias for reducing impact of para-meter variations in low power and high performance microproces-sors,” IEEE J. Solid-State Circuits, vol. 38, no. 5, pp. 826–829,May 2003.

[27] H. Mahmoodi, S. Mukhopadhyay, and K. Roy, “Estimation of delayvariations due to random-dopant fluctuations in nanoscale CMOS cir-cuits,” IEEE J. Solid-State Circuits, vol. 40, no. 9, pp. 1787–1796,Sep. 2005.


Rami A. Abdallah (M’12) received the B.Eng.degree with the highest distinction from the Amer-ican University of Beirut (AUB), Beirut, Lebanon,in 2006, and the M.S. and Ph.D. degrees from theUniversity of Illinois at Urbana-Champaign, Urbana,IL, USA, in 2008 and 2012, respectively, all inelectrical and computer engineering.

He has been with the Visual and Parallel Com-puting Group, Intel Corporation, as a Silicon Archi-tecture Engineer, on the design of many-integratedcore chips, since June 2012. From 2006 to 2012,

he was a Research Assistant with the Coordinated Science Laboratory. From2007 to 2009, he was with Texas Instruments, Inc., Dallas, TX, USA, withthe applications and systems research and development labs, where he wasinvolved in the design of communication receivers for long-term evolutionand WiMAX and on-chip DC-DC conversion. His current research interestsinclude the design of integrated circuits and systems for communications,digital signal processing, and general purpose computing. He was on theDean’s honor list at AUB from 2002 to 2006, and was selected amongthe world’s top students to participate in the Research Science Institute,Massachusetts Institute of Technology, Cambridge, MA, USA, in 2001.

Dr. Abdallah was a recipient of the Charli S. Korban Outstanding Under-graduate Award in 2006, the HKN Honor Society Scholarship Award in 2009,the Mac Van Valkenburg Outstanding Researcher Award in 2012, and the 2012Yi-Min Wang and Pi-Yu Chung Outstanding Student Award.

Naresh R. Shanbhag (F’06) received the B.Tech.degree from the Indian Institute of Technology, NewDelhi, India, in 1988, the M.S. degree from WrightState University, Dayton, OH, USA, in 1990, andthe Ph.D. degree from the University of Minnesota,Minneapolis, MN, USA, in 1993, all in electricalengineering.

He was with the AT&T Bell Laboratories, MurrayHill, NJ, USA, from 1993 to 1995, where he wasthe Lead Chip Architect for AT&T’s 51.84 Mb/stransceiver chips over twisted-pair wiring for asyn-

chronous transfer mode (ATM)-LAN and very high-speed digital subscriberline chip-sets. Since August 1995, he has been with the Department ofelectrical and computer engineering, and the Coordinated Science Laboratory,University of Illinois at Urbana-Champaign, Urbana, IL, USA, where heis currently a Jack Kilby Professor of electrical and computer engineering.He was with National Taiwan University, Taipei, Taiwan, in 2007. He haspublished more than 200 publications. He holds 12 U.S. patents. He isthe co-author of the research monograph Pipelined Adaptive Digital Filters(Kluwer Academic Publishers, 1994). His current research interests includethe design of robust and energy-efficient integrated circuits and systems forcommunications including VLSI architectures for error-control coding, andequalization, noise-tolerant integrated circuit design, error-resilient architec-tures and systems, and system-assisted mixed-signal design.

Dr. Shanbhag received the 2010 Richard Newton GSRC Industrial ImpactAward, the 2006 IEEE Journal of Solid-State Circuits Best Paper Award, the2001 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI)SYSTEMS Best Paper Award, the 1999 IEEE Leon K. Kirchmayer BestPaper Award, the 1999 Xerox Faculty Award, the Distinguished Lecturershipfrom the IEEE Circuits and Systems Society in 1997, the National ScienceFoundation CAREER Award in 1996, and the 1994 Darlington Best PaperAward from the IEEE Circuits and Systems Society. He served as an AssociateEditor for the IEEE TRANSACTION ON CIRCUITS AND SYSTEMS—PART II:EXPRESS BRIEFS from 1997 to 1999 and the IEEE TRANSACTIONS ON VERY

LARGE SCALE INTEGRATION (VLSI) SYSTEMS from 1999 to 2002 and2009 to 2011. He is the General Co-Chair of the 2012 IEEE InternationalSymposium on Low-Power Design (ISLPED), was the Technical ProgramCo-Chair of the 2010 ISLPED, and served on the technical program (wirelinesubcommittee) committee of the International Solid-State Circuits Conferencefrom 2007 to 2011. He lead the Alternative Computational Models in the Post-Si Era research theme, in the DOD and Semiconductor Research Corporation(SRC) sponsored Microelectronics Advanced Research Corporation Centerunder their Focus Center Research Program from 2006 to 2012. Since January2013, he has been the Founding Director of the Systems on Nanoscale Infor-mation fabriCs Center, a five-year multi-university center funded by DARPAand SRC under the STARnet phase of FCRP. In 2000, he co-founded andserved as the Chief Technology Officer with Intersymbol Communications,Inc., a venture-funded fabless semiconductor start-up that provides DSP-enhanced mixed-signal ICs for electronic dispersion compensation of OC-192optical links. In 2007, he was with Intersymbol Communications, Inc., whichwas acquired by Finisar Corporation, Inc.

Reducing Energy at the Minimum Energy Operating Point Via Statistical Error Compensation

Documents

Transcript of Reducing Energy at the Minimum Energy Operating Point Via Statistical Error Compensation