Synthesizing Parsimonious Inexact Circuits through ...al4/lingamneni-tecs13a.pdf · 93 Synthesizing...

�

�

�

�

�

�

�

�

93

Synthesizing Parsimonious Inexact Circuits through ProbabilisticDesign Techniques

AVINASH LINGAMNENI, Rice University and CSEM SACHRISTIAN ENZ, CSEM SA and EPFLKRISHNA PALEM, Rice University and Nanyang Technological UniversityCHRISTIAN PIGUET, CSEM SA

The domain of inexact circuit design, in which accuracy of the circuit can be exchanged for substantial cost(energy, delay, and/or area) savings, has been gathering increasing prominence of late owing to a growing de-sire for reducing energy consumption of the systems, particularly in the domain of embedded and (portable)multimedia applications. Most of the previous approaches to realizing inexact circuits relied on scaling ofcircuit parameters (such as supply voltage) taking advantage of an application’s error tolerance to achievethe cost and accuracy trade-offs, thus suffering from acute drawbacks of considerable implementation over-heads that significantly reduced the gains. In this article, two novel design approaches called ProbabilisticPruning and Probabilistic Logic Minimization are proposed to realize inexact circuits with zero hardwareoverhead.Extensive simulations on various architectures of critical datapath elements demonstrate thateach of the techniques can independently achieve normalized gains as large as 2x–9.5x in energy-delay-areaproduct for relative error magnitude as low as 10−4% – 8% compared to corresponding conventional correctcircuits.

Categories and Subject Descriptors: B.8.0 [Performance and Reliability]: General

General Terms: Reliability, Algorithms

Additional Key Words and Phrases: Inexact circuit design, error-tolerant systems, probabilistic pruning,probabilistic logic minimization, energy-accuracy trade-off, VLSI design, low power/energy

ACM Reference Format:Lingamneni, A., Enz, C., Palem, K., and Piguet, C. 2013. Synthesizing parsimonious inexact circuits throughprobabilistic design techniques. ACM Trans. Embed. Comput. Syst. 12, 2s, Article 93 (May 2013), 26 pages.DOI:http://dx.doi.org/10.1145/2465787.2465795

1. INTRODUCTION

The notion of exact computation, where outputs of the computational element (cir-cuit) have precise deterministic values, has been pervasive in the computing domainfor many decades owing to the overwhelming success of the integrated circuit designusing reliable transistors, particularly in Complementary Metal-Oxide-Semiconductor(CMOS) technology. However, it is facing serious challenges today [Borkar 2005] asdiminishing transistor sizes driven by Moore’s law are leading to increasing processvariations arising as lithographic scaling lags behind device scaling and due to in-creasing parameter variations owning to perturbations such as (thermal) noise [Kish2002]. While one obvious way to counter the antagonistic effects of this scaling inducedinexactness is through error-correction mechanisms [Ernst et al. 2003; Ray et al. 2001],

Author’s address: A. Lingamneni; email: [email protected] to make digital or hard copies of part or all of this work for personal or classroom use is grantedwithout fee provided that copies are not made or distributed for profit or commercial advantage and thatcopies show this notice on the first page or initial screen of a display along with the full citation. Copyrightsfor components of this work owned by others than ACM must be honored. Abstracting with credit is per-mitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any componentof this work in other works requires prior specific permission and/or a fee. Permissions may be requestedfrom Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)869-0481, or [email protected]© 2013 ACM 1539-9087/2013/05-ART93 $15.00DOI:http://dx.doi.org/10.1145/2465787.2465795

ACM Transactions on Embedded Computing Systems, Vol. 12, No. 2s, Article 93, Publication date: May 2013.

�

�

�

�

�

�

�

�

93:2 A. Lingamneni et al.

Fig. 1. Illustrative example showing the benefits of inexact design in an SAR imaging application (a) Im-age processed by conventional correct electronics; (b) image processed by inexact circuit design; (c) imageprocessed by value of information-based inexact circuit design. (Courtesy: George et al. [2006])

a radically different approach was developed by Palem [2003a] which we refer to nowas inexact circuit design.1 In our sense, inexact circuit design refers to an approach to,realizing information processing frameworks—transistors, gates, data-path elementsor more macroscopic engines—which are deliberately designed to be erroneous andused as such without adding any error-correction or compensatory mechanisms in re-turn for savings in energy, performance, and/or area. This approach has been receiv-ing increasing prominence of late as a consequence of the variations/perturbations juststated and, significantly, the quest for ultra-low energy systems emanating from thedesire for a longer battery life. To understand this need for mitigating the rising In-formation Technology (IT) sector’s carbon footprint, see Bronk et al. [2010]. Inexactcircuits are parsimonious in terms of (physical) implementation and cost (in terms ofenergy, delay, and/or area) much less than their conventional correct counterparts. Im-plementing such parsimonious inexact circuits through innovative design approachesis the central focus of this article. We show that incorporating the notion of value ofinformation or “significance” of computational blocks in realizing such inexact circuitswill help glean significant savings in the cost. An illustrative example demonstratingthe benefits of inexact circuit based systems is shown in Figure 1, where the output of aSynthetic Aperture Radar (SAR) imaging application for various design approaches ispresented [George et al. 2006]. As evident from these figures, the output of the inexactcircuit design guided by the principle of value of information is barely distinguishablefrom the output generated by conventional correct circuit design with an energy gainof a multiplicative factor of 5.

The main contributions of this article are summarized below.

— A comprehensive retrospect of existing inexact circuit design techniques taking ad-vantage of the principle of trading accuracy for cost (energy, area, and/or delay) sav-ings, solely using physical-level (voltage) overscaling techniques while highlightingtheir significant drawbacks (refer to Section 1.3).

— Two novel zero overhead and technology-independent design approaches calledProbabilistic Pruning and Probabilistic Logic Minimization are proposed to real-ize inexact circuits and are shown to overcome all of the drawbacks of the existingdesign techniques while offering significantly more savings than any of the conven-tional approaches.

1The phrase inexact circuits used in this article is an umbrella term for the previously proposed probabilisticcircuits [Chakrapani et al. 2007; George et al. 2006], approximate circuits [Chakrapani et al. 2008], andstochastic circuits [Narayanan et al. 2010].


�

�

�

�

�

�

�

�

Synthesizing Parsimonious Inexact Circuits 93:3

— A novel synthesis-based CAD framework incorporating the proposed techniques fordesigning inexact circuits achieving a faster time to design and fabricate than afull-custom design flow needed for most of the previous works in literature.

— Extensive experimental results conclusively demonstrating and validating the po-tential savings in energy, delay, and area obtained by the proposed techniques aredescribed. In the context of various datapath elements such as adders and multi-pliers, savings as large as 2x–9.5x in energy-delay-area product with correspondingrelative error magnitude percentage as low as 10−4%–8% have been achieved.

To the best of our knowledge, this is one of the first attempts to focus exclusively onthe innovations at the architecture and logic-level abstractions for inexact circuits. Wewish to clarify that the focus of this article is to demonstrate the prospects at these lay-ers of abstraction through preliminary (yet fairly rigorous and conclusive) experimen-tal simulations, while the detailed mathematical optimization framework along witha physical validation of a bigger system would be the focus of our subsequent articles.These techniques can be viewed as partial inroads to achieving the ultimate goal ofdeveloping a more general and rigorous framework for designing optimal application-specific inexact circuits.

1.1. A Retrospect of Inexact Circuit Design

The underlying principle for this domain of inexact circuit design is as follows. Errorin the circuits can be viewed as a commodity that can be traded for substantial cost(energy, delay, area, etc.) savings as opposed to being viewed as an impediment inthe applications that can tolerate/benefit from such error. This shift towards inexactcircuit design has been more noticeable in the field of embedded, multimedia, andDSP systems and in application domains of growing interest, such as recognition anddata mining [Dubey 2005]. In such domains, accuracy of the output of circuits canbe relaxed depending on the error tolerance or resilience of the application, whichcan be attributed to many factors: (a) The “cognitive filling” capabilities of the endsystems (such as human sensory systems) which consume the output. These systemshave an underlying architecture which aids them in realizing useful results even fromunreliable or imprecise components. (b) This is because the underlying algorithms areoften aggregative in that they inherently possess a mechanism through which anyoutput within a particular bound is equally acceptable as the single “golden” output.

In general, applications can be broadly classified into three types.

(1) Applications that benefit from the errors (or perturbations), in particular, proba-bilistic algorithms such as randomized test for primality, for example, see Rabin[1976] and Ding and Rabin [2002];

(2) Applications that can tolerate but do not benefit from errors, in particular, buildingblocks of most DSP and multimedia systems.

(3) Applications that cannot tolerate any errors, in particular, control logic of safetycritical applications used in automobiles or aviation.

Traditional circuit design methodology has mainly focused on designing error-free sys-tems with correct outputs or following in the rich tradition of von Neumann [1956],innovative signal-processing approaches employing error-correction/compensationschemes [Ernst et al. 2003; Hegde and Shanbhag 1999, 2001] when necessary toovercome the perturbations (or errors) in the circuits. In sharp contrast to thisover-engineered circuit design, Palem [2003a] advocated the philosophy of designingadequately-engineered systems which use erroneous circuits for applications which


�

�

�

�

�

�

�

�


Fig. 2. Timeline of important papers and innovations in the domain of inexact circuit designs.

can tolerate/benefit from such error in exchange for significant cost (typically energy)savings. Basically, Palem [2003b, 2005] posited a connection between the error in aprobabilistic computing setting and the energy consumption through the principles ofthermodynamics.

A CMOS realization of this principle called PCMOS (Probabilistic CMOS) and ananalytical model for Energy-Probability (E-P) relationships of PCMOS devices weregiven in Cheemalavagu et al. [2004] and Korkmaz et al. [2006]. It was later extendedto realize system-level applications through an SoC architecture [Chakrapani et al.2007] and through a programmable multi-core architecture called ERSA (Error Re-silient System Architecture) [Leem et al. 2010] that combines one reliable processorcore with a large number of unreliable counterparts. We observe that inexact design inthe preceding sense is different from the concepts used in approximate signal process-ing [Ludwig et al. 1996; Nawab et al. 1997] in that the latter uses approximate (algo-rithmic) processing techniques to implement low-cost DSP primitives using error-freecircuits, whereas the central focus of our approach is on gleaning cost benefits by usingerror-prone or inexact circuits.

The essential principle behind inexact design is to use the elements that are impre-cise or erroneous without correcting or compensating for error, invariably in return forsignificant savings. Thus, building on the foundations Palem [2003a, 2003b], this ideahas been applied with success both when the source of error is due to probabilistic noisycontexts [Chakrapani and Palem 2010; Cheemalavagu et al. 2004; George et al. 2006;Palem et al. 2009b], and through deterministic mechanisms induced typically by vary-ing the clock speeds of critical elements through voltage overscaling [Banerjee et al.2007; Chakrapani et al. 2008]. Since the advent and the demonstration of the promise


�

�

�

�

�

�

�

�


Fig. 3. Classification of innovations at various design-level abstractions for inexact systemimplementations.

of this idea, several other impressive results have since been achieved [Chippa et al.2010; Hoffmann et al. 2011; Kim et al. 2009; Mohapatra et al. 2009; Varatkar andShanbhag 2006].

A timeline capturing important publications that shaped the current domain of in-exact circuit design has been shown in Figure 2.

The rest of the article is organized as follows: Section 1 continues to discuss andclassify the extensive literature in the domain of inexact system design into differ-ent layers of abstraction highlighting the novelty of the proposed techniques and thedrawbacks of the existing design methodologies. In Section 2, useful metrics for char-acterizing and analyzing inexact designs are presented, including the error metricsand the notion of value of information metric quantified through the significance ofa node or circuit component. The proposed architectural-level technique probabilis-tic pruning and logic-level technique probabilistic logic minimization are presentedin Section 3 and Section 4, respectively, along with detailed algorithms and illustra-tive examples. In Section 5, the proposed synthesis-based experimental frameworkis presented, and the results of the proposed techniques are shown. Also, a detailedanalysis of the characteristics and advantages of the proposed techniques over theconventional techniques is done. Section 6 concludes the article and presents a briefoverview of possible future directions in the design and implementation of inexactsystems.

1.2. Innovations in Inexact Circuit Design at Various Layers of Abstraction

In general, the hardware implementation of a system/application can be divided intothree layers of abstraction: architecture, logic, and physical, as shown in Figure 3. Anoptimal implementation of a system involves optimizations at each layer of abstrac-tion. However, reiterating that an inexact design in our sense involves designing andusing a circuit without compensating or correcting for error, most of the implementa-tions of inexact circuits realized so far tried to optimize designs at the physical level us-ing operational parameter-scaling approaches (such as supply voltage Vdd) guided bythe error tolerance/resilience of the application. This disparity becomes more evident


�

�

�

�

�

�

�

�


when we try to group the extensive research on inexact design into various differentlayers of abstraction.

— Logic layer. Chakrapani and Palem [2010]2.— Physical layer. George et al. [2006], Chong and Ortega [2007], Chakrapani et al.

[2008], Mohapatra et al. [2009], Palem et al. [2009a], Narayanan et al. [2010], andmany more.

It is evident from this classification that the striking aspect of inexact circuit im-plementation research is the lack of any substantial innovations at the architecture orlogic levels, and it is exactly the innovations at these layers which form the prime focusof this article. Given that the existing physical-level techniques (which are solely basedon some form of supply voltage overscaling) have significant drawbacks (outlined in thenext section) that considerably reduce possible gains in the inexact systems, there isa pressing need for innovations at the architecture and logic level that could translateto substantial gains for the entire system.

1.3. Drawbacks of Existing Implementations of Inexact Circuits

As mentioned previously, most of the existing efforts to realize inexact systems areconcentrated at the physical abstraction level using variants of parameter (particu-larly supply voltage) scaling. The drawbacks of such physical-level parameter scalingbased approaches are multifold.

(1) Accurate fine-tuning of supply voltage at runtime based on the application require-ments might not be feasible due to inherent variations present in the power supplyrouting [Alioto and Palumbo 2006] and by the large overhead generally requiredto ensure that such accurate fine-tuning is realized necessitated by the possibil-ity of massive failures that can occur in circuits beyond a critical voltage scalingpoint [Narayanan et al. 2010].

(2) One physical realization referred to as Biased Voltage Scaling (BiVOS)[Chakrapani et al. 2008; George et al. 2006] is seriously impeded since it involvessignificant overheads of routing multiple voltage planes and, by necessity, for levelshifters.

(3) Varying supply voltage during circuit operation coupled with the inherent powersupply variations might also increase the possibility of timing failures due tometastable conditions and might require metastable tolerant flip-flops or latchesadding to the already increasing overhead.

Based on these drawbacks, applying conventional voltage scaling based approachesalone might not be a wise option to realize inexact circuits (in particular, datapathelements), as they tend to considerably reduce the gains that can be obtained by theaccuracy trade-off. This highlights the growing importance of the need to move awayfrom voltage-scaling-based optimizations at the physical level to higher abstractionlevels to continue to glean substantial gains from the accuracy trade-off.

In this article, we overcome these drawbacks of conventional approaches to designinexact circuits through architecture and logic-level design techniques called Prob-abilistic Pruning and Probabilistic Logic Minimization, respectively, and show that

2It should be noted that probabilistic boolean logic proposed in Chakrapani and Palem [2010] is for prob-abilistic circuits when gates are rendered probabilistic due to inherent device perturbations/variations andhence might be only useful for future technology nodes. On the other hand, the probabilistic techniques pro-posed in this article (both probabilistic pruning and probabilistic logic minimization) are targeted for thecurrently widespread deterministic circuits but can be used mutatis mutandis for probabilistic circuits aswell.


�

�

�

�

�

�

�

�


these zero-overhead and technology-independent design approaches yield significantgains across all three dimensions—energy, area and delay—for acceptable error.

We wish to point out that while the proposed pruning technique was intended forsmaller datapath elements, such as adders in Lingamneni et al. [2011a], in this arti-cle, we observe that it would be more beneficial if the proposed probabilistic pruningtechnique is employed at the architectural level of a more complex system (such as anFFT employing a network of adders and multipliers) and the probabilistic logic mini-mization technique is utilized at the logic level to design smaller elements, such as theadders or multipliers. This is hinged on the intuition that logic-level optimization iscomputationally intensive, as it involves finetuning or reducing the circuit complexity(by minimizing its equation) and hence is well suited for smaller circuits. On the otherhand, probabilistic pruning uses a higher-level binary approach wherein a node is ei-ther pruned or not, and hence, it is less computationally intensive, making it ideal formore complex circuits.

2. USEFUL METRICS FOR DESIGNING AND ANALYZING INEXACT CIRCUITS

In this section, we propose some useful metrics needed to analyze and compare thegains obtained by the inexact designs along with a heuristic to guide the architecturaloptimizations through the notion of value of information or significance assigned toportions of the (inexact) system (generally guided by the application algorithm).

2.1. Defining Error Metrics

We can broadly classify error-resilient applications into two types: ones which havea bound on the total number of erroneous computations (such as the number of in-correct memory address computations in a microprocessor) and others (such as thecomputation of the value of a pixel by a graphics processor) which have bounds onthe magnitude of error. While in the former type of applications, each of the outputreceives equal importance or “significance” and errors are quantified through the er-ror rate metric, the outputs in the latter applications have a certain importance orweights depending on the magnitude of error and are quantified through the relativeerror magnitude metric, similar to the ones proposed in Chong et al. [2006] and widelyused in Chakrapani et al. [2008], and Palem et al. [2009a].

Error Rate = Number of Erroneous ComputationsTotal Number of Computations

= V′

V;

Relative Error Magnitude = 1V

V∑

k=1

|Ok − O′k|

Ok,

where V is the total number of simulation cycles or test vectors given to the circuit, Okis the expected correct output vector, and O′

k is the obtained erroneous output vectorfor the kth input vector. Given the dominance of the inexact applications with unequaloutput significance, in this article, we focus our efforts on quantifying the gains ininexact systems through the relative error magnitude metric, although it can be con-veniently replaced with the error rate metric or any other error metric (such as averageerror or maximum error) as deemed necessary.

2.2. Computing the Value of Information or Significance Value

The notion of assigning significance based on the value of information principle to acircuit node is one of the guiding principles for achieving an optimal inexact circuitdesign. It should be noted that the significance value is generally derived from the ap-plication’s algorithm and the type of circuit implementation (circuit topology) chosen.


�

�

�

�

�

�

�

�


Hence, the proposed architectural redesign techniques take this assignment of signif-icance as a parameter which can be modified based on various heuristics to obtainvarying (yet significant) amount of savings. For the sake of completeness, we presenta simple heuristic to assign significance to circuit nodes depending on the amount oferror they can cause at the circuit outputs, assuming the rest of the circuit nodes oper-ate correctly. Note that this is a circuit topology based heuristic and is not limiting inany sense that it could as well be combined with application algorithm’s assignmentto realize more optimal designs.

We consider the case of a single node that produces an error (averaged over theapplication’s test vectors). Let us consider that for some test vectors, a circuit node ican cause an error at an output node Ot for t ∈ {1, 2, · · · NO}, where NO is the totalnumber of output nodes. Let Er(i) and Er(Ot) be the errors at the output of node iand corresponding output nodes Ot. Then, we define the significance of node i as σ(i),computed as follows.

σ(i) =∑NO

t=1 Er(Ot)

Er(i).

The heuristic to assign significance described here can be implemented usinga mathematical model for simple circuits, such as a ripple carry adder, and usesimulation-based assignment while assigning significance to more complex circuits,such as a multiplier.

3. PROPOSED ARCHITECTURAL-LEVEL APPROACH: PROBABILISTIC PRUNING

Probabilistic Pruning is a architecture-level design technique wherein we systemat-ically prune or delete components and their associated wires along the paths of thecircuit that have a lower probability of being active during circuit operation whilestaying within the error boundaries dictated by the application. As this approach iscarried out during the design phase, it can be realized with zero overhead on the cir-cuit hardware. In this section, we introduce a formal mathematical formulation of theproposed pruning technique along with a detailed algorithm to implement it.

3.1. A Formal Mathematical Formulation of Probabilistic Pruning

A circuit can be represented as a directed acyclic graph3 whose nodes are componentssuch as gates, inputs, or outputs and whose edges are wires. Given a circuit G with NCcomponents, NI inputs, NO outputs, and NW wires, our goal is to prune componentsin the paths such that the energy, area, and speed are reduced while maintaining abound on error, say σ . Let I be the set of all input nodes, O be the set of output nodes,C be the set of all components, and W be the set of all wires. For each component Cj, wedenote the cost of the component by εj (note that the cost can be defined as the energy,delay, area, or a combination of these depending on the type of gains sought).

We now formulate an optimization problem of computing a circuit G’, which is asubgraph of G such that it has a subset of inputs, I′ ⊆ I, and outputs O′ ⊆ O and withcomponents C’ where NC’ ≤ NC and wires W’, where NW’ ≤ NW as follows.

Optimization Problem. Given G and V randomly chosen inputs (testbench), find G′

to minimize∑

Cj in G′εj

3This mathematical formulation doesn’t take into account the circuits with feedback paths.


�

�

�

�

�

�

�

�


such that

Er(G′) = 1V

V∑

k=1

pk × |O′k − Ok| ≤ σ , (1)

where Ok and O′k correspond to values of final output vectors < Ok,1,Ok,2, . . .Ok,n > and

< O′k,1,O′

k,2, . . .O′k,n > of circuits G and G’, respectively, for a given n-bit input vector Ik

which occurs with a probability pk for 1 ≤ k ≤ V. Without loss of generality, we couldassign a weight ηj to the jth output bit Oj.

Output. A pruned G′ that is optimal in that there is no other G′′ satisfying the pre-ceding conditions such that ε′′ < ε′.

The average error computation metric just used is not limiting in any sense thatit can be conveniently replaced by any other error metric based on the applicationrequirements in using the probabilistic pruning approach. However, our main goalin this article is to demonstrate the value of applying probabilistic pruning to circuitdesign. Therefore, we will not emphasize the algorithmic nuances in this article butwill rather use a simple-minded and (almost) brute-force heuristic here, which isshown in Figure 4. As shown in Figure 4, the probabilistic pruning algorithm consistsof two main functions.

— Ranking Function. The goal of the ranking function is to rank the nodes based ontheir value determined by the Significance-Activity Product (SAP) metric. The SAPis computed as a product of the significance (assigned as discussed in Section 2.2)and activity of a node, where activity is the probability of transition at a node de-termined through a benchmark simulation or a mathematical model. It should benoted that the complexity of a node used for this technique can be varied between asmall component, such as an adder, to a more complex component, such as an FFTblock.

— Pruning Function. The goal of the pruning function is to iteratively prune the nodeswith the least SAP values until the target error bound is reached. In this article, weuse an iterative greedy algorithm heuristic based pruning function which verifiesthe error bound (through benchmark simulation) after each pruning step. However,while greedy algorithms mostly do not guarantee an optimal circuit, they do pro-vide computationally much less intensive good-enough solutions generally [Cormenet al. 2001].

3.2. An Example to Demonstrate the Probabilistic Pruning Technique

As previously mentioned, we will generally refrain from applying the probabilisticpruning technique on smaller circuit blocks such as adders but will, however, use theirstructure as a demonstrative example structure to illustrate the principle of the prob-abilistic pruning technique. Given a wide variety of prefix networks [Harris 2003], thedifferent carry chain networks in adders provide an ideal stereotype to model a broadrange of circuits with similar structures of nodes and interconnects. The two majorsteps necessary for the ranking function are described next.

3.2.1. Computing Probability of Being Active or Activity of a Node. We next use concepts tocalculate the path probabilities in each adder and apply the probabilistic pruning tech-nique, as shown in Figure 4. We use some of the results derived in Pippenger [2002] tomodel the carry propagation path probabilities which will form the basis for obtainingthe activity value of each circuit node. For notational convenience, we will use the sym-bols S, A, and B to denote the Sum (output) and the two binary inputs to the adder.


�

�

�

�

�

�

�

�


Fig. 4. Flowchart for the probabilistic pruning technique.

As all the paths between output Si and an input Aj or Bj, (∀j �= i and 0 ≤ j ≤ N)existing in an N-bit adder are due to the propagation of carry bits, we compute thevarious path probabilities in an adder using a variation of the carry path propagationresults derived in Pippenger [2002] to form the basis of the pruning technique. A bitposition i is said to generate a carry if both Ai and Bi are equal to 1 and propagatea carry if exactly one of Ai or Bi is equal to 1. Hence, a sum output Si is affected byan input Aj or Bj (where j < i) only if there is a carry generated at j and the rest ofthe i − j bits propagate the carry. For example, if the summands A and B are chosenuniformly at random, the probability that a bit position j generates a carry is 1/4 andthe probability that the rest of the i − j − 1 propagates the carry is 1/2i−j−1. Hence, theprobability of any particular path from an input Aj or Bj to an output Sum Si beingactive is 1/2i−j+1.

Due to regular structure of the prefix networks, all components at the same level(or row) are on the paths with an equal probability of being active. For example, thecomponents on the 4th level of the Kogge-stone adder are propagating carry informa-tion from inputs Ai and Bi to output Si+8, while the components on the third levelare propagating the carry information from inputs Ai and Bi to output Si+4, the pathprobabilities of which are 1/29 and 1/25, respectively.

3.2.2. Assigning Significance to the Nodes. We observed that the optimal assignment ofsignificance is dependent on the target application (or the error metric used to quan-tify it). To demonstrate this, we present three different assignments of significance de-pending on obtaining the minimum error (metric) values for a particular energy-delayproduct gains, as summarized in Figure 5.

— For the currently widespread DSP and multimedia applications where the end re-sult is quantified through some variant of the signal-to-noise ratio (SNR) metric,it is the average or relative error metrics that will be useful for quantifying suchsystems. The assignment of significance for minimizing the relative and average


�

�

�

�

�

�

�

�


Fig. 5. Example of different variations of the probabilistic pruning technique for minimizing different errormetrics on a Kogge-Stone type network of nodes. The results for each structure are given when the lowestranked nodes by the SAP product (indicated by numbers) in these structures are pruned.

errors is shown in Figure 5(a), where each bit position has a significance of twotimes higher than the previous bit position (starting from the LSB and moving to-wards the MSB), and the nodes in each column are assigned the significance of theoutput bit significance of that column.

— For applications requiring minimum error rate, we assign the same significancevalue to all the nodes in the circuit, as shown in Figure 5(b).


�

�

�

�

�

�

�

�


Table I. Comparison of Probabilistic Pruning and Precision Reduction for a 16-bit Input Data over a KoggeStone Carry Network and a Ripple Carry Adder

Precision reduced to Avg. Error Rel. Error Pruned Adder (extra) EDAP Gains16-bit Kogge-Stone 16-bit Ripple Carry

(PG) Block-level Pruning Gate-Level Pruning14-bit ∼3 0.007% 1.334X 1.383X13-bit ∼7 0.015% 1.239X 1.362X12-bit ∼15 0.032% 1.328X 1.437X11-bit ∼31 0.065% 1.383X 1.21X

— For applications needing a bound on the maximum permissible error, we observedthat assigning significance based on the precision reduction or truncation schemeresulted in the lowest maximum error, as shown in Figure 5(c).

It is evident from Figure 5 that assigning the significance to circuit nodes largelyinfluences the targeted error metric for similar cost benefits. After ranking the nodesusing the significance-activity product (SAP), we execute the iterative pruning functionon the adder network following Figure 4.

We also show that the gains obtained through the pruning technique are only de-pendent on the ratio of the nodes pruned, and hence, using an adder carry network ora bigger circuit (such as an FIR/FFT) with equivalent network of nodes (nodes can bedifferent) will yield similar gains for similar ratio of nodes pruned. Hence, in the in-terests of simulating and validating a wide range of network of nodes, we will restrictourselves to analyzing the gains obtained in a variety of adder carry chain networks.We hope to extend this to a few select networks of nodes, such as an FIR filter or FFTdatapath, in our future work.

3.3. Comparison to Precision Reduction or Bit-Width Truncation

As shown previously, precision reduction (utilizing datapath elements with lowerprecision) and bit-width truncation (truncating some of the output bits) can be viewedas a special case of the probabilistic pruning algorithm in which (a) significance ofthe truncated nodes can be assigned as zero, or (b) activity of the truncated nodeswill be zero. Our results, as shown in Table I, establish that the probabilistic pruningtechnique outperforms the bit-width truncation by achieving 20–40% more cumu-lative gains in energy-delay-area product for comparable relative error magnitudeand average error. Also, one other drawback of the precision reduction technique isthat it obtains highly unacceptable results when the error rate metric is used as thecomparison metric, as even a two-bit truncation gave >90% error rate. However, as aspecial case of the pruning algorithm, it does achieve the lowest maximum error valuewhen compared with other heuristics (as demonstrated in Figure 5) for comparableenergy-delay-area product points.

An important observation from Table I is that while in the Kogge-Stone adder, thenodes chosen are at the granularity of blocks (Propagate-Generate(PG) blocks), wechoose the gates as the nodes in the case of a Ripple Carry adder. This is attributed tothe fact that choosing a block-level (or at the granularity of a full adder) in a RippleCarry adder would lead to simple bit-width truncation owing to the serial network,while reducing the granularity would lead to more possibilities of trade-offs. In gen-eral, the granularity at which a node is chosen is generally dependent on the complex-ity of the circuit under consideration and its structure.


�

�

�

�

�

�

�

�


Fig. 6. Demonstrative examples of a few architectures of the conventional and minimized full adder cells.

4. PROPOSED LOGIC-LEVEL APPROACH: PROBABILISTIC LOGIC MINIMIZATION

Probabilistic logic minimization is a logic-level technique wherein we systematicallyminimize circuit components (or nodes) guided by the significance and the input combi-nation probabilities of those nodes while staying within the error boundaries dictatedby the application [Lingamneni et al. 2011b]. Similar to probabilistic pruning, it iscarried out at the design level and hence, incurs zero hardware overhead. The pro-posed algorithm takes advantage of the notion of introducing bit flips in the mintermsof boolean functions that was proposed in Choudhury and Mohanram [2008] in thecontext of fast error detection and subsequently used in Shin and Gupta [2010] for en-hancing circuit yield and for designing low-power inexact circuits in Bharghava et al.[2010]. However, this utilization of bit-flip based technique in the synthesis of inexactcircuits has mostly been ad hoc with limited insights to general circuits (especiallydatapath circuits that dominate the energy consumption in inexact systems, such asmotion estimation block in video encoders [Varatkar and Shanbhag 2006]) and lacksa general guiding algorithm to attune circuits to specific error-tolerant applications toglean further cost (energy, delay, and area) gains, and we hope to address these issuesthrough our proposed technique in this article.

4.1. Logic Minimization Through Bit Flips in Karnaugh Maps of Boolean Functions

The key to the probabilistic logic minimization algorithm is the notion of introduc-ing bit flips in the minterms of boolean functions to further minimize them (as dem-onstrated in Choudhury and Mohanram [2008]), thereby achieving gains (energy/area/delay) through literal reduction while causing an error due of such bit flip(s).However, not all bit flips of minterms would result in expanding the prime implicant(PI) cubes, and some of them might result in negative gains. Hence, it is important toidentify the “favorable” bit flips (or the bit flips which further minimize the function)and discard the unfavorable ones. To illustrate through an example, Figure 7(a) showsa function (Carry logic) that is widely prevalent in most datapath elements. Assuming


�

�

�

�

�

�

�

�


that the application would only be able to tolerate at most one bit flip at this logicfunction (probability of error = 1/8), Figures 7(b) and 7(c) give examples of favorable0 to 1 and 1 to 0 bit flips, respectively, as they minimize the logic function, whereasFigure 7(d) shows an unfavorable bit flip leading to an increased logic functioncomplexity. Another alternative for addressing unfavorable minterms is by using don’tcare (DC) conditions in the Karnaugh maps [Bharghava et al. 2010]. Hence, we canconclude that the introduction of favorable bit flips would lead to further minimizationof a logic function owing to the expansion of PI cubes, thereby achieving cost (energy,area, and delay) gains at the expense of error, which is proportional to the number ofsuch bit flips introduced.

While the benefit of this imprecise minimization cannot be denied, it gives riseto another interesting and important question: Given a circuit node with manyfavorable bit-flip possibilities, each with similar cost gains, how do we select the rightminimization for the node? In other words, is the error introduced by each of thebit flips equal? While conventional wisdom calls for an assumption of uniform inputcombination probabilities, it is never the case with most applications, more so withmultimedia applications where inputs are highly correlated, and hence, our proposedtechnique takes advantage of such correlation to guide the minimization algorithmand glean further savings.

Assuming uniform input probability values, the architectures of some of the fulladder cells obtained for various application benchmarks are shown in Figure 6.

In general, given a circuit node with n inputs, there are 2n possible minterms ofwhich we could flip the bits at atmost k minterms (depending on the application’s errortolerance) to derive the minimum cost function. We propose a probabilistic extensionto the minimization scheme wherein all the favorable bit flips are ranked based ontheir input combination probabilities, and the bit flip(s) having the least correspondinginput combination probabilities are done. For example, in Figures 7(b) and 7(c), theminimized functions have the same gains (two ORs and three ANDs function reducedto one OR and one AND). But if the probability of input to the logic function being ‘001’is higher than the input being ‘011’, then a bit flip at ‘011’ would likely cause an errorwith a lesser probability. Hence, in short, a bit flip occurring at the least likely inputcombination would result in lesser error for the same amount of savings. With this asbackground, we propose a general algorithm for application of the probabilistic logicminimization technique in the following section.

4.2. Probabilistic Logic Minimization Based Datapath Elements

We choose datapath elements as the first platform for the application of our probabilis-tic logic minimization technique, as they are one of the most energy-consuming blocksin the targeted error-tolerant applications (e.g., power consumption of the datapathelements accounts for up to 75% of the total motion estimation block [Varatkar andShanbhag 2006]). The main datapath elements commonly used in most applicationsare arithmetic adders and multipliers, and hence, they will be the prime focus of ourstudy.

To select the optimal bit flips, we will use the input combination probabilities at thefull adder nodes. The results obtained for a simulation of different benchmarks on var-ious full adder nodes in datapath elements are shown in Table II. The audio and imagebenchmarks have been obtained from NCH Software4, and MediaBench5, respectively.It should be noted that not all full adders used to construct the datapath elements,

4http://www.nch.com.au/acm/index.html5http://euler.slu.edu/fritts/mediabench


�

�

�

�

�

�

�

�


Fig. 7. Example of k-maps of the (a) initial correct function (Carry logic of a full adder), (b) function with afavorable 0 to 1 bit flip, (c) function with a favorable 1 to 0 bit flip, (d) function with an unfavorable 0 to 1 bitflip.

such as array multiplier, have similar input transition characteristics. The full addersreceiving inputs directly from the circuit inputs are dependent on the application’s in-put correlation, while the full adders receiving the inputs from the outputs of otheradders are more dependent on the topology of the circuit (specifically the sum and/orcarry propagation paths). For the array multiplier, the full adders receiving the inputsdirectly from the partial products (AND-ed inputs) are denoted as external, the fulladders present inside the partial product reduction matrix are denoted as internal,and finally, the full adders present in the final carry propagate stage are denoted asCPA. Hence, the probability of an input combination occurring at a node is generallyeither (a) only dependent on the input test vectors (such as full adders in ripple carryadder and external full adder in array multiplier), (b) only dependent on the circuittopology (such as internal full adder of array multiplier), or (c) a combination of both(such as CPA full adder in array multiplier).

In general, this technique can be extended to higher-order counters/compressors[Song and De Micheli 1991] with relative ease. While using parallel prefix adders[Harris 2003], the choice of nodes can be varied between XOR gates in the initial prop-agate blocks and the PG-blocks in the prefix network tree.

Some of the key observations from Table II are as follows. (a) There is a strong cor-relation between the input vectors and the type of minimization that can be done atthe node; (b) the amount of logic minimization (or the number of bit flips) that can beperformed on a node can be determined by the corresponding grouping of input com-bination probabilities that are close to each other. For example, we could potentiallygroup all the input combinations with values less than one or two standard deviationsfrom the mean of the group and then favorable bit flips can be done within this group toobtain the most amount of minimization. For example, in the audio benchmark, input


�

�

�

�

�

�

�

�


Table II. Input Combination Probabilities of Full Adders in Various Datapath Elements

TestVectorSuite

Datapath Element Probability of Various Input Combinations

000 001 010 011 100 101 110 111

Uniform

Ripple Carry Adder 0.129 0.121 0.125 0.121 0.125 0.126 0.13 0.124Array Multiplier (External) 0.542 0.024 0.167 0.017 0.145 0.04 0.025 0.041Array Multiplier (Internal) 0.349 0.047 0.079 0.048 0.314 0.051 0.056 0.055

Array Multiplier (CPA) 0.388 0.088 0.082 0.01 0.218 0.092 0.091 0.03

Audio

Ripple Carry Adder 0.258 0.02 0.133 0.12 0.129 0.121 0.014 0.205Array Multiplier (External) 0.5207 0.0004 0.2209 0.0002 0.2556 0.0005 0.0003 0.0014Array Multiplier (Internal) 0.394 0.029 0.041 0.048 0.309 0.032 0.038 0.109

Array Multiplier (CPA) 0.272 0.073 0.148 0.001 0.291 0.126 0.085 0.004

Image

Ripple Carry Adder 0.355 ∼0 0.382 ∼0 0.148 ∼0 0.115 ∼0Array Multiplier (External) 0.846 ∼0 0.132 ∼0 0.024 ∼0 ∼0 ∼0Array Multiplier (Internal) 0.38 0.027 0.192 0.015 0.298 0.033 0.037 0.019

Array Multiplier (CPA) 0.867 0.03 0.013 ∼0 0.081 0.007 0.003 ∼0

combinations {‘001’, ‘011’, ‘101’, ‘110’} for the external full adder of the array multi-plier can be grouped together and favorable bit flips identified among them. (c) Therecan be a strong correlation between the application benchmarks and the possible min-imizations. For example, for the image benchmark, input combinations {‘001’, ‘011’,‘101’, ‘111’} can be flipped for full adders in ripple carry adder and array multiplier(external) without any error in the output.

4.3. A General Algorithm for Probabilistic Logic Minimization

A circuit can be represented as a directed acyclic graph with nodes representing com-ponents, such as gates (or even bigger blocks like full adders), input, or outputs andwith edges representing interconnects. Let a graph G represent a circuit with N nodesand W edges. For any given node i in the graph, we have the following.

— node.function(i) denotes that function computed by node i.— node.significance(i) denotes the significance of node i, the assignment of which is

described in Section 2.2.— node.fanin(i) denotes the fanin of node i.— node.inputprobability(i)(j) denotes the transition probability of input combination j

occurring at node i. The range of values of j are 0 to 2fanin − 1.— node.functionmin(i) denotes the minimized function of the node.— node.valuemin(i) denotes the value or normalized cost gains of the minimized

function.

While function, significance, and fanin values of all nodes are derived from the graphstructure and user input, inputprobability, functionmin, and valuemin are computedduring the algorithm execution.

5. EXPERIMENTAL RESULTS AND ANALYSIS OF THE PROPOSED PROBABILISTICTECHNIQUES

5.1. Methodology and Framework

The proposed logic-synthesis-based CAD methodology for applying the probabilisticpruning and probabilistic logic minimization techniques is based on Figure 8. Thecentral object of interest in this CAD methodology is the probabilistic pruner/logic


�

�

�

�

�

�

�

�


ALGORITHM 1: Pseudocode for the Probabilistic Logic Minimization (PLM) Algorithm on aCircuit or Graph G

//Main Function in the Algorithmfunction PLM(MaxError)

//Compute the probability of each input transition at all nodesBenchmark( );//Compute the most cost-effective minimization at each nodefor all i ← 1 to N do

ComputeMinimization(node(i));end for//Iteratively minimize each node based on their “value” until the error bound is reachedwhile Error ≤ MaxError do

NodetoMinimize = FindMinimum (node.significance × node.valuemin);MinimizeNode(NodetoMinimize);

end whileend function

function BENCHMARKRunBenchmark();for all i ← 1 to N do

for all j ← 1 to 2fanin − 1 donode.inputprobability(i)(j) = ComputeInputProbability;

end forend for

end function

function COMPUTEMINIMIZATION(node)for all j ← 1 to 2fanin − 1 do

//Estimate the gains obtained by the bit flip at the input sequence j in the K-Mapthrough synthesis tools

costgain(j) = EstimateCostGain(bitflip(j));if costgains(j) > 0 then

ValueofBitFlip(j) =costgain(j)

inputprobability(j);

elseValueofBitFlip(j) = 0;

end ifend forMaxValue = FindMaximum(ValueofBitFlip(j)):functionmin ← ComputeFunction(MaxValue);valuemin ← MaxValue;

end function

function MINIMIZENODE(node)function ← functionmin;

end function

minimizer which has been seamlessly integrated into the traditional established CADflow to design and fabricate integrated circuits.

In this CAD flow, the circuits are described in a hardware description language (typ-ically VHDL or Verilog) and then synthesized using industrial logic synthesizers, suchas Synopsys Design Compiler or Cadence RTL Compiler. This synthesized design isthen sent to the probabilistic pruner/logic-minimizer which implements the respec-tive algorithms. This pruner/logic-minimizer interacts with an error estimator (imple-mented in either C/C++/Matlab) and a functional simulator (such as ModelSim) with


�

�

�

�

�

�

�

�


Fig. 8. Synthesis-based CAD flow integrating the proposed architectural techniques.

an application-specific benchmark to determine the final pruned or logic-minimizedcircuit. The application-specific benchmark used in our simulations include uniformrandom distributions from Matlab (for generic applications), image test vectors fromMediabench and audio test vectors from NCH Software. The obtained pruned or mini-mized circuit will be synthesized once again to glean further savings (if any) and thensent to the Place & Route Tools (such as Cadence SoC Encounter) to generate the finallayout and the GDSII file that would be sent to the foundry for fabrication. The post-layout analysis of our resulting circuit is done by back-annotation of the final netlistand parasitics using Synopsys Power Compiler.

To establish the technology-independent nature of our architecture-level designtechniques, we have implemented the inexact circuits in a variety of CMOS technologylibraries, including TSMC 65nm (high Vt), IBM 90nm (normal Vt), and TSMC 180nm(low power). All the circuits are operated at the nominal supply voltage of their re-spective technology nodes, as indicated by the foundries: 1.8V for TSMC 180nm, 1.2Vfor IBM 90nm, and 1.2V for TSMC 65nm. Also, the synthesis of designs was done tar-geting highest frequency of operation and lowest power consumption (or loose targetfrequency) separately in order to analyze the gains achieved in each case.

5.2. Simulation Results of the Proposed Techniques

The normalized gains (conventional/pruned) for different metrics—Energy (computedas energy/operation), energy-delay product (EDP), and energy-delay-area product(EDAP)—obtained by applying the probabilistic pruning technique on various 64-bit adder carry networks are summarized in Figure 9. While we have considered 11unique adder network structures in our simulations, only the best and worst two casesare shown here to provide the range of savings and error percentages that could beachieved through the pruning technique. As evident from Figure 9, the probabilisticpruning technique achieves 2x–7.7x savings in the energy-delay-area product metricfor reasonable relative error magnitude values.


�

�

�

�

�

�

�

�


Fig. 9. Normalized gains vs. relative error percentage of various probabilistic pruned 64-bit adders.

The normalized gains (conventional/proposed) values for different error metricsobtained by applying the probabilistic logic minimization technique for a 16-bit ripplecarry adder and a 16-bit array multiplier for different application benchmarks aregiven in Figure 10. The choice of the bit-width here is governed by the fact that most ofthe targeted multimedia applications [Mohapatra et al. 2009; Varatkar and Shanbhag2006] generally use bit-widths of 16 bits or less for datapath elements. However, wehave also implemented other types of adders, such as Carry-Select, Kogge-Stone,and Sklansky, and multipliers, such as Wallace-tree and Dadda multipliers, withvarying bit-widths (up to 64 bits) and for different application benchmarks, andhave obtained similar gains. As evident from the results, the probabilistic logicminimization approach results in highly energy-, delay-, and area-efficient datapathelements. For the uniform test vectors, in the case of ripple carry adders, probabilisticminimization yields savings up to 8x with a relative error of less than 1% comparedto their conventional correct counterparts, while in the case of array multiplier, itresulted in savings of about 7x with a relative error of less than 6.5%. It can be seenthat using application-specific test vectors (like audio and image), the savings have


�

�

�

�

�

�

�

�


Fig. 10. Normalized gains vs. relative error percentage of minimized ripple carry adder and array multiplierfor different benchmarks.


�

�

�

�

�

�

�

�


Fig. 11. Graphs showing the technology independence of the proposed techniques through the energy-delay-area product metric.

increased (up to 9.5x in the case of ripple carry adders and up to 8.25x in the case ofarray multipliers) with comparable error values.

To summarize, one of the key inferences from the simulation results is that the sig-nificant gains achieved in a circuit through the proposed probabilistic techniques aretechnology independent, bit-width independent, and only proportional to the amountof circuit nodes pruned/minimized.

5.3. A Summary of Characteristics of the Proposed Techniques

Some of the important observations from the experimental results of the proposedtechniques are summarized next.

— Figure 11 outlines the results obtained for applying probabilistic pruning on twodifferent adder networks and for probabilistic logic minimization technique on anarray multiplier in three different technology libraries. From this, we can concludethat for similar operating conditions, the gains achieved in the probabilistic prunedcircuits or probabilistic logic minimized circuits are proportional to the ratio ofcircuit nodes pruned/minimized to the original circuit and are largely independenton the process technology being used.

— We can also observe that both the proposed approaches are design-level techniqueswhich do not involve varying circuit parameters during operation, and hence, wecan conclude that the amount of error in a probabilistic pruned or probabilisticlogic minimized circuit is independent of varying parameters (such as Vdd) unlikephysical-level techniques and are hence as robust as conventional circuits to pro-cess variations. The amount of such error is generally fixed at design time basedon application requirements. The proposed probabilistic approaches can be used inconjunction with techniques such as adaptive body bias [Tschanz et al. 2002] toaddress the effects of parameter variations in the more significant portions of thecircuits.

— Another observation regarding the probabilistic pruned or minimized circuits isthat the error (both error rate and relative error magnitude) in probabilistic prunedor minimized rises sharply beyond a critical amount of pruning/minimization akinto the critical voltage scaling point problem mentioned in Narayanan et al. [2010].


�

�

�

�

�

�

�

�


We anticipate that this can be fixed by combining a physical-level or algorithm-levelapproach with the proposed techniques.

— These gains achieved through our proposed techniques are relative in that they canbe combined with standard techniques that achieve energy or performance gains orboth through absolute approaches. Specifically, this means that any technique thatuses equal physical-level or algorithm-level innovations and yields correct resultsor slightly incorrect results can be extended through the insights in this article toyield additional gains simultaneously along the energy, delay, and area dimensionsby using the proposed techniques.

5.4. Advantages of the Proposed Architecture-Level Approaches

The advantages of the proposed probabilistic techniques are multifold and are sum-marized next.

(1) As the techniques are used to realize or synthesize inexact circuit architectures atthe design level, they have zero overhead on the circuit hardware in terms of en-ergy, delay, and area. In other words, the proposed techniques obtain savings in allthree dimensions—energy, delay, and area—when compared to their conventionalcorrect counterparts.

(2) Since they are design approaches, the proposed techniques guarantee a bound onthe error (average or worst case) for the inexact circuit realization, unlike thephysical-level scaling techniques (such as voltage scaling). This can be attributedto the fact that the proposed techniques are independent of operational circuit pa-rameters (such as supply voltage) variation and hence resilient to metastability ortiming failures and doesn’t have a critical (voltage scaled) point that might causemassive failures [Narayanan et al. 2010].

(3) The proposed techniques don’t have the hardware overheads of level shifters, mul-tiple voltage planes, or metastability-tolerant latches typically needed for the op-eration of circuit-level voltage-scaling based design techniques.

(4) The proposed techniques are technology independent, as the amount of gains areonly proportional to the amount of nodes pruned or minimized and not on theprocess technology parameters, unlike the voltage-scaling based schemes in whichthe amount of gains is limited by the process technology constraints. For example,in the present day deep submicron CMOS technology nodes (45nm and below), thesupply voltage is typically around 1V and the threshold voltage is around 0.3–0.5V.Hence, the amount of voltage scaling that could be done is very limited and so arethe gains.

(5) Another advantage in applying the proposed probabilistic logic minimizationapproach to XOR-dominated (datapath) circuits widely prevalent in most ap-plications is that traditional logic synthesizers do a lousy job in minimizingXORs [Verma and Ienne 2007] whereas through our logic minimization algorithm,the logic synthesizers can extract further savings as the minimized function is mostlikely to have primitive gates as opposed to costly XORs.

(6) For present day and future deep submicron CMOS technologies (45nm and below),leakage power forms a significant portion of the total power consumption. The pro-posed techniques reduce the absolute leakage power of a circuit by virtue of theirsignificant reduction in the total number of leaky transistors.

(7) Lastly, the proposed techniques can be integrated easily into traditional systembased CAD flows, thereby reducing the design effort and time, as opposed to someof the physical-level design techniques, in particular the Biased Voltage Scaling(BiVOS) proposed in George et al. [2006] and Chakrapani et al. [2008] that requiresa custom design flow for (physical) implementation.


�

�

�

�

�

�

�

�


6. CONCLUSION AND FUTURE DIRECTIONS

To the best of our knowledge, this is the first attempt at innovations at thearchitecture- and logic-level for inexact circuit design. As substantiation, we convinc-ingly show through extensive simulations that the proposed architecture-level Prob-abilistic Pruning and logic-level Probabilistic Logic Minimization techniques achievesignificant savings across all three dimensions—energy, delay, and area—for the mod-est error trade-offs while avoiding the drawbacks associated with the conventionalphysical-level voltage overscaling schemes. Other benefits of the proposed zero over-head techniques include technology independence and ability to operate within the er-ror bounds of the application specified at the design level (no scope for timing errors ormetastable states due to supply voltage variations that might lead to massive failures).

Some of the future directions of research that can be pursued building upon thiswork are outlined next.

— Mathematical framework for developing optimization models for the proposed tech-niques. While this article concentrated on proposing heuristics to demonstrate thebenefits that could be achieved by the architectural-level and logic-level probabilis-tic techniques, it should be noted that the resulting circuits are not optimal in termsof achieving the best cost (energy, delay, and/or area) versus accuracy trade-off, ascan be expected from a heuristic-guided algorithm rather than an optimization-based algorithm. The exploration and evaluation of optimization algorithms for re-alizing optimal inexact circuits at the various levels of abstraction building on theefforts of Kedem et al. [2010, 2011] will be one of the focuses of our future research.

— Design of complex error-tolerant systems. Complex systems as motion estimationfor video encoding/decoding, hearing aids using the inexact building blocks, andtechniques proposed in this article will form the application focus of our futureresearch efforts as well.

— Dynamic quality-cost trade-offs. The implementations of the proposed probabilis-tic techniques have been static so far owing to large overheads involving usingdynamic schemes for small arithmetic components. We would like to extend theproposed techniques to accommodate dynamic error-cost trade-offs at runtime.

— Cross-layer co-design of inexact circuits. Realizing a cross-layer co-design frame-work will be of significant interest as well. We view this work as an early valida-tion of a very general principle to datapath design with the potential to enable novelapplications. To start with, we believe that conventional algorithms for (computer)arithmetic and concomitant designs for signal processing will have to be revisited(particularly at the algorithm-level) and will result in innovations if inexact de-sign is considered. Second, we also expect architectural research building on thework of Kaul et al. [2008] wherein an SoC approach to designing specialized me-dia co-processors is outlined. We anticipate such co-processors as being eminentlysuited to being designed using the principles outlined in this article for any generalapplication.

ACKNOWLEDGMENTS

We would like to thank and acknowledge Jean-Luc Nagel and Marc Morgan of CSEM SA for their valuablehelp and support in establishing the simulation framework for this article. We would also like to acknowl-edge the contributions of Lakshmi Chakrapani and Kirthi Krishna Muntimadugu for their collaborativework in this domain which helped influence and shape the article. The concept for Figure 2 is inspired byRichard Karp’s Turing award lecture [Karp 1986]. We would also like to extend our sincere gratitude to theanonymous reviewers for their valuable feedback and comments.


�

�

�

�

�

�

�

�


REFERENCES

Alioto, M. and Palumbo, G. 2006. Impact of supply voltage variations on full adder delay: Analysis andcomparison. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 14, 12, 1322–1335.

Banerjee, N., Karakonstantis, G., and Roy, K. 2007. Process variation tolerant low power DCT architecture.In Proceedings of the Design, Automation and Test in Europe Conference. 1–6.

Bharghava, R., Abinesh, R., Purini, S., and Govindatajulu, R. 2010. Design of low power systems usinginexact logic circuits. J. Low Power Electron. 6, 3, 401–414.

Borkar, S. 2005. Designing reliable systems from unreliable components: The challenges of transistor vari-ability and degradation. IEEE Micro 25, 6, 10–16.

Bronk, C., Lingamneni, A., and Palem, K. 2010. Innovation for sustainability in information and commu-nication technologies (ICT). Tech. rep., James A. Baker III Institute for Public Policy, Rice University,Houston, TX.

Chakrapani, L. N. and Palem, K. V. 2010. A probabilistic boolean logic for energy efficient circuit and systemdesign. In Proceedings of the 15th Asia South Pacific Design Automation Conference.

Chakrapani, L. N. B., Korkmaz, P., Akgul, B. E. S., and Palem, K. V. 2007. Probabilistic system-on-a-chiparchitectures. ACM Trans. Des. Autom. Electron. Syst. 12, 3, 1–28.

Chakrapani, L. N. B., Muntimadugu, K. K., Lingamneni, A., George, J., and Palem, K. V. 2008. Highlyenergy and performance efficient embedded computing through approximately correct arithmetic: Amathematical foundation and preliminary experimental validation. In Proceedings of the IEEE/ACMInternational Conference on Compilers, Architecture, and Synthesis of Embedded Systems.

Cheemalavagu, S., Korkmaz, P., and Palem, K. V. 2004. Ultra low-energy computing via probabilistic algo-rithms and devices: CMOS device primitives and the energy-probability relationship. In Proceedings ofthe International Conference on Solid State Devices and Materials. 402–403.

Chippa, V. K., Mohapatra, D., Raghunathan, A., Roy, K., and Chakradhar, S. T. 2010. Scalable effort hard-ware design: Exploiting algorithmic resilience for energy efficiency. In Proceedings of the 47th DesignAutomation Conference. 555–560.

Chong, I. and Ortega, A. 2007. Dynamic voltage scaling algorithm for power constrained motion estimation.In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing.

Chong, I., Cheong, H., and Ortega, A. 2006. New quality metric for multimedia compression using faultyhardware. In Proceedings of the International Workshop on Video Processing and Quality Metrics forConsumer Electronics.

Choudhury, M. and Mohanram, K. 2008. Approximate logic circuits for low overhead, non-intrusive concur-rent error detection. In Proceedings of the Design, Automation and Test in Europe. 903–908.

Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduction to Algorithms 2nd Ed. TheMIT Press, Cambridge, MA.

Ding, Y. Z. and Rabin, M. O. 2002. Hyper-encryption and everlasting security. In Proceedings of the 19thAnnual Symposium on Theoretical Aspects of Computer Science. Lecture Notes In Computer Science,vol. 2285. 1–26.

Dubey, P. 2005. A platform 2015 workload model recognition, mining and synthesis moves computers to theera of tera. White paper, Intel Corp.

Ernst, D., Kim, N. S., Das, S., Pant, S., Pham, T., Rao, R., Ziesler, C., Blaauw, D., Austin, T., and Mudge, T.2003. Razor: A low-power pipeline based on circuit-level timing speculation. In Proceedings of the 36thAnnual IEEE/ACM International Symposium on Microarchitecture (MICRO). 7–18.

George, J., Marr, B., Akgul, B. E. S., and Palem, K. 2006. Probabilistic arithmetic and energy efficientembedded signal processing. In Proceedings of the IEEE/ACM International Conference on Compilers,Architecture, and Synthesis for Embedded Systems. 158–168.

Harris, D. 2003. A taxonomy of parallel prefix networks. In Proceedings of the Asilomar Conference onSignals, Systems and Computers 2, 2213–2217.

Hegde, R. and Shanbhag, N. R. 1999. Energy-efficient signal processing via algorithmic noise-tolerance. InProceedings of the International Symposium on Low Power Electronics and Design. 30–35.

Hegde, R. and Shanbhag, N. R. 2001. Soft digital signal processing. IEEE Trans. Very Large Scale Integr.(VLSI) Syst. 9, 6, 813–823.

Hoffmann, H., Sidiroglou, S., Carbin, M., Misailovic, S., Agarwal, A., and Rinard, M. 2011. Dynamic knobsfor responsive power-aware computing. In Proceedings of Architectural Support for Programming Lan-guages and Operating Systems (ASPLOS). 199–212.

Karp, R. M. 1986. Combinatorics, complexity, and randomness. Commun. ACM 29, 2, 98–109.


�

�

�

�

�

�

�

�


Kaul, H., Anders, M., Mathew, S., Hsu, S., Agarwal, A., Krishnamurthy, R., and Borkar, S. 2008. A 320 mv 56μw 411 gops/watt ultra-low voltage motion estimation accelerator in 65 nm cmos. IEEE J. Solid-StateCircuits. 107–114.

Kedem, Z., Mooney, V., Muntimadugu, K. K., and Palem, K. 2011. An approach to energy-error tradeoffs inapproximate ripple carry adders. In Proceedings of the International Symposium on Low Power Elec-tronics and Design (ISLPED).

Kedem, Z. M., Mooney, V. J., Muntimadugu, K. K., Palem, K. V., Devarasetty, A., and Parasuramuni, P. D.2010. Optimizing energy to minimize errors in dataflow graphs using approximate adders. In Proceed-ings of CASES. 177–186.

Kim, S. H., Mukohopadhyay, S., and Wolf, W. 2009. Experimental analysis of sequence dependence on en-ergy saving for error tolerant image processing. In Proceedings of the 14th ACM/IEEE InternationalSymposium on Low Power Electronics and Design.

Kish, L. B. 2002. End of Moore’s law: Thermal (noise) death of integration in micro and nano electronics.Physics Letters A 305, 144–149.

Korkmaz, P., Akgul, B. E. S., Chakrapani, L. N., and Palem, K. V. 2006. Advocating noise as an agentfor ultra low-energy computing: Probabilistic CMOS devices and their characteristics. Japan. J. Appl.Physics 45, 4B, 3307–3316.

Leem, L., Cho, H., Bau, J., Jacobson, Q., and Mitra, S. 2010. ERSA: Error resilient system architecture forprobabilistic applications. In Proceedings of the Design, Automation Test in Europe Conference (DATE).1560–1565.

Lingamneni, A., Enz, C., Nagel, J.-L., Palem, K., and Piguet, C. 2011a. Energy parsimonious circuit de-sign through probabilistic pruning. In Proceedings of the 14th Design, Automation and Test in Europe.764–769.

Lingamneni, A., Enz, C., Palem, K., and Piguet, C. 2011b. Parsimonious circuit design for error-tolerantapplications through probabilistic logic minimization. In Proceedings of the 21st International Workshopon Power and Timing Modeling, Optimization and Simulation. 204–213.

Ludwig, J., Nawab, S., and Chandrakasan, A. 1996. Low-power digital filtering using approximate process-ing. IEEE J. Solid-State Circuits 31, 3, 395–400.

Mohapatra, D., Karakonstantis, G., and Roy, K. 2009. Significance driven computation: A voltage-scalable,variation-aware, quality-tuning motion estimator. In Proceedings of the International Symposium onLow Power Electronics and Design (ISLPED).

Narayanan, S., Sartori, J., Kumar, R., and Jones, D. 2010. Scalable stochastic processors. In Proceedings ofthe Design, Automation and Test in Europe.

Nawab, S. H., Oppenheim, A. V., Chandrakasan, A. P., M.Winograd, J., and T.Ludwig, J. 1997. Approximatesignal processing. J. VLSI Signal Process. 15, 177–200.

Palem, K. V. 2003a. Energy aware algorithm design via probabilistic computing: From algorithms and mod-els to Moore’s law and novel (semiconductor) devices. In Proceedings of the IEEE/ACM InternationalConference on Compilers, Architecture and Synthesis for Embedded Systems. 113–117.

Palem, K. V. 2003b. Proof as experiment: Probabilistic algorithms from a thermodynamic perspective. InProceedings of the International Symposium on Verification (Theory and Practice).

Palem, K. V. 2005. Energy aware computing through probabilistic switching: A study of limits. IEEE Trans.Comput. 54, 9, 1123–1137.

Palem, K. V., Chakrapani, L. N., Kedem, Z. M., Lingamneni, A., and Muntimadugu, K. K. 2009a. SustainingMoore’s law in embedded computing through probabilistic and approximate design: Retrospects andprospects. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis forEmbedded Systems. 1–10.

Palem, K. V., Korkmaz, P., Yeo, K.-S., and Kong, Z.-H. 2009b. Probabilistic cmos (pcmos) logic for nanoscalecircuit design. In Proceedings of the International Solid State Circuits Conference: Advanced Solid-StateCircuits Forum.

Pippenger, N. 2002. Analysis of carry propagation in addition: An elementary approach. J. Algorithms 42,317–313.

Rabin, M. O. 1976. Probabilistic algorithms. In Algorithms and Complexity, In New Directions and RecentTrends, J. F. Traub Ed., Academic Press,Waltham, MA. 29–39.

Ray, J., Hoe, J. C., and Falsafi, B. 2001. Dual use of superscalar datapath for transient-fault detection andrecovery. In Proceedings of the 34th Annual IEEE/ACM International Symposium on Microarchitecture(MICRO). 214–224.

Shin, D. and Gupta, S. 2010. Approximate logic synthesis for error tolerant applications. In Proceedings ofthe Design, Automation and Test in Europe Conference (DATE). 957–960.


�

�

�

�

�

�

�

�


Song, P. and De Micheli, G. 1991. Circuit and architecture trade-offs for high-speed multiplication. IEEE J.Solid-State Circuits 26, 9, 1184–1198.

Tschanz, J. W., Kao, J. T., Narendra, S. G., Nair, R., Antoniadis, D. A., Chandrakasan, A. P., and De, V.2002. Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations onmicroprocessor frequency and leakage. IEEE J. Solid-State Circuits, 1396–1402.

Varatkar, G. V. and Shanbhag, N. R. 2006. Energy-efficient motion estimation using error-tolerance. InProceedings of the International Symposium on Low Power Electronics and Design (ISLPED).

Verma, A. and Ienne, P. 2007. Improving XOR-dominated circuits by exploiting dependencies betweenoperands. In Proceedings of the Asia and South Pacific Design Automation Conference.

von Neumann, J. 1956. Probabilistic logics and the synthesis of reliable organisms from unreliable compo-nents. In Automata Studies, C. E. Shannon and J. McCarthy Eds., Princeton Univ. Press, Princeton,N.J.

Received June 2011; revised September 2011; accepted November 2011


Synthesizing Parsimonious Inexact Circuits through ...al4/lingamneni-tecs13a.pdf · 93 Synthesizing...

Documents

Transcript of Synthesizing Parsimonious Inexact Circuits through ...al4/lingamneni-tecs13a.pdf · 93 Synthesizing...