DRD

16
418 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003 Minimization of Switching Activities of Partial Products for Designing Low-Power Multipliers Oscal T.-C. Chen, Sandy Wang, and Yi-Wen Wu Abstract—This work presents low-power 2’s complement multipliers by minimizing the switching activities of partial products using the radix-4 Booth algorithm. Before computation for two input data, the one with a smaller effective dynamic range is processed to generate Booth codes, thereby increasing the probability that the partial products become zero. By employing the dynamic-range determination unit to control input data paths, the multiplier with a column-based adder tree of compressors or counters is designed. To further reduce power consumption, the two multipliers based on row-based and hybrid-based adder trees are realized with operations on effective dynamic ranges of input data. Functional blocks of these two multipliers can preserve their previous input states for noneffective dynamic data ranges and thus, reduce the number of their switching operations. To illustrate the proposed multipliers exhibiting low-power dissipation, the theoretical analyzes of switching activities of partial products are derived. The proposed 16 16-bit multiplier with the column-based adder tree conserves more than 31.2%, 19.1%, and 33.0% of power consumed by the conventional multiplier, in applications of the ADPCM audio, G.723.1 speech, and wavelet-based image coders, respectively. Furthermore, the proposed multipliers with row-based, hybrid-based adder trees reduce power consumption by over 35.3%, 25.3% and 39.6%, and 33.4%, 24.9% and 36.9%, respectively. When considering product factors of hardware areas, critical delays and power consump- tion, the proposed multipliers can outperform the conventional multipliers. Consequently, the multipliers proposed herein can be broadly used in various media processing to yield low-power consumption at limited hardware cost or little slowing of speed. Index Terms—Adder-tree, arithmetic, digital, low-power design, switching activity. I. INTRODUCTION A DVANCES IN microelectronic technology have led to more effective encoding of data, more reliable trans- mission of information, and more embedded intelligence in systems. In particular, to meet the increasing market demand for portable applications, these microelectronic devices consume very low power. Consequently, various digital signal processing chips are now designed with low-power dissipation [1], [2]. In such systems, a multiplier is a fundamental arithmetic unit. The computation of a multiplier manipulates two input data to generate many partial products for subsequent addition operations, which in the CMOS circuit design, require many Manuscript received July 4, 2000; revised April 2, 2002. This work was sup- ported in part by the Computer and Communication Research Laboratories, ITRI, Taiwan, under Contract TI-89024, and in part by the National Science Council, Taiwan, under Contract 88-2736-L-194-003. The authors are with the Department of Electrical Engineering, Signal and Media Laboratories, National Chung Cheng University Chia-Yi, 621, Taiwan R.O.C. (e-mail: [email protected]). Digital Object Identifier 10.1109/TVLSI.2003.810788 switching activities [3]. Thus, switching activities within the functional units of a multiplier account for the majority of the power dissipation of a multiplier, as given in the following: (1) where is the switching activity parameter, is the loading ca- pacitance, is the operating voltage, and is the operating frequency. can also be viewed as the effective switching ca- pacitance of the transistors’ nodes on charging and discharging. Therefore, minimizing switching activities can effectively re- duce power dissipation without impacting the circuit’s opera- tional performance. Many researchers have elucidated various approaches that use modified algorithms, architectures, and circuits to reduce power consumption [4]–[9]. Abu-Khater et al. developed circuit tech- niques for low-power, high-performance multiplier designs [4]. Moshnyaga et al. analyzed the algorithmic, structural, and cir- cuit levels, and used sign generation and 4–2 compressors to minimize switching activities [5]. Angel and Swartzlander sug- gested using an efficient sign extension scheme to process the sign bits [6], allowing the multiplier to bypass processing sign extensions, thus reducing power dissipation. Yu et al. reorga- nized a Booth-encoded carry-save adder array in a multiplier design to reduce power consumption [7]. Goldovsky et al. devel- oped modified radix-4 Booth encoders to generate partial prod- ucts that are summed by (3,2), (5,3), and (7,4) counters in an array with reducing sum and carry vectors [8]. Mahant-Shetti et al. employed a bottom-up temporal tiling approach to design a leapfrog array multiplier that minimized spurious transition activity [9]. In this work, low-power multipliers are investigated by min- imizing switching activities of partial products according to ef- fective dynamic ranges of input data. In designing the proposed low-power multipliers, the radix-4 Booth algorithm is utilized to reduce the complexity of implementation. For every two input data, the one with a smaller effective dynamic range is pro- cessed to yield several Booth codes. According to the Booth codes, the other datum is multiplied with 2, 1, 0, 1, or 2 to generate partial products that are then shifted and summed in parallel to yield the final result. Hence, these partial products have a greater chance of equaling zero because of the Booth en- coding the datum with a smaller effective dynamic range. Fur- thermore, the switching activities of partial products decrease, implying a decline in power dissipation. To realize the pro- posed multipliers, the dynamic-range determination units can be easily designed in front of the Booth decoders and adder trees, to switch or pass input data flows where the adder trees 1063-8210/03$17.00 © 2003 IEEE

description

dynamic range detector

Transcript of DRD

418 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003

Minimization of Switching Activities of PartialProducts for Designing Low-Power Multipliers

Oscal T.-C. Chen, Sandy Wang, and Yi-Wen Wu

Abstract—This work presents low-power 2’s complementmultipliers by minimizing the switching activities of partialproducts using the radix-4 Booth algorithm. Before computationfor two input data, the one with a smaller effective dynamicrange is processed to generate Booth codes, thereby increasing theprobability that the partial products become zero. By employingthe dynamic-range determination unit to control input data paths,the multiplier with a column-based adder tree of compressorsor counters is designed. To further reduce power consumption,the two multipliers based on row-based and hybrid-based addertrees are realized with operations on effective dynamic rangesof input data. Functional blocks of these two multipliers canpreserve their previous input states for noneffective dynamic dataranges and thus, reduce the number of their switching operations.To illustrate the proposed multipliers exhibiting low-powerdissipation, the theoretical analyzes of switching activities ofpartial products are derived. The proposed 16 16-bit multiplierwith the column-based adder tree conserves more than 31.2%,19.1%, and 33.0% of power consumed by the conventionalmultiplier, in applications of the ADPCM audio, G.723.1 speech,and wavelet-based image coders, respectively. Furthermore, theproposed multipliers with row-based, hybrid-based adder treesreduce power consumption by over 35.3%, 25.3% and 39.6%, and33.4%, 24.9% and 36.9%, respectively. When considering productfactors of hardware areas, critical delays and power consump-tion, the proposed multipliers can outperform the conventionalmultipliers. Consequently, the multipliers proposed herein canbe broadly used in various media processing to yield low-powerconsumption at limited hardware cost or little slowing of speed.

Index Terms—Adder-tree, arithmetic, digital, low-power design,switching activity.

I. INTRODUCTION

A DVANCES IN microelectronic technology have led tomore effective encoding of data, more reliable trans-

mission of information, and more embedded intelligence insystems. In particular, to meet the increasing market demand forportable applications, these microelectronic devices consumevery low power. Consequently, various digital signal processingchips are now designed with low-power dissipation [1], [2]. Insuch systems, a multiplier is a fundamental arithmetic unit.

The computation of a multiplier manipulates two input datato generate many partial products for subsequent additionoperations, which in the CMOS circuit design, require many

Manuscript received July 4, 2000; revised April 2, 2002. This work was sup-ported in part by the Computer and Communication Research Laboratories,ITRI, Taiwan, under Contract TI-89024, and in part by the National ScienceCouncil, Taiwan, under Contract 88-2736-L-194-003.

The authors are with the Department of Electrical Engineering, Signal andMedia Laboratories, National Chung Cheng University Chia-Yi, 621, TaiwanR.O.C. (e-mail: [email protected]).

Digital Object Identifier 10.1109/TVLSI.2003.810788

switching activities [3]. Thus, switching activities within thefunctional units of a multiplier account for the majority of thepower dissipation of a multiplier, as given in the following:

(1)

where is the switching activity parameter, is the loading ca-pacitance, is the operating voltage, and is the operatingfrequency. can also be viewed as the effective switching ca-pacitance of the transistors’ nodes on charging and discharging.Therefore, minimizing switching activities can effectively re-duce power dissipation without impacting the circuit’s opera-tional performance.

Many researchers have elucidated various approaches that usemodified algorithms, architectures, and circuits to reduce powerconsumption [4]–[9]. Abu-Khateret al.developed circuit tech-niques for low-power, high-performance multiplier designs [4].Moshnyagaet al. analyzed the algorithmic, structural, and cir-cuit levels, and used sign generation and 4–2 compressors tominimize switching activities [5]. Angel and Swartzlander sug-gested using an efficient sign extension scheme to process thesign bits [6], allowing the multiplier to bypass processing signextensions, thus reducing power dissipation. Yuet al. reorga-nized a Booth-encoded carry-save adder array in a multiplierdesign to reduce power consumption [7]. Goldovskyet al.devel-oped modified radix-4 Booth encoders to generate partial prod-ucts that are summed by (3,2), (5,3), and (7,4) counters in anarray with reducing sum and carry vectors [8]. Mahant-Shettiet al.employed a bottom-up temporal tiling approach to designa leapfrog array multiplier that minimized spurious transitionactivity [9].

In this work, low-power multipliers are investigated by min-imizing switching activities of partial products according to ef-fective dynamic ranges of input data. In designing the proposedlow-power multipliers, the radix-4 Booth algorithm is utilized toreduce the complexity of implementation. For every two inputdata, the one with a smaller effective dynamic range is pro-cessed to yield several Booth codes. According to the Boothcodes, the other datum is multiplied with2, 1, 0, 1, or 2to generate partial products that are then shifted and summedin parallel to yield the final result. Hence, these partial productshave a greater chance of equaling zero because of the Booth en-coding the datum with a smaller effective dynamic range. Fur-thermore, the switching activities of partial products decrease,implying a decline in power dissipation. To realize the pro-posed multipliers, the dynamic-range determination units canbe easily designed in front of the Booth decoders and addertrees, to switch or pass input data flows where the adder trees

1063-8210/03$17.00 © 2003 IEEE

CHEN et al.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS 419

(a) (b)

Fig. 1. The proposed multipliers. (a) The column-based multiplier. (b) The row-based or hybrid-based multiplier.

can be implemented by the column-based, row-based and hy-brid-based structures. The proposed multipliers, using column-based, row-based and hybrid-based adder trees are named asthe proposed column-based, row-based and hybrid-based mul-tipliers, respectively.

When only the dynamic-range determination unit is usedin front of the conventional multiplier that uses counters andcompressors, such a multiplier is denoted as the proposedcolumn-based multiplier, as shown in Fig. 1(a). The conven-tional Booth-algorithm multiplier adds partial products in thecolumn direction. Although partial products are more likely tobe zero in the proposed column-based multiplier than in theconventional one, some compressors or counters which sumthese zero products may consume power because they add theswitched sum or carry-out bit of neighboring compressors orcounters. To improve upon this, additions of partial products inthe row direction are proposed to reduce the number of partialproducts connected to each adder unit, and the number ofintermediate accumulation results connected to each adder unit.With this multiplier, only some functional units can be activatedto conduct operations according to the one of two input data,with a smaller effective dynamic range [10], [11]. Switchingactivities of the unused functional blocks are minimized where

input bits of unused functional blocks remain unaltered. How-ever, to have capability of preserving the previous states forunused functional blocks, the proposed row-based multiplierrequires more flip-flops than the proposed column-basedone. The states of input data stored in the flip-flops can bechanged by a group of bits such as 4, 6, and 8 bits to reducethe number of flip-flops. On the other hand, the critical delayof the proposed row-based multiplier is also longer than thatof the proposed column-based multiplier because of adding inthe row direction. This situation is improved by developing thehybrid-based adder tree which integrates column-based androw-based structures in the proposed hybrid-based multiplier.These two multipliers include master-stage flip-flops, a dy-namic-range determination unit, slave-stage flip-flops, Boothdecoders, a row-based adder tree or a hybrid-based adder tree,and a sign-extension unit, as depicted in Fig. 1(b).

In this study, the low-power 2’s complement Booth-al-gorithm multipliers based on column-based, row-based,and hybrid-based adder trees are implemented using TSMC0.25 m CMOS technology. The proposed column-basedmultiplier increases the probability that partial products be-come zero for power reduction. The proposed row-based andhybrid-based multipliers not only reduce the bit switching of

420 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003

Fig. 2. The proposed column-based 16� 16-bit multipliers.

partial products, but also minimize the power consumption offunctional units for noneffective bits. Moreover, in Appendix 1,equations are derived to demonstrate that the proposed mul-tipliers exhibit partial products with low switching activities.The multiplication operations of the practical data are analyzedin the proposed multipliers that consume less power thanthe conventional multipliers. Consequently, the multipliersproposed herein are very well suited to low-power multimediaprocessing at reasonable hardware cost or little reduction ofspeed.

II. PROPOSEDLOW-POWER MULTIPLIERS

Figs. 2–4 show the proposed column-based, row-based, andhybrid-based 16 16-bit multipliers, respectively, to demon-strate the fact that the proposed multipliers have low-power con-sumption. In these three kinds of multipliers, Booth encoding isperformed through the radix-4, resulting in eight partial prod-ucts for summation in the column-based, row-based, and hy-brid-based adder trees. The functional units of the proposedlow-power multipliers are described as follows:

Master-Stage and Slave-Stage Flip-Flops

The master-stage and slave-stage flip-flops are realizedusing the true-single phase edge-triggered circuit, as shown inFig. 3(a) [12]. This type of circuit design has both high-speedand low-power dissipation characteristics. The master-stage

flip-flops are to latch input data for the dynamic-range determi-nation unit to decide the input data flow and generate controlsignals. The slave-stage flip-flops store the updated input dataor retain previous data.

Dynamic-Range Determination Unit

The dynamic-range determination unit detects effective dy-namic ranges of input data, and then generates control signals. Inthe proposed column-based multiplier, these control signals de-termine the data flows between the mater-stage and slave-stageflip-flops. In the proposed row-based and hybrid-based multi-pliers, the control signals not only select the data flows but alsomanipulate slave-stage flip-flops to maintain noneffective bitsin their previous states, and thus ensure that the functional unitsaddressed by these data do not consume switching power. Ad-ditionally, these control signals are used to control the data pathof an adder tree and the sign extension operation.

The effective dynamic range detection can be realized usinggroups of bits to simplify the implementation. In this study, abasic group is based upon two bits for detection, since a par-tial product is determined by an average of two bits of an inputdatum in the radix-4 Booth encoding. Fig. 5 shows the func-tional blocks of the dynamic-range determination unit that in-cludes comparators, logic gates, multiplexors, and latches. Datadetection begins from the most significant bits, and the com-parators examine each 3-bit group, but not the four least signif-icant bits. If these three bits are all either zero or one, then acontrol signal output is 1; otherwise it is 0. An overlapped bit in

CHEN et al.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS 421

Fig. 3. The proposed row-based 16� 16-bit multipliers. (a) Mode I. (b) Mode II.

422 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003

(c)

(d)

Fig. 3. Continued. (c) Mode III. (d) Mode IV.

CHEN et al.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS 423

Fig. 4. The proposed hybrid-based 16� 16-bit multipliers. (a) Mode III. (b) Mode IV.

424 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003

Fig. 5. The dynamic-range determination unit.

the neighboring two groups is used to support a continual com-parison. For a 16-bit datum, six groups are compared to deter-mine the effective dynamic ranges of 4, 6, 8, 10, 12, 14, and16 bits. Herein, a 16 16-bit multiplier has two input data, theeffective dynamic ranges of which are determined by 12 3-bitcomparators. The one of two input data having a smaller effec-tive dynamic range can be determined by logic operations onthe signals from the comparators. The signal, indicating a datumwith a smaller effective dynamic range, is generated to controlmultiplexors in the switcher of the dynamic-range determina-tion unit to manipulate the input data flow. Furthermore, this andother signals, indicating the effective dynamic ranges of inputdata, address the control signal generator of the dynamic-rangedetermination unit to yield the control signals that manipulatethe slave-stage flip-flops, multiplexors of a row-based or hybrid-based adder tree, and a sign-extension unit. Four bits or more canconstitute a basic group of data that are either changed or un-changed in slave-stage flip-flops together, to reduce the numberof the slave-stage flip-flops in the proposed row-based multi-pliers, because each of the functional units after the slave-stageflip-flops requires at least two partial products to be computed.Herein, when effective dynamic ranges of input data randomlyoccur between 1 and 16 bits, four operational modes are con-sidered for analysis simplification. 1) Mode I that operates on4, 8, 12, and 16 bits; 2) mode II that operates on 8, 12, and 16bits; 3) mode III that operates on 8 and 16 bits, and 4) mode IVthat operates on 12 and 16 bits. Only modes III and IV of theproposed hybrid-based multipliers are explored by consideringthe reduction of processing speed.

A Booth Decoder

The radix-4 Booth decoder can generate five possible valuesof 2, 1, 0, 1, and 2 times the input datum. The proposedradix-4 Booth decoder, shown in Fig. 3(a), includes a 3-to-1multiplexor and simple logic gates to select the decoded valueof 0, 1 or 2 times the input datum, or to invert the output value.

An Adder Tree

The carry-save adders, (3,2), (5,3), and (7,4) counters, anda leapfrog adder array applied in the Yu, Goldovsky, andMahant–Shetti’s multipliers [7]–[9], respectively, are appliedin the adder trees of the proposed column-based multipliers

for comparison. The proposed row-based multipliers requireseven ripple adders and multiplexors that are arranged in fouroperational structures, as shown in Fig. 3. The hybrid-basedadders, shown in Fig. 4, include the row-based adders and thecolumn-based adders using Yu’s approach. The eight partialproducts are grouped into two parts which are individuallysummed in the column-based adders, and the results from thesetwo parts are added by using the row-based adder.

Sign-Extension Unit

The sign-extension unit is used only in the proposed row-based and hybrid-based multipliers. By using the control signalsof the dynamic-range determination unit, only input bits in theeffective dynamic range are allowed to move to the slave-stageflip-flops. Input bits in the noneffective dynamic range remainin their previous states such that no switching activities con-sume power. Here, the effective dynamic ranges of input dataare determined by a group of bits as a basis, such that the de-tected effective dynamic-range values may exceed the actualones. After an adder tree performs addition, the results in theeffective and noneffective dynamic ranges have correct and in-correct values, respectively. Sign extension must be assigned tothe output result in the noneffective dynamic range to restore thecorrect value in the final step. Fig. 6 shows the functional blocksof the sign-extension unit in four different operational modes.Herein, multiplexors were used to decide which bits were signsand which were values.

III. POWER ANALYSES

The proposed 16 16-bit 2’s complement Booth-algo-rithm multipliers using the column-based, row-based andhybrid-based adder trees are implemented by the Cadence tool,using TSMC 0.25 m CMOS technology to generate theirlayouts. These layouts are extracted, and post-simulated bythe Power-mill and Time-mill tools. Here, the widths/lengthsof the pMOS and nMOS transistors are 2.5m/0.25 m,and 1.0 m/0.25 m, respectively, for most circuit cells in theconventional and proposed multipliers, except in the slave-stageflip-flops and the carry propagation adder in the last stage of theadder tree. Considering the driving capabilities of slave-stageflip-flops and the processing speed of the carry propagation

CHEN et al.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS 425

Fig. 6. The sign-extension unit. (a) Mode I. (b) Mode II. (c) Mode III. (d) Mode IV.

adder, their transistors are sufficiently enlarged for use in boththe proposed and conventional multipliers.

Adaptive differential pulse code modulation (ADPCM)audio, G.723.1 speech, and wavelet-based image coders areemployed in practical power analyzes. Their multiplicationoperations are performed using a multiplier that is eitherthe proposed or conventional multiplier. In the ADPCMaudio coder, a 0.125-second segment of audio is analyzed, inwhich the multiplication operations of low-pass and high-passband-splitting, and signal prediction involve 17 367 inputvectors. In the G.723.1 speech coder, the multiplication opera-tions involved in autocorrelation of linear prediction coding for0.05-second speech signals sampled at 8 KHz have 26 697 inputvectors. In the wavelet-based image coder, one fortieth of themultiplication operations of the 512 512-pixel Lenna image

through the 5-tap low-pass and 3-tap high-pass filtering of thewavelet filters are performed and involve 19 117 input vectors.Fig. 7 shows the histograms of effective dynamic ranges ofinput data for multiplication in these three applications.

Table I lists the power consumption, areas and critical delaysof the conventional and proposed column-based multipliersin these three applications. The proposed column-basedmultipliers that use the approaches of Yu, Goldovsky, andMahant-Shetti, consume less power than the conventional Yu’s,Goldovsky’s and Mahant-Shetti’s multipliers. Goldovsky’smultiplier requires a larger hardware area than the other twoconventional multipliers since it uses the condition-sum adderin the last stage of its adder tree. In Mahant-Shetti’s multiplier,the sum output of a full adder is linked to the sum inputof the subsequent adder using a leapfrog connection, such

426 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003

Fig. 7. The histograms of effective dynamic ranges of input data for multiplication in the three practical applications. (a) ADPCM audio coder. (b) G.723.1 speechcoder. (c) Wavelet-based image coder.

that this multiplier requires more full adders to realize itsadder tree than Yu’s multiplier. Yu’s multiplier includes theadder array for adding from the most to the least significantbits. Here, a further modification connects the sum and carryoutputs of a carry save adder to the carry and sum inputs ofthe subsequent carry save adder, respectively. The proposedcolumn-based multiplier, using Yu’s approach, consumes lesspower than the other two proposed column-based multipliers.Additionally, it uses 31.2%, 19.1%, and 33.0% less powerthan the Yu’s multiplier, to realize the ADPCM audio, G.723.1speech and wavelet-based image coders, respectively. Here, thedynamic-range determination unit consumes 7.1% less powerconsumption in the proposed column-based multiplier usingYu’s approach. Fig. 7 shows that the effective dynamic rangesof input data from the wavelet-based image coder vary lessand are smaller than 9 bits. Hence, the proposed column-basedmultipliers computing the wavelet-based image coder, caneffectively switch or pass the input data flow to encode input

data of which effective dynamic ranges are smaller than 9 bits,and thus they consume less power than those computing theADPCM audio and G.723.1 speech coders.

Table II lists the power consumption, areas and criticaldelays of the proposed row-based and hybrid-based multipliersfor these three applications. The proposed row-based and hy-brid-based multipliers in modes III and IV consume less powerthan the proposed column-based multipliers. Additionally,Tables I and II illustrate that the row-based multiplier in modeIV consumes the least power. The proposed row-based andhybrid-based multipliers in mode IV save more than 35.3%,25.3%, and 39.6%, and 33.4%, 24.9%, and 36.9% of the powerin Yu’s multiplier to realize the ADPCM audio, G.723.1 speechand wavelet-based image coders, respectively. Nevertheless,the proposed column-based, row-based and hybrid-basedmultipliers exhibit more than 0.0%, 21.3%, and 2.5% of thecritical delay, and more than 12.6%, 14.8%, and 12.6% ofthe hardware area of Yu’s multiplier, respectively, when the

CHEN et al.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS 427

TABLE IPOWER CONSUMPTION, AREAS AND CRITICAL DELAYS OF THE CONVENTIONAL AND PROPOSEDCOLUMN-BASED MULTIPLIERS

TABLE IIPOWER CONSUMPTION, AREAS AND CRITICAL DELAYS OF THE PROPOSEDROW-BASED AND HYBRID-BASED MULTIPLIERS

TABLE IIIPOWER CONSUMPTION OF THEPROPOSEDCOLUMN-BASED, ROW-BASED AND HYBRID-BASED, AND YU’S 16� 16-BIT MULTIPLIERS FOR

EFFECTIVE DYNAMIC RANGES OFINPUT DATA WITH UNIFORM DISTRIBUTIONS

operational mode IV is utilized. The power dissipation of thedynamic-range determination unit and sign-extension unit isless than 8.8% of those of the row-based and hybrid-based mul-tipliers in mode IV. When considering the factor of multiplyingpower consumption, areas and critical delays, the proposedhybrid-based multiplier in mode IV performs best in these threeapplications and the second best performer is the proposedcolumn-based multiplier.

The proposed column-based multiplier that follows Yu’s ap-proach, the proposed row-based and hybrid-based multipliers inmodes III and IV, and Yu’s conventional multiplier are chosenfrom Tables I and II for a comparison that involves the effec-tive dynamic ranges of input data with uniform and Gaussiandistributions. Here, each distribution case involves 15 000 inputvectors, where the signs of the input data are randomly gen-erated. Table III lists the power consumption of the proposed

428 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003

TABLE IVPOWER CONSUMPTION OF THEPROPOSEDCOLUMN-BASED, ROW-BASED AND HYBRID-BASED, AND YU’S 16� 16-BIT MUTIPLIERS FOR

EFFECTIVE DYNAMIC RANGES OFINPUT DATA WITH GAUSSIAN DISTRIBUTIONS

and conventional multipliers for uniformly distributed effectivedynamic ranges of input data. The saving ratios of power con-sumption of the proposed column-based multiplier against thatof the conventional multiplier increase with the effective dy-namic ranges of the input data. The proposed row-based and hy-brid-based multipliers in mode III have the largest power savingratios for effective dynamic ranges of input data between 1 and8 bits, whereas these two multipliers in mode IV have the largestpower saving ratios for effective dynamic ranges of input databetween 1 and 12 bits. This effect reveals that operational modesIII and IV can match the effective dynamic ranges of input datafrom 1 to 8 bits and from 1 to 12 bits, respectively. Table IVspecifies the power consumption of the proposed and conven-tional multipliers when effective dynamic ranges of input datafollow the Gaussian distributions with different means and stan-dard deviations. The effective dynamic ranges of input data in-crease with the mean, increasing power consumption. However,for a given mean, larger standard deviations facilitate increasedpower savings because of an increased probability of encodingthe data with smaller effective dynamic ranges. Tables III andIV reveal that the proposed row-based or hybrid-based multi-pliers in modes III or IV consume the least power for various ef-fective dynamic-range distributions. Therefore, the multipliersproposed herein consume less power by reducing the switchingactivities of partial products to realize various low-power mul-timedia applications.

The results of the previous 1616-bit proposed and con-ventional multipliers are analyzed to effectively utilize theproposed column-based, row-based, and hybrid-based multi-pliers. The proposed column-based, row-based, and hybrid-based multipliers have 1.00, 1.21, and 1.03 times the criticaldelay, and 1.13, 1.15, and 1.13 times the hardware area ofYu’s conventional multiplier, respectively, when operationalmode IV is utilized. Furthermore, the proposed row-based andhybrid-based multipliers can conserve more power than theproposed column-based multiplier. When neighboring inputdata have similar effective dynamic ranges and the same sign,

TABLE VPROBABILITIES OF THE BOOTH DECODED VALUES

BEING�2Y ,�Y , 0,Y , AND 2Y

the proposed column-based multiplier can be cost-effective.When two neighboring input data have a large dynamic-rangedifference, the proposed row-based and hybrid-based multi-pliers can effectively save power when their operational modesare selected to match the effective dynamic-range distributionof input data. In addition, the proposed row-based multipliermay consume less power but has a longer delay than theproposed hybrid-based multiplier. Users can thus determinea proposed multiplier that is suited to their applications byconsidering the chip area, speed, power consumption, and datatype.

IV. CONCLUSION

The three proposed Booth-algorithm multipliers are demon-strated to dissipate less power than conventional ones. Thesethree multipliers are equipped with dynamic-range deter-mination units to add partial products in the column-based,row-based and hybrid-based adder trees. The dynamic-range

CHEN et al.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS 429

TABLE VIPROBABILITIES OF BOOTH DECODEDVALUES AT DIFFERENTEFFECTIVE DYNAMIC RANGES

determination unit detects the one of two input data, with thesmaller effective dynamic range for Booth encoding, mini-mizing the switching activities of partial products. Additionally,the DRD unit of the proposed row-based and hybrid-basedmultipliers controls the slave-stage flip-flops to store effectivedynamic-range bits of an input datum, manipulates the dataflow of an adder tree, and determines the operation of thesign-extension unit for further power reduction. The poweranalyzes of multiplication operations of the practical inputdata confirmed that the proposed 1616-bit column-based,row-based, and hybrid-based multipliers dissipate less powerthan Yu’s conventional multiplier. The proposed hybrid-basedmultiplier is the best and the proposed column-based multiplieris the second best in terms of the product factors of hardware

areas, critical delays and power consumption. Consequently,the proposed low-power multipliers can be used in variouspractical applications with a small increase in hardwarecomplexity or critical delay. Finally, power consumption,hardware complexity, processing speed, and data types are themost important considerations of the cost-effective selectionof the proposed column-based, row-based, or column-hybridmultiplier.

APPENDIX

THEORETICAL ANALYSES OFSWITCHING ACTIVITIES

The theoretical foundation is derived to illustrate the reduc-tion of switching activities for the partial products of the pro-

430 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003

posed 2’s complement multiplication. The radix-4 Booth algo-rithm is usually applied to encode one of two input data,and

[13]. If a series of data, , , is used forBooth encoding, then a datum, , is partitioned into several3-bit groups, each of which has one bit that is overlapped withthe previous group. Hence, the 2’s complement ofwith aword length of , can be represented by

(2)

where is the th digit of , and equals zero. Here,is assumed to be an even number. Multiplying the other input

datum, by , (2) is modified to,

(3)

According to (3), is an intermediate product thatcan be represented by five different values of , , 0,

, and . Table V lists occurrence probabilities of these fivevalues when three bits of , except for , are uniformlydistributed as 0 or 1.

Table VI presents the occurrence probabilities of the Boothdecoded values, obtained by considering the effective dynamicranges of , where of , , ,

and designate probabilities associated withas , , 0, , and , respectively, for the

case in which has an effective dynamic range ofbits. Ifthe effective dynamic ranges ofhave probabilities of ,

, where indicates the probabilitythat the effective dynamic range isbits, then the probabilitythat each partial product, , is zero can be derived asthe following:

for (4)

The relationship between andcan be classified simply as four cases of changes of partial prod-ucts—1) from zero to zero, 2) from zero to nonzero values, 3)from nonzero values to zero, and 4) from nonzero to nonzerovalues. Switching activities occur in cases 2), 3), and 4). The

average switching activity of the partial product, ,can be approximately given by the following:

(5)

where the output value from the radix-4 Booth decoder has aword length of bits. Here, input data are assumed tobe uncorrelated and switch simultaneously. In addition, neigh-boring partial products are independent and simultaneouslychange their states without glitching, and thus have an averageof one half of the bits with switching. Furthermore, the averageswitching activity of all partial products is

(6)

According to (6), the switching activity can be reduced whenincreasing the probabilities that the partial products are zero.From Table VI, is a fixed value for an effective dynamicrange of bits. Hence, altering the distribution of can effec-tively increase the value of , minimizing switching activi-ties. Table VI, and (4) and (6) reveal that the minimum averageswitching activity occurs when the effective dynamic range of

is only 1 bit. In this case, equals whereis 0.5 and is 1 for greater than 0.

Equation (6) represents the average switching activity ofpartial products of the conventional multiplication, using theradix-4 Booth algorithm. The partial products from Booth de-coders that operate on the most significant bits of input data aremore likely to become zero when the proposed column-basedmultiplication, as shown in Fig. 1(a), is employed by Boothencoding the one of two input data, with a smaller effectivedynamic range. Additionally, the dynamic-range determinationunit has a detection resolution of 2 bits and determines effectivedynamic ranges larger than 4 bits. Accordingly, the probability

CHEN et al.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS 431

TABLE VIIAVERAGE SWITCHING ACTIVITIES OF PARTIAL PRODUCTS OF THEPROPOSED ANDCONVENTIONAL BOOTH-ALGORITHM MUTIPLIERS FOR

EFFECTIVE DYNAMIC RANGES OFINPUT DATA WITH UNIFORM DISTRIBUTIONS

that the effective dynamic range of input data isbits for Boothencoding can be formulated as

for

for

an odd number

for

an even number.

(7)

Replacing ( ) with in Eq. (4), yields the probability thatthe partial product from the Booth decoder is zero:

for (8)

The average switching activity of all partial products within theproposed column-based multiplication are then represented by

(9)

Only partial products from the effective dynamic rangeof an input datum for Booth encoding are switched and theothers remain in their previous states. These additional reducedswitching activities come primarily from the changes of effec-tive dynamic ranges of two neighboring input data for Boothencoding, from large to small. As well as the Booth encodingsmaller dynamic-range numbers, the proposed row-based andhybrid-based multipliers, shown in Fig. 1(b), perform additionsof partial products at effective dynamic data ranges to savepower. Several grouped data ranges are allowed for preservingthe previous states to reduce the number of the slave-stageflip-flops. Hence, their switching activities occur when partialproducts, within the grouped effective dynamic ranges, change

from zero to nonzero values, from nonzero to nonzero values,and from nonzero values to zero. Thus, the switching activityof the th partial product can be formulated as

(10)

where represents the least number in the predetermined datarange , and ( ) belongs to . For example, the pro-posed hybrid-based 16 16-bit multiplier in mode III has twopredetermined data ranges, and where includes from1 to 8 bits and from 9 to 16 bits. When is 5, ( ) equals11 and thus belongs to : then is 9 and is used in Eq. (10).Consequently, the average switching activity of all partial prod-ucts for the proposed row-based or hybrid-based multiplicationis

(11)

According to (6), (9) and (11), the average switching ac-tivities of partial products for the conventional and proposedmultiplication can be analyzed for various effective dynamicranges of input data. Here, 16 16-bit multiplication is usedas an example in which two input data,and , are assumedto have the same dynamic-range distribution. Table VII illus-trates average switching activities of partial products for effec-tive dynamic ranges of input data with uniform distributions.According to Table VII, a larger effective dynamic range ofinput data implies greater switching activities. With the pro-posed multipliers, saving ratios are likely increased with ef-fective dynamic ranges because the more differences betweenthe effective dynamic ranges of two input data enable the pro-posed multipliers to encode input data with smaller effective

432 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003

TABLE VIIIAVERAGE SWITCHING ACTIVITIES OF PARTIAL PRODUCTS OF THEPROPOSED

AND CONVENTIONAL BOOTH-ALGORITHM MUTIPLIERS FOREFFECTIVE

DYNAMIC RANGES OFINPUT DATA WITH THE GAUSSIAN DISTRIBUTIONS

dynamic ranges. Table VIII presents the average switching ac-tivities of the conventional and proposed multipliers for effec-tive dynamic ranges of input data with the Gaussian distribu-tions. When the standard deviation increases, variations in ef-fective dynamic ranges of two input data increase, thereby in-creasing the saving ratios of switching activities. In contrast,when the mean increases, minimizing switching activities be-comes increasingly difficult. That is, an increase in the effectivedynamic range decreases the probability that the partial productsbecome zero, making more a reduction in switching activitiesmore difficult. According to Table VII and VIII, the proposedrow-based or hybrid-based multipliers in modes III or IV canexhibit the least switching activity, since it uses smaller effectivedynamic-range numbers for Booth encoding and controls valuesof partial products in part of the noneffective dynamic range toremain unchanged. The variation characteristics of the results inTables VII and VIII are quite consistent with those in Tables IIIand IV, respectively. However, as effective dynamic ranges ofinput data span in a small range or has a low standard devia-tion, the power conserved from reduction of switching activi-ties cannot compensate for the power consumed by the overheadhardware components in the proposed multipliers. Thereby, theproposed multipliers consume little more power than the con-ventional multiplier in these cases.

ACKNOWLEDGMENT

Valuable comments and suggestions from reviewers arehighly appreciated. Dr. Bing J. Sheu, Nassda Corp., SantaClara, USA, are also commended for his valuable suggestionson low-power circuit design. Nan-Ying Shen, Dept. of Elec-trical Engineering, National Chung Cheng University, Chia-Yi,Taiwan helped on the circuit layouts and simulations.

REFERENCES

[1] A. P. Chandrakasan and R. W. Brodersen,Low-Power CMOS De-sign. Piscataway, NJ: IEEE Press, 1998.

[2] G. K. Yeap,Practical Low‘-ower Digital VLSI Design. Norwell, MA:Kluwer, 1998.

[3] A. P. Chandrakasan and R. W. Brodersen, “Minimizing power consump-tion in digital CMOS circuits,”Proc. IEEE, vol. 83, no. 4, pp. 498–523,Apr. 1995.

[4] I. S. Abu-Khater, A. Bellaouar, and M. Elmasry, “Circuit techniques forCMOS low-power high-performance multipliers,”IEEE J. Solid-StateCircuits, vol. 31, pp. 1535–1546, Oct. 1996.

[5] V. G. Moshnyaga and K. Tamaru, “A comparative study of switchingactivity reduction techniques for design of low-power multipliers,” inProc. IEEE Int. Symp. Circuits Syst., Apr. 1995, pp. 1560–1563.

[6] E. Angel and E. E. Swartzlander, Jr., “Low power parallel multipliers,”in Proc. IEEE Workshop Very Large Scale Integration (VLSI) Signal Pro-cessing, 1996, pp. 199–208.

[7] Z. Yu, L. Wasserman, and A. Willson, Jr., “A painless way to reducepower dissipation by over 18% in Booth-encoded carry-save array mul-tipliers for DSP,” inProc. IEEE Workshop Signal Processing Syst., 2000,pp. 571–580.

[8] A. Goldovsky, B. Patel, M. Schulte, and R. Kolagotla, “Design and im-plementation of a 16 by 16 low-power tow’s complement multiplier,” inProc. IEEE Int. Symp. Circuits Syst., vol. 5, 2000, pp. 345–348.

[9] S. Mahant-Shetti, P. Balsara, and C. Lemonds, “High performance lowpower array multiplier using temporal tiling,”IEEE Trans. VLSI Syst.,vol. 7, pp. 121–124, Mar. 1999.

[10] R. Sheen, S. Wang, O. T.-C. Chen, and R.-L. Ma, “Power consumptionof a 2’s complement adder minimized by effective dynamic data ranges,”in Proc. IEEE Int. Symp. Circuits Syst., vol. I, May 1999, pp. 266–269.

[11] S. Wang, Y. Wu, O. T.-C. Chen, and R. Ma, “Low-power multipliers byminimizing inter-data switching activities,” inProc. IEEE 43rd MidwestSymp. Circuits Systems, vol. 1, Aug. 2000, pp. 88–92.

[12] J. Yuan and C. Svensson, “New single-clock CMOS latches and flipflopswith improved speed and power savings,”IEEE J. Solid-State Circuits,vol. 32, pp. 62–69, Jan. 1997.

[13] O. T.-C. Chen, W.-L. Liu, H.-C. Hsieh, and J.-Y. Wang, “A highly-scaleable FIR using the Radix-4 Booth algorithm,” inProc. IEEE Int.Conf. Acoustic, Speech, and Signal Processing, vol. 3, May 1998, pp.1765–1768.

Oscal T.-C. Chen(S’89–M’94) was born in Taiwan,R.O.C., in 1965. He received the B.S. degree in elec-trical engineering from National Taiwan Universityin 1987, and the M.S. and Ph.D. degrees in electricalengineering from University of Southern Californiaat Los Angeles, in 1990 and 1994, respectively.

From 1994 to 1995, he was with the ComputerProcessor Architecture Department of ComputerCommunication and Research Labs. (CCL), Indus-trial Technology Research Institute (ITRI), Hsinchu,Taiwan, as System Design Engineer, Project Leader,

and Section Chief. He contributed significantly to many industrial applicationsincluding the fuzzy chip, neural networks, speech recognition system, anddigital signal processor. Since September 1995, he has been an AssociateProfessor in the Department of Electrical Engineering, National Chung ChengUniversity (NCCU), Chiayi, Taiwan. Currently, he is also Director of the Aca-demic Development Division, Office of Research and Development, NCCU.He has also served as a Technical Consultant with the Institute for informationIndustry, Center for Aviation and Space Technology and CCL, ITRI. Hisresearch interests include analog/digital circuit design, video/audio processing,DSP processors, VLSI systems, RF IC, microsensors, and communicationsystems.

Dr. Chen was an Associate Editor ofIEEE Circuits and Devices Magazinefrom July 1995 to March 1999, and a Founding Member of the multimediasystems and applications technical committee of IEEE Circuits and SystemsSociety. He participated in the Technical Program Committee of the IEEE In-ternational Conference on Multimedia and Expo, 2000–2002. He was the core-cipient of the Best Paper Award of IEEE TRANSACTIONS ONVERY LARGESCALE

INTEGRATON(VLSI) SYSTEMSin 1995. He is a Life Member of Chinese FuzzySystems Association.

CHEN et al.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS 433

Sandy (Li Yueh) Wangwas born in Taiwan, R.O.C., in 1974. She received theB.S. and M.S. degrees in electrical engineering from National Chung ChengUniversity, Taiwan, R.O.C., in 1997 and 1999, respectively.

In 2000, she joined Winbond Corporation, Hsinchu, Taiwan R.O.C. Her re-search interests include operational amplifiers, RF circuit modules, and low-power CMOS integrated circuits for consumer electronics.

Yi-Wen Wu was born in Yunlin, Taiwan, in 1976. She received the B.S. degreein electrical engineering from National Taiwan Ocean University at Keelung,Taiwan, R.O.C. in 1999, and the M.S. degree in electrical engineering from Na-tional Chung Cheng University at Chiayi, Taiwan, R.O.C. in 2001.

Currently, she is an Integrated Circuit Design Engineer in the Etrend Elec-tronics, Inc., Tainan, Taiwan, where she works in the field of very large scaleintegration (VLSI) circuit design and system analysis. Her research interests in-clude digital circuit design, board-level development and system integration.