59. Radix-4 and Radix-8 Booth Encoded Multi-Modulus Multipliers

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS 1

Radix-4 and Radix-8 Booth EncodedMulti-Modulus Multipliers

Ramya Muralidharan, Member, IEEE, and Chip-Hong Chang, Senior Member, IEEE

AbstractNovel multi-modulus designs capable of performingthe desired modulo operation for more than one modulus inResidue Number System (RNS) are explored in this paper tolower the hardware overhead of residue multiplication. Twomulti-modulus multipliers that reuse the hardware resourcesamongst the modulo , modulo and modulo mul-tipliers by virtue of their analogous number theoretic propertiesare proposed. The former employs the radix- Booth encodingalgorithm and the latter employs the radix- Booth encodingalgorithm. In the proposed radix- and radix- Booth encodedmulti-modulus multipliers, the modulo-reduced products for themoduli , and are computed successively. Withthe basis of the radix- Booth encoded modulo andradix- Booth encoded modulo and modulomultiplier architectures, new Booth encoded modulo multi-pliers are proposed to maximally share the hardware resourcesin the multi-modulus architectures. Our experimental results on

based RNS multiplication show that theproposed radix- and radix- Booth encoded multi-modulusmultipliers save nearly 60% of area over the correspondingsingle-modulus multipliers. The proposed radix- and radix-Booth encoded multi-modulus multipliers increase the delay ofthe corresponding single-modulus multipliers by 18% and 13%,respectively in the worst case. Compared to the single-modulusmultipliers, the proposed multi-modulus multipliers incur a minorpower dissipation penalty of 5%.

Index TermsBooth algorithm, multi-modulus architectures,multiplier, residue number system (RNS).

I. INTRODUCTION

R ESIDUE number system (RNS) has been shown tobe an advantageous alternative to traditional TwosComplement System (TCS) for accelerating certain computa-tions in high dynamic range (DR) applications such as signalprocessing, communication, and cryptographic algorithms[1][3]. RNS is defined by a set of pair-wise co-primemoduli called the base. For unambiguousrepresentation, the DR of the RNS is given by the productof all moduli comprising the base, i.e., . An in-teger within the DR is represented by a unique -tuple

, where is a residue defined by moduloor . The operation is implemented as

,where each residue of is independently computed fromonly the residues, and , of the operands, and , respec-tively in a modulo channel corresponding to the modulus

Manuscript received July 20, 2012; revised December 03, 2012; acceptedFebruary 07, 2013. This paper was recommended by Associate Editor M.-D.Shieh.The authors are with the School of Electrical and Electronic Engineering,

Nanyang Technological University, Singapore (e-mail: [email protected]; [email protected]).Digital Object Identifier 10.1109/TCSI.2013.2252642

[4]. The absence of inter-channel carry propagation and thereduced length of intra-channel carry propagation chains leadto faster implementation in RNS when compared to its TCScounterpart. However, the benefits of RNS may not translatewell into area and power constrained platforms due to the sig-nificantly extra hardware required for the conversion betweenTCS and RNS as well as the concurrent computations in severalmodulo channels.A factor that directly influences the hardware complexity of

RNS is the choice of the moduli that comprise the RNS base.Compared to the general moduli, special moduli of forms,, and that posses good number theoretic propertiesare preferred. By exploiting these properties, full-adder basedbinary-to-residue and residue-to-binary converters for specialmoduli sets have been developed [5][8]. Hardware efficientmodulo and modulo arithmetic circuits such asadders and multipliers have also been designed with these prop-erties as the basis [9][23].To lower the RNS area and power dissipation, system level

design methodologies like multi-function and multi-modulusdesigns have also been suggested [24], [25]. Architectures thatperform the same modulo operation for more than one modulusare called multi-modulus architectures (MMAs) [24]. MMAis classified into two types: fixed multi-modulus architecture(FMA) and variable multi-modulus architecture (VMA). Themodulo operations corresponding to the different moduli areperformed concurrently in a FMA but serially in a VMA.In contrast to the single-modulus architecture (SMA), MMAexploits hardware reuse amongst the modulo channels of RNS.Hence, both FMA and VMA result in considerable savingsin area as well as power dissipation. MMA is widely usedin reconfigurable and fault-tolerant RNSs where the moduliof the base are selectively used. In reconfigurable RNS, theDR of RNS is varied by selecting a subset of moduli fromthe base to fulfill the DR requirement of the application. Thechannels corresponding to the unselected moduli are turned offthereby reducing the system power dissipation [26], [27]. Infault-tolerant RNS, additional moduli are included in the basesuch that the DR of RNS far exceeds the DR requirement ofthe application. These redundant moduli are used to detect andisolate faults in computations. Once the faulty residue digit islocated, the corresponding modulus of the base is discarded[28], [29]. Another specific application of VMA is high DRRNSs based on imbalanced word-length moduli sets [5], [8].In such RNSs, a VMA can be employed for the non-criticalmoduli to substantially lower the hardware requirement withoutcompromising the system delay.Consequently, MMAs for the three special moduli, ,

and have been explored in recent times. In [30], a variable

1549-8328/$31.00 2013 IEEE


2 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS

multi-modulus two-operand parallel-prefix adder that computesthe sum modulo , modulo and modulo in only

prefix levels was proposed. By extending the numberof operands, a variable multi-modulus multi-operand adder forthe moduli and based on regularly structuredcarry save adder (CSA) tree was described in [31]. By makinguse of the variable multi-modulus multi-operand adder, a vari-able multi-modulus multiplier for the three special moduli wassuggested in [32]. Recently, a dual-modulus multiplier based onradix-4 Booth encoding for the moduli and wasproposed [33]. For the moduli , and , non-en-coded fixed and variable multi-modulus squarers were proposedin [34]. In [35], the VMA for squarer was further improved byemploying the radix-4 Booth encoding algorithm.In this paper, Booth encoding technique is applied to multi-

modulus multiplication. Two newmulti-modulus multipliers forthe moduli , and are proposed. The formeruses the radix- Booth encoding scheme while the latter usesthe radix- Booth encoding scheme. In the radix- Booth en-coded multi-modulus multiplier, the FMA approach is appli-cable only to the partial product generation stage and not appli-cable to the bias generation and partial product addition stages.While in the radix- Booth encoded multi-modulus multiplier,the FMA approach is relevant only to the encoding of the mul-tiplier bits. Owing to the restricted applicability of the FMA inencoded multipliers, only VMA is considered in this paper.For the proposed radix- Booth encoded multi-modulus

multiplier, the modulo multiplier of [13] is used as thebaseline architecture. The preliminary modulo multiplierarchitecture we proposed in [20] is shown to be advantageousfor sharing resources with [13]. Furthermore, a new radix-Booth encoded modulo multiplier that maximally reuses thehardware blocks of [13] and [20] is proposed. The equivalencesin the radix- Booth encoded modulo and modulo

multiplier architectures initially presented in [23] areexploited to design the proposed radix- Booth encodedmulti-modulus multiplier. The structure of the modulohard multiple generator (HMG) is modified to employ dualprefix operators for handling the complemented and non-com-plemented modified-generate and modified-propagate signalpair. A new radix- Booth encoded modulo multiplier thatis analogous to [23] is also introduced. In both the proposedradix- and radix- Booth encoded multi-modulus multi-pliers, an efficient technique to include the bias associated withmodulo negation is also proposed. The suggested modulobias generation involves only hardwiring of the Booth encoderoutputs and does not incur any additional hardware overhead.This paper is organized as follows. The properties of modulo

, modulo and modulo arithmetic are discussedin Section II. In Sections III and IV, the proposed radix-and radix- Booth encoded multi-modulus multipliers aredescribed with emphasis on multi-modulus partial product,hard multiple and bias generations as well as multi-moduluspartial product accumulation. In Section V, the performancesof the proposed multi-modulus multipliers are evaluated andcompared against based RNS multipliersthat employ the same encoding scheme. Conclusions are drawnin Section VI.

II. PRELIMINARIESIn this section, the fundamental properties of modulo ,

modulo and modulo arithmetic that will be employedin the design of radix- and radix- Booth encoded multi-modulus multipliers are reviewed [4], [11], [15], [18], [19], [23].All equations are assumed to be modulo-reduced by ,

unless otherwise specified.Let , and

be the -bit multiplicand, multiplier and product, respectivelyof each modulo multiplier of the RNS. Specifically, the residues, and are in binary representation with dual zeros,

i.e., and for the modulo channel and in

diminished-1 representation for the modulo channel.Conversions to and from these representations will be handledby the RNS forward and reverse converters, respectively.Therefore, for the modulo and modulo multipli-cations, the binary product is to be computedfrom the binary represented inputs, and , whereas for themodulo multiplication, the diminished-1 product isto be computed directly from the diminished-1 inputs and. Hence,

for . The multi-modulus multiplication can beexpressed as follows.

ifif

if

if(1)

In the radix- Booth encoding algorithm, adjacent multi-plier bits and an overlap multiplierbit are scanned and converted to a signed digit ,

. Thus, the multiplier bits are encoded asmultiplier digits as follows.

(2)

The translation of to as well as the generation and the ad-dition of the modulo-reduced partial products can be simplifiedfor efficient hardware implementation by the following numbertheoretic properties of modulo arithmetic.Property 1: Let denote the bit complement of . The

modulo reduction of the binary weight of in excess of themodulus is given by

ififif

(3)Property 2: Let denote the ones complement of . By

the definition of additive inverse, the modulo negation of isgiven by

ififif

(4)


MURALIDHARAN AND CHANG: RADIX-4 AND RADIX-8 BOOTH ENCODED MULTI-MODULUS MULTIPLIERS 3

Property 3: Let denote the left shift of by bitpositions where the resultant least significant bits (lsbs) arezeros. Furthermore, let the circular-left-shift andcomplementary-circular-left-shift operations de-note the circular left shift of by bit positions, where theresultant lsbs are not inverted and inverted, respectively. Asa corollary of Property 1, the modulo multiplication of by apositive power-of-two term is given by

ififif

(5)

Property 4: By Properties 2 and 3, the modulo multiplicationof by is given by

ififif

(6)

Property 5: The modulo addition of two operands, andis given by

ififif

(7)

where is the carry-out bit from the -bit addition of andand the binary weight of is modulo-reduced using

Property 1. It must be highlighted that in modulo ad-dition, the diminished-1 result is the sum of not only the di-minished-1 addends, and , but also a constant one. Fur-thermore, modulo , modulo and modulo addi-tions are equivalent to -bit end-around-carry (EAC), carry-ig-nore and complementary-end-around-carry (CEAC) additions,respectively.Property 6: The modulo addition of operands is given by

if

if

(8)

III. RADIX- BOOTH ENCODED MULTI-MODULUSMULTIPLIER

A. Multi-Modulus Partial Product Generation

By substituting in (2), the radix- Booth encodedmultiplier is given by

(9)

Fig. 1. (a) Proposed multi-modulus radix- Booth encoder. (b) Proposed 3:1Multiplexer (MUX3).

where is even. The term can bemodulo-reduced usingProperty 1 as follows.

ififif

(10)

Hence, (9) can be simplified to

if

if

if

if(11)

where

(12)

The radix- Booth encoded digit is formatted using threebits: a sign bit and one-hot encoded magnitude bits,and . The proposed multi-modulus radix- Booth encoderusing radix- Booth Encoder (BE2) slices and one MUX3block is shown in Fig. 1(a) for . To select the appro-priate bit corresponding to the modulus , or, a custom 3:1 multiplexer (MUX3) with 2-bit select input

depicted in Fig. 1(b) is used. The inputcorresponding to the modulus , or is selectedwhen is 00, 01, or 10, respectively.From Fig. 1(b), it can be easily shown that equals

and hence, MUX3 can be simplified toan XOR-AND implementation. In what follows, all illustrationsof MUX3 block are assumed to include inputs.Using the radix- Booth encodedmultiplier form of (11), the

multi-modulus multiplication of (1) becomes

if

if

(13)



TABLE IMODULO-REDUCED PARTIAL PRODUCTS FOR RADIX- BOOTH ENCODING

By Property 3, modulo , modulo and modulomultiplications of by can be implemented respectively by

, operations, and operation with a bias ofon . Similarly by Property 4, modulo , modulo andmodulo multiplications of by can be implementedrespectively by operation, operation with a bias ofand operation with a bias of on .is shown to be equivalent to a vector for the modulus

, the sum of a vector and a constant biasfor the modulus and the sum of a vector and aconstant bias for the modulus in Table I. In Table I,# denotes the concatenation operator.For the modulo reduced calculated based on Prop-

erties 3 and 4, the most significant bits of corre-sponding to and 0 as well as and 1 arebitwise complement of each other. However, for and2, only the most significant bits of are bitwise

complement of each other as shown below.

(14)

(15)

The bias corresponding to is suitably modified suchthat the most significant bits of for arebitwise complement of for .

(16)

Therefore in Table I, for the modulus and , theexpressions of and derived in (16) are employed.

Fig. 2. Proposed multi-modulus partial product generation for radix- Boothencoding.

The radix- Booth encoded multi-modulus multiplication of(13) can then be reformulated as

(17)In the following, the generation of the for the three

moduli, , and is described. The standard cir-cuit implementation of a bit slice of the radix- Booth Selector(BS2) generates a single bit of , i.e., , by selecting abit of either the multiplicand or one-bit shifted multiplicandand conditionally inverting it. From Table I, the least signif-icant bits of for moduli , and are

, 0 and , respectively. Hence, MUX3 of Fig. 1(b) canbe used at the output of the BS2 blocks in the least significantbit positions to select from , 0 or . Furthermore, when

, the input to the BS2 block is , 0or for modulus , or , respectively. Thusthis input to the BS2 block is also selected using a MUX3. Theproposed multi-modulus generation of the using BS2and MUX3 blocks is shown in Fig. 2 for . The number ofBS2 and MUX3 blocks required in the generation isand , respectively.

B. Multi-Modulus Bias GenerationAnalogous to the generation of using BS2 blocks, the

bias can be generated by decoding appropriately. Thistechnique while being straightforward has an obvious limita-tion. By considering as an -bit word, the doublethe number of partial products to be added. Hence, the reduc-tion in the number of partial products achieved by the radix-Booth encoding technique is essentially nullified. An efficienttechnique to include with minimal hardware circuitry basedon the properties of modulo and modulo arithmeticis proposed.From Table I, and for the modulus are tabulated

in Table II. In order to determine the relation between and, is represented in terms of , and while is

expressed as a function of the bit .From the three-variable Karnaugh map with

dont care minterms 011 and 111, it can be shown that



TABLE IIBIAS FOR THE MODULUS FOR RADIX- BOOTH ENCODING

TABLE IIIBIAS FOR THE MODULUS FOR RADIX- BOOTH ENCODING

Note: and.

. Hence, the aggregate bias is given by the simplified ex-pression and can be generated by merely hardwiring theoutput of the BE2 blocks to appropriate bit positions.

(18)

From Table I, and for the modulus are tabulatedin Table III. Furthermore, is divided into a static biasand a dynamic bias . isspecific to the radix of the Booth encoding but independentof . On the other hand, is dependent on. is represented in terms of , and while

is expressed as , where, to determine the relation between and .

The dynamic bias can be minimized by the three-variableKarnaugh map, which results in ,

and , where the operator denotes BooleanAND if its operands are Boolean variables.

(19)

By applying Property 1 to the negatively weighted term, (19)becomes

(20)

The next step entails modulo-reduced addition ofpartial products. From Property 6, the modulo additionof operands inherently increments the result by .

To negate this effect, the constant is subtracted from theaggregate bias. The result is

(21)

where

(22)

Both and can be implemented by hardwiring the BE2outputs, and as well as logic one to appropriate bit posi-tions via at most one level of AND gates.For , the aggregate bias is composed of two -bit

words, and given by (21). For , the aggregatebias is composed of one -bit word given by (18). For

, the aggregate bias equals zero as shown in Table I. Thesethree aggregate biases are generalized in (23) as a sum ofand given by (24) and (25), respectively, such that and

for as well as for are zero.

(23)

if

if

if

(24)



Fig. 3. Proposed multi-modulus bias generation for radix- Booth encoding.

Fig. 4. (a) Proposed multi-modulus partial product addition for radix- Boothencoding. (b) Implementation of parallel-prefix adder components.

Fig. 5. Proposed multi-modulus radix- Booth encoder.

if

if

(25)

Equations (23)(25) are implemented as shown in Fig. 3 for. and are simply hardwired to and

hence, are not illustrated in Fig. 3.

C. Multi-Modulus Partial Product Addition

By replacing in (17) with (23), the radix-Booth encoded multi-modulus multiplication is given by (26).

In (26), an -bit all-zeros partial product is included for themoduli and so that the number of partial productsto be added is for all three moduli.

if

or

if

(26)From Property 5, a multi-modulus addition can be imple-

mented using a MUX3 in the carry feedback path that selectsfrom , 0 or . The multi-modulus addition ofpartial products in a CSA tree and a parallel-prefix two-operandadder is illustrated in Fig. 4(a) for . In Fig. 4(a), the par-allel-prefix adder is constructed from the pre-processing (PP),prefix and post-processing blocks and the implemen-tation of these blocks is shown in Fig. 4(b). The number ofMUX3 blocks needed for multi-modulus partial product addi-tion is .

IV. RADIX- BOOTH ENCODEDMULTI-MODULUSMULTIPLIER

A. Multi-Modulus Partial Product Generation

By substituting in (2), the radix- Booth encodedmultiplier is given by

(27)where .The radix- Booth encoded multiplier digit is formatted

using five bits: a sign bit and four one-hot encoded magni-tude bits, , , and . radix- BoothEncoder (BE3) blocks are used to encode the -bit multiplieras shown in Fig. 5 for .Using the radix- Booth encoded multiplier form of (27),

the multi-modulus multiplication of (1) becomes

if

if(28)

By Properties 3 and 4, the modulo-reduced partial prod-ucts are simplified and tabulated in Table IV.In Table IV, all partial products except and

, are shown to be generated by , andoperations on or when is , and, respectively. In the subsequent section, a multi-mod-

ulus HMG is proposed whose output is ,or if the selected modulus is , or

, respectively. Thus the partial products corresponding tocan be generated by , and operations



TABLE IVMODULO-REDUCED PARTIAL PRODUCTS FOR RADIX- BOOTH ENCODING

on or for the moduli , and , respectively,as shown in Table IV.The radix- Booth encoded multi-modulus multiplication of

(28) can then be reformulated as

(29)

Fig. 6 shows the generation of usingradix- Booth Selector (BS3) blocks for ,

where each bit slice of BS3 selects a bit from the multiplicand,the one-bit shifted multiplicand, the hard multiple or the two-bit

Fig. 6. Proposed multi-modulus partial product generation for radix- Boothencoding.

shifted multiplicand followed by a conditional bit inversion.According to Table IV, MUX3 blocks are used at the output ofthe BS3 blocks in the least significant bit positions to selectfrom , 0 or when the modulus is , or ,respectively. When , MUX3 blocks are used at the BS3input to select from , 0 or and at the BS3input to select from , 0 or . Likewise, when

a MUX3 block is used at the BS3 inputto select from , 0 or . The number of MUX3 blocksneeded is .

B. Multi-Modulus Hard Multiple Generation

Application-specific adders known as HMGs that computeonly the sum of and to generate the hard multiple

were proposed in [23] for the moduli and .The HMGs were designed by reformulating the carry equationsof modulo and modulo additions using the bitcorrelation between the addends, and . The carry equa-tions for modulo and modulo HMGs are givenby (30) and (31) for even and odd cases of , respectively.

if

if(30)

if

if(31)

where and the modi-fied-generate and modified-propagate pair is defined asfollows.

if

if



Fig. 7. Proposed multi-modulus HMG.

if

(32)

As modulo addition is equivalent to carry-ignore addition,the carry equation for modulo HMG becomes

(33)where is defined as follows.

(34)

From (30), (31), and (33), the parallel-prefix implementationof multi-modulus HMG is illustrated for in Fig. 7. Itconsists of three stages: pre-processing, prefix computation andpost-processing. In the preprocessing stage, each operator computes using AND-OR and OR-AND gates. Thebit is also computed by the XOR operation on the -th bits ofthe two addends. Two MUX3 blocks are required at the input ofthe pre-processing operator to select from , 0, or andfrom , 0 or for computing and . Inthe prefix-computation stage, is computed using the prefixoperators in only levels. To implement the multi-modulus carry generation, dual prefix operators are used in thefirst level such that the input to one prefix operator iswhile the input to the other prefix operator is selected from

, (0,0) or using MUX3 blocks. The number ofMUX3 blocks required is and the number of dual prefixoperators in each prefix level is , where

. In the post-processing stage, each operator computes a bit of the hard multiple by . When

, a MUX3 is employed at the input of the post-pro-cessing operator to select from , 0 or to implementEAC, carry-ignore, and CEAC additions, respectively.

TABLE VBIAS FOR THE MODULUS FOR RADIX- BOOTH ENCODING

TABLE VIBIAS FOR THE MODULUS FOR RADIX- BOOTH ENCODING

Note: and.

C. Multi-Modulus Bias Generation

From Table IV, and for the modulus are tabulated inTable V. From the five-variable Kar-naugh map, it can be shown that and the aggregate biascan be simplified to

(35)

From Table IV, and for the modulus are tabulatedin Table VI. is divided into a static biasthat depends on the radix of the Booth encoding and a dynamicbias that depends on the multiplier digit .Using a five-variable Karnaugh

map, the aggregate of the static bias, the dynamic bias and theconstant that negates the constant incurred fromthe modulo addition of partial products, isreduced to

(36)



where

(37)

The operators and denote Boolean OR and AND,respectively when their operands are Boolean variables. For

, the aggregate bias is composed of three -bitwords, , and given by (36). For , the ag-gregate bias is composed of one -bit word given by (35).For , the aggregate bias equals zero as shown inTable IV. They can be generalized in (38) as a sum of ,and given by (39), (40), and (41), respectively, such that, and for as well as and for

are zero. (See equation at bottom of page.)

Fig. 8. Proposed multi-modulus bias generation for radix- Booth encoding.

Equations (38)(41) are implemented as shown in Fig. 8 for. , , and are hardwired to logic 0 and

hence, are not illustrated in Fig. 8.

D. Multi-Modulus Partial Product AdditionBy replacing in (29) with (38), the radix-

Booth encoded multi-modulus multiplication is given by (42).Two -bit all-zeros partial products are included for the moduli

and so that the number of partial products to be addedis for all three moduli. (See equation at bottom ofpage.)The multi-modulus addition of partial products

in a CSA tree and a two-operand adder with MUX3 blocks in

(38)

(39)

(40)

(41)

if

if(42)



Fig. 9. Proposed multi-modulus partial product addition for radix- Boothencoding.

the carry feedback paths is illustrated in Fig. 9 for . Thenumber of MUX3 blocks needed is .

V. PERFORMANCE COMPARISON

In this section, the performances of the proposed radix- andradix- Booth encoded multi-modulus multipliers are evalu-ated. The efficiency of the proposed multi-modulus multiplierover a triad of single-modulus multipliers is demonstrated in re-configurable and redundant modulo channels, where the compu-tation of only a single modulo-reduced product is desired. Theproposed radix- Booth encoded multi-modulus multiplier iscompared with based RNS multiplier thatconsists of three radix- Booth encoded single-modulus mul-tipliers, i.e., modulo , modulo and modulomultipliers. Four configurations of basedRNS multipliers are compared. In Type A RNS multiplier, allthree single-modulus multipliers are radix- Booth encodedwhile in Types B-D RNS multipliers, all three single-modulusmultipliers are radix- Booth encoded. Hence, the proposedradix- Booth encoded multi-modulus multiplier is comparedwith Type A RNS multiplier while the proposed radix- Boothencoded multi-modulus multiplier is compared with Types B-DRNS multipliers.Type A RNSmultiplier consists of modulo and modulo

multipliers described in [23]. In Types B-D RNS multi-pliers, the multiplier of [13] is employed for the modulus .The multipliers of [15], [18] and [20] are employed for the mod-ulus in Type B, Type C, and Type D, respectively. Asradix- and radix- Booth encoded modulo multipliershave not been proposed in literature thus far, modulo multi-pliers described in Sections III and IV are employed in Type Aand Types B-D, respectively.The multipliers were specified in VHDL and synthesized by

Synopsys Design Compiler (version D-2010.03) under nominalsynthesis environment, i.e., 25 and 1.8 V based on TSMC

TABLE VIIAREA, DELAY, AND TOTAL POWER DISSIPATION OF PROPOSED RADIX-

BOOTH ENCODED MULTI-MODULUS MULTIPLIERS

TABLE VIIIAREA OF RADIX- BOOTH ENCODED MODULO , MODULO , AND

MODULO MULTIPLIERS

0.18 1.8 V CMOS standard-cell library. The designs wereoptimized for area and delay independently. The average dy-namic power dissipation is also estimated based on the MonteCarlo method with 99.9% confidence that the error is boundedbelow 3%. Theminimum achievable area from area-constrainedoptimization, critical path delay from delay-constrained opti-mization and average total power dissipation of the proposedradix- Booth encoded multi-modulus multipliers for the com-putation of a single modulo-reduced product are tabulated inTable VII for and .The areas, delays, and total power dissipations of the radix-

Booth encoded single-modulus modulo , modulo , andmodulo multipliers that comprise Types A-D RNS mul-tipliers are tabulated in Tables VIII, IX, and X respectively. Thearea, delay and total power dissipation of Types A-D RNS mul-tipliers for the computation of a single modulo-reduced productare tabulated in Table XI. The unused single-modulus multi-pliers are assumed to be gated off. In this table, the areas ofTypes A-D RNS multipliers are computed as the sum of theareas of the constituent single-modulus modulo , modulo, and modulo multipliers from Table VIII. The crit-

ical path delays of Types A-D RNS multipliers are determinedas the maximum value amongst the delays of the constituentsingle-modulus modulo , modulo , and modulomultipliers from Table IX. The total power dissipations of TypesA-D RNS multipliers are computed from Table X as the sum ofthe following two terms: a) the maximum dynamic power dissi-pation amongst modulo , modulo and modulomultipliers and b) the leakage power dissipations of modulo

, modulo , and modulo multipliers.From Tables VII and XI, the percentage savings or in-

crease in area, delay and power dissipation of the proposed



TABLE IXDELAY OF RADIX- BOOTH ENCODED MODULO , MODULO , AND

MODULO MULTIPLIERS

TABLE XTOTAL POWER DISSIPATION OF RADIX- BOOTH ENCODED MODULO ,

MODULO , AND MODULO MULTIPLIERS

TABLE XIAREA, DELAY, AND TOTAL POWER DISSIPATION OF RADIX- BOOTH

ENCODED BASED RNS MULTIPLIERS

TABLE XIIPERCENTAGE SAVINGS IN AREA, DELAY, AND TOTAL POWER DISSIPATION OF

PROPOSED MULTI-MODULUS MULTIPLIERS OVER RNS MULTIPLIERS

radix- Booth encoded multi-modulus multiplier over Type ARNS multiplier and of the proposed radix- Booth encodedmulti-modulus multiplier over Types B-D RNS multipliers aretabulated in Table XII.

From Table XII, by replacing the three single-modulus mul-tipliers in the based RNS multiplier bythe proposed multi-modulus multiplier, an average reduction of60% in area can be observed. However the proposed radix-Booth encoded multi-modulus multiplier increases the criticalpath delay of Type A RNS multiplier by 13% in the worst case.The delay penalties of the proposed radix- Booth encodedmulti-modulus multiplier over Types B-D RNS multipliers for

are 8%, 18%, and 16%, respectively in the worstcase. The maximum delay penalty is incurred by the proposedradix- Booth encoded multi-modulus over Type C RNS mul-tiplier for . This can be attributed to the differencein the number of CSAs in the critical path as determined bythe Dadda depth function. The proposed radix- and radix-Booth encoded multi-modulus multipliers incur a power dis-sipation penalty of no more than 4 5% over Type A andType B RNS multipliers, respectively which can be directly at-tributed to the additional power dissipated by theMUX3 blocks.It should be noted that power gating affects design architec-ture and requires sizing of sleep transistors at transistor level aswell as additional time delays to assure safe entrance and exitof power gated mode. If clock gating is used to gate off the un-used single-modulus multipliers, some area, delay, and poweroverheads will be incurred in replacing the input registers byintegrated clock gating cells and the insertion of an -bit wide3:1 multiplexer to select the output from one of the three modulomultipliers. These overheads are not included for Types A to DRNS multipliers in Table XI. Furthermore, owing to its simplerarchitecture, the proposed radix- Booth encoded multi-mod-ulus multiplier lowers the power dissipation of Types C and DRNS multipliers maximally by 37% and 49%, respectively.The most recent dual-modulus multiplier [33] uses only

radix- Booth-encoding and does not incorporate hardwarereuse for the modulus. Based on the unit-gate model of[36], the area and delay of this dual-modulus radix- Boothencoded multiplier are compared with our proposed triple-mod-ulus radix- Booth encoded multiplier in Table XIII. In thismodel, the area and delay of AND, NAND, OR, and NORgates are assumed to be one unit each while the area and delayof XOR and XNOR gates are assumed to be two units each.In Table XIII, is the Dadda depth function. Since thefinal parallel-prefix adder is common to both multipliers, itis excluded from Table XIII. From Table XIII, the proposedmultiplier employs additional unit-gates than[33], of which the overhead of unit-gatescan be solely attributed to the use of MUX3 for triple-modulusselection in the proposed design instead of XOR for dual-mod-ulus selection in [33]. Though in the proposed radix-multi-modulus multiplier, one additional partial product has tobe generated and accumulated using AND gates, FAs, and1 MUX3 of unit-gates, the overhead has been limited toonly unit-gates. Since is a logarithmic function,it can be verified that the and arethe same for all in the range from 36 to 64 except 52. Theadditional one unit-delay per CSA level, i.e., instead of

, in the proposed multiplier over [33] is also due to theuse of MUX3 as opposed to XOR for moduli selection. How-ever, it should be noted that our proposed multiplier is more



TABLE XIIIUNIT-GATE AREA AND DELAY COMPARISON OF RADIX- BOOTH ENCODED

TRIPLE AND DUAL MODULUS MULTIPLIERS

agile in that it can perform multi-modulus multiplications formore comprehensive RNS made of all three types of moduli.Moreover, for the same dynamic range, the value of of thedual-modulus multiplier [33] for the RNSmultiplication is 1.5 times that of our proposed triple-modulusmultiplier for the RNS multiplication.

VI. CONCLUSIONThe equivalences in operations central to modulo multiplica-

tion, i.e., modulo negation, modulo reduction of binary weight,modulo multiplication by powers-of-two, and two-operandmodulo addition for the three special moduli, , , and

were demonstrated. New radix- and radix- Boothencoded modulo multipliers with architectures comparableto those of the corresponding modulo and modulo

multipliers were introduced. With the correlation amongmodulo , modulo and modulo operations asthe basis, radix- and radix- Booth encoded multi-modulusmultipliers that perform modulo multiplication for the threespecial moduli successively were developed. The proposedmulti-modulus multipliers were compared against RNS multi-pliers employing Booth encoding of the same radix.

REFERENCES[1] R. Conway and J. Nelson, Improved RNS FIR filter architectures,

IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 51, no. 1, pp. 2628,Jan. 2004.

[2] A. S. Madhukumar and F. Chin, Enhanced architecture for residuenumber system-based CDMA for high rate data transmission, IEEETrans. Wireless Commun., vol. 3, no. 5, pp. 13631368, Sep. 2004.

[3] D. M. Schinianakis et al., An RNS implementation of an ellipticcurve point multiplier, IEEE Trans. Circuits Syst. I, Reg. Papers, vol.56, no. 6, pp. 12021213, Jun. 2009.

[4] N. S. Szabo and R. I. Tanaka, Residue Arithmetic and its Applicationto Computer Technology. New York: McGraw-Hill, 1967.

[5] B. Cao, C. H. Chang, and T. Srikanthan, An efficient reverse converterfor the 4-moduli set based on the NewChinese Remainder Theorem, IEEE Trans. Circuits Syst. I, Fundam.Theory Appl., vol. 50, no. 10, pp. 12961303, Oct. 2003.

[6] P. V. A. Mohan and A. B. Premkumar, RNS-to-binary converters fortwo four-moduli sets and

, IEEE Trans. Circuits Syst. I, Reg. Papers,vol. 54, no. 6, pp. 12451254, Jun. 2007.

[7] P. V. A. Mohan, RNS-to-binary converter for a new three-moduli set, IEEE Trans. Circuits Syst. II, Exp. Briefs,

vol. 54, no. 9, pp. 775779, Sep. 2007.[8] A. S.Molahosseini, K. Navi, C. Dadkhah, O. Kavehei, and S. Timarchi,

Efficient reverse converter designs for the new 4-moduli setsand based

on New CRTs, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no.4, pp. 823835, Apr. 2010.

[9] R. Zimmermann, Efficient VLSI implementation of moduloaddition and multiplication, in Proc. 14th IEEE Symp. Comput.

Arithmetic, Adelaide, Australia, Apr. 1999, pp. 158167.[10] L. Kalampoukas, D. Nikolos, C. Efstathiou, H. T. Vergos, and J. Kala-

matianos, High-speed parallel-prefix modulo adders, IEEETrans. Comput., vol. 49, no. 7, pp. 673680, Jul. 2000.

[11] S. J. Piestrak, Design of squarers modulo A with low-levelpipelining, IEEE Trans. Circuits Syst. II, Analog Digit. SignalProcess., vol. 49, no. 1, pp. 3141, Jan. 2002.

[12] H. T. Vergos, C. Efstathiou, and D. Nikolos, Diminished-one moduloadder design, IEEE Trans. Comput., vol. 51, no. 12, pp.

13891399, Dec. 2002.[13] C. Efstathiou, H. T. Vergos, and D. Nikolos, Modified Booth modulo

multiplier, IEEE Trans. Comput., vol. 53, no. 3, pp. 370374,Mar. 2004.

[14] C. Efstathiou, H. T. Vergos, G. Dimitrakopoulos, and D. Nikolos, Ef-ficient diminshed-1 modulo multipliers, IEEE Trans. Comput.,vol. 54, no. 4, pp. 491496, Apr. 2005.

[15] L. Sousa and R. Chaves, A universal architecture for designing effi-cient modulo multipliers, IEEE Trans. Circuits Syst. I, Reg.Papers, vol. 52, no. 6, pp. 11661178, Jun. 2005.

[16] H. T. Vergos and C. Efstathiou, Diminished-1 modulo squarerdesign, IEE Comput. Digit. Tech., vol. 152, no. 5, pp. 561566, Sep.2005.

[17] H. T. Vergos and C. Efstathiou, Design of efficient modulomultipliers, IET Comput. Digit. Tech., vol. 1, no. 1, pp. 4957, Jan.2007.

[18] J. W. Chen and R. H. Yao, Efficient modulo multipliers fordiminished-1 representation, IET Circuits, Devices, Syst., vol. 4, no.4, pp. 291300, Jul. 2010.

[19] R. Muralidharan and C. H. Chang, Radix-8 Booth encoded modulomultipliers with adaptive delay for high dynamic range Residue

Number System, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58,no. 5, pp. 982993, May 2011.

[20] R. Muralidharan and C. H. Chang, A simple radix-4 Booth encodedmodulo multiplier, in Proc. IEEE Int. Symp. Circuits Syst., Riode Janeiro, Brazil, May 2011, pp. 11631166.

[21] C. Efstathiou, K. Pekmestzi, and N. Axelos, On the design of modulomultipliers, in Proc. 14th Euromicro Conf. Digit. Syst. Design,

Oulu, Finland, Aug. 2011, pp. 453459.[22] J. W. Chen, R. H. Yao, and W. J. Wu, Efficient modulo multi-

pliers, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no.12, pp. 21492157, Dec. 2011.

[23] R. Muralidharan and C. H. Chang, Area-power efficient moduloand modulo multipliers for based

RNS, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 59, no. 10, pp.22632274, Oct. 2012.

[24] V. Paliouras and T. Stouraitis, Multifunction architectures for RNSprocessors, IEEE Trans. Circuits Syst. II, Analog Digit. SignalProcess., vol. 46, no. 8, pp. 10411054, Aug. 1999.

[25] D. Adamidis and H. T. Vergos, RNS multiplication/sum-of-squaresunits, IET Comput. Digit. Tech., vol. 1, no. 1, pp. 3848, Jan. 2007.

[26] G. C. Cardarilli, A. Del Re, A. Nannarelli, and M. Re, ResidueNumber System reconfigurable datapath, in Proc. IEEE Int. Symp.Circuits Syst., Scottsdale, AZ, USA, May 2002, vol. 2, pp. 756759.

[27] W. K. Jenkins and A. J. Mansen, Variable word length DSP using se-rial-by-modulus residue arithmetic, in Proc. IEEE Int. Conf. Acoust.,Speech, Signal Process., Minneapolis, MN, USA, Apr. 1993, vol. 3,pp. 8992.

[28] W. K. Jenkins and B. A. Schnaufer, Fault Tolerant architectures forefficient realization of common DSP kernels, in Proc. 35th MidwestSymp. Circuits Syst., Washington, DC, USA, Aug. 1992, vol. 2, pp.13201323.



[29] W. K. Jenkins, B. A. Schnaufer, and A. J. Mansen, Combined systemlevel redundancy and modular arithmetic for fault tolerant digitalsignal processing, in Proc. 11th Symp. Comput. Arithmetic, Windsor,Canada, Jun. 1993, pp. 2835.

[30] H. T. Vergos and D. Bakalis, Area-time efficient multi-modulusadders and their applications, Microprocessors Microsyst., vol. 36,no. 5, pp. 409419, Jul. 2012.

[31] C. H. Chang, S. Menon, B. Cao, and T. Srikanthan, A configurabledual-moduli multi-operand modulo adder, in Proc. IEEE Int. Symp.Circuits Syst., Kobe, Japan, May 2005, vol. 2, pp. 16301633.

[32] S. Menon and C. H. Chang, A reconfigurable multi-modulus modulomultiplier, in Proc. IEEE Asia-Pacific Conf. Circuits Syst., Singapore,Dec. 2006, pp. 11681171.

[33] E. Vassalos, D. Bakalis, and H. T. Vergos, Configurable Booth-en-coded modulo multipliers, in Proc. 8th Conf. Ph.D. Res. Mi-croelectron. Electron., Aachen, Germany, Jun. 2012, pp. 14.

[34] R. Muralidharan and C. H. Chang, Fixed and variable multi-modulussquarer architectures for triple moduli base of RNS, in Proc. IEEE Int.Symp. Circuits Syst., Taipei, Taiwan, May 2009, pp. 441444.

[35] D. Bakalis and H. T. Vergos, Area-efficient multi-moduli squarer forRNS, in Proc. 13th Euromicro Conf. Digit. Syst. Design, Lille, France,Sep. 2010, pp. 408411.

[36] A. Tyagi, A reduced-area scheme for carry-select adders, IEEETrans. Comput., vol. 42, no. 10, pp. 11631170, Oct. 1993.

Ramya Muralidharan (S07M13) receivedthe B.E. degree from Anna University, Chennai,India, in 2006 and the Ph.D. degree from NanyangTechnological University, Singapore, in 2012. She iscurrently working as a Hardware System Engineer inSan Diego, CA, USA. Her research interests includeresidue number system, computer arithmetic circuitsand digital IC design.

Chip-Hong Chang (S92M98SM03) receivedthe B.Eng. (Hons.) degree from National Universityof Singapore in 1989, and the M. Eng. and Ph.D.degrees from Nanyang Technological University(NTU), Singapore, in 1993 and 1998, respectively.He served as a Technical Consultant in industry priorto joining the School of Electrical and ElectronicEngineering (EEE), NTU, in 1999, where he iscurrently an Associate Professor. He holds jointappointments with the university as Assistant Chairof Alumni of the School of EEE since June 2008,

Deputy Director of the Center for High Performance Embedded Systemssince 2000, and the Program Director of the Center for Integrated Circuits andSystems from 2003 to 2009. He has coedited one book, published four bookchapters and more than 180 research papers in refereed international journalsand conferences. His current research interests include low power arithmeticcircuits, residue number system, digital filter design, and digital watermarkingfor VLSI intellectual property protection.Dr. Chang has served as Associate Editor of the IEEE TRANSACTIONS ON

CIRCUITS AND SYSTEMS-I in 2010-2012, the IEEE TRANSACTIONS ON VERYLARGE SCALE INTEGRATION (VLSI) SYSTEMS since 2011 and the IEEE AccessJournal since 2013, the Editorial Advisory Board Member of the Open Elec-trical and Electronic Engineering Journal since 2007 and the Editorial BoardMember of the JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING since2008. He also guest edited several special issues for the Journal of Circuits, Sys-tems and Computers in 2011 and 2012, the IEEE TRANSACTIONS ON CIRCUITSAND SYSTEMSPART I: REGULAR PAPERS and the JOURNAL OF ELECTRICALAND COMPUTER ENGINEERING in 2012, and served in many international con-ference advisory and technical program committees. He is a Fellow of the IET.

59. Radix-4 and Radix-8 Booth Encoded Multi-Modulus Multipliers

Documents

Transcript of 59. Radix-4 and Radix-8 Booth Encoded Multi-Modulus Multipliers