High Radix Multiplier Dividers

8/9/2019 High Radix Multiplier Dividers

1/14

High-Radix Multiplier-Dividers:Theory, Design, and Hardware

Alaaeldin Amin, Member, IEEE, and M. Waleed Shinwari, Member, IEEE

AbstractThis paper describes the theory and design of digital high-radix multiplier-dividers (Patent Pending). The theory of high-

radix division is extended to high-radix multiplier-dividers that can perform fused multiplication and division operations using a single

recurrence relation. With the fused implementation of multiplication and division, the two operations can be executed using a single

instruction, implying only a single rounding operation. The recurrence relation is described, the quotient digit selection function derived,

and important design parameters together with their optimal values and relations are defined. Efficient design procedure and

implementation hardware are described and important system parameter values for various radix systems computed. Compared to

pure dividers, the multiplier-divider requires a slightly more complex data path and quotient digit selection function.

Index TermsComputer arithmetic, division, quotient digit selection, SRT, multiplier-divider.

1 INTRODUCTION

TO achieve high performance in application-specificprocessors, the current System-on-Chip (SoC) technol-ogy generally augments the programmable general-pur-pose cores with application-specific functional units (AFUs)or hardware accelerators. This is a cost-effective way tosimultaneously speed up execution and reduce energyconsumption through delegating time-consuming tasks ofapplications to dedicated hardware accelerators, leavingless critical tasks to traditional software execution. A recenttrend in SoC technology is to allow the extension of existinginstruction sets by special instructions for performance-critical operations [1]. This is achieved by adding applica-tion-specific instruction set extensions (ISEs) to the proces-

sor instruction set for executing the critical portions of theapplication on the AFUs [2]. This has lead microprocessorintellectual property (IP) vendors to license configurableand extensible processor cores to their customers [3]. Forexample, an optimized multiply-and-accumulate (MAC)unit that can compute a b c d using only one4-operand instruction has been recently reported [4].Another [5], reported a GF2m 3-operand computation ofab=c performing both multiplication and inversion.

This paper describes new formulas, algorithm, andhardware which can efficiently compute A B D byperforming simultaneous multiplication and division opera-tions thus requiring only a single rounding operation. Suchunit, which will be referred to as multiplier-divider, requires athree-operand instruction and can be implemented as part ofa special Floating-point unit (FPU) or as a stand-alone AFU

for compute-intensive applications which utilize this opera-tion. The multiplier-divider can perform either a singlemultiplication operation, a single division operation, or asimultaneous combined multiplication and division opera-tions. All operations have the same execution time with onedigit of the result produced each cycle starting with the mostsignificant digit. Zurawski and Gosling [6] reported a morerestricted approach to builda radix-4 unit for multiply-divideand square root. Ercegovac and Lang [7]have also reported amodulethat can perform radix-2 multiplication, division, andsquare root. Compared to the multiplier-divider reportedhere, this moduleworksonlyfor radix 2 andcan only performone of the three operations but none of their combinations.

Likewise, McIlhenny and Ercegovac [8] have proposed athree-operand module that can perform two simultaneousmultiplications A B C. Further, Antelo et al. [9]reporteda veryhigh radixprocessorthat computes combineddivision and square root operations (

ffiffiffiffiffiffiffiffiffiX=d

p).

There exists quite extensive literature that describes thetheory and design of high-speed multiplication and divisionalgorithms [10], [11]. The theory of high-radix multiplier-dividers may be considered as an extension of the high-radixdigit recurrence division algorithm. Based on the differenthardware operations used in their implementations, e.g.,multiplication, subtraction, and table lookup, division algo-rithms are divided into five classes [12]: digit recurrence,functional iteration, very high radix, table lookup, andvariable latency. Digit recurrence is the oldest class ofhigh-speed division algorithms and, as a result, a significantnumber of publications can be found in the literatureproposing digit recurrence algorithms, implementations,and techniques. The most common implementation of digitrecurrence division in modern processors has been theSRT method [10].

Digit recurrence division algorithmsuse iterative methodsto calculate quotients one digit per iteration. One quotientdigit (mbits) is retired at each iteration using a quotient-digitselection function [13], [14], [15]. Typically, for a system withradix r, the quotient digits are selected from a redundantsigned digit set D ; 1; . . . ; 1; 0; 1; . . . ; 1;

whose size 1

is greater than r with both negative

IEEE TRANSACTIONS ON COMPUTERS, VOL. 59, NO. 8, AUGUST 2010 1009

. A. Amin is with the Computer Engineering Department, King FahdUniversity of Petroleum & Minerals, Dhahran 31262, Saudi Arabia.E-mail: [email protected].

. M.W. Shinwari is with the Electrical Engineering Department, McMasterUniversity, 1280 Main Street West, Hamilton, ON L8S 4K1, Canada.E-mail: [email protected].

Manuscript received 9 July 2008; revised 12 Jan. 2009; accepted 1 Oct. 2009;published online 6 Apr. 2010.Recommended for acceptance by E. Antelo.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TC-2008-07-0338.

Digital Object Identifier no. 10.1109/TC.2010.78.0018-9340/10/$26.00 2010 IEEE Published by the IEEE Computer Society

Authorized licensed use limited to: MOTILAL NEHRU NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on July 29,2010 at 04:29:55 UTC from IEEE Xplore. Restrictions a


2/14

and positive digits. It is fairly common to choose a symmetricdigit set where in which case the size of the digitset 2 1 > r implying that ! d r

2e.

The degree of redundancy is measured by theredundancy factor h, where h r1 . Redundancy ismaximal when r 1 in which case h 1, while it isminimal when r=2 (i.e., 1

2< h 1.

The fundamental choices in the design of digit recur-rence dividers are the radix, the allowed quotient digits,and the representation method of the partial remainder(residue). The radix determines the number of quotient bitsretired per iteration, which determines the required numberof iterations. Larger radices can reduce the latency, butincrease the time for each iteration. Judicious choice of theallowed quotient digits can reduce the time for eachiteration, but with a corresponding increase in complexityand hardware. Similarly, different representations of thepartial remainder (residue) can reduce iteration time, with acorresponding increase in complexity. This paper describesa high-radix digit recurrence algorithm and formulas whichcan efficiently compute S AB=D by performing simulta-

neous accumulation of partial products and subtraction of aproper divisor multiple in each iteration.The structure of the paper is as follows: In Section 2, we

present the digit recurrence relation of the multiplier-divider for both the integer and fractional formats, defineconstraints imposed on input operands, determine the sizeof the multiplier-divider processor, and the requirednumber of iterations. In Section 3, the recurrence relationis derived, upper and lower bounds of the residuecomputed, and the quotient digit selection function defined.In Section 4, the optimal multiplier-divider design para-meters are derived. Section 5 outlines the overall designprocedure while Section 6 provides hardware implementa-tion of high-radix multiplier-dividers. In Section 7, we

discuss results with conclusions provided in Section 8.

2 HIGH-RADIX MULTIPLIER-DIVIDERS

This work presents the design of a digital multiplier-dividerunit which can efficiently compute S A B D, wherethe multiplicand A, the multiplier B, and the divisor D, are-bit unsigned numbers. Computing S yields an -bitquotient Q and a remainder R such that

AB QD R; and 1

jRj < jDj: 2

Conventionally, S is computed using two independentoperations: a multiplication operation and a divisionoperation. Recurrence relations for these two operationshave been proposed and are in common use by digitalprocessors. In this work, we propose a general radix singlerecurrence relation to perform the multiplication anddivision operations in a fused manner which allows efficientcomputation ofS. The recurrence relation is used as a basisto extend the theory of high-radix division to multiplication-division. The quotient digit selection function for this case ispresented, a design procedure is outlined, major designparameters for these systems are defined, and their optimalvalues and relations are derived. Restricted versions and/orutilization of equivalent recurrence relations, however, have

reportedly been used in different contexts. For example, to

compute the modular product, Takagi [16] has used aradix-4 equivalent version of the recurrence relation while asoftware implementation by Tang [17] computes the mod-ular multiplication of long multiprecision integers using asimilar recurrence relation in its inner loop.

2.1 Multiplier-Divider Recurrence Relation

To speed up the computation of S A B D, theproposed recurrence relation uses a high radix r 2m,where m ! 1. Initially, consider the operands A, B, andD to be n-digit integers, i .e., A an1; . . . a1; a0,B bn1; . . . b1; b0, and D dn1; . . . d1; d0, where n d=me and ai; bi, and di are radix r digits. The proposedmultiply-divide recurrence relation is given by

Rj rRj1 qnjDrn bnj1Ar

n1;

j 1; 2; . . . ; n;3

where

. qi is the ith quotient digit,

.

bi is the ith digit of B,. b1 0,. Rj is the jth running partial remainder, and. R0 bn1Ar

n1.

The final results are the quotient Q and the remainder R,where

Q qn1qn2 . . . q2q1q0 Xn1j0

qjrj; 4

R Rnrn

; 5

if Rn < 0, then the following correction step should beperformed:

. Q Q ulp, where ulp is a unit in least position,and

. Rn Rn D with R Rnrn

.

2.1.1 Proof of the Recurrence Relation

Executing then iterations of theproposed recurrence relationyields the desired Q and R values as defined by (1) and (2).

Rj rRj1 qnjDrn bnj1Ar

n1;

R0 bn1Arn1;

R1 rnbn1A qn1Dr

n rn1bn2A;

R2 Arn1bn1 r

nbn2 Drn1qn1 qn2Dr

n

bn3Arn1

Arn1bn1 rnbn2 r

n1bn3

Drn1qn1 rnqn2;

R3 Arn2bn1 r

n1bn2 rnbn3 r

n1bn4

Drn2qn1 rn1qn2 r

nqn3;

Rn AXn1i1

r2nibni DXnj1

r2njqnj

with b1

0

, Rn r

n

AB DQ.

1010 IEEE TRANSACTIONS ON COMPUTERS, VOL. 59, NO. 8, AUGUST 2010



3/14

Thus, R Rnrn AB DQ, and AB DQ R.If the digits of Q are chosen such that the magnitude of

the partial residue Rj is maintained less than the magnitude

of D, then, Q is effectively the required quotient of the

operation AB=D. Since AB DQ R and jRj < jDj, then R

is indeed the final remainder.

2.1.2 Fractional Form of the Recurrence Relation

The recurrence relation of (3) can be rewritten assuming A,

B, D, and Q to be normalized fractions of the form

D 0:d1d2 dn, and Q 0:q1q2 qn, with mini-

mum value (Dmin or Qmin) of 0.5. The integer operations can

be readily mapped to the fractional form. The fractional

formulas are more convenient in mathematical representa-

tion, however, since they are readily adaptable to floating

point representations. The fractional form is obtained from

the integer form as follows:

A Ainteger rn; B Binteger r

n;

D Dinteger rn

; and R Rinteger r2n

:

6

Following is the modified fractional multiply-divide recur-

rence relation:

Rj rRj1 qjD bj1Ar1;

j 1; 2; . . . ; n;7

where

R0 b1Ar1;

bi 0; for i > n;

Rn rnR:

The final quotient Q and remainder R are given by

Q 0:q1q2 qn Xnj1

qjrj; 8

R Rnrn

: 9

The above multiplier-divider recurrence relation may

alternately be used with R0 0 in which case, an extra

iteration step is needed. Thus,

Rj rRj1 qj1D bjAr1;

j 1; 2; . . . ; n 1; 10

where

R0 0;

q0 0;

bj 0; for j > n;

Rn1 rnR:

In this case, the quotient Q and the remainder R are given by

Q 0:q1q2 qn

Xn

j1

qjrj; 11

R Rn1

rn: 12

The above formulas will also support operands thatfollow the IEEE 754 mantissa/significand format, in whichcase the operand minimum value (e.g., Dmin) is 1.0.

Example. r 2, n 8, A 43Decimal 00101011binary,

B 173Decimal 10101101binary, and D 93Decimal

01011101binary.

The solution steps are shown in Fig. 1. Note that theinteger operands were mapped to their fractional form.Throughout this work, we assume fractional operands, withthe understanding that integer operations can be directlymapped to the fractional form. The width of the operationsis 11 bits: 8 bits is the original operand size, 1 bit for thesign, 1 bit to accommodate left-shift operation and 1 bit forthe right shift of the term bjAr

1. The quotient digits (bits,in this case) were chosen such that the remainder willalways lie in the range D; D and not just 0; D.Allowing negative remainders can cause the selected

quotient digit to be negative. A final remark to be made

AMIN AND SHINWARI: HIGH-RADIX MULTIPLIER-DIVIDERS: THEORY, DESIGN, AND HARDWARE 1011

Fig. 1. Example of multiplier-divider operation.



4/14

about the example of Fig. 1 is that since all operands (A, B,and D) are positive, the final remainder should be positiveas well. Since we are allowing negative remainders (withmagnitude less than D), we may require a final correctionstep if the final remainder turns out to be negative. Thecorrection step would add the divisor D to the partialremainder Rn with a corresponding correction to the

quotient value by subtracting a ulp.As in the case of division, AB < D is the condition to

guarantee that no overflow may occur (AB DQ R).The following analysis assumes the use of (7)-(9) with theunderstanding that similar analysis holds true if thealternative formulas of (10)-(12) are used instead.

2.2 Preprocessing and Operand Value Constraint

Referring to the recurrence relation of (7), each iterationconsists of the following steps:

1. Determination of the next quotient digit qj usingsome quotient digit selection function (qj SELrRj1; D). The selection function may typically

be implemented as a lookup table.2. Generate the product qjD.3. Perform the triple addition of rRj1, (qjD), and

bj1Ar1. The resulting partial residue (Rj) must

guarantee that jRjj < jDj. The condition jRjj < jDjdepends on theproper choice of thequotient digit qj.

When performing a multiply-divide operation, we areadding a multiple of the input operand A in each step. Theresulting residue thus obtained (Rj) cannot be known aspredictably as the case of high-radix division. However, wemay still restrict the value range of (Rj) by placing somerestrictions on the value of A. One possible restriction is toimpose the constraint jAj < jDj. Thus, we assume that

A !D, where ! < 1.Assuming Maxbi B

, (7) yields

Rjmaxqj rRj1 qjD B

r!D;

rRj1 qjD !TD; and

Rjmaxqj Rjdivision !TD;

where T B

r < 1, ! < 1, and Rjdivision is the residue of

regular high-radix division. This shows that the deviation in

the remainder curve of the Robertson diagram [10], [11]

from the case of pure division can be as high as !T D.he upper bound of ! A=D, which equals Amax=Dmin

must be less than one. To guarantee satisfaction of thisconstraint, a preprocessing step shifting A by Z bits tothe right is performed. Thus, if the input operand is A0,processing is actually performed on A A0=2Z ratherthan the input operand A0 itself. Accordingly, ourproposed methodology computes S AB=D and a post-processing step computes S0 A0B=D S A0=A S2Z.For the adopted operand fractional formats (Dmin 0:5 orDmin 1:0), an x-bit normalized significand has a ratio ofAmax=Dmin which equals 2 2

x1 and accordingly, theupper bound of ! is given by

! Amax=Dmin

2Z

21 2nm

2Z:

Since, for typical values ofn and m, 2nm ( 1, we define theparameter !max 21

Z as the upper bound for !, such that

! < !max, where

!max 21Z 1: 13

2.3 Size of the Multiplier-Divider Processor

The multiply-divide recurrence relation can be implemen-ted in hardware using shift and add operations. Although

the problem size is bits ( nm), the minimum possible

size in radix r implementations is n 2m Z 1 bits

where r 2m. Referring to the high-radix multiplier-dividerrecurrence relation (7), a total of n digits are needed to

accommodate the input operand size, two more digits are

needed to account for the left and right shifts (rRj1 and

bj1Ar1 ), respectively, Z extra bits are needed since

computations are performed on the constrained parameter

A (A A0=2Z) rather than the multiplicand A0, and a sign

bit is required since the partial residue Rj may be either

positive or negative.

2.4 Postprocessing and the Number of Iterations

Due to the preprocessing step where the input multiplicand

A0 is shifted right by Z-bit positions, i.e., A A0=2Z, a

postprocessing step where the result S is shifted left by

Z-bit positions is needed, i.e., S0 S2Z. In other words, since

the resulting quotient and remainder values (Q and R)

satisfy the relation AB QD R, i .e., A0 2ZB

QD R, the true quotient Q0 and remainder R0 which

satisfy A0B Q0D R0 are computed in a postprocessingstep as: Q0 Q2Z, and R0 R2Z. Thus, it is expected that the

first Z bits of the resulting quotient (Q) to be zeros.

Accordingly, if n-significant digits of Q0 are needed, the

number of required iterations of the recurrence relation (7)must be raised to n dZme. Thus, Zero output digits are

produced for the first bZmc clock cycles. This initial delay is

quite similar to the online delay of online arithmetic [10],[18]. As such, this algorithm carries some resemblance to an

online division algorithm where the dividend is received in

an online digit-serial manner while the divisor is available inparallel.

3 QUOTIENT DIGIT SELECTION

To define the quotient digit selection function, we need to

determine the upper and lower bounds of the shifted partial

residue (P rRj1) for which a given quotient digit valuemay be selected such that jRjj < jDj. The assumptions

under which these bounds will be derived are:

1. Rj is kept bounded, i.e., jRjj < jDj, by defining thenegative and positive range limiting factors h andh such that hD Rj h

D, where h; h < 1.2. The radix r is a power of 2, i.e., r 2m.3. The multiplier-divider operand A is obtained by

shifting the input operand A0 by Z bits to theright, i.e., A A0=2Z.

4. The magnitude of the multiplicand A is smaller thanthe magnitude of the divisor D (A !D, where

! < !max 21Z

1

).




5/14

5. The multiplier B is represented by radix r digits bieither in a 0; r 1 nonredundant digit set, or in aB; B redundant signed digit set. In thefollowing analysis, we will use a generalized signeddigit set [19], where bi falls in the range B

; B.6. For the quotient digits, we use a redundant balanced

s i gn e d d i gi t s e t Dq f; 1; . . . ; 1; 0;

1; . . . ; g, with r=2 r 1.7. For thejth iteration, an acceptable choice ofqj is one

which satisfies the condition hD Rj hD.

3.1 Range Limiting Factors

For a feasible implementation of the high-radix multiplier-divider recurrence relation (7), when the shifted partialresidue rRj1 equals its maximum value (rh

D) and bj1 isalso maximum ( B) , a v al ue of qj shouldguarantee that Rj < h

D, thus

rhD D B

r!D

hD;

thus,

h

r 1

B

r 1

!

r:

Replacing ! by !max in the above equation, we obtain alower bound expression for h which guarantees that

Rj < hD. Thus, h is taken as

h

r 1

B

r 1

!maxr

; 14

h h !max

rhB; 15

where

hB B

r 1: 16

Likewise, when the shifted partial residue rRj1 equalsits minimum value rhD and bj1 is also minimum B, a value of qj should guarantee that

Rj ! hD, thus

rhD D B

r!D

! hD; or

h

r 1

B

r 1

!

r:

Replacing ! by !max in the above equation, we obtain alower bound expression for h which guarantees that(hD Rj). Thus, h

is taken as

h

r 1

B

r 1

!maxr

; 17

h h !max

rhB; 18

where

hB B

r 1: 19

3.2 Recurrence Revisited

Assuming fractional multiplication-division, the error in theresulting quotient must be bounded by

jqj AB

D Q

rn: 20

For a meaningful multiplication-division operation, thepartial products have to beaccumulated ahead of thedivisionprocess by at least one iteration. Thus, the quotient error atthe jth iteration is defined by

qj ABj 1

D Qj; 21

where Bj 1 0:b1b2 bj1 and Qj 0:q1q2 qj.It can be easily shown that

qj1 q

j rj1 A

D

bj2

r qj1

& ': 22

For convergence, the quotient digit qj1 should be

chosen to reduce the error qj1 even under worst-casevalues of bj2. Assuming a multiplier digit set in therange B; B, convergence worst-case values of bj2are B, if qj > 0, and B

, otherwise. In either case, theworst-case value of A=D is !max. Thus, for convergence,

qj

can be expressed in terms of the error after the nth iterationqn under worst case as

qj qn

Xnij1

ri !maxB

r

max

Xnij1

qiri

!: 23

Assuming a balanced quotient digit set in the range; , (23) becomes

qj qn

Xnij1

ri !maxB

r

; 24

qj qn !max

B

r

rj rn

r 1; 25

qj qn h

!maxr

hB

rj rn; 26

qj qn h

rj rn; 27

where hB and h are those defined in Section 3.1.From (20), since jqnj r

n, (27) yields an upper boundsolution of qj h

rj for all possible values of h and hB.Likewise, it can be shown that the lower bound solution is

qj ! hrj with h is as defined in Section 3.1. Thus, for

iterations to converge to a solution, the iteration quotienterror must be bounded by

hrj qj hrj; 28

or

hrj ABj 1

D Qj hrj: 29




6/14

We define the residual Rj such that its upper and lowerbounds are independent ofj, as

Rj rjABj 1 DQj: 30

Equation (30) yields the recurrence relation

Rj rRj1 qjD bj1Ar1; 31

such that

hD Rj hD: 32

The initial value R0 is obtained from (30) as

R0 AB1 Ab1r1: 33

As indicated by (28), the final error may be negative whichwould require a correction step where D is added to Rn andQ is decremented by a ulp.

3.3 P-D Diagrams

Here, we determine the selection interval defined by the

upper (Uk) and lower (Lk) bounds of the shifted partialresidue (P rRj1) for which a given quotient digitvalue (qj k) may be selected such that the next partialresidue (Rj) satisfies h

D Rj hD. From (7), we

can write P rRj1 Rj qjD bj1Ar1. Thus,

Uk rRj1 Rjmax kD MAXfbj1Ar1g; i:e:;

Uk hD kD

B

r!maxD

k h B

r!max

D;

thus,

UkD k

D; 34

where

h B

r!max; or

h !maxhB:

35

Likewise,

LkD k D; 36

where

h !maxhB: 37

Equations (35) and (37), clearly show that if abalanced signed digit set is used for B. In this case, theP-D diagram will be symmetric with UkD LkDwhich would allow the utilization of only the firstquadrant of the P-D diagram considerably reducing thestorage requirements of the quotient digit selectionfunction [14], [15].

3.4 The Selection Function

Using all bits of P and D (2 2m Z 1 bits) as input tothe quotient digit selection function SELP ; D requireshuge ROM or PLA sizes. Accordingly, it is advantageous tominimize the number of input bits to the quotient digit

selection function. Thus, we use truncated values of P and D

as input to the quotient digit selection function. Let thesetruncated values be Pt and Dt and let the number offractional bits of these parameters be nP and nD, respec-tively. Thus, the maximum truncation error values for P andD are 2np and 2nD , respectively. Using a 20s complementrepresentation, the introduced truncation errors are alwayspositive, i.e., P ! Pt and D ! Dt. Accordingly, any given

value of Pt represents a range of P that is defined byPt P < Pt 2

nP. Likewise, a given value ofDt representsa range of D defined by Dt D < Dt 2

nD . As Pt and Dtare the only inputs to the selection function, with a totalnumber of bits (m nP nD 2 bDminc). For the samesystem, i.e., same r and , the number of input bits to theselection function is the same independent of the represen-tation format of the input fractions (i.e., whether Dmin 0:5or Dmin 1:0). Note that the values of nP and nD in caseDmin 1:0 are smaller by one bit compared to their valuesfor the case where Dmin 0:5. To reduce the hardwarecomplexity of the selection function, (nP nD), henceforth,designated as nTo t, should be minimized. The selection

function defines for each interval of the divisor D [Di, Di1),where Di1 Di 2

nD , comparison constants mki withinthe overlap regions for all values of k such that:

. The set ofcomparison constants for each range ofD isdetermined such that a given value ofPt is comparedto these constants based on which a proper value ofqj is chosen, e.g., mki Pt < mk1i qj k.

. If a symmetric multiplier digit set is used (B B),symmetry of the P-D diagram can be utilized [14] andonly comparison constants mki for the first quad-rant may be defined, i.e., for k 0; 1; 2; . . . ; .

. The comparison constants mki arechosen withintheoverlapregionswhereachoiceofa qj valueofeitherkor k 1 satisfies the constraint hD Rj hD.

. Since any value within the overlap region may beused as a comparison constant, the choice is madesuch that (nP nD) is minimized.

For the ith selection interval Di; Di1, when determin-ing the comparison constant mki, two Conditions must besatisfied [10], [11]:

1. Containment: Where Lk mki Uk, and2. Continuity: If Pt mki 2

nP, then qj mustequal k 1 which implies that mki 2

nP Uk1.Written differently, we must have mki Uk1 2

nP as well as satisfy the containment constraint.

Accordingly, mki should satisfy Lk mki Uk1 2

nP.

For the ith selection interval Di; Di1, the uncertainty inthe value ofP for a given value ofPt, has an upper bound ofP 2

nP, i.e., Pt P < Pt P. Accordingly, the upperbound of the comparison constant mki must be reducedby P, and hence, mki should satisfy

Lk mki Uk1: 38

4 OPTIMAL DESIGN PARAMETERS

The objective of this section is to derive expressions for

optimal values of nP, nD, and Z. Using 2s complement




7/14

binary system, we derive these expressions for two cases:one with the shifted partial residue (P rRj1) representedin a nonredundant binary format, and another in aredundant carry-save format.

4.1 Using Nonredundant Binary Representation

For a feasible mki value, the height of the overlap region

(y) at a given divisor value (D) must be greater than theminimum grid 2nP, thus,

y Uk1 Lk 1 D:

At D Dmin, the height of the overlap region y isminimum ymin Uk1 Lk

1Dmin. Accord-ingly, the minimum value of nP (nPmin) is the smallestinteger satisfying

2nPmin < 1 Dmin: 39

The lower bound ofnPmin is reached at very high valuesof Z (Z 1) leading t o (!max 0) in which cas e

h. This low bound is the smallest integer valuesatisfying the following equation:

2nPLow Bound < 2h 1Dmin; 40

where h r1 is the quotient digit set redundancy factor.Defining Z1 as the value of Z at which nPmin is equal

to its lower bound value, (39) shows that Z1 is the minimuminteger satisfying

2Z1 >

2 hB hB

2h 1 2

nPLow Bound

Dmin

: 41To exploit symmetry of the P-D diagram, we adopt the

approach outlined in [14], [15] where the comparisonconstants mki have been defined to stress the symmetricnature of the diagram. This makes it possible to only utilizefirst quadrant of the P-D diagram significantly reducing thehardware complexity of the quotient digit selection logic. Incase of pure division, Fig. 2 shows P-D diagram for r 4, 2, Dmin 0:5, and P is represented in nonredundantformat. The comparison constants are defined as follows:

1. m0i 0 8 i, i.e., for all ranges of D.2. For P > 0:

. For 1 k , defineUkD k hD, LkD k hD.

. Plower mki Pupper, where

- Pupper Uk1D k 1 D.

- Plower LkD 2nD k D 2nD .

. mki Pt < mk1i qj k.3. For P < 0:

. For 1 k , defineUkD k hD k hD LkD,LkD k hD k hD UkD.

. Plower mki Pupper.

. Pupper UkD 2nD k D 2nD .

. Plower Lk1D k 1 D.

. mk1i < Pt mki qj k.

In this context, the overlap region for a given divisor

value y for P > 0 and y for P < 0 are given by

y Pupper Plower

1D k 2nD > 0;42

y Pupper Plower

1D k 2nD > 0:43

Notes.

1. The overlap range (y or y) is smaller forsmaller values of D.

2. Higher values ofk yield smaller overlap regions (forboth y and y).

3. For worst-case analysis, the smallest values of y

(ymin) and for y (ymin) occur at D Dmin and

k . Thus,

ymin 1Dmin

2nD > 0;

44

ymin 1Dmin

2nD > 0:

45

4. With a balanced multiplier digit set (B B),y y. However, if a [0, r-1] nonredundantdigit set is used, B r 1 is greater than B 0in which case ymin < y

min, and worst-case analy-

sis should be performed on y

min

.


Fig. 2. Division P-D Diagram r 4; 2.



8/14

Equation (45) shows that the minimum value of nD(nDmin) is the smallest integer value satisfying

2nDmin 0.

ymin 1Dmin

2nD 2nP > 0:54

This yields the following constraint on the allowedminimum value of nD:

2nD 2

h

B h

B2h 1 2

nPLow Bound

Dmin

: 58Multiplying both sides of (55) by 2nP, we obtain

2nPnD

1

1Dmin: 61

It is clear that the computed value of nPopt (61) is higherby 1 than the minimum nP value defined by (56). Further, by using (55), it is seen that the optimum nD value is theminimum integer value satisfying

2nDopt1

2hB hB

2h 1 2nPLow Bound

Dmin

:

3. Compute g as the minimum integer satisfying

2g < 2h 1D

min

h:

4. Compute Z2 as the minimum integer value satisfying

2Z2 >

2fhB1 2g

Dmin hBg

2h 1 2g

Dmin h

:

5. Compute Z MAXZ1; Z2.

In special cases, analytical expressions for the values of Z1,

Z2, and Z can be derived. For example, for the case where

r 2m

!4

, r 1

, h 1

,2

nPLow Bound

Dmin 0

:5

,2

g

Dmin 1

=r,Z1 3, an d Z Z2 m 1 for minimally redundant

symmetric multiplier digit set (B B r=2) as well as

nonredundant multiplier digit set (B r 1; B 0).

Similarly, for the case where r=2, h 0:51 1r1,2

nPLow Bound

Dmin 1=r, 2

g

Dmin 2

r2, Z Z1 2m 2, and Z2

2m 1 for minimally redundant symmetric multiplier digit

set while for nonredundant multiplier digit set Z Z1

2m 1, and Z2 2m.For values of r=2, r 2, and r 1, Figs. 3 and 4 show

the values ofZfor both the nonredundant and the minimally

redundant balanced multiplier digit sets, respectively.

5 HIGH-RADIX MULTIPLIER-DIVIDER DESIGNPROCEDURE

Optimal values ofnP, nD, and Z are required for the designof a multiplier-divider of a given r, , Dmin, a n d aB; B multiplier digit set. To determine if a given setofnP, nD, and Z values represent a valid solution, as well asdetermine the values of the comparison constants mki forthis solution, we follow the same approach detailed in [14].This approach defines a validity function that can test for thevalidity of a given solution as well as an almost closed formto define the comparison constants of a given solution. Wewill define these expressions for the case of multiplier

dividers for the general case which can be easily mapped forthe more feasible special case of a balanced multiplier digitset to exploit features of symmetry in the P-D diagram.

The ith truncated divisor value Di and the comparisonconstants mki of the ith divisor range Di; Di1 are integermultiples of 2nD and 2nP, respectively, these two para-meters are represented as follows:

Di di 2nD ; 66

mki mk;i 2nP; for P > 0; 67

mki mk;i 2nP; f or P < 0; 68


Fig. 3. Values of Z for different radices and nonredundant multiplierdigit set.

Fig. 4. Values of Z for different radices and minimally redundancysymmetric multiplier digit set.



10/14

where the coefficients mk;i, mk;i, and di are positive integervalues with Dmin2nD di < Dmin2nD

1. Fo r P > 0, thecomparison constants have to satisfy

Plower mki Pupper; or

k di 12nD mk;i2

nP

k 1

di2

nD

2

nP

;or

d2nPnD k di 1e mk;i

b2nPnD k 1di 1c:

69

As in [14], (69) is used as the basis for defining the mk;icoefficients and the validity function VnP; nD; di whichverifies the validity of a given solution of nP, nD, and Zas follows:

mk;i d2nPnD k di 1e;

where Dmin2nD di < Dmin2

nD1;70

VnP; nD; di b2nPnD 1di 1c

d2nPnD di 1e;

redundant carry-save format

71

VnP; nD; di b2nPnD 1dic

d2nPnD di 1e:

nonredundant binary format

72

Likewise, for P < 0, the mk;i coefficients and thevalidity functions are defined as

mk;i d2nPnD k di 1 1e;

where Dmin2nD di < Dmin2

nD1;73

VnP; nD; di b2nPnD 1di 1c

d2nPnD di 1e;

redundant carry-save format

74

VnP; nD; di b2nPnD 1dic

d2nPnD di 1e;

nonredundant binary format

75

A solution (a given set of nP, nD, and Z) is consideredfeasible if:

1. VnP; nD; Dmin2nD

!1

, or2. VnP; nD; di 0 for di Dmin2nD ; . . . ; d1 1 andVnP; nD; d1 ! 1.

In case of a 0; r 1 nonredundant multiplier digit set,the P-D diagram is not symmetric and ymin < y

min

which means that (74) or (75) should be used to verify thevalidity of a given solution. If, however, a balancedmultiplier digit set is used, then B B, hB h

B hB,

and leading to a symmetric P-D diagram. Inthis case, the symmetry of the P-D diagram can be exploitedto allow for the use of only the first quadrant which leads tosignificant reduction the hardware complexity of the digitselection logic [14], [15]. It should also be noted that the

larger the size of the multiplier digit set is the narrower the

overlap regions will be which means that a minimallyredundant digit set is the best choice for multiplier B.

Example. Let r 4, 2, Dmin 0:5, and B B 2, we

compute Z 6 (Section 4.3). This yields nPmin 3witha corresponding nD 7 which is not a feasible solutionwith V3; 7; 26 1. However, a feasible solution existsat Z 6, nP 5,and nD 4 with V5; 4; di 0 for di 8,9 but with di 10V5; 4; 10 1. The coefficients (mk;i) oftheselectionconstantsforthissolutionaregiveninTable1.

To reduce the area of the lookup table of the quotient digitselection function, the total number of address bits of thistable must be reduced. Accordingly, an important objectiveof the design process of multiplier-dividers is to minimizenT ot nP nD. To come up with an optimal set of nP, nD,and Z, some choices/trade-offs are needed. In addition to thefundamental choices that are essential for the design of digitrecurrence dividers, e.g., the radix, the quotient digit set, andthe representation method of the partial remainder[12], otherchoices are needed for the design of multiplier-dividers. Forexample, highervalues ofZincrease the number of iterationsas well as the processor size, while at the same time, yielding

lower nP and nTo t values which would reduce the delay ofeach iteration as well as thearea of thequotientdigit selectionfunction. The proposed method to compute Z as outlined inSection 4.3, invariably yields solutions optimized for lowervalues of Z and nTo t. However, slightly higher values of Zmay, in some cases, yield solutions with the same nT ot butwith lower nP values. Lower nP values have somewhat loweriteration delays and hardware complexity. For example, inthe case of r 4, 2, Dmin 0:5, and B

B 2, thecomputed Zvalue is 6 which yields the solution nP 5, andnD 4. With Z increased by one bit to 7, we obtain anothersolution having the same nT ot but with a lower nP 4. Inother cases, valueslower than thecomputedZbyonebitmay

yield the same solution allowing for a lower value of Zto be used. For example, the case of r 64, 32, andB B 32, the computed value of Z 14 yields asolution of nP 9, and nD 13. With Z smaller by one bit(Z 13), the same solution is obtained. Thus, if the targetsystem parameters are to be optimized, it is advisable toinvestigate solutions for the computed Zas well as for Z 1,and Z 1. Another choice for multiplier-dividers is theB; B range of the multiplier digit set. If a nonredun-dant digit set in the 0; r 1 range is used, generation of(A=r)-multiples would require a number of precomputationsfor r ! 4. If, however, a minimally redundant symmetricdigit set (B B r

2) is used not only would that lead to

fewer numberof such precomputationsbut it will also yield a


TABLE 1First Quadrant Comparison Constants Coefficients (mk;i)

versus di for r 4, 2, Dmin 0:5, B B 2,

Z 6, nP 5, and nD 4



11/14

symmetric P-D diagram that allows for the use of only thefirst quadrant significantly reducing the hardware complex-

ity of the selection function. It should also be pointed out thatthe height of overlap regions is slightly larger in case of anonredundant multiplier digit set. However, with minimallyredundant digit set the height reduction is almost insignif-icant and has practically no effect on the optimal nP andnD parameter values.

Based on the above, given the system radix r, thequotient digit set parameter , and the multiplier digit setparameters B and B, the optimal parameters for the high-radix multiplier-divider may be determined as follows:

1. Compute h r1 , hB

B

r1, and hB

B

r1.

2. Determine Z as detailed in Section 4.3.3. For Zi Z 1 To Z 1, do

a. Compute !max 21Zi .

b. Compute nPmin ((39) or (56)).c. For nP nPmin To nPmin 2, do

i. Compute the current nD value (nDC)corresponding to the current nP value((46) or (55)).

ii. For nD nDC To nDC 2, do

. If the current nP, nD, and Zi constitute afeasible solution, then store Zi, nP, nD,and nT ot nP nD.

4. Select the solution which yields the smallest nT ot. In

case of more than one solution having the same nT ot,choose the one with the smallest nP.

5. Generate the comparison constants mki for theselected solution ((70) and (73)).

Thedescribed designprocedure hasbeen used to computethe design parameters for multiplier-dividers of variousvalues of r, and . For a minimally redundant symmetricmultiplier digit set, Figs. 4, 5, 6, and7 show thederived valuesof Z, nP, nD, and nT ot nP nD, respectively, versusdifferent radices for r=2, r 2, and r 1.

It should be pointed out that the described multiplier-divider design procedure will generally yield an optimalnT ot value which is higher than its corresponding value for a

pure divider hardware. Consider, for example, the case of

balanced multiplier digit set where expressions for Uk andLk are identical to those of the pure divider case only ifRho Rho h. Theoretically, however, Rho and Rho

will equal h only at Z 1. When Z1 and Z2 have feasiblevalues, the proposed method to compute Z will only ensurethat the values of nPmin and nDmin of the multiplier-divider equal those of the corresponding pure divider. Thisyields solutions which have either the same value of nT ot orhigher by just one bit at reasonable values of Z. In somecases where the multiplier-divider has a larger nTot value,solutions having the same nTo t as pure dividers are possibleto obtain but at impractically large values of Z. This is clearin the case where r 4, 2 where the multiplier-dividerdesign procedure yields a solution of nT ot 9 at Z 5,whereas for a pure divider, nT ot 8. If, however, the value ofZis raised to 48 or higher, a solution exists where nTot 8. Insuch case, even though the reduced nT ot yields a smaller

lookup table, the sizable increase in Z results in unaccep-tably larger number of iterations as well as processor size.Section 7 gives the major design parameter values (Z, nP,and nD) of various multiplier-divider systems.

As shown in Section 4.3, Z is highest (2m 2) whenminimally redundant digit sets are used for both thequotientand multiplier. Thus, as a worst case, Z may assume valuesas high as 2m 3 since the above-described design proce-dure investigates values up to Z 1. Thus, the initial delay of


Fig. 5. Values of nP for different radices and minimally redundantsymmetric multiplier digit set.

Fig. 6. Values of nD for different radices and minimally redundantsymmetric multiplier digit set.

Fig. 7. Values of nTot nD nP for different radices and minimallyredundant symmetric multiplier digit set.



12/14

bZmc clock cycles has a theoretical upper bound value of

three clock cycles for r 2 f4; 8g and two cycles for r > 8.

6 HARDWARE IMPLEMENTATION FOR REDUNDANTREPRESENTATION

Fig. 8 shows a possible hardware implementation of our

proposed high-radix multiplier-divider of a balanced

multiplier digit set for the case of Dmin 0:5. It has the

following features:

. A counter is used to hold the number of iterations tobe performed (niterate d

Zm e).

. The most significant bits of the -bit B register arepassed to combinational logic that generates thedesired digit set, e.g., a balanced signed digit set, theoutput is the current digit (bj1) of B. In eachiteration, register B is shifted left by m bits.

. Utilizing symmetry of the P-D diagram, dependingon the sign of Pt, m nP bits of either Pt or its 1

0scomplement are passed as address to the lookuptable of the quotient digit selection function.

. The selection function is implemented either as aROM or a PLA where the truncated values of P and

D (Pt and Dt) are the input to this ROM (or PLA) for

a total of (m nT ot 12Dmin ) bits. The output of

the ROM/PLA is a first-quadrant quotient digit ofdLog2 1e m bits.

. The value ofP rRj uses a redundant representa-tion in the form of a SUM component (P S), and aCARRY component (P C) which are held in theregisters PSR and PCR, respectively. Accordingly,

there are four quantities that need to be added eachiteration: P S, P C, qjD, and (bj1A=r).

. The multiplexer MU Xa generates the m bits ofjbj1jA

0 or its 1s complement depending on the signof bj1. This is sign-extended by 1 Z m bits. Togenerate the signed 2s complement of (bj1A=r),where A A0=2z, the sign ofbj1 is fed as the leastsignificant bit of the left-shifted copy of PC.

. The output m 1 bits of MU Xd is either (jqjjD)or its signed 1s complement depending on the signof qj. The output of MU Xd is left appended byZ m bits each having a value that equals the signbit ofMU Xd output (sign-bit extension). To add the

signed 2s complement of qjD, the carry-in ofthe (4:2) compressor is fed with the sign of qjD.

. A Carry Lookahead Adder (CLA) is used to add the(1 m nP) most significant bits of the sum andcarry components of the shifted partial residue (P Sand P C). The resulting summation is the truncatedPt value used as input to the ROM/PLA.

. Adding the four quantities P S, P C, qjD, and(bj1A

0) is done using a (4:2) compressor yieldingtwo outputs: a partial sum component (Sum), and apartial carry component (Cry).

. An m-bit left-shifted version of Sum and Cry arestored in two registers (PSR and PCR) to represent(rRj). The outputs of PSR and PCR are fed back asinput to the (4:2) compressor representing theshifted partial residue (rRj) while the (1 m nP)most significant bits of PSR and PCR are addedusing the CLA to yield the value of Pt.

. At the last iteration, a CPA is used to assimilate thesum and carry components of the shifted partialresidue (P S and P C) to yield the value of P. ThisCPA may (or may not) utilize the (1 m nP)-bitCLA to yield the (1 m nP) most significant bits ofthe result as shown in Fig. 8.

. Using higher radices results in larger implementa-tion areas. If the selection function is implemented asa R OM, its t ot al capacit y in b it s would b e

2mnTot 12D

min m bits. Referring to Fig. 7, underworst-case scenario, nTo t increases linearly with m,i.e., nT ot m in which case the ROM capacity is2

1 r1 m. Thus, assuming a minimally redun-dant multiplier digit set, for r=2, where 3and 4, the ROM capacity is of order Or4Log2r,while for r 1, where 1 and 5, the ROMcapacity is of order Or2Log2r. Sizes of other datapath modules, e.g., the (4:2) compressor, registers,adders, and multiplexers depend on m, Z, or nP.Since, under worst-case scenario, Z and nP increaselinearly with m as shown in Figs. 3, 4, and 5, thearea complexity of these multiplier-divider data path

modules is OLog2r under worst-case scenario.


Fig. 8. Hardware implementation of the high-radix multiplier-divider.



13/14

7 RESULTS AND DISCUSSION

The described multiplier-divider design procedure andhardware have been modeled and verified using VHDL.The model has been used to compute the major designparameters for various values of r, and as well as for a

minimally redundant balanced multiplier digit set, and a0; r 1 nonredundant digit set. Table 2 lists the obtainedresults where Zbal is the value of Z when a balancedminimally redundant digit set is used for B, while Zun is thevalue ofZ when a 0; r 1 nonredundant digit set is used.

For all tested combinations of r, , and Dmin, with the

exception of the r 2 case, it was found that both the

balanced minimally redundant and the nonredundant

multiplier digit sets yield the same nP and nD values.

However, the values ofZ are generally slightly lower in the

nonredundant case (Zun) as can be seen from Table 2. This is

not unexpected since the overlap region height is slightly

larger in this case compared to the case when a balanced

digit set is used. Parameter values were also obtained forthe case where Dmin 1:0 and, as expected, exactly the

same results were obtained as listed in Table 2 but with nPand nD lower by one bit.

The case ofr 4, 3, yields nP and nD values that are

identical to those of a pure divider. For the case of r 4,

2, the multiplier-divider design procedure described in

Section 5 yields a solution of Z 5 and nTo t 9 higher by

one bit than that of a pure divider [15]. In the same case,

however, the same values of nP and nD as those of a pure

divider are obtained only at Z ! 48.For the case of r 2 and Dmin 0:5, even though the

design equation of Z2

(64) yields infeasible values, a

solution is possible by assigning Z the computed value of

Z1 in this case. For a balanced minimally redundant

multiplier digit set, Z Z1 4 and we obtain a solution

at nP 3 and nD 1. A better solution is obtained for a

0; r 1 nonredundant binary multiplier digit set where

Z Z1 3, nP 2, and nD 1 with comparison constantsat m0i 0:5 and m1i 0.

The last column of Table 2 shows the computed initialdelay ( b Zmc) for a balanced minimally redundant multiplier

digit set aswellas a nonredundantset. Itis clear thatfor r > 4,

this delay is only either one or two clock cycles depending on

whether the quotient digit set is maximally or minimally

redundant balanced set, respectively. It may be worthwhile

to point outthat a multiplier-divider that is built by cascading

an online divider to an online multiplier will have an overall

online delay that is equal to the sum of the individual online

delays of the multiplier and the divider. While online

multipliers have a typical online delay of three [10], the

onlinedelayof onlinedividers variesbasedon theused radix,

quotient digit set, and the selection function. Depending onthe complexity of the implementation, various online delays

for dividers ranging from as small as three [20], to as large as

eight [21] have been reported.Accordingly, a cascaded online

multiplier-dividerwould have an onlinedelay in therangeof

6-11 clock cycles.

8 CONCLUSION

A general radix recurrence relation and a design methodol-

ogy which describes how to efficiently design fast digital

multiplier-dividers have been developed. The proposed

single recurrence relation performs simultaneous multi-

plication and division. With the fused implementation ofmultiplication and division, the two operations can be

executed using a single instruction, implying only a singlerounding. The design methodology is based on an

analytical model with a set of equations fully defining

and relating lower bounds and optimal values for thenumber of fractional bits of both the divisor and the shifted

partial remainder to be used as input to the quotient digit

selection logic. The developed design strategy and method

has been modeled using VHDL and used in the design of

multiplier-dividers for variety of radix and quotient digitset values. A proposed hardware implementation has also

been modeled and verified.

ACKNOWLEDGMENTS

The authors would like to acknowledge the support of theComputer Engineering Department of King Fahd University

of Petroleum & Minerals (KFUPM) and King Abdul-Aziz

City for Science & Technology (KACST). This work was

partially supported by a grant from King Abdul-Aziz Cityfor Science & Technology under grant AR 22-17. The authors

would like to also acknowledge the reviewers of the paper as

well as the associate editor Professor Elisardo Antelo for

their insightful comments and suggestions that have further

improved the quality of the work presented here.


TABLE 2Multiplier-Divider Parameters for Dmin 0:5



14/14

REFERENCES[1] K. Kucukcakar, An ASIP Design Methodology for Embedded

Systems, Proc. Seventh Intl Symp. Hardware/Software Codesign(CODES 99), pp. 17-21, May 1999.

[2] P. Biswas, N.D. Dutt, L. Pozzi, and P. Ienne, Introduction ofArchitecturally Visible Storage in Instruction Set Extensions,IEEE Trans. Computer-Aided Design of Integrated Circuits andSystems, vol. 26, no. 3, pp. 435-446, Mar. 2007.

[3] R.E. Gonzalez, Xtensa: A Configurable and Extensible Proces-sor, IEEE Micro, vol. 20, no. 2, pp. 60-70, Mar./Apr. 2000.[4] J. Groschadl, Instruction Set Extension for Long Integer

Modulo Arithmetic on RISC-Based Smart Cards, Proc. 14thSymp. Computer Architecture and High Performance Computing(SBAC-PAD 02), pp. 13-19, 2002.

[5] E.M. Popovici and P. Fitzpatrick, Algorithm and Architecture fora Galois Field Multiplicative Arithmetic Processor, IEEE Trans.Information Theory, vol. 49, no. 12, pp. 3303-3307, Dec. 2003.

[6] J.H.P. Zurawski and J.B. Gosling, Design of a High-Speed SquareRoot Multiply and Divide Unit, IEEE Trans. Computers, vol. 36,no. 1, pp. 13-23, Jan. 1987.

[7] M.D. Ercegovac and T. Lang, Implementation of ModuleCombining Multiplication, Division, and Square Root, Proc. IEEEIntl Symp. Circuits and Systems (ISCAS 89), pp. 150-153, 1989.

[8] R. McIlhenny and M.D. Ercegovac, On the Implementation of aThree-Operand Multiplier, Conf. Record of the 31st Asilomar Conf.

Signals, Systems & Computers, vol. 2, nos. 2-5, pp. 1168-1172, Nov.1997.[9] E. Antelo, T. Lang, and J.D. Bruguera, Computation of

ffiffiffiffiffiffiffiffiffiX=d

pin a

Very High Radix Combined Division/Square-Root Unit withScaling and Selection by Rounding, IEEE Trans. Computers,vol. 47, no. 2, pp. 152-161, Feb. 1998.

[10] M.D. Ercegovac and T. Lang, Digital Arithmetic. Morgan Kauf-mann, 2004.

[11] M.D. Ercegovac and T. Lang, Division and Square Root: Digit-Recurrence Algorithms and Implementations. Kluwer AcademicPublishers, 1994.

[12] S.F. Obermann and M.J. Flynn, Division Algorithms andImplementations, IEEE Trans. Computers, vol. 46, no. 8, pp. 833-854, Aug. 1997.

[13] D.E. Atkins, Higher-Radix Division Using Estimates of theDivisor and Partial Remainders, IEEE Trans. Computers, vol. 17,no. 10, pp. 925-934, Oct. 1968.

[14] P. Kornerup, Revisiting SRT Quotient Digit Selection, Proc. 16thIEEE Symp. Computer Arithmetic, pp. 38-45, June 2003.[15] P. Kornerup, Digit Selection for SRT Division and Square Root,

IEEE Trans. Computers, vol. 54, no. 3, pp. 294-303, Mar. 2005.[16] N. Takagi, A Radix-4 Modular Multiplication Hardware Algo-

rithm for Modular Exponentiation, IEEE Trans. Computers,vol. 41, no. 8, pp. 949-956, Aug. 1992.

[17] P.T.P. Tang, Modular Multiplication Using Redundant DigitDivision, Proc. 18th IEEE Symp. Computer Arithmetic, pp. 217-224,

June 2007.[18] D. Lau, A. Schneider, M.D. Ercegovac, and J. Villasenor, A

FPGA-Based Library for On-Line Signal Processing, J. VLSISignal Processing Systems, vol. 28, nos. 1-2, pp. 129-143, May-June2001.

[19] B. Parhami, Generalized Signed-Digit Number Systems: AUnifying Framework for Redundant Number Representations,IEEE Trans. Computers, vol. 39, no. 1, pp. 89-98, Jan. 1990.

[20]P.K.-G. Tu, On-Line Arithmetic Algorithms for Efficient Im-plementation, PhD thesis, Univ. of California, Los Angeles, Sept.1990.

[21] P.K.-G. Tu and M.D. Ercegovac, A Radix-4 On-Line DivisionAlgorithm, Proc. IEEE Eighth Symp. Computer Arithmetic, pp. 181-187, 1987.

Alaaeldin Amin received the BS degree inelectrical engineering and the MS degree inelectronic circuits from Cairo University in 1974and 1977, respectively. He received the PhDdegree from the University of Utah in 1987.From 1980 to 1988, he was a member of theMOS Memory R&D group of National Semicon-ductor Corporation before joining the ComputerEngineering Department of King Fahd University

of Petroleum & Minerals in September 1988. Heholds four US patents with three more pending applications. Hisresearch interests include computer arithmetic, VLSI circuit design,digital system testing and design for testability, digital system modelingand synthesis, memory design and testing, design automation, andcomputer architecture. He is a member of the IEEE.

M. Waleed Shinwari received the BSc degreewith a double major in electrical engineering andcomputer engineering from King Fahd Universityof Petroleum and Minerals, Dhahran, SaudiArabia, in 2003, and the masters degree inelectrical engineering from McMaster University,Hamilton, Ontario, Canada. He is currentlyworking toward the PhD degree in electricalengineering at McMaster University, Hamilton,Ontario, Canada. During summer of 2002, he

worked as an engineering trainee at Advanced Electronics Company,Riyadh, Saudi Arabia. Later in January 2003, he joined the company asa design engineer, where he was involved in embedded controllerprogramming, real-time instrumentation algorithms, SCADA systems,and telemetry solutions development and installation for the industrialand power systems installations. His main research interests includeanalog and digital signal processing, computer arithmetic circuits,electronic biosensors, and physical modeling of devices. He is amember of the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


High Radix Multiplier Dividers

Documents

Transcript of High Radix Multiplier Dividers