Multiplication for 2’s Complement System – Booth...

ECE152B AU 1

Multiplication for 2’s Complement System – Booth Algorithm

Consider an unsigned five bit number:B = B4B3B2B1B0

= B4×16+ B3×8+ B2×4+ B1×2+ B0×1For a 2’s complement number:B 2’s comp=B4×(−16)+ B3×8+ B2×4+ B1×2+ B0×1

which can be re-expressed as:B2’s comp= (−16)×B4 + (16−8)×B3 + (8−4)×B2 +

(4−2) × B1 + (2−1) ×B0

= -16×(B4 − B3)−8×(B3 − B2)−4×(B2 −B1)− 2×(B1 − B0)−1×(B0 − 0)

The value in parentheses is difference of two consecutive bits, which could be +1,0, or -1.

ECE152B AU 2

Example: Use the Booth’s algorithm recoding scheme to perform the multiplication:

2510×−1910

0 1 1 0 0 1 -- A=2510 multiplicant1 0 1 1 0 1 -- B=-1910 multiplierB5B4B3B2B1B0

P = A×B = −32×(B5−B4)×A − 16×(B4−B3)×A− 8×(B3−B2)×A − 4×(B2−B1)×A − 2×(B1−B0)×A −1×(B0−0)×A

= −32×Α+16×Α+0×Α−4×Α+2×Α−1×Α

ECE152B AU 3

8-bit register

8-bit ALU

8-bit register 8-bit register

S1

S3

D7 ….………… D1

7

8

8

7

8

1

7

4

Multiplicand

prod_clr

prod_clk D0D7 ……… D1 D0

7

D7

8-bit shiftregister

80

PIER_CLK

PIER_LD

Multiplier

PCAND_CLK

D7 ……… D1 D0 1-bitFF

QD

Combologic

D0Bi Bi-1

ECE152B AU 4

Plier Bits ALU control Set

B1-H B0-H S3-H S2-H S1-H S0-H Function

0 0 0 0 0 0 pass product value

0 1 1 0 0 1 product plus multiplicand

1 0 0 1 1 0 product minus multiplicand

1 1 0 0 0 0 pass product value

The logic for ALU Select lines is implemented by two NAND gates.

ECE152B AU 5

Multiplication

A3 A2 A1 A0

× B3 B2 B1 B0

R0,3 R0,2 R0,1 R0,0

R1,3 R1,2 R1,1 R1,0

R2,3 R2,2 R2,1 R2,0

R3,3 R3,2 R3,1 R3,0

Sum of partial products

ECE152B AU 6

Using adders to add rows

ECE152B AU 7

Multiplication Using Adders

A3 A2 A1 A0

× B3 B2 B1 B0

R0,3 R0,2 R0,1 R0,0

R1,3 R1,2 R1,1 R1,0

R2,3 R2,2 R2,1 R2,0

R3,3 R3,2 R3,1 R3,0


1st level adder

ECE152B AU 8


A3 A2 A1 A0

× B3 B2 B1 B0

R0,3 R0,2 R0,1 R0,0

R1,3 R1,2 R1,1 R1,0

R2,3 R2,2 R2,1 R2,0

R3,3 R3,2 R3,1 R3,0


2nd level adder

ECE152B AU 9


A3 A2 A1 A0

× B3 B2 B1 B0

R0,3 R0,2 R0,1 R0,0

R1,3 R1,2 R1,1 R1,0

R2,3 R2,2 R2,1 R2,0

R3,3 R3,2 R3,1 R3,0


3rd leveladder

ECE152B AU 10

Row Reduction Method for Multiplication

A3 A2 A1 A0

× B3 B2 B1 B0

R0,3 R0,2 R0,1 R0,0

R1,3 R1,2 R1,1 R1,0

R2,3 R2,2 R2,1 R2,0

R3,3 R3,2 R3,1 R3,0


ECE152B AU 11

Using Carry Save Adders

ECE152B AU 12


A3 A2 A1 A0

× B3 B2 B1 B0

R0,3 R0,2 R0,1 R0,0

R1,3 R1,2 R1,1 R1,0

R2,3 R2,2 R2,1 R2,0

R3,3 R3,2 R3,1 R3,0


1st level adder (row-reduction unit)

ECE152B AU 13


A3 A2 A1 A0

× B3 B2 B1 B0

R0,3 R0,2 R0,1 R0,0

R1,3 R1,2 R1,1 R1,0

R2,3 R2,2 R2,1 R2,0

F5 F4 F3 F2 F1 F0C5 C4 C3 C2 C1 C0

1st level adder (row-reduction unit)

Outputs of 1st

level adder

ECE152B AU 14

Row Reduction Method for MultiplicationA3 A2 A1 A0

× B3 B2 B1 B0R0,3 R0,2 R0,1 R0,0

R1,3 R1,2 R1,1 R1,0R2,3 R2,2 R2,1 R2,0F5 F4 F3 F2 F1 F0

C5 C4 C3 C2 C1 C0R3,3 R3,2 R3,1 R3,0F6 F5 F4 F3 F2 F1 F0

C6 C5 C4 C3 C2 C1 C0

2nd level adder (row-reduction unit)

Outputs of 2nd level adder

Use a regular adder to add these two rows

ECE152B AU 15

Generalized row reduction method:

MULTIPLICAND

MULTIPLIER×

IntermediatePartial

Product SumsFormed in

Parallelthen Summed

PartialProduct

Array

PRODUCT

ECE152B AU 16

Example: design a high speed multiplier• 56×56 bit• longest available row reduction unit: 15-4• the final stage is a LACA with 8-bit basic adders.

15-4

15-4

15-4

15-4

15-47-3 3-2 FA

PRODUCT56

PARTIALPRODUCT

ROWS

ECE152B AU 17

DELAY: 1 + 2 × T15-4 rru + T7-3 rru + T3-2rru + TLACA

toform partialproduct

rowreductionunit

2+4×(⎡log8(112) ⎤-1)= 2+4×(3-1)= 10 gate delay

ECE152B AU 18

Multiplication with Sectioning

Design an 8×8 multiplier using 4×4 multipliers.

X3 X2 X1 X0

× Y3 Y2 Y1 Y0

R0,3 R0,2 R0,1 R0,0

R1,3 R1,2 R1,1 R1,0

R2,3 R2,2 R2,1 R2,0

R3,3 R3,2 R3,1 R3,0

P7 P6 P5 P4 P3 P2 P1 P0

ECE152B AU 19

X7 X6 X5 X4 X3 X2 X1 X0

× Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0

R0,7 R0,6 R0,5 R0,4 R0,3 R0,2 R0,1 R0,0

R1,7 R1,6 R1,5 R1,4 R1,3 R1,2 R1,1 R1,0

R R R R R R R R P1 =R R R R R R R R X3-0× Y3-0

R R R R R R R RR R R R R R R R

R R R R R R R RR R R R R R R R P2 = X7-4xY3-0

P1,7 P1,6 P1,5 P1,4 P1,3 P1,2 P1,1 P1,0

P2,7 P2,6 P2,5 P2,4 P2,3 P2,2 P2,1 P2,0

P3,7 P3,6 P3,5 P3,4 P3,3 P3,2 P3,1 P3,0

P4,7 P4,6 P4,5 P4,4 P4,3 P4,2 P4,1 P4,0

P3

P4

ECE152B AU 20

ECE152B AU 21

Division

DD = Q × DS + R

dividend quotient divisor remainder

The most straightforward method is to mimic the operations of paper-and-pencil long division for positive numbers.

ECE152B AU 22

Example: 1011 Quotient, Q Divisor,DS 101) 111010 ←dividend, DD

101 Q3×Ds 10010 R>Ds, continue 000 Q2×Ds, shifted 10010 R>Ds, continue 101 Q1×Ds, shifted 1000 R>Ds, continue 101 Q0×Ds, shifted 11 R<Ds, done

ECE152B AU 23

A block diagram for such a divider:

bitfrom

controlDS R Q

INPUT INPUT

ALU(subtract) Termination: Quotient in Q

Remainder in R

The division process involves repetitive shifts and subtraction operations.

ECE152B AU 24

Load Ds, Qclear Cout

clear R

begin

shift Q,R left one bit

inc count

isR-Ds

positive ?

prepare 0for Q reg

iscount

N?

prepare 1for Q reg

R=R-Ds

shift Q,Rleft one bit

Yes

Yes

No

No Done

DS R Q

INPUT INPUT

ALU(subtract)

ECE152B AU 25

Fig 6.10 Parallel Array Divider

R := (c → D:¬c → (D-d-bi) mod 2):

Borrow alwayscomputed

d1

q1

q2

0

0

0qm

r1 r2 rm

D1 d2 D2 dm Dm Dm+1 D2m

D

R

d

d

bo

c

bi

c

ECE152B AU 26

Division by Repeated Multiplication

Cost-effective if system contains high-speed multiplierQ = D D/ DS

• In each iteration, a factor fi is generated & used to multiply both divisor DS and dividend DD.• Q= (DD×f0×f1×f2 … )/(DS×f0×f1×f2 …)

• fi is so chosen that DS×f0×f1×f2 … converges rapidly toward 1.

• If the denominator converges toward 1, the numerator converges toward Q.

ECE152B AU 27

For simplicity, assume DD & DS are positive normalized fraction: DS=1-x where x<1.Set f0 = 1+x => DS×f0=1-x2 (closer to 1 than DS)=> Q= (DD×(1+x) )/(1-x2 )

Set f1 =1+x2

=> DS × f0× f1 =1-x4 (even closer to 1)=> Q= (DD×(1+x) ×(1+ x2) )/(1-x4 )

f0 =1+x = 1+(1- DS) = 2-DS (2’s complement of DS)f1= 1+x2 = 1+(1-DS×f0) = 2- DS×f0 = 2-DS0

=>fi =2- DS×f0 ×… fi-1 =2-DS(i-1)

ECE152B AU 28

Example: (1). 0.4/0.7:DD0 0.4000000 DS0 0.7000000 f0 1.3000000DD1 0.5200000 DS1 0.9099999 f1 1.0900000DD2 0.5668000 DS2 0.9918999 f2 1.0081000DD3 0.5713911 DS3 0.9999344 f3 1.0000656DD4 0.5714286 DS4 0.9999999 f4 1.0000000DD5 0.5714286 DS5 1.0000000 f5 1.0000000DD6 0.5714286 DS6 1.0000000

ECE152B AU 29

(2). 0.7/0.4:DD0 0.7000000 DS0 0.4000000 f0 1.5999999DD1 1.1199999 DS1 0.6400000 f1 1.3599999DD2 1.5231999 DS2 0.8704000 f2 1.1295999DD3 1.7206066 DS3 0.9832038 f3 1.0002821DD4 1.7495062 DS4 0.9997178 f4 1.0002821DD5 1.7499998 DS5 0.9999999 f5 1.0000001DD6 1.7499999 DS6 1.0000000

(3). 0.1/0.15:DD0 0.1000000 DS0 0.1500000 f0 1.8499999DD1 0.1850000 DS1 0.2775000 f1 1.7224999DD2 0.3186625 DS2 0.4779938 f2 1.5220062DD3 0.4850063 DS3 0.7275094 f3 1.2724905DD4 0.6171659 DS4 0.9257489 f4 1.0742511DD5 0.6629912 DS5 0.9944868 f5 1.0055132DD6 0.6666464 DS6 0.9999696

ECE152B AU 30

The # of iterations required is determined by the value of DS

It’s better to use a fixed # of iterations• To assure that the process converges to the

correct answer for all data, instead of using 2-DS to calculate f0, use a ROM to find an appropriate value for f0.

• It can then guarantee correct results after a fixed # of iterations.

ECE152B AU 31

Suppose ROM has 28 words(a) If DS is 8-bit, one iteration is sufficient

=> f0 = 1/DS

(b) If DS is > 8-bit, more than one iteration is required, DS• f0=1-x & x< 2-8

• At the 2nd iteration, Ds • f0 • f1= 1-x2

the difference from 1 is <2-16

• At the ith iteration (i>2)Ds•f0•f1•…•fi-1= Dsi-1 = 1-x2(i-1)

the difference from 1 is < (2-8)2(i-1)

(3rd iteration error < 2-32)(4th iteration error < 2-64)

ECE152B AU 32

mult mult mult

mult mult

ROMf0 2’s

compf1 2’s

compf2

DD

DS

Q

ECE152B AU 33

Fig 6.14 Floating-PointNumber Format

s is sign, e is exponent, and f is significand(mantissa)We will assume a fraction mantissa, but some representations have used integers

s

Sign

e f

me

1 + me + mf = m, Value(s, e, f ) = (– 1)s × f × 2e

m bits

1 mf

Exponent Fraction

ECE152B AU 34

Floating Point Arithmetic

Floating point additionThe difficulty when adding two floating point numbers stems from the fact that the mantissas, in general, have different significance.A = B+C

= MB × rSEB + MC× rS

Ec

Before the two numbers can be properly added together, the mantissas must be aligned.A= (MB × rS

EB-Ec+MC)× rSEc (assume |B|<|C|)

ECE152B AU 35

This involves determining which operand value is smaller, and then aligning the mantissa of that operand appropriately with the mantissa of the larger operand.The alignment is accomplished by shifting the mantissa of the smaller operand to line up with the digits of the same significance in the larger operand.The amount of the alignment, i.e. the # of positions to shift, is determined by the difference in the exponents.

ECE152B AU 36

A block diagram:

Exponent B Exponent C Mantissa B Mantissa C

Exponent Compare

Exponent Adjust

Add/Subtract

Post Normalization

Result Exponent Result Mantissa

Select Selectand align

ECE152B AU 37

The selection of the appropriate mantissa to be aligned is made based on a comparison of the magnitude of the two exponents.The resulting number of the addition/subtraction is provided to the “Post-Normalization” unit.Examples:

0.8045 Input A is normalized0.7133 Input B is normalized1.5177 Result is not normalized

+

0.80450.80320.0012

–

ECE152B AU 38

The post normalization unit must be capable of shifts of one or more positions for the mantissa and adjust the size of the exponent to reflect the normalization.Floating point addition, then, requires many more operations, and hence more hardware, than its integer counterpart.

ECE152B AU 39

Floating-point adder of IBM system/360 Model 91

E1 E2 M1 M2

Shifter 1Adder 1

Adder 2

Adder 3

E3

Zero digitchecker

R

Shifter 2

M3

Input Bus

Output Bus

ExponentComparison and

MantissaAlignment

Mantissaaddition-subtraction

Resultnormalization

E1-E2

ECE152B AU 40

Design a network to align the smaller mantissa to be added to the larger mantissa.Assume that the mantissa is 24 bitsThe alignment network must be capable of shifting any number of bits, from 0 to 24 (shift left).Assume the adders used to compare the exponents provide a binary number (size: 0 to 24; hence 5 bits S4S3S2S1S0) which indicates how far the number needs to be shifted in the alignment process.

ECE152B AU 41

Fig 6.11 A N × N Bit Crossbar Design for Barrel Rotator

S h iftco u n t

Dec

ode

r

y 0

x 0

x 1

x 2

x - in pu ty - o u tp u t

x 3

x 4

x 5

y1 y 2 y3 y 4 y5

ECE152B AU 42

Properties of the Crossbar Barrel Shifter

There is a 2-gate delay for any length shiftEach output line is effectively an n way multiplexer for shifts of up to n bitsThere are n2 3-state drivers for an n bit shifterFor n = 32, this means 1024 3-state drivers

For 32 bits, the decoder is 5 bits (1 out of 32)The minimum delay but large number of gates in the crossbar prompts a compromise:

the logarithmic barrel shifter

ECE152B AU 43

Shift count

Input word

Output word

x0 x1 x2 x29 x30 x31

One shift/bypass cell

Shift/bypass

y0 y1 y2 y29 y30 y31

Bypass/shift 1 bit right

Bypass/shift 2 bits right




s4 s3 s2 s1 s0

Logarithmic Barrel Shifter

ECE152B AU 44

The LSB of this number is used by the first level of MUXs to shift the number by 1 bit (the 1 condition), or provide no shift at all (the 0 condition).Similarly, the second LSB controls the 2nd set of MUXs to shift the number by 2 more or not to shift.Similarly, the MSB controls the 5th set of MUXsto shift the number by 16 more or not to shift.

ECE152B AU 45

Floating Point MultiplicationA = B × C

= MB × rSEBxMC× rS

Ec

= MB × MC × rSEB+Ec


Exponent Add

Exponent Adjust

Multiply

Post Normalization

Result Exponent Result Mantissa

ECE152B AU 46

Post normalization unit only needs to shift the result by at most one bit position.Consider two extreme cases:

Largest / LargestBase 2 Base 10

0.1111 0.9999× 0.1111 × 0.9999

0.1110 0.9998 A ligned proper ly,=>no postnormalization.

Smallest / smallestBase 2 Base 10

0.1000 0.1000× 0.1000 × 0.1000 N ot aligned proper ly,

0.0100 0.0100 =>postnormalization ofone digit position.

ECE152B AU 47

Floating Point divisionA = B /C

= MB × rSEB/(MC× rS

Ec)= (MB / MC) × rS

EB-Ec


Exponent subtract

Exponent Adjust

Divide

Post-Normalization

ECE152B AU 48

The result of the mantissa division may require post-normalization by at most one bit position in opposite direction of the multiplier.

Largest / smallestBase 2 Base 10

0.1111 0.9999÷ 0.1000 ÷ 0.1000 N ot aligned proper ly,

1.1110 9.9990 =>postnormalization isrequired.

Smallest / LargestBase 2 Base 10

0.1000 0.1000÷ 0.1111 ÷ 0.9999 A ligned proper ly,

0.1000 0.1000 =>no postnormalization.

Multiplication for 2’s Complement System – Booth...

Documents

Transcript of Multiplication for 2’s Complement System – Booth...