The following report contains the information about the work carried by us during the Fall

2014 at Texas Tech university. The purpose of this project is to create a multiplier using

Booth algorithm in the Verilog language and work done on the cadence.

Booth algorithm is used for Simulation and Development of Digital Multiplier. It is a

powerful algorithm for signed-number multiplication, which treats both positive and negative

numbers uniformly. Booth algorithm uses a small number of additions and shift operations to

do the work of multiplication. This approach uses fewer additions and subtractions than more

straightforward algorithms. This work evaluates the performance of the design in terms of

delay, power and their products by hand with logical efforts through custom design using

Verilog language in Xilinx ISE 14.2 tool.



1. Introduction:

Booth's multiplication algorithm is a multiplication algorithm that multiplies two signed binary

numbers in two's complement notation. The algorithm was invented by Andrew Donald Booth in

1950 while doing research on crystallography at Birkbeck College in Bloomsbury, London. Booth

used desk calculators that were faster at shifting than adding and created the algorithm to increase

their speed. Booth's algorithm is of interest in the study of computer architecture.

Multiplication is more complicated than addition, being implemented by shifting as well as

addition. Multiplication is nothing but addition of partial products generation and accumulation.

Because of the partial products involved in most multiplication algorithms, more time and more

circuit area is required to compute, allocate, and sum the partial products to obtain the multiplication


A Booth multiplier is a hardware multiplier that performs multiplication of two signed (two's

complement) binary numbers (integers). Booth algorithm, which encodes a binary number one bit-

pair at a time to the signed-digit set S = {-2, —1,0,1,2},is often used to encode one of the multiplier

inputs to reduce the number of partial products that need to be added.

Signed multiplication is a careful process. With unsigned multiplication there is no need to take the

sign of the number into consideration. However in signed multiplication the same process cannot be

applied because the signed number is in a 2’s compliment form which would yield an incorrect result

if multiplied in a similar fashion to unsigned multiplication. That’s where Booth’s algorithm comes

in. Booth’s algorithm preserves the sign of the result.

Booth multiplication is a technique that allows for smaller, faster multiplication circuits, by recoding

the numbers that are multiplied. This approach uses fewer additions and subtractions than more

straightforward algorithms.

1.1 Algorithm:

Booth's algorithm examines adjacent pairs of bits of the N-bit multiplier Y in signed two's

complement representation, including an implicit bit below the least significant bit, y-1 = 0. For each

bit yi, for i running from 0 to N-1, the bits yi and yi-1 are considered. Where these two bits are equal,

the product accumulator P is left unchanged. Where yi = 0 and yi-1 = 1, the multiplicand times 2i is

added to P; and where yi = 1 and yi-1 = 0, the multiplicand times 2i is subtracted from P. The final

value of P is the signed product.

The multiplicand and product are not specified; typically, these are both also in two's complement

representation, like the multiplier, but any number system that supports addition and subtraction will

work as well. As stated here, the order of the steps is not determined. Typically, it proceeds from LSB


to MSB, starting at i = 0; the multiplication by 2i is then typically replaced by incremental shifting of

the P accumulator to the right between steps; low bits can be shifted out, and subsequent additions and

subtractions can then be done just on the highest N bits of P.[1] There are many variations and

optimizations on these details.

The algorithm is often described as converting strings of 1's in the multiplier to a high-order +1 and a

low-order –1 at the ends of the string. When a string runs through the MSB, there is no high-order +1,

and the net effect is interpretation as a negative of the appropriate value.

1.2 Implementation:

Booth's algorithm can be implemented by repeatedly adding (with ordinary unsigned binary addition)

one of two predetermined values A and S to a product P, then performing a rightward arithmetic shift

on P. Let m and r be the multiplicand and multiplier, respectively; and let x and y represent the

number of bits in m and r.

Determine the values of A and S, and the initial value of P. All of these numbers should have a length

equal to (x + y + 1).

A: Fill the most significant (leftmost) bits with the value of m. Fill the remaining (y + 1) bits

with zeros.

S: Fill the most significant bits with the value of (−m) in two's complement notation. Fill the

remaining (y + 1) bits with zeros.

P: Fill the most significant x bits with zeros. To the right of this, append the value of r. Fill the

least significant (rightmost) bit with a zero.

Determine the two least significant (rightmost) bits oxf P.

If they are 01, find the value of P + A. Ignore any overflow.

If they are 10, find the value of P + S. Ignore any overflow.

If they are 00, do nothing. Use P directly in the next step.

If they are 11, do nothing. Use P directly in the next step.

Arithmetically shift the value obtained in the 2nd step by a single place to the right. Let P now equal

this new value.Repeat steps 2 and 3 until they have been done y times.

Drop the least significant (rightmost) bit from P. This is the product of m and r.


1.3Flow Chart:


1.4 Example:

We demonstrate the technique by multiplying -8 by 2 using 4 bits for the multiplicand and the


A = 1 1000 0000 0

S = 0 1000 0000 0

P = 0 0000 0010 0

Perform the loop four times :

P = 0 0000 0010 0. The last two bits are 00.

P = 0 0000 0001 0. Right shift.

P = 0 0000 0001 0. The last two bits are 10.

P = 0 1000 0001 0. P = P + S.

P = 0 0100 0000 1. Right shift.

P = 0 0100 0000 1. The last two bits are 01.

P = 1 1100 0000 1. P = P + A.

P = 1 1110 0000 0. Right shift.

P = 1 1110 0000 0. The last two bits are 00.

P = 1 1111 0000 0. Right shift.

The product is 11110000 (after discarding the first and the last bit) which is −16.

2.0 Multiplication of two 4 bit signed binary numbers:

As we discussed the flowchart and an example of booth algorithm, it should now be technically easy

to implement the algorithm for higher bits. For 4 bit signed binary number, the max decimal value is

15 to -15. So whenever we give an input greater than 15, the program will consider its 2’s compliment

and predict it as negative number .Hence the product of two 4 bit signed binary number is a 8 bit


2.0.1 Verilog code for 4 bit Binary number:

module Multi4bit(X,Y,Z);

input signed [3:0] X,Y;

output signed [7:0] Z;

reg signed [7:0] Z;


reg [1:0] temp_check;

integer i;

reg checkBit;

reg [7:0] Y1;

always @ (X,Y)




//Number of shifts is equal to number of bits of operation

for (i=0 ; i<4 ; i=i+1)


temp_check= {X[i],checkBit};

Y1= -Y;


2'd2 : begin

//If temp_check is 10 , substract Y from Z, i.e., add Z and Y1

Z[7:3]= Z[7:3]+Y1;


2'd1 : begin

//If temp_check is 01 , add Y to Z

Z[7:3]= Z[7:3]+Y;


default : begin //If temp_check is 00 or 11 , do nothing



//After add or sub or default case, right shift the Z by 1

Z = Z>>1;

//Restore the sign bit.


Z[7]= Z[6];

//New check bit is equal to current X bit





2.0.2 Test Bench

module tb_Multi4bit;

// Inputs

reg [3:0] X;

reg [3:0] Y;

// Outputs

wire [7:0] Z;

// Instantiate the Unit Under Test (UUT)

Multi8bit uut (



.Z(Z) );

initial begin

// Initialize Inputs

X= 4'd2;

Y= 4'd3;

$monitor ("X=%d, NegX=%d, Y=%d , Z=%d, NegZ=%d",X,-X, Y, Z, -Z );

#50; // Add stimulus here




2.0.3 Results:

Fig:1 4-bit output

2.04 Synthesis Report:

2.0.5 Schematic:

Fig 2. Schematic -4 bit

2.0.6Power calculation:

Fig 3. Power Calculation -4 bit


2.1 Multiplication of two 8 bit signed binary numbers

As we discussed the flowchart and an example of booth algorithm, it should now be technically easy

to implement the algorithm for higher bits. For 8 bit signed binary number, the max decimal value is

127 to -128. So whenever we give an input greater than 127, the program will consider its 2’s

compliment and predict it as negative number. Hence the product of two 8 bit signed binary number is

a 16 bit result.

2.1.1 Verilog code for 8 bit:

module Multi8bit(X,Y,Z);

input signed [7:0] X,Y;

output signed [15:0] Z;

reg signed [15:0] Z;

reg [1:0] temp_check;

integer i;

reg checkBit;

reg [7:0] Y1;

always @ (X,Y)




//Number os shifts is equal to number of bits of operation

for (i=0 ; i<8 ; i=i+1)


temp_check= {X[i],checkBit};

Y1= -Y;


2'd2 : begin

//If temp_check is 10 , substract Y from Z, i.e., add Z and Y1

Z[15:8]= Z[15:8]+Y1;


2'd1 : begin


//If temp_check is 01 , add Y to Z

Z[15:8]= Z[15:8]+Y;


default : begin //If temp_check is 00 or 11 , do nothing



//After add or sub or default case, right shift the Z by 1

Z = Z>>1;

//Restore the sign bit.

Z[15]= Z[14];

//New check bit is equal to current X bit





2.1.2 Test Bench

module tb_Multi8bit;

// Inputs

reg [7:0] X;

reg [7:0] Y;

// Outputs

wire [15:0] Z;

// Instantiate the Unit Under Test (UUT)

Multi8bit uut (






initial begin

// Initialize Inputs

X= 8'd2;

Y= -8'd12;

$monitor ("X=%d, NegX=%d, Y=%d , Z=%d, NegZ=%d",X,-X, Y, Z, -Z );


// Add stimulus here



2.1.3 Test Results:


Fig 4: Output 8-bit

2.1.4 HDL Synthesis Report

Macro Statistics

# Adders/Subtractors : 15

8-bit adder : 15

# Multiplexers : 64

1-bit 4-to-1 multiplexer : 64


2.1.5 Schematic:

Fig 5: Schematic -8 bit

2.1.6 Time delay:

Fig 6 : Total Time delay

The total time delay for the 8 bit multiplier 36.640ns


2.1.7 Total Power:

Fig 7. Total Power

The total power consumed is 34 mW.

2.2 Verilog Code for 16 bit multiplier:

module BoothAlgthm16bit_VCode(X,Y,Z);

input signed [15:0] X,Y;

output signed [31:0] Z;

reg signed [31:0] Z;

reg [1:0] temp_check;

integer i;

reg checkBit;

reg [15:0] Y1;

always @ (X,Y)




//Number os shifts is equal to number of bits of operation

for (i=0 ; i<16 ; i=i+1)


temp_check= {X[i],checkBit};


Y1= -Y;

$monitor ("NegY=%d", Y1);


2'd2 : begin

//If temp_check is 10 , substract Y from Z, i.e., add Z and Y1

Z[31:16]= Z[31:16]+Y1;


2'd1 : begin

//If temp_check is 01 , add Y to Z

Z[31:16]= Z[31:16]+Y;


default : begin

//If temp_check is 00 or 11 , do nothing



//After add or sub or default case, right shift the Z by 1

Z = Z>>1;

//Restore the sign bit.

Z[31]= Z[30];

//New check bit is equal to current X bit





2.2.1 Test Bench

module tb_Multi;

reg [15:0] X;


reg [15:0] Y;

wire [31:0] Z;

// Instantiate the Unit Under Test (UUT)

BoothAlgthm16bit_VCode uut (





initial begin

// Initialize Inputs

X= 16'd2555;

Y= -16'd2;

$monitor ("X=%d, NegX=%d, Y=%d , Z=%d, NegZ=%d",X,-X, Y, Z, -Z );




2.2.2 Test Results:

Fig 8 :16 bit Output


2.2.3 Total Power:

Fig 9.Total Power

The total power consumed is 34mW.

2.2.4 Time dealy

Fig 10.Total Time Delay

The total time delay for the circuit is 61.251 ns.

2.2.5 HDL Synthesis Report

Macro Statistics

# Adders/Subtractors : 31

16-bit adder : 31

# Multiplexers : 256

1-bit 4-to-1 multiplexer : 256


2.3 Total no of modules used:

Module 4 bit 8 bit 16

Adders 7(8bit Adders) 15 (Adders) 31(16bit Adders)

Mux 4:1 19 64 256

Table 2.1 Total number of modules

2.4 Power and Delay Comparison:

Parameter 4 bit 8bit 16 bit

Power (mW) 13.859 34 34

Delay 27 36.64 61.259

Table 2..2 Power and Delay Comparison

2.5 Future Work:

The standard booth algorithm which helps in speeding the multiplication process. For the further

improvement in speed of multiplication, engineers have introduced the modified booth algorithm ,It is

possible to reduce the number of partial products by half, by using the technique of Radix-4 Booth

recoding. The basic idea is that, instead of shifting and adding for every column of the multiplier term

and multiplying by 1 or 0, we only take every second column, and multiply by +/- 1, +/- 2 or 0, to

obtain the same results. Radix-4 booth encoder performs the process of encoding the multiplicand

based on multiplier bits. It will compare 3 bits at a time with overlapping technique.

In Verilog the limitation is calculation of power, so in future it would be better to implement the same

in Cadence in order to have correct power, delay.


3.0 Modules implemented in Cadence:

Following basic modules are implemented using Cadence

1. Inverter

2. Nor

3. Nand

4. Xor

5. Half Adder

6. Full Adder

7. Decoder

8. Multiplexer

9. D- Flipflop

10. Shifter

11. Adder-Subtractor

Nand schematic:

Fig 11. Nand Schematic


Nand Delay:

Fig 12 Nand delay

Delay: 78.4239 p S

Fig 13: .Nand Power

Power: 3.16uW


Nor Schematic:

Fig 14: Nor Schematic

Nor Delay:

Delay: 78.423pS

Fig 15. NOR delay


Nor Power:

Fig16 .Nor power

Power : 6.802uW.

Xor Schematic:

Fig 17. XOR schematic


Xor Output:

Fig 18. XOR output

Half Adder Schematic:

Fig19. Half adder schematic


Half Adder Output:

Fig 20. Half adder output

Full Adder Schematic:

Fig 21. Full Adder schematic


Full Adder Output:

Fig 22. Full Adder output

Multiplexer schematic:

Fig 23.Multiplexer schematic


Multiplexer TB:

Fig 24. Multiplexer TB

Multiplexer Output:

Fig 25. Multiplexer output


Decoder Schematic:

Fig 26. Decoder Schematic

Decoder TB:

Fig 27 : Deocder TB

Decoder Output:

Fig 28: Decoder Output


D-Flip-flop Schematic:

Fig29. D Flip-flop schematic

D-Flip-flop TB:

Fig 30 .D Flipflop TB


Fig 31. D Flipflop output


Adder – Subtractor Schematic:

Fig 31. Adder Subtractor schematic

Adder Subtractor Tb:

Fig 32. Adder Subtractor TB


Adder Subtractor Output:

Fig 33. Adder Subtractor output.

4.0 Conclusion

The result of the above multiplication technique reviewed in this paper is the reduction of the

maximum height of the partial product array, which simplifies the partial product reduction tree, in

terms of delay and regularity of the layout. Considering different technique or design of MAC unit,

pipelining booth multiplication gives good performance in terms of speed and SPST and block

enabling technique are better in low power consumption and area.

5.0 References

1. International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)

2. Digital Design by M. Morris Mano, Michael D. Ciletti, 4th edition.

3. Introduction to VLSI by Doughlas A. Pucknell

4. Fabrizio Lamberti, Nikolaos Andrikos,Elisardo Antelo, Paolo Montuschi, ―Speeding-up

Booth Encoded Multipliers by Reducing the Size of Partial Product Array," INTERNAL



5. Sandeep Shrivastava*, Jaikaran Singh* and Mukesh Tiwari*, ―Implementation of Radix-2

Booth Multiplier and Comparison with Radix-4 Encoder Booth Multiplier,‖ International

Journal on Emerging Technologies 2(1): 14-16(2011) ISSN : 0975-8364

6. Dr. Ravi Shankar Mishra,Prof. Puran Gour,Braj Bihari Soni, ―Design and Implements of

Booth and Robertson‘s multipliers algorithm on FPGA.‖ International Journal of Engineering

Research and Applications (IJERA) ISSN: 2248-9622.