Report
-
Upload
santhosh-vempati -
Category
Documents
-
view
7 -
download
0
description
Transcript of Report
Multiplier using Booth Algorithm
Project Report
Submitted
in the partial fulfillment of the requirements for
the award of ECE -5382
MASTERS
In
Electronics and Computer Engineering
By
Santhosh Kumar Vempati (R11344923)
Yaswanth Popuri (R11358263)
Under the Guidance of
Dr. Tooraj Nikoubin
DEPARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING
TEXAS TECH UNIVERSITY
FALL 2014
ii
ACKNOWLEDGEMENT
To discover, analyse and to present something new is to venture on an
untraded path towards and unexplored destination is an arduous adventure unless one
gets a true torchbearer to show the way. We would have never succeeded in
completing our task without the cooperation, encouragement and help provided to us
by various people. Words are often too less to reveals one’s deep regards. We take
this opportunity to express our profound sense of gratitude and respect to all those
who helped me through the duration of this thesis. We acknowledge with gratitude
and humility our indebtedness to Dr. Tooraj Nikoubin,ECE, Texas Tech University
under whose guidance we had the privilege to complete this projet.. We wish to
express our deep gratitude towards his for providing individual guidance and support
throughout the work.
Santhosh Kumar Vempati R11344923
Yashwanth Popuri R11358263
iii
ABSTRACT
The following report contains the information about the work carried by us during the Fall
2014 at Texas Tech university. The purpose of this project is to create a multiplier using
Booth algorithm in the Verilog language and work done on the cadence.
Booth algorithm is used for Simulation and Development of Digital Multiplier. It is a
powerful algorithm for signed-number multiplication, which treats both positive and negative
numbers uniformly. Booth algorithm uses a small number of additions and shift operations to
do the work of multiplication. This approach uses fewer additions and subtractions than more
straightforward algorithms. This work evaluates the performance of the design in terms of
delay, power and their products by hand with logical efforts through custom design using
Verilog language in Xilinx ISE 14.2 tool.
iv
INDEX 1
Acknowledgement ii
Abstract iii
Index 1 iv
Index 2: List Of Tables vi
Index 3 : List Of Figures vii
1. Introduction
1.1 Algorithm
1.2 Implementation
1.3 Flow Chart
1.4 Example
2.0 Multiplication of two 4bit signed numbers
2.0.1 Verilog Code for 4bit binary numbers
2.0.2 Test bench
2.0.3 Results
2.0.4 Synthesis Report
2.0.5 Schematic
2.0.6 Power Calculation
2.1 Multiplication of two 4bit signed numbers
2.1.1 Verilog Code for 4bit binary numbers
2.1.2 Test bench
2.1.3 Results
2.1.4 Synthesis Report
2.1.5 Schematic
v
2.1.6 Power Calculation
2.1.7 Delay
2.2 Multiplication of two 16 bit signed numbers
2.1.1 Verilog Code for 4bit binary numbers
2.2.2 Test bench
2.2.3 Results
2.2.4 Synthesis Report
2.2.5 Schematic
2.2.6 Power Calculation
2.2.7 Delay
2.3 Total number of modules used
2.4 Power delay comparison
2.5 Future Work
3.0 Modules Implemented in Cadence
vi
INDEX-II
LIST OF FIGURES
S.No FIGURE No TITLE PAGE No
1 Figure 1 4-bit Output 1
2 Figure 2 4-bit Schematic 18
3 Figure 3 Power Calculation 4 bit 18
4 Figure 4 8-bit Output 21
5 Figure 5 8-bit Schematic 22
6 Figure 6 Total Time delay 22
7 Figure 7 Total Power 23
8 Figure 8 16-bit Output 25
9 Figure 9 Total Power – 16bit 26
10 Figure 10 Time Delay – 16bit 26
11 Figure 11 Nand Schematic 28
12 Figure 12 Nand delay 29
13 Figure 13 Nand Power 29
14 Figure 14 Nor Schematic 30
15 Figure 15 NOR delay 30
16 Figure 16 XOR schematic 31
17 Figure 17 XOR output 31
18 Figure 18 Half adders schematic 32
19 Figure 19 Half adder output 32
20 Figure 20 Full Adder schematic 33
vii
21 Figure 21 Full Adder output 33
22 Figure 22 Multiplexer schematic 34
23 Figure 23 Multiplexer TB 34
24 Figure 24 Multiplexer Output 34
25 Figure 25 Decoder Schematic 35
26 Figure 26 Deocder TB 35
27 Figure 27 Decoder Output 35
28 Figure 28 D Flip-flop schematic 36
29 Figure 29 D Flip-flopr TB 36
30 Figure 30 D Flipflop output 36
31 Figure 31 Adder Subtractor schematic 37
32 Figure 32 Adder Subtractor TB 37
33 Figure 33 Adder Subtractor output 38
viii
INDEX-III
LIST OF TABLES
S.No TABLE No TITLE PAGE No
1 Table 2.1 Total number of modules 27
2 Table 2.2 Power and Delay Comparison 27
1
1. Introduction:
Booth's multiplication algorithm is a multiplication algorithm that multiplies two signed binary
numbers in two's complement notation. The algorithm was invented by Andrew Donald Booth in
1950 while doing research on crystallography at Birkbeck College in Bloomsbury, London. Booth
used desk calculators that were faster at shifting than adding and created the algorithm to increase
their speed. Booth's algorithm is of interest in the study of computer architecture.
Multiplication is more complicated than addition, being implemented by shifting as well as
addition. Multiplication is nothing but addition of partial products generation and accumulation.
Because of the partial products involved in most multiplication algorithms, more time and more
circuit area is required to compute, allocate, and sum the partial products to obtain the multiplication
result.
A Booth multiplier is a hardware multiplier that performs multiplication of two signed (two's
complement) binary numbers (integers). Booth algorithm, which encodes a binary number one bit-
pair at a time to the signed-digit set S = {-2, —1,0,1,2},is often used to encode one of the multiplier
inputs to reduce the number of partial products that need to be added.
Signed multiplication is a careful process. With unsigned multiplication there is no need to take the
sign of the number into consideration. However in signed multiplication the same process cannot be
applied because the signed number is in a 2’s compliment form which would yield an incorrect result
if multiplied in a similar fashion to unsigned multiplication. That’s where Booth’s algorithm comes
in. Booth’s algorithm preserves the sign of the result.
Booth multiplication is a technique that allows for smaller, faster multiplication circuits, by recoding
the numbers that are multiplied. This approach uses fewer additions and subtractions than more
straightforward algorithms.
1.1 Algorithm:
Booth's algorithm examines adjacent pairs of bits of the N-bit multiplier Y in signed two's
complement representation, including an implicit bit below the least significant bit, y-1 = 0. For each
bit yi, for i running from 0 to N-1, the bits yi and yi-1 are considered. Where these two bits are equal,
the product accumulator P is left unchanged. Where yi = 0 and yi-1 = 1, the multiplicand times 2i is
added to P; and where yi = 1 and yi-1 = 0, the multiplicand times 2i is subtracted from P. The final
value of P is the signed product.
The multiplicand and product are not specified; typically, these are both also in two's complement
representation, like the multiplier, but any number system that supports addition and subtraction will
work as well. As stated here, the order of the steps is not determined. Typically, it proceeds from LSB
2
to MSB, starting at i = 0; the multiplication by 2i is then typically replaced by incremental shifting of
the P accumulator to the right between steps; low bits can be shifted out, and subsequent additions and
subtractions can then be done just on the highest N bits of P.[1] There are many variations and
optimizations on these details.
The algorithm is often described as converting strings of 1's in the multiplier to a high-order +1 and a
low-order –1 at the ends of the string. When a string runs through the MSB, there is no high-order +1,
and the net effect is interpretation as a negative of the appropriate value.
1.2 Implementation:
Booth's algorithm can be implemented by repeatedly adding (with ordinary unsigned binary addition)
one of two predetermined values A and S to a product P, then performing a rightward arithmetic shift
on P. Let m and r be the multiplicand and multiplier, respectively; and let x and y represent the
number of bits in m and r.
Determine the values of A and S, and the initial value of P. All of these numbers should have a length
equal to (x + y + 1).
A: Fill the most significant (leftmost) bits with the value of m. Fill the remaining (y + 1) bits
with zeros.
S: Fill the most significant bits with the value of (−m) in two's complement notation. Fill the
remaining (y + 1) bits with zeros.
P: Fill the most significant x bits with zeros. To the right of this, append the value of r. Fill the
least significant (rightmost) bit with a zero.
Determine the two least significant (rightmost) bits oxf P.
If they are 01, find the value of P + A. Ignore any overflow.
If they are 10, find the value of P + S. Ignore any overflow.
If they are 00, do nothing. Use P directly in the next step.
If they are 11, do nothing. Use P directly in the next step.
Arithmetically shift the value obtained in the 2nd step by a single place to the right. Let P now equal
this new value.Repeat steps 2 and 3 until they have been done y times.
Drop the least significant (rightmost) bit from P. This is the product of m and r.
4
1.4 Example:
We demonstrate the technique by multiplying -8 by 2 using 4 bits for the multiplicand and the
multiplier:
A = 1 1000 0000 0
S = 0 1000 0000 0
P = 0 0000 0010 0
Perform the loop four times :
P = 0 0000 0010 0. The last two bits are 00.
P = 0 0000 0001 0. Right shift.
P = 0 0000 0001 0. The last two bits are 10.
P = 0 1000 0001 0. P = P + S.
P = 0 0100 0000 1. Right shift.
P = 0 0100 0000 1. The last two bits are 01.
P = 1 1100 0000 1. P = P + A.
P = 1 1110 0000 0. Right shift.
P = 1 1110 0000 0. The last two bits are 00.
P = 1 1111 0000 0. Right shift.
The product is 11110000 (after discarding the first and the last bit) which is −16.
2.0 Multiplication of two 4 bit signed binary numbers:
As we discussed the flowchart and an example of booth algorithm, it should now be technically easy
to implement the algorithm for higher bits. For 4 bit signed binary number, the max decimal value is
15 to -15. So whenever we give an input greater than 15, the program will consider its 2’s compliment
and predict it as negative number .Hence the product of two 4 bit signed binary number is a 8 bit
result.
2.0.1 Verilog code for 4 bit Binary number:
module Multi4bit(X,Y,Z);
input signed [3:0] X,Y;
output signed [7:0] Z;
reg signed [7:0] Z;
5
reg [1:0] temp_check;
integer i;
reg checkBit;
reg [7:0] Y1;
always @ (X,Y)
begin
Z=8'd0;
checkBit=1'd0;
//Number of shifts is equal to number of bits of operation
for (i=0 ; i<4 ; i=i+1)
begin
temp_check= {X[i],checkBit};
Y1= -Y;
case(temp_check)
2'd2 : begin
//If temp_check is 10 , substract Y from Z, i.e., add Z and Y1
Z[7:3]= Z[7:3]+Y1;
end
2'd1 : begin
//If temp_check is 01 , add Y to Z
Z[7:3]= Z[7:3]+Y;
end
default : begin //If temp_check is 00 or 11 , do nothing
end
endcase
//After add or sub or default case, right shift the Z by 1
Z = Z>>1;
//Restore the sign bit.
6
Z[7]= Z[6];
//New check bit is equal to current X bit
checkBit=X[i];
end
end
endmodule
2.0.2 Test Bench
module tb_Multi4bit;
// Inputs
reg [3:0] X;
reg [3:0] Y;
// Outputs
wire [7:0] Z;
// Instantiate the Unit Under Test (UUT)
Multi8bit uut (
.X(X),
.Y(Y),
.Z(Z) );
initial begin
// Initialize Inputs
X= 4'd2;
Y= 4'd3;
$monitor ("X=%d, NegX=%d, Y=%d , Z=%d, NegZ=%d",X,-X, Y, Z, -Z );
#50; // Add stimulus here
end
endmodule
7
2.0.3 Results:
Fig:1 4-bit output
2.04 Synthesis Report:
Release 14.2 - xst P.28xd (nt)
Copyright (c) 1995-2012 Xilinx, Inc. All rights reserved.
--> Parameter TMPDIR set to xst/projnav.tmp
Total REAL time to Xst completion: 0.00 secs
Total CPU time to Xst completion: 0.12 secs
--> Parameter xsthdpdir set to xst
Total REAL time to Xst completion: 0.00 secs
Total CPU time to Xst completion: 0.12 secs
--> Reading design: Multi4bit.prj
TABLE OF CONTENTS
1) Synthesis Options Summary
2) HDL Compilation
3) Design Hierarchy Analysis
4) HDL Analysis
5) HDL Synthesis
8
5.1) HDL Synthesis Report
6) Advanced HDL Synthesis
6.1) Advanced HDL Synthesis Report
7) Low Level Synthesis
8) Partition Report
9) Final Report
9.1) Device utilization summary
9.2) Partition Resource Summary
9.3) TIMING REPORT
========================================================================
=
* Synthesis Options Summary *
========================================================================
=
---- Source Parameters
Input File Name : "Multi4bit.prj"
Input Format : mixed
Ignore Synthesis Constraint File : NO
---- Target Parameters
Output File Name : "Multi4bit"
Output Format : NGC
Target Device : xc3s100e-4-vq100
---- Source Options
Top Module Name : Multi4bit
Automatic FSM Extraction : YES
FSM Encoding Algorithm : Auto
Safe Implementation : No
FSM Style : LUT
9
RAM Extraction : Yes
RAM Style : Auto
ROM Extraction : Yes
Mux Style : Auto
Decoder Extraction : YES
Priority Encoder Extraction : Yes
Shift Register Extraction : YES
Logical Shifter Extraction : YES
XOR Collapsing : YES
ROM Style : Auto
Mux Extraction : Yes
Resource Sharing : YES
Asynchronous To Synchronous : NO
Multiplier Style : Auto
Automatic Register Balancing : No
---- Target Options
Add IO Buffers : YES
Global Maximum Fanout : 500
Add Generic Clock Buffer(BUFG) : 24
Register Duplication : YES
Slice Packing : YES
Optimize Instantiated Primitives : NO
Use Clock Enable : Yes
Use Synchronous Set : Yes
Use Synchronous Reset : Yes
Pack IO Registers into IOBs : Auto
Equivalent register Removal : YES
10
---- General Options
Optimization Goal : Speed
Optimization Effort : 1
Keep Hierarchy : No
Netlist Hierarchy : As_Optimized
RTL Output : Yes
Global Optimization : AllClockNets
Read Cores : YES
Write Timing Constraints : NO
Cross Clock Analysis : NO
Hierarchy Separator : /
Bus Delimiter : <>
Case Specifier : Maintain
Slice Utilization Ratio : 100
BRAM Utilization Ratio : 100
Verilog 2001 : YES
Auto BRAM Packing : NO
Slice Utilization Ratio Delta : 5
========================================================================
=
* HDL Compilation *
========================================================================
=
Compiling verilog file "bit_4.v" in library work
Module <Multi4bit> compiled
No errors in compilation
Analysis of file <"Multi4bit.prj"> succeeded.
11
========================================================================
=
* Design Hierarchy Analysis *
========================================================================
=
Analyzing hierarchy for module <Multi4bit> in library <work>.
========================================================================
=
* HDL Analysis *
========================================================================
=
Analyzing top module <Multi4bit>.
Module <Multi4bit> is correct for synthesis.
========================================================================
=
* HDL Synthesis *
========================================================================
=
Performing bidirectional port resolution...
Synthesizing Unit <Multi4bit>.
Related source file is "bit_4.v".
WARNING:Xst:646 - Signal <temp_check> is assigned but never used. This unconnected signal will
be trimmed during the optimization process.
WARNING:Xst:646 - Signal <checkBit> is assigned but never used. This unconnected signal will be
trimmed during the optimization process.
WARNING:Xst:646 - Signal <Y1> is assigned but never used. This unconnected signal will be
trimmed during the optimization process.
Found 5-bit adder for signal <$add0000> created at line 41.
12
Found 5-bit adder for signal <$add0001> created at line 45.
Found 5-bit adder for signal <$add0002> created at line 41.
Found 5-bit adder for signal <$add0003> created at line 45.
Found 5-bit adder for signal <$add0004> created at line 41.
Found 5-bit adder for signal <$add0005> created at line 45.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0000> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0001> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0002> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0003> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0004> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0005> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0006> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0007> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0008> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0009> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0010> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0011> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0012> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0013> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0014> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0015> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0016> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0017> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0018> created at line 38.
Found 1-bit 4-to-1 multiplexer for signal <Z$mux0019> created at line 38.
Summary:
inferred 7 Adder/Subtractor(s).
inferred 20 Multiplexer(s).
13
Unit <Multi4bit> synthesized.
========================================================================
=
HDL Synthesis Report
Macro Statistics
# Adders/Subtractors : 7
5-bit adder : 6
8-bit adder : 1
# Multiplexers : 20
1-bit 4-to-1 multiplexer : 20
========================================================================
=
* Advanced HDL Synthesis *
========================================================================
Advanced HDL Synthesis Report
Macro Statistics
# Adders/Subtractors : 7
5-bit adder : 7
# Multiplexers : 19
1-bit 4-to-1 multiplexer : 19
Optimizing unit <Multi4bit> ...
Mapping all equations...
Building and optimizing final netlist ...
Found area constraint ratio of 100 (+ 5) on block Multi4bit, actual ratio is 5.
========================================================================
=
* Final Report *
========================================================================
=
14
Final Results
RTL Top Level Output File Name : Multi4bit.ngr
Top Level Output File Name : Multi4bit
Output Format : NGC
Optimization Goal : Speed
Keep Hierarchy : No
Design Statistics
# IOs : 16
Cell Usage :
# BELS : 120
# GND : 1
# LUT2 : 8
# LUT3 : 24
# LUT4 : 42
# MULT_AND : 4
# MUXCY : 12
# MUXF5 : 14
# XORCY : 15
# IO Buffers : 16
# IBUF : 8
# OBUF : 8
========================================================================
=
Device utilization summary:
---------------------------
Selected Device : 3s100evq100-4
15
Number of Slices: 42 out of 960 4%
Number of 4 input LUTs: 74 out of 1920 3%
Number of IOs: 16
Number of bonded IOBs: 16 out of 66 24%
---------------------------
Partition Resource Summary:
---------------------------
No Partitions were found in this design.
---------------------------
========================================================================
=
TIMING REPORT
NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE.
FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
GENERATED AFTER PLACE-and-ROUTE.
Clock Information:
------------------
No clock signals found in this design
Asynchronous Control Signals Information:
----------------------------------------
No asynchronous control signals found in this design
16
Timing Summary:
---------------
Speed Grade: -4
Minimum period: No path found
Minimum input arrival time before clock: No path found
Maximum output required time after clock: No path found
Maximum combinational path delay: 22.571ns
Timing Detail:
--------------
All values displayed in nanoseconds (ns)
========================================================================
=
Timing constraint: Default path analysis
Total number of paths / destination ports: 25181 / 8
-------------------------------------------------------------------------
Delay: 22.571ns (Levels of Logic = 19)
Source: Y<1> (PAD)
Destination: Z<7> (PAD)
Data Path: Y<1> to Z<7>
Gate Net
Cell:in->out fanout Delay Delay Logical Name (Net Name)
---------------------------------------- ------------
IBUF:I->O 16 1.218 1.209 Y_1_IBUF (Y_1_IBUF)
LUT2:I0->O 3 0.704 0.531 Madd__old_Y1_2_Madd_xor<1>11 (Z_mux0003_mand)
17
MULT_AND:I1->LO 0 0.741 0.000 Z_mux0003_mand (Z_mux0003_mand1)
MUXCY:DI->O 1 0.888 0.000 Madd__add0000_cy<0> (Madd__add0000_cy<0>)
XORCY:CI->O 1 0.804 0.424 Madd__add0000_xor<1> (_add0000<1>)
LUT4:I3->O 3 0.704 0.535 Mmux_Z_mux000221 (X<0>_mmx_out12)
LUT4:I3->O 2 0.704 0.622 Mmux_Z_mux000841 (Z_mux0008)
LUT2:I0->O 1 0.704 0.000 Madd__add0002_lut<0> (Madd__add0002_lut<0>)
MUXCY:S->O 1 0.464 0.000 Madd__add0002_cy<0> (Madd__add0002_cy<0>)
MUXCY:CI->O 1 0.059 0.000 Madd__add0002_cy<1> (Madd__add0002_cy<1>)
XORCY:CI->O 1 0.804 0.420 Madd__add0002_xor<2> (_add0002<2>)
MUXF5:S->O 4 0.739 0.622 Mmux_Z_mux00055_f5 (X<1>_mmx_out4)
LUT3:I2->O 1 0.704 0.499 Mmux_Z_mux00101221 (Z_mux0012)
LUT4:I1->O 3 0.704 0.566 Madd__add0005_cy<1>11 (Madd__add0005_cy<1>)
LUT3:I2->O 1 0.704 0.455 Madd__add0005_cy<2>11 (Madd__add0005_cy<2>)
LUT4:I2->O 1 0.704 0.595 Mmux_Z_mux00101219 (Mmux_Z_mux00101219)
LUT3:I0->O 1 0.704 0.000 Mmux_Z_mux00101258_G (N45)
MUXF5:I1->O 2 0.321 0.447 Mmux_Z_mux00101258 (Z_6_OBUF)
OBUF:I->O 3.272 Z_7_OBUF (Z<7>)
----------------------------------------
Total 22.571ns (15.646ns logic, 6.925ns route)
(69.3% logic, 30.7% route)
========================================================================
Total REAL time to Xst completion: 3.00 secs
Total CPU time to Xst completion: 3.70 secs
Total memory usage is 200684 kilobytes
Number of errors : 0 ( 0 filtered)
Number of warnings : 0 ( 0 filtered)
Number of infos : 0 ( 0 filtered)
19
2.1 Multiplication of two 8 bit signed binary numbers
As we discussed the flowchart and an example of booth algorithm, it should now be technically easy
to implement the algorithm for higher bits. For 8 bit signed binary number, the max decimal value is
127 to -128. So whenever we give an input greater than 127, the program will consider its 2’s
compliment and predict it as negative number. Hence the product of two 8 bit signed binary number is
a 16 bit result.
2.1.1 Verilog code for 8 bit:
module Multi8bit(X,Y,Z);
input signed [7:0] X,Y;
output signed [15:0] Z;
reg signed [15:0] Z;
reg [1:0] temp_check;
integer i;
reg checkBit;
reg [7:0] Y1;
always @ (X,Y)
begin
Z=16'd0;
checkBit=1'd0;
//Number os shifts is equal to number of bits of operation
for (i=0 ; i<8 ; i=i+1)
begin
temp_check= {X[i],checkBit};
Y1= -Y;
case(temp_check)
2'd2 : begin
//If temp_check is 10 , substract Y from Z, i.e., add Z and Y1
Z[15:8]= Z[15:8]+Y1;
end
2'd1 : begin
20
//If temp_check is 01 , add Y to Z
Z[15:8]= Z[15:8]+Y;
end
default : begin //If temp_check is 00 or 11 , do nothing
end
endcase
//After add or sub or default case, right shift the Z by 1
Z = Z>>1;
//Restore the sign bit.
Z[15]= Z[14];
//New check bit is equal to current X bit
checkBit=X[i];
end
end
endmodule
2.1.2 Test Bench
module tb_Multi8bit;
// Inputs
reg [7:0] X;
reg [7:0] Y;
// Outputs
wire [15:0] Z;
// Instantiate the Unit Under Test (UUT)
Multi8bit uut (
.X(X),
.Y(Y),
.Z(Z)
);
21
initial begin
// Initialize Inputs
X= 8'd2;
Y= -8'd12;
$monitor ("X=%d, NegX=%d, Y=%d , Z=%d, NegZ=%d",X,-X, Y, Z, -Z );
#50;
// Add stimulus here
end
endmodule
2.1.3 Test Results:
Output:
Fig 4: Output 8-bit
2.1.4 HDL Synthesis Report
Macro Statistics
# Adders/Subtractors : 15
8-bit adder : 15
# Multiplexers : 64
1-bit 4-to-1 multiplexer : 64
22
2.1.5 Schematic:
Fig 5: Schematic -8 bit
2.1.6 Time delay:
Fig 6 : Total Time delay
The total time delay for the 8 bit multiplier 36.640ns
23
2.1.7 Total Power:
Fig 7. Total Power
The total power consumed is 34 mW.
2.2 Verilog Code for 16 bit multiplier:
module BoothAlgthm16bit_VCode(X,Y,Z);
input signed [15:0] X,Y;
output signed [31:0] Z;
reg signed [31:0] Z;
reg [1:0] temp_check;
integer i;
reg checkBit;
reg [15:0] Y1;
always @ (X,Y)
begin
Z=32'd0;
checkBit=1'd0;
//Number os shifts is equal to number of bits of operation
for (i=0 ; i<16 ; i=i+1)
begin
temp_check= {X[i],checkBit};
24
Y1= -Y;
$monitor ("NegY=%d", Y1);
case(temp_check)
2'd2 : begin
//If temp_check is 10 , substract Y from Z, i.e., add Z and Y1
Z[31:16]= Z[31:16]+Y1;
end
2'd1 : begin
//If temp_check is 01 , add Y to Z
Z[31:16]= Z[31:16]+Y;
end
default : begin
//If temp_check is 00 or 11 , do nothing
end
endcase
//After add or sub or default case, right shift the Z by 1
Z = Z>>1;
//Restore the sign bit.
Z[31]= Z[30];
//New check bit is equal to current X bit
checkBit=X[i];
end
end
endmodule
2.2.1 Test Bench
module tb_Multi;
reg [15:0] X;
25
reg [15:0] Y;
wire [31:0] Z;
// Instantiate the Unit Under Test (UUT)
BoothAlgthm16bit_VCode uut (
.X(X),
.Y(Y),
.Z(Z)
);
initial begin
// Initialize Inputs
X= 16'd2555;
Y= -16'd2;
$monitor ("X=%d, NegX=%d, Y=%d , Z=%d, NegZ=%d",X,-X, Y, Z, -Z );
#50;
End
endmodule
2.2.2 Test Results:
Fig 8 :16 bit Output
26
2.2.3 Total Power:
Fig 9.Total Power
The total power consumed is 34mW.
2.2.4 Time dealy
Fig 10.Total Time Delay
The total time delay for the circuit is 61.251 ns.
2.2.5 HDL Synthesis Report
Macro Statistics
# Adders/Subtractors : 31
16-bit adder : 31
# Multiplexers : 256
1-bit 4-to-1 multiplexer : 256
27
2.3 Total no of modules used:
Module 4 bit 8 bit 16
Adders 7(8bit Adders) 15 (Adders) 31(16bit Adders)
Mux 4:1 19 64 256
Table 2.1 Total number of modules
2.4 Power and Delay Comparison:
Parameter 4 bit 8bit 16 bit
Power (mW) 13.859 34 34
Delay 27 36.64 61.259
Table 2..2 Power and Delay Comparison
2.5 Future Work:
The standard booth algorithm which helps in speeding the multiplication process. For the further
improvement in speed of multiplication, engineers have introduced the modified booth algorithm ,It is
possible to reduce the number of partial products by half, by using the technique of Radix-4 Booth
recoding. The basic idea is that, instead of shifting and adding for every column of the multiplier term
and multiplying by 1 or 0, we only take every second column, and multiply by +/- 1, +/- 2 or 0, to
obtain the same results. Radix-4 booth encoder performs the process of encoding the multiplicand
based on multiplier bits. It will compare 3 bits at a time with overlapping technique.
In Verilog the limitation is calculation of power, so in future it would be better to implement the same
in Cadence in order to have correct power, delay.
28
3.0 Modules implemented in Cadence:
Following basic modules are implemented using Cadence
1. Inverter
2. Nor
3. Nand
4. Xor
5. Half Adder
6. Full Adder
7. Decoder
8. Multiplexer
9. D- Flipflop
10. Shifter
11. Adder-Subtractor
Nand schematic:
Fig 11. Nand Schematic
36
Decoder Schematic:
Fig 26. Decoder Schematic
Decoder TB:
Fig 27 : Deocder TB
Decoder Output:
Fig 28: Decoder Output
37
D-Flip-flop Schematic:
Fig29. D Flip-flop schematic
D-Flip-flop TB:
Fig 30 .D Flipflop TB
Output:
Fig 31. D Flipflop output
38
Adder – Subtractor Schematic:
Fig 31. Adder Subtractor schematic
Adder Subtractor Tb:
Fig 32. Adder Subtractor TB
39
Adder Subtractor Output:
Fig 33. Adder Subtractor output.
4.0 Conclusion
The result of the above multiplication technique reviewed in this paper is the reduction of the
maximum height of the partial product array, which simplifies the partial product reduction tree, in
terms of delay and regularity of the layout. Considering different technique or design of MAC unit,
pipelining booth multiplication gives good performance in terms of speed and SPST and block
enabling technique are better in low power consumption and area.
5.0 References
1. International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
2. Digital Design by M. Morris Mano, Michael D. Ciletti, 4th edition.
3. Introduction to VLSI by Doughlas A. Pucknell
4. Fabrizio Lamberti, Nikolaos Andrikos,Elisardo Antelo, Paolo Montuschi, ―Speeding-up
Booth Encoded Multipliers by Reducing the Size of Partial Product Array," INTERNAL
REPORT DAUIN/DELEN-POLITECNICO DI TORINO AND UNIVERSITY DE
SANTIAGO DE COMPOSTELA, 2009
5. Sandeep Shrivastava*, Jaikaran Singh* and Mukesh Tiwari*, ―Implementation of Radix-2
Booth Multiplier and Comparison with Radix-4 Encoder Booth Multiplier,‖ International
Journal on Emerging Technologies 2(1): 14-16(2011) ISSN : 0975-8364
6. Dr. Ravi Shankar Mishra,Prof. Puran Gour,Braj Bihari Soni, ―Design and Implements of
Booth and Robertson‘s multipliers algorithm on FPGA.‖ International Journal of Engineering
Research and Applications (IJERA) ISSN: 2248-9622.