Report

47
Multiplier using Booth Algorithm Project Report Submitted in the partial fulfillment of the requirements for the award of ECE -5382 MASTERS In Electronics and Computer Engineering By Santhosh Kumar Vempati (R11344923) Yaswanth Popuri (R11358263) Under the Guidance of Dr. Tooraj Nikoubin DEPARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING TEXAS TECH UNIVERSITY FALL 2014

description

report

Transcript of Report

Multiplier using Booth Algorithm

Project Report

Submitted

in the partial fulfillment of the requirements for

the award of ECE -5382

MASTERS

In

Electronics and Computer Engineering

By

Santhosh Kumar Vempati (R11344923)

Yaswanth Popuri (R11358263)

Under the Guidance of

Dr. Tooraj Nikoubin

DEPARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING

TEXAS TECH UNIVERSITY

FALL 2014

ii

ACKNOWLEDGEMENT

To discover, analyse and to present something new is to venture on an

untraded path towards and unexplored destination is an arduous adventure unless one

gets a true torchbearer to show the way. We would have never succeeded in

completing our task without the cooperation, encouragement and help provided to us

by various people. Words are often too less to reveals one’s deep regards. We take

this opportunity to express our profound sense of gratitude and respect to all those

who helped me through the duration of this thesis. We acknowledge with gratitude

and humility our indebtedness to Dr. Tooraj Nikoubin,ECE, Texas Tech University

under whose guidance we had the privilege to complete this projet.. We wish to

express our deep gratitude towards his for providing individual guidance and support

throughout the work.

Santhosh Kumar Vempati R11344923

Yashwanth Popuri R11358263

iii

ABSTRACT

The following report contains the information about the work carried by us during the Fall

2014 at Texas Tech university. The purpose of this project is to create a multiplier using

Booth algorithm in the Verilog language and work done on the cadence.

Booth algorithm is used for Simulation and Development of Digital Multiplier. It is a

powerful algorithm for signed-number multiplication, which treats both positive and negative

numbers uniformly. Booth algorithm uses a small number of additions and shift operations to

do the work of multiplication. This approach uses fewer additions and subtractions than more

straightforward algorithms. This work evaluates the performance of the design in terms of

delay, power and their products by hand with logical efforts through custom design using

Verilog language in Xilinx ISE 14.2 tool.

iv

INDEX 1

Acknowledgement ii

Abstract iii

Index 1 iv

Index 2: List Of Tables vi

Index 3 : List Of Figures vii

1. Introduction

1.1 Algorithm

1.2 Implementation

1.3 Flow Chart

1.4 Example

2.0 Multiplication of two 4bit signed numbers

2.0.1 Verilog Code for 4bit binary numbers

2.0.2 Test bench

2.0.3 Results

2.0.4 Synthesis Report

2.0.5 Schematic

2.0.6 Power Calculation

2.1 Multiplication of two 4bit signed numbers

2.1.1 Verilog Code for 4bit binary numbers

2.1.2 Test bench

2.1.3 Results

2.1.4 Synthesis Report

2.1.5 Schematic

v

2.1.6 Power Calculation

2.1.7 Delay

2.2 Multiplication of two 16 bit signed numbers

2.1.1 Verilog Code for 4bit binary numbers

2.2.2 Test bench

2.2.3 Results

2.2.4 Synthesis Report

2.2.5 Schematic

2.2.6 Power Calculation

2.2.7 Delay

2.3 Total number of modules used

2.4 Power delay comparison

2.5 Future Work

3.0 Modules Implemented in Cadence

vi

INDEX-II

LIST OF FIGURES

S.No FIGURE No TITLE PAGE No

1 Figure 1 4-bit Output 1

2 Figure 2 4-bit Schematic 18

3 Figure 3 Power Calculation 4 bit 18

4 Figure 4 8-bit Output 21

5 Figure 5 8-bit Schematic 22

6 Figure 6 Total Time delay 22

7 Figure 7 Total Power 23

8 Figure 8 16-bit Output 25

9 Figure 9 Total Power – 16bit 26

10 Figure 10 Time Delay – 16bit 26

11 Figure 11 Nand Schematic 28

12 Figure 12 Nand delay 29

13 Figure 13 Nand Power 29

14 Figure 14 Nor Schematic 30

15 Figure 15 NOR delay 30

16 Figure 16 XOR schematic 31

17 Figure 17 XOR output 31

18 Figure 18 Half adders schematic 32

19 Figure 19 Half adder output 32

20 Figure 20 Full Adder schematic 33

vii

21 Figure 21 Full Adder output 33

22 Figure 22 Multiplexer schematic 34

23 Figure 23 Multiplexer TB 34

24 Figure 24 Multiplexer Output 34

25 Figure 25 Decoder Schematic 35

26 Figure 26 Deocder TB 35

27 Figure 27 Decoder Output 35

28 Figure 28 D Flip-flop schematic 36

29 Figure 29 D Flip-flopr TB 36

30 Figure 30 D Flipflop output 36

31 Figure 31 Adder Subtractor schematic 37

32 Figure 32 Adder Subtractor TB 37

33 Figure 33 Adder Subtractor output 38

viii

INDEX-III

LIST OF TABLES

S.No TABLE No TITLE PAGE No

1 Table 2.1 Total number of modules 27

2 Table 2.2 Power and Delay Comparison 27

1

1. Introduction:

Booth's multiplication algorithm is a multiplication algorithm that multiplies two signed binary

numbers in two's complement notation. The algorithm was invented by Andrew Donald Booth in

1950 while doing research on crystallography at Birkbeck College in Bloomsbury, London. Booth

used desk calculators that were faster at shifting than adding and created the algorithm to increase

their speed. Booth's algorithm is of interest in the study of computer architecture.

Multiplication is more complicated than addition, being implemented by shifting as well as

addition. Multiplication is nothing but addition of partial products generation and accumulation.

Because of the partial products involved in most multiplication algorithms, more time and more

circuit area is required to compute, allocate, and sum the partial products to obtain the multiplication

result.

A Booth multiplier is a hardware multiplier that performs multiplication of two signed (two's

complement) binary numbers (integers). Booth algorithm, which encodes a binary number one bit-

pair at a time to the signed-digit set S = {-2, —1,0,1,2},is often used to encode one of the multiplier

inputs to reduce the number of partial products that need to be added.

Signed multiplication is a careful process. With unsigned multiplication there is no need to take the

sign of the number into consideration. However in signed multiplication the same process cannot be

applied because the signed number is in a 2’s compliment form which would yield an incorrect result

if multiplied in a similar fashion to unsigned multiplication. That’s where Booth’s algorithm comes

in. Booth’s algorithm preserves the sign of the result.

Booth multiplication is a technique that allows for smaller, faster multiplication circuits, by recoding

the numbers that are multiplied. This approach uses fewer additions and subtractions than more

straightforward algorithms.

1.1 Algorithm:

Booth's algorithm examines adjacent pairs of bits of the N-bit multiplier Y in signed two's

complement representation, including an implicit bit below the least significant bit, y-1 = 0. For each

bit yi, for i running from 0 to N-1, the bits yi and yi-1 are considered. Where these two bits are equal,

the product accumulator P is left unchanged. Where yi = 0 and yi-1 = 1, the multiplicand times 2i is

added to P; and where yi = 1 and yi-1 = 0, the multiplicand times 2i is subtracted from P. The final

value of P is the signed product.

The multiplicand and product are not specified; typically, these are both also in two's complement

representation, like the multiplier, but any number system that supports addition and subtraction will

work as well. As stated here, the order of the steps is not determined. Typically, it proceeds from LSB

2

to MSB, starting at i = 0; the multiplication by 2i is then typically replaced by incremental shifting of

the P accumulator to the right between steps; low bits can be shifted out, and subsequent additions and

subtractions can then be done just on the highest N bits of P.[1] There are many variations and

optimizations on these details.

The algorithm is often described as converting strings of 1's in the multiplier to a high-order +1 and a

low-order –1 at the ends of the string. When a string runs through the MSB, there is no high-order +1,

and the net effect is interpretation as a negative of the appropriate value.

1.2 Implementation:

Booth's algorithm can be implemented by repeatedly adding (with ordinary unsigned binary addition)

one of two predetermined values A and S to a product P, then performing a rightward arithmetic shift

on P. Let m and r be the multiplicand and multiplier, respectively; and let x and y represent the

number of bits in m and r.

Determine the values of A and S, and the initial value of P. All of these numbers should have a length

equal to (x + y + 1).

A: Fill the most significant (leftmost) bits with the value of m. Fill the remaining (y + 1) bits

with zeros.

S: Fill the most significant bits with the value of (−m) in two's complement notation. Fill the

remaining (y + 1) bits with zeros.

P: Fill the most significant x bits with zeros. To the right of this, append the value of r. Fill the

least significant (rightmost) bit with a zero.

Determine the two least significant (rightmost) bits oxf P.

If they are 01, find the value of P + A. Ignore any overflow.

If they are 10, find the value of P + S. Ignore any overflow.

If they are 00, do nothing. Use P directly in the next step.

If they are 11, do nothing. Use P directly in the next step.

Arithmetically shift the value obtained in the 2nd step by a single place to the right. Let P now equal

this new value.Repeat steps 2 and 3 until they have been done y times.

Drop the least significant (rightmost) bit from P. This is the product of m and r.

3

1.3Flow Chart:

4

1.4 Example:

We demonstrate the technique by multiplying -8 by 2 using 4 bits for the multiplicand and the

multiplier:

A = 1 1000 0000 0

S = 0 1000 0000 0

P = 0 0000 0010 0

Perform the loop four times :

P = 0 0000 0010 0. The last two bits are 00.

P = 0 0000 0001 0. Right shift.

P = 0 0000 0001 0. The last two bits are 10.

P = 0 1000 0001 0. P = P + S.

P = 0 0100 0000 1. Right shift.

P = 0 0100 0000 1. The last two bits are 01.

P = 1 1100 0000 1. P = P + A.

P = 1 1110 0000 0. Right shift.

P = 1 1110 0000 0. The last two bits are 00.

P = 1 1111 0000 0. Right shift.

The product is 11110000 (after discarding the first and the last bit) which is −16.

2.0 Multiplication of two 4 bit signed binary numbers:

As we discussed the flowchart and an example of booth algorithm, it should now be technically easy

to implement the algorithm for higher bits. For 4 bit signed binary number, the max decimal value is

15 to -15. So whenever we give an input greater than 15, the program will consider its 2’s compliment

and predict it as negative number .Hence the product of two 4 bit signed binary number is a 8 bit

result.

2.0.1 Verilog code for 4 bit Binary number:

module Multi4bit(X,Y,Z);

input signed [3:0] X,Y;

output signed [7:0] Z;

reg signed [7:0] Z;

5

reg [1:0] temp_check;

integer i;

reg checkBit;

reg [7:0] Y1;

always @ (X,Y)

begin

Z=8'd0;

checkBit=1'd0;

//Number of shifts is equal to number of bits of operation

for (i=0 ; i<4 ; i=i+1)

begin

temp_check= {X[i],checkBit};

Y1= -Y;

case(temp_check)

2'd2 : begin

//If temp_check is 10 , substract Y from Z, i.e., add Z and Y1

Z[7:3]= Z[7:3]+Y1;

end

2'd1 : begin

//If temp_check is 01 , add Y to Z

Z[7:3]= Z[7:3]+Y;

end

default : begin //If temp_check is 00 or 11 , do nothing

end

endcase

//After add or sub or default case, right shift the Z by 1

Z = Z>>1;

//Restore the sign bit.

6

Z[7]= Z[6];

//New check bit is equal to current X bit

checkBit=X[i];

end

end

endmodule

2.0.2 Test Bench

module tb_Multi4bit;

// Inputs

reg [3:0] X;

reg [3:0] Y;

// Outputs

wire [7:0] Z;

// Instantiate the Unit Under Test (UUT)

Multi8bit uut (

.X(X),

.Y(Y),

.Z(Z) );

initial begin

// Initialize Inputs

X= 4'd2;

Y= 4'd3;

$monitor ("X=%d, NegX=%d, Y=%d , Z=%d, NegZ=%d",X,-X, Y, Z, -Z );

#50; // Add stimulus here

end

endmodule

7

2.0.3 Results:

Fig:1 4-bit output

2.04 Synthesis Report:

Release 14.2 - xst P.28xd (nt)

Copyright (c) 1995-2012 Xilinx, Inc. All rights reserved.

--> Parameter TMPDIR set to xst/projnav.tmp

Total REAL time to Xst completion: 0.00 secs

Total CPU time to Xst completion: 0.12 secs

--> Parameter xsthdpdir set to xst

Total REAL time to Xst completion: 0.00 secs

Total CPU time to Xst completion: 0.12 secs

--> Reading design: Multi4bit.prj

TABLE OF CONTENTS

1) Synthesis Options Summary

2) HDL Compilation

3) Design Hierarchy Analysis

4) HDL Analysis

5) HDL Synthesis

8

5.1) HDL Synthesis Report

6) Advanced HDL Synthesis

6.1) Advanced HDL Synthesis Report

7) Low Level Synthesis

8) Partition Report

9) Final Report

9.1) Device utilization summary

9.2) Partition Resource Summary

9.3) TIMING REPORT

========================================================================

=

* Synthesis Options Summary *

========================================================================

=

---- Source Parameters

Input File Name : "Multi4bit.prj"

Input Format : mixed

Ignore Synthesis Constraint File : NO

---- Target Parameters

Output File Name : "Multi4bit"

Output Format : NGC

Target Device : xc3s100e-4-vq100

---- Source Options

Top Module Name : Multi4bit

Automatic FSM Extraction : YES

FSM Encoding Algorithm : Auto

Safe Implementation : No

FSM Style : LUT

9

RAM Extraction : Yes

RAM Style : Auto

ROM Extraction : Yes

Mux Style : Auto

Decoder Extraction : YES

Priority Encoder Extraction : Yes

Shift Register Extraction : YES

Logical Shifter Extraction : YES

XOR Collapsing : YES

ROM Style : Auto

Mux Extraction : Yes

Resource Sharing : YES

Asynchronous To Synchronous : NO

Multiplier Style : Auto

Automatic Register Balancing : No

---- Target Options

Add IO Buffers : YES

Global Maximum Fanout : 500

Add Generic Clock Buffer(BUFG) : 24

Register Duplication : YES

Slice Packing : YES

Optimize Instantiated Primitives : NO

Use Clock Enable : Yes

Use Synchronous Set : Yes

Use Synchronous Reset : Yes

Pack IO Registers into IOBs : Auto

Equivalent register Removal : YES

10

---- General Options

Optimization Goal : Speed

Optimization Effort : 1

Keep Hierarchy : No

Netlist Hierarchy : As_Optimized

RTL Output : Yes

Global Optimization : AllClockNets

Read Cores : YES

Write Timing Constraints : NO

Cross Clock Analysis : NO

Hierarchy Separator : /

Bus Delimiter : <>

Case Specifier : Maintain

Slice Utilization Ratio : 100

BRAM Utilization Ratio : 100

Verilog 2001 : YES

Auto BRAM Packing : NO

Slice Utilization Ratio Delta : 5

========================================================================

=

* HDL Compilation *

========================================================================

=

Compiling verilog file "bit_4.v" in library work

Module <Multi4bit> compiled

No errors in compilation

Analysis of file <"Multi4bit.prj"> succeeded.

11

========================================================================

=

* Design Hierarchy Analysis *

========================================================================

=

Analyzing hierarchy for module <Multi4bit> in library <work>.

========================================================================

=

* HDL Analysis *

========================================================================

=

Analyzing top module <Multi4bit>.

Module <Multi4bit> is correct for synthesis.

========================================================================

=

* HDL Synthesis *

========================================================================

=

Performing bidirectional port resolution...

Synthesizing Unit <Multi4bit>.

Related source file is "bit_4.v".

WARNING:Xst:646 - Signal <temp_check> is assigned but never used. This unconnected signal will

be trimmed during the optimization process.

WARNING:Xst:646 - Signal <checkBit> is assigned but never used. This unconnected signal will be

trimmed during the optimization process.

WARNING:Xst:646 - Signal <Y1> is assigned but never used. This unconnected signal will be

trimmed during the optimization process.

Found 5-bit adder for signal <$add0000> created at line 41.

12

Found 5-bit adder for signal <$add0001> created at line 45.

Found 5-bit adder for signal <$add0002> created at line 41.

Found 5-bit adder for signal <$add0003> created at line 45.

Found 5-bit adder for signal <$add0004> created at line 41.

Found 5-bit adder for signal <$add0005> created at line 45.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0000> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0001> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0002> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0003> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0004> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0005> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0006> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0007> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0008> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0009> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0010> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0011> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0012> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0013> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0014> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0015> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0016> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0017> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0018> created at line 38.

Found 1-bit 4-to-1 multiplexer for signal <Z$mux0019> created at line 38.

Summary:

inferred 7 Adder/Subtractor(s).

inferred 20 Multiplexer(s).

13

Unit <Multi4bit> synthesized.

========================================================================

=

HDL Synthesis Report

Macro Statistics

# Adders/Subtractors : 7

5-bit adder : 6

8-bit adder : 1

# Multiplexers : 20

1-bit 4-to-1 multiplexer : 20

========================================================================

=

* Advanced HDL Synthesis *

========================================================================

Advanced HDL Synthesis Report

Macro Statistics

# Adders/Subtractors : 7

5-bit adder : 7

# Multiplexers : 19

1-bit 4-to-1 multiplexer : 19

Optimizing unit <Multi4bit> ...

Mapping all equations...

Building and optimizing final netlist ...

Found area constraint ratio of 100 (+ 5) on block Multi4bit, actual ratio is 5.

========================================================================

=

* Final Report *

========================================================================

=

14

Final Results

RTL Top Level Output File Name : Multi4bit.ngr

Top Level Output File Name : Multi4bit

Output Format : NGC

Optimization Goal : Speed

Keep Hierarchy : No

Design Statistics

# IOs : 16

Cell Usage :

# BELS : 120

# GND : 1

# LUT2 : 8

# LUT3 : 24

# LUT4 : 42

# MULT_AND : 4

# MUXCY : 12

# MUXF5 : 14

# XORCY : 15

# IO Buffers : 16

# IBUF : 8

# OBUF : 8

========================================================================

=

Device utilization summary:

---------------------------

Selected Device : 3s100evq100-4

15

Number of Slices: 42 out of 960 4%

Number of 4 input LUTs: 74 out of 1920 3%

Number of IOs: 16

Number of bonded IOBs: 16 out of 66 24%

---------------------------

Partition Resource Summary:

---------------------------

No Partitions were found in this design.

---------------------------

========================================================================

=

TIMING REPORT

NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE.

FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

GENERATED AFTER PLACE-and-ROUTE.

Clock Information:

------------------

No clock signals found in this design

Asynchronous Control Signals Information:

----------------------------------------

No asynchronous control signals found in this design

16

Timing Summary:

---------------

Speed Grade: -4

Minimum period: No path found

Minimum input arrival time before clock: No path found

Maximum output required time after clock: No path found

Maximum combinational path delay: 22.571ns

Timing Detail:

--------------

All values displayed in nanoseconds (ns)

========================================================================

=

Timing constraint: Default path analysis

Total number of paths / destination ports: 25181 / 8

-------------------------------------------------------------------------

Delay: 22.571ns (Levels of Logic = 19)

Source: Y<1> (PAD)

Destination: Z<7> (PAD)

Data Path: Y<1> to Z<7>

Gate Net

Cell:in->out fanout Delay Delay Logical Name (Net Name)

---------------------------------------- ------------

IBUF:I->O 16 1.218 1.209 Y_1_IBUF (Y_1_IBUF)

LUT2:I0->O 3 0.704 0.531 Madd__old_Y1_2_Madd_xor<1>11 (Z_mux0003_mand)

17

MULT_AND:I1->LO 0 0.741 0.000 Z_mux0003_mand (Z_mux0003_mand1)

MUXCY:DI->O 1 0.888 0.000 Madd__add0000_cy<0> (Madd__add0000_cy<0>)

XORCY:CI->O 1 0.804 0.424 Madd__add0000_xor<1> (_add0000<1>)

LUT4:I3->O 3 0.704 0.535 Mmux_Z_mux000221 (X<0>_mmx_out12)

LUT4:I3->O 2 0.704 0.622 Mmux_Z_mux000841 (Z_mux0008)

LUT2:I0->O 1 0.704 0.000 Madd__add0002_lut<0> (Madd__add0002_lut<0>)

MUXCY:S->O 1 0.464 0.000 Madd__add0002_cy<0> (Madd__add0002_cy<0>)

MUXCY:CI->O 1 0.059 0.000 Madd__add0002_cy<1> (Madd__add0002_cy<1>)

XORCY:CI->O 1 0.804 0.420 Madd__add0002_xor<2> (_add0002<2>)

MUXF5:S->O 4 0.739 0.622 Mmux_Z_mux00055_f5 (X<1>_mmx_out4)

LUT3:I2->O 1 0.704 0.499 Mmux_Z_mux00101221 (Z_mux0012)

LUT4:I1->O 3 0.704 0.566 Madd__add0005_cy<1>11 (Madd__add0005_cy<1>)

LUT3:I2->O 1 0.704 0.455 Madd__add0005_cy<2>11 (Madd__add0005_cy<2>)

LUT4:I2->O 1 0.704 0.595 Mmux_Z_mux00101219 (Mmux_Z_mux00101219)

LUT3:I0->O 1 0.704 0.000 Mmux_Z_mux00101258_G (N45)

MUXF5:I1->O 2 0.321 0.447 Mmux_Z_mux00101258 (Z_6_OBUF)

OBUF:I->O 3.272 Z_7_OBUF (Z<7>)

----------------------------------------

Total 22.571ns (15.646ns logic, 6.925ns route)

(69.3% logic, 30.7% route)

========================================================================

Total REAL time to Xst completion: 3.00 secs

Total CPU time to Xst completion: 3.70 secs

Total memory usage is 200684 kilobytes

Number of errors : 0 ( 0 filtered)

Number of warnings : 0 ( 0 filtered)

Number of infos : 0 ( 0 filtered)

18

2.0.5 Schematic:

Fig 2. Schematic -4 bit

2.0.6Power calculation:

Fig 3. Power Calculation -4 bit

19

2.1 Multiplication of two 8 bit signed binary numbers

As we discussed the flowchart and an example of booth algorithm, it should now be technically easy

to implement the algorithm for higher bits. For 8 bit signed binary number, the max decimal value is

127 to -128. So whenever we give an input greater than 127, the program will consider its 2’s

compliment and predict it as negative number. Hence the product of two 8 bit signed binary number is

a 16 bit result.

2.1.1 Verilog code for 8 bit:

module Multi8bit(X,Y,Z);

input signed [7:0] X,Y;

output signed [15:0] Z;

reg signed [15:0] Z;

reg [1:0] temp_check;

integer i;

reg checkBit;

reg [7:0] Y1;

always @ (X,Y)

begin

Z=16'd0;

checkBit=1'd0;

//Number os shifts is equal to number of bits of operation

for (i=0 ; i<8 ; i=i+1)

begin

temp_check= {X[i],checkBit};

Y1= -Y;

case(temp_check)

2'd2 : begin

//If temp_check is 10 , substract Y from Z, i.e., add Z and Y1

Z[15:8]= Z[15:8]+Y1;

end

2'd1 : begin

20

//If temp_check is 01 , add Y to Z

Z[15:8]= Z[15:8]+Y;

end

default : begin //If temp_check is 00 or 11 , do nothing

end

endcase

//After add or sub or default case, right shift the Z by 1

Z = Z>>1;

//Restore the sign bit.

Z[15]= Z[14];

//New check bit is equal to current X bit

checkBit=X[i];

end

end

endmodule

2.1.2 Test Bench

module tb_Multi8bit;

// Inputs

reg [7:0] X;

reg [7:0] Y;

// Outputs

wire [15:0] Z;

// Instantiate the Unit Under Test (UUT)

Multi8bit uut (

.X(X),

.Y(Y),

.Z(Z)

);

21

initial begin

// Initialize Inputs

X= 8'd2;

Y= -8'd12;

$monitor ("X=%d, NegX=%d, Y=%d , Z=%d, NegZ=%d",X,-X, Y, Z, -Z );

#50;

// Add stimulus here

end

endmodule

2.1.3 Test Results:

Output:

Fig 4: Output 8-bit

2.1.4 HDL Synthesis Report

Macro Statistics

# Adders/Subtractors : 15

8-bit adder : 15

# Multiplexers : 64

1-bit 4-to-1 multiplexer : 64

22

2.1.5 Schematic:

Fig 5: Schematic -8 bit

2.1.6 Time delay:

Fig 6 : Total Time delay

The total time delay for the 8 bit multiplier 36.640ns

23

2.1.7 Total Power:

Fig 7. Total Power

The total power consumed is 34 mW.

2.2 Verilog Code for 16 bit multiplier:

module BoothAlgthm16bit_VCode(X,Y,Z);

input signed [15:0] X,Y;

output signed [31:0] Z;

reg signed [31:0] Z;

reg [1:0] temp_check;

integer i;

reg checkBit;

reg [15:0] Y1;

always @ (X,Y)

begin

Z=32'd0;

checkBit=1'd0;

//Number os shifts is equal to number of bits of operation

for (i=0 ; i<16 ; i=i+1)

begin

temp_check= {X[i],checkBit};

24

Y1= -Y;

$monitor ("NegY=%d", Y1);

case(temp_check)

2'd2 : begin

//If temp_check is 10 , substract Y from Z, i.e., add Z and Y1

Z[31:16]= Z[31:16]+Y1;

end

2'd1 : begin

//If temp_check is 01 , add Y to Z

Z[31:16]= Z[31:16]+Y;

end

default : begin

//If temp_check is 00 or 11 , do nothing

end

endcase

//After add or sub or default case, right shift the Z by 1

Z = Z>>1;

//Restore the sign bit.

Z[31]= Z[30];

//New check bit is equal to current X bit

checkBit=X[i];

end

end

endmodule

2.2.1 Test Bench

module tb_Multi;

reg [15:0] X;

25

reg [15:0] Y;

wire [31:0] Z;

// Instantiate the Unit Under Test (UUT)

BoothAlgthm16bit_VCode uut (

.X(X),

.Y(Y),

.Z(Z)

);

initial begin

// Initialize Inputs

X= 16'd2555;

Y= -16'd2;

$monitor ("X=%d, NegX=%d, Y=%d , Z=%d, NegZ=%d",X,-X, Y, Z, -Z );

#50;

End

endmodule

2.2.2 Test Results:

Fig 8 :16 bit Output

26

2.2.3 Total Power:

Fig 9.Total Power

The total power consumed is 34mW.

2.2.4 Time dealy

Fig 10.Total Time Delay

The total time delay for the circuit is 61.251 ns.

2.2.5 HDL Synthesis Report

Macro Statistics

# Adders/Subtractors : 31

16-bit adder : 31

# Multiplexers : 256

1-bit 4-to-1 multiplexer : 256

27

2.3 Total no of modules used:

Module 4 bit 8 bit 16

Adders 7(8bit Adders) 15 (Adders) 31(16bit Adders)

Mux 4:1 19 64 256

Table 2.1 Total number of modules

2.4 Power and Delay Comparison:

Parameter 4 bit 8bit 16 bit

Power (mW) 13.859 34 34

Delay 27 36.64 61.259

Table 2..2 Power and Delay Comparison

2.5 Future Work:

The standard booth algorithm which helps in speeding the multiplication process. For the further

improvement in speed of multiplication, engineers have introduced the modified booth algorithm ,It is

possible to reduce the number of partial products by half, by using the technique of Radix-4 Booth

recoding. The basic idea is that, instead of shifting and adding for every column of the multiplier term

and multiplying by 1 or 0, we only take every second column, and multiply by +/- 1, +/- 2 or 0, to

obtain the same results. Radix-4 booth encoder performs the process of encoding the multiplicand

based on multiplier bits. It will compare 3 bits at a time with overlapping technique.

In Verilog the limitation is calculation of power, so in future it would be better to implement the same

in Cadence in order to have correct power, delay.

28

3.0 Modules implemented in Cadence:

Following basic modules are implemented using Cadence

1. Inverter

2. Nor

3. Nand

4. Xor

5. Half Adder

6. Full Adder

7. Decoder

8. Multiplexer

9. D- Flipflop

10. Shifter

11. Adder-Subtractor

Nand schematic:

Fig 11. Nand Schematic

29

Nand Delay:

Fig 12 Nand delay

Delay: 78.4239 p S

Fig 13: .Nand Power

Power: 3.16uW

30

Nor Schematic:

Fig 14: Nor Schematic

Nor Delay:

Delay: 78.423pS

Fig 15. NOR delay

31

Nor Power:

Fig16 .Nor power

Power : 6.802uW.

Xor Schematic:

Fig 17. XOR schematic

32

Xor Output:

Fig 18. XOR output

Half Adder Schematic:

Fig19. Half adder schematic

33

Half Adder Output:

Fig 20. Half adder output

Full Adder Schematic:

Fig 21. Full Adder schematic

34

Full Adder Output:

Fig 22. Full Adder output

Multiplexer schematic:

Fig 23.Multiplexer schematic

35

Multiplexer TB:

Fig 24. Multiplexer TB

Multiplexer Output:

Fig 25. Multiplexer output

36

Decoder Schematic:

Fig 26. Decoder Schematic

Decoder TB:

Fig 27 : Deocder TB

Decoder Output:

Fig 28: Decoder Output

37

D-Flip-flop Schematic:

Fig29. D Flip-flop schematic

D-Flip-flop TB:

Fig 30 .D Flipflop TB

Output:

Fig 31. D Flipflop output

38

Adder – Subtractor Schematic:

Fig 31. Adder Subtractor schematic

Adder Subtractor Tb:

Fig 32. Adder Subtractor TB

39

Adder Subtractor Output:

Fig 33. Adder Subtractor output.

4.0 Conclusion

The result of the above multiplication technique reviewed in this paper is the reduction of the

maximum height of the partial product array, which simplifies the partial product reduction tree, in

terms of delay and regularity of the layout. Considering different technique or design of MAC unit,

pipelining booth multiplication gives good performance in terms of speed and SPST and block

enabling technique are better in low power consumption and area.

5.0 References

1. International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)

2. Digital Design by M. Morris Mano, Michael D. Ciletti, 4th edition.

3. Introduction to VLSI by Doughlas A. Pucknell

4. Fabrizio Lamberti, Nikolaos Andrikos,Elisardo Antelo, Paolo Montuschi, ―Speeding-up

Booth Encoded Multipliers by Reducing the Size of Partial Product Array," INTERNAL

REPORT DAUIN/DELEN-POLITECNICO DI TORINO AND UNIVERSITY DE

SANTIAGO DE COMPOSTELA, 2009

5. Sandeep Shrivastava*, Jaikaran Singh* and Mukesh Tiwari*, ―Implementation of Radix-2

Booth Multiplier and Comparison with Radix-4 Encoder Booth Multiplier,‖ International

Journal on Emerging Technologies 2(1): 14-16(2011) ISSN : 0975-8364

6. Dr. Ravi Shankar Mishra,Prof. Puran Gour,Braj Bihari Soni, ―Design and Implements of

Booth and Robertson‘s multipliers algorithm on FPGA.‖ International Journal of Engineering

Research and Applications (IJERA) ISSN: 2248-9622.