10.1.1.1.4951
-
Upload
anand-chaudhary -
Category
Documents
-
view
218 -
download
0
Transcript of 10.1.1.1.4951
-
7/30/2019 10.1.1.1.4951
1/112
16-bit Booth Multiplier
with 32-bit Accumulate
Marc MoskoCMPE223 Independent Study
-
7/30/2019 10.1.1.1.4951
2/112
-
7/30/2019 10.1.1.1.4951
3/112
CMPE223 Booth Multiplier Marc Mosko
Table of Contents
Introduction......................................................................................................................................3Basic Design ....................................................................................................................................4
Performance Estimates ................................................................................................................5Booth Multiplier ..........................................................................................................................6
VHDL Source Code.......................................................................................................................10
Code Overview..........................................................................................................................10I/O Register Design ...................................................................................................................13
Example Register Access ..........................................................................................................13
Source Code...............................................................................................................................17Source Code Hierarchy..............................................................................................................18
VHDL Code Versions................................................................................................................20Overflow Logic..............................................................................................................................22
Magic Layout .................................................................................................................................23Design Hierarchy .......................................................................................................................24RSIM Calibration.......................................................................................................................28
Optimization ..............................................................................................................................29References......................................................................................................................................32
VHDL Source Code.......................................................................................................................33Addcell.vhd................................................................................................................................33Adder.vhd ..................................................................................................................................34
Booth.vhd...................................................................................................................................36claN.vhd .....................................................................................................................................38
driverN.vhd ................................................................................................................................41
latch.vhd.....................................................................................................................................42mult.vhd .....................................................................................................................................47
mult_cla.vhd ..............................................................................................................................53mult_pipe.vhd ............................................................................................................................54
-
7/30/2019 10.1.1.1.4951
4/112
CMPE223 Booth Multiplier Marc Mosko
Invchain ...................................................................................................................................101Mcell ........................................................................................................................................103
Mcell ........................................................................................................................................104Ppmux......................................................................................................................................105
Ppmuxfa...................................................................................................................................107Rwire........................................................................................................................................108Wiring cells (passive) ..............................................................................................................109
-
7/30/2019 10.1.1.1.4951
5/112
CMPE223 Booth Multiplier Marc Mosko
Introduction
This report presents three main topics we investigated as part of a project to build a Booth
encoded multiply/accumulate VLSI chip. The original scope of work included synthesizing
VHDL code using the Mentor Graphics tools. Exemplar was the VHDL compiler. Leonardo
Spectrum was the synthesizer. Since my team, which included Kevin Delaney, did not meet a
Mosis deadline our chip funding was lost. Since we did not actually fabricate a chip, we cannot
discuss the success of our results. Likewise, VHDL synthesis using the Exemplar tools was not
very successful, so we do not discuss synthesis results except in passing. The main points we
cover are the basic architecture, our VHDL code, and a Magic layout in place of logic synthesis.
The work presented here, except as cited, is almost entirely my own. Teamwork with Kevin
Delaney had some influence on the VHDL code, since he was primarily working on the synthesis
portion of the project.
Due to length considerations, we have not included all VHDL code or any test suites. We have
-
7/30/2019 10.1.1.1.4951
6/112
CMPE223 Booth Multiplier Marc Mosko
Basic Design
The goal of the multiplier is to compute X[15:0] * Y[15:0] + W[31:0] = Z[31:0] and OVRFLW.
OVRFLW is the multiply-accumulate overflow. We discuss OVRFLW in more detail below. It
is not simply the carry-out of the final addition.
Our multiplier is based on a booth encoded array multiplier design in [3,4]. The 32-bit adder we
use for the final addition is from [1,2,4]. We used a Carry-Select Adder (CSA) since it has fairly
regular layout and good performance.
The VHDL design is a 3-stage pipeline with I/O registers and common 16-bit I/O bus. A
complete transaction takes 7 complete cycles: load X, load Y, load W_H, load W_L, Multiply,
read Z_H, read Z_L. Our design can pipeline the multiply with loading a value, such as the next
operations X, so in a stream we are down to 6 cycles. The 6 or 7 cycle length is a limitation of
-
7/30/2019 10.1.1.1.4951
7/112
CMPE223 Booth Multiplier Marc Mosko
improperly sizes transistors that did not pass 1 or sometimes 0 with enough force to drive the
whole CPL NMOS chain. [3] also uses cross-coupled minimum sized PMOS latches to restore
the swing to output inverters. RSIM did not correctly simulate the swing restore, so we had to
remove the cross-coupled latches.
We have verified correct operation of both the VHDL and Magic circuits with several boundary
cases and 10,000 random multiply/accumulates. The VHDL test cases ran through the I/O
registers while the Magic cases were raw arithmetic computations. The Magic layout had many
problems, particularly with the carry-select adder design, which uses pass-logic. As of the
writing of this report, we verified 10,000 random cases on the Magic layout with one error. We
have fixed that error, but do not have time to rerun the whole batch. It takes about 8 hours to run
all cases (we used four machines at 2 hours each).
Per fo rmance Est im atesBased on timing estimates from Leonardo Spectrum, we believe the VHDL system will run at
about 48 MHz on a 2-phase clock (about 7ns per phase, 3 phases). We do not believe these
-
7/30/2019 10.1.1.1.4951
8/112
CMPE223 Booth Multiplier Marc Mosko
down to 0.1ns or less. There are a handful of 0.2ns transisitons in the critical path. The critical
path has 61 transitions.
Because of uncertainty in both the Leonardo timing and the RSIM calibration, it is possible that
both results are substantially off. The section RSIM Calibration describes our approach to
calibrating RSIM for the AMI C5N 0.5 process.
The original fast adder in the Magic layout was a 2-2-4-4-4-8-8 CSA adder. This design, based
on [2], assumes that all inputs arrive at about the same time. That is not the case here.
Generally, the last bit to the adder is around Z[16], so one might wish to experiment with a 2-2-
4-4-2-2-4-4-4 or other variant. Please see our comments on the CSA adder in the Optimization
section below.
We had time to try a 2-2-4-4-4-4-4-8 adder, and our maximum time dropped from 7ns to 6.1ns
for 4EF9 x E1DC + 287CF2D0 (we have since reduced the time even further). Our intuition
-
7/30/2019 10.1.1.1.4951
9/112
CMPE223 Booth Multiplier Marc Mosko
present schematics of each component in a later section with the Magic cell layouts. For now,
we wish to present only a high-level floorplan.
Figure 2. shows the floorplan of a 6-bit Booth multiplier with 12-bit accumulate. It is essentially
the same as the 16-bit multiplier. We will use the 6-bit version in our present discussion, since
the floorplan fits on a single page. The vertical dashed lines are continuations of the X inputs.
We used dashed lines to make it easier to see the regular wiring pattern.
The 12-bit accumulate requires a set of full adders as shown. The first five bits of the
accumulate, W[4:0], use the first Booth row for addition. W5 cannot be added with X5. X5 is a
sign bit but W5 is not. Therefore, we must add W5 with an adder on the outside of the array. A
standard array multiplier has a fast adder outside the array. Along the bottom of the array, a sum
bit is 1i j jZ C S+= + . Sand C are the sum and carry outputs of the bottom Booth row ppfacomponents. In our case of a 6-bit multiplier, j=i-5. To add in a third bit, Wi, we use full adder
to compute the sum S and carry C from C S W+ + We may then use a fast adder to
-
7/30/2019 10.1.1.1.4951
10/112
CMPE223 Booth Multiplier Marc Mosko
single full adders have no ripple carry, this style seems to work well.
The design in [3,4] uses a sign extension mechanism between booth-encoded rows. There is no
constant offset, as in the current Kestrel multiplier. The sign extension uses the partial product
output (pp_out) from the left-most column of the previous row. The pp_out output is the output
of the partial product mux before the adder. Using this and the ff output of the previous rows
sgn component, the sign extender computes new outputs to carry the sign to the next row. This
technique uses one additional column in the array.
-
7/30/2019 10.1.1.1.4951
11/112
CMPE223 Booth Multiplier Marc Mosko
fa
p
pfa
x4
d
w4
ppf
a
w0
x0
gnd
add
cell
fa
p
pfa
ppf
a
add
ce
ll
ppfa
pp
fa
ppfa
ppfa
pp
fa
ppfa
fa
p
pfa
ppf
a
add
cell
ppfa
pp
fa
ppfa
12-bitCSAAdder(1/2)
HA
w1
x1
w
2
x2
w3
x3
w5
FA
w6
12-bitCSA
Adde
r(1/2)
FA
w7
F
Aw8
FA
FA
FA
w9
w10
w11
-
7/30/2019 10.1.1.1.4951
12/112
CMPE223 Booth Multiplier Marc Mosko
VHDL Source Code
This section presents the most recent VHDL source code for a pipelined Booth encoded
multiplier. The code presented below may be found in the directory
http://www.cse.ucsc.edu/~mmosko/cmpe223/report2/vhdl. There are several other versions
under /projects/kestrel/users/mult/marc/vhdl/booth-1. The code presented here is mostly based
on the leo directory (short for Leonardo, the synthesizer). The last part of this section makes
brief comments on the other versions.
We used C++ to model the Booth multiply/accumulate before writing the VHDL code. The link
is http://www.cse.ucsc.edu/~mmosko/cmpe223/report2/cpp. There are three versions. The first
is an 8-bit adder using 4:2 compressors. The second is a 16-bit also with 4:2 compressors. The
third is a 16-bit with a fast adder. We shall not discuss this code any further in the interests of
space.
-
7/30/2019 10.1.1.1.4951
13/112
CMPE223 Booth Multiplier Marc Mosko
The VHDL code mirrors the design in Fig. 4 as closely as possible. We made some abstractions.
An abstract data type in VHDL replaces the one-hot booth encoding. This allows the synthesizer
to use whatever technique it chooses. The adders are abstract + signs, not actual fast adder
implementations. The synthesizer may then use whatever style is appropriate.
Referring to Fig. 4, there are two main sections to the multiplier. The top section is the chip I/O
consisting of six 16-bit registers, an overflow register, a 16-bit common I/O bus, and control
signals. The bottom section is the pipelined Booth multiplier. The multiplier begins
i hi 2 d h l d h f ll i l hi 2 W di h
Signal Direction Purposebusio_s2h In/Out 16-bit input/output from multilier.
ovrflw_s2h Out OVRFLW output
bussel_s2h In 3-bit mutiplexed register selection.
bs_s2h In Input from bus (H) or multiplier (L). Applies to Z registers.
rw_s2h In Read from bus (H) or write to bus (L).
me_s2h In Multiplier Enable (perform the calculation)
rst_s2h In Reset all registers to 0.
clk In phi_1H, phi_1L, phi_2H, phi_2L
-
7/30/2019 10.1.1.1.4951
14/112
CMPE223 Booth Multiplier Marc Mosko
The second pipeline stage consists of four more Booth encoders reading from latched Y values.
The computation begins on phi_1 and the results are latched at the end of the phase. There is
another array multiplier, which continues the multiplication process. There is also an unsigned
8-bit adder to sum the results from the first pipeline section. Leonardo synthesized an Inverted
Nibble adder. Since this addition is independent of the results in the second pipeline stage, we
can perform this addition with little overhead.
According to Leonardo timing estimates, the 8-bit adder is not necessarily for free. The 8-bit
adder takes a comparable amount of time to the 4-row Booth multiplier. In fact, in some timing
runs the 8-bit adder took longer than the multiply, indicating that it might not be a good idea to
try the addition in this pipeline stage. Because of problems we had with the Leonardo timing
estimates, we did not finish an analysis of this question. We would surmise that since all
pipeline stages have the same period, the second pipeline stage with 8-bit accumulate would still
take less time than the 24-bit accumulate in the third stage.
-
7/30/2019 10.1.1.1.4951
15/112
CMPE223 Booth Multiplier Marc Mosko
I /O Regist er Design
The I/O registers follow the schematic
in Fig. 3. The signal medly_s2h is
used to clock in the value from
multin_v2h. It only applies to the Z
registers. The multiplier generates the
delayed multiplier enable signal,
medly, as part of the pipeline. The
output signal store_s2h feeds both
multout_s2h and busout_s2h. The tri-
state drivers for the I/O bus are located
in different VHDL code because of problems we had with the VHDL compiler. The register
drives the bus when sel_s2h and not(rw_s2h) is true, otherwise it is tri-state.
0
1D Q
CLK RST
DFF
0
1
bs_s2h
multin_v2h
busin_s2hw2_s2h store_s2h
csel_s2h
csel_q2h
phi_2h
rst_q2h
rst_s2h
medly_s2h
rden_s2h
rden_s2h
sel_s2h
rw_s2h
Figure 3. Multiplier I/O Register
Line-----A-Bus output-------Bus input--------BBB-C-D-E-F00000122 1011111001000011 1011111001000011 000 1 1 1 0
-
7/30/2019 10.1.1.1.4951
16/112
CMPE223 Booth Multiplier Marc Mosko
Column A is the expected carry out, which is set when reading from the multiplier. Bus Output
is the expected bus output. Bus input is the external driver to the bus. Column A is ovrflw_s2h.
Column B is the bussel_s2h signal. Column C is the bs_s2h. Column D is the rw_s2h signal.
Note that rw_s2h only affects the Z registers. Column E is the me_s2h signal. Column F is
rst_s2h.
Prior to line 122, values were loaded in to the X, Y, and W registers. On line 122, we enable
me_s2h, which latches the X, Y, and W register values in to the multiplier. Because of our I/O
register design, we may simultaneously load a new value in to a register while reading the
register. Line 122 loads a new value 1011111001000011 (BE43) into the X register by
selecting register 0 via bussel_s2h and asserting rw_s2h.
Line 123 loads a new value in to the Y register and simultaneously stores the multiplier output in
to the Z registers. The new value 1010111101100101 (AFC5) is stored in the Y register
by selecting register 1 via bussel s2h and asserting rw s2h. The multiplier result is stored in
-
7/30/2019 10.1.1.1.4951
17/112
CMPE223 Booth Multiplier Marc Mosko
from a D-flipflop. In our test stimulus file, lines other than 124, 125, 12a, and 12b are - dont
care.
Lines 126 and 127 load the W values (2D04417F). Lines 128 and 129 are similar to line 122 and
123. They load the next X and Y value and compute BE43 * AFC5 + 2D04417F = 41B71EEE.
We read the Z values in lines 12a and 12b.
-
7/30/2019 10.1.1.1.4951
18/112
CMPE223 Booth Multiplier Marc Mosko
in rw_s2h
in
reg 0x[15:0]
reg 1y[15:0]
reg 2w[31:16]
reg 3w[15:0]
reg 4Z[31:16]
reg 5Z[15:0]
dffOvrflw
BUSIO_S2H[15:0]
bussel_s2h[2:0] 3:8
demuxin
out
io
inrst_s2h
bs_s2h
ovrflw_s2h
in
in
clk[3:0]
clk = phi1_h/l,
phi2_h/l
me_s2h
16 x 4 Booth Multiplier
(9bits)
Booth(20bout)
16 x 4 Booth Multiplier
pipeline registers
20bout)
pipeline
registers
8bunsingedadd
pipeline
registers
A
(1/3)
y[8:0]
y
[1
5:
8]
z[7:0]
z[15:8]
ovrflw
logic
-
7/30/2019 10.1.1.1.4951
19/112
CMPE223 Booth Multiplier Marc Mosko
Source Code
The table below lists the 59 files that are part of the VHDL code. Generally, there are three or
four files associated with each major component. For the component foo, there would be
foo.vhd, which is the instantiation of the entity and architecture. foo_test.vhd is a test script that
uses foo.vhd as a component (UUT). The test script reads stimulus from foo.txt. Sometimes
there will be a foogen.{pl|cc} to generate the stimulus.
Fileno File Name Description1 addcell.txt Test file stimulus
2 addcell.vhd Implements +1 when booth sign negative3 addcell_test.vhd Test script for addcell4 adder.txt Test file stimulus5 adder.vhd Single bit and N bit full adder6 adder_test.vhd
7 adder15.txt Test file stimulus8 adder15gen.pl Generates stimulus for exhaustive 15-bit adder, incorrect carry
out9 adderN_test.vhd A 15-bit adder test using "adder.vhd"10 adk.vhd Cell library for Leonardo11 booth.txt Test file stimulus12 booth.vhd Booth type (abstracts one-hot), booth encoder, sign propagation13 booth_test.vhd
14 claN.vhd An n-bit carry-lookahead adder (abstract "plusN" and
-
7/30/2019 10.1.1.1.4951
20/112
CMPE223 Booth Multiplier Marc Mosko
29 modelsim.ini Ini file for Exemplar VHDL30 mult.txt Test file stimulus
31 mult.vhd The whole multiplier with I/O registers32 mult_test.vhd Test for "mult.vhd"33 mult_test_1.vhd Test for synthesized "mult.vhd" (uses std_ulogic)34 mult_cla.vhd The CLA adder used by the multiplier35 mult_framegen Generates test cases (boundary and random)36 mult_framegen.cc C++ source code for mult_framegen37 mult_frame.tcl A timing analysis file example
38 mult_frame.txt Test stimulus39 mult_frame.vhd The booth array multiplier and CLA adder40 mult_frame_test.vhd Test for "mult_frame"41 mult_frame_test_u.vhdTest for "mult_frame" (synthesized)42 mult_pipe.txt Test stimulus43 mult_pipe.vhd The booth array multiplier44 mult_pipe_test.vhd Old test script -- out of date
45 multgen.cc Generates test cases for "mult.vhd"46 multreg.vhd N-bit multiplier register using "dffr_fall" and an input buffer47 multregN.txt Test stimulus for 4-bit register, non-exhaustive, out of date (for
latch, not dff)48 multreg_test.vhd Test script for 4-bit register49 Mymake CSH script to create everything50 plusN_test.vhd Tests abstract N-bit adder
51 pp.vhd Partial-product cells ppmux, ppfa, ppfapp)52 ppfa.txt Test cases for "ppfa"53 ppfagen.pl Generates test cases for "ppfa"54 t t T f " "
-
7/30/2019 10.1.1.1.4951
21/112
CMPE223 Booth Multiplier Marc Mosko
computes bus_wr_h[7:0] as a one-hot control signal to a set of 16-bit tri-state buffers for each
I/O registers busout_s2h signal. Finally, the code computes the ovrflw signal.
The component multregn instantiates an N-bit I/O register, as described above. It uses the
components dffrN_fall (file 25) and buf (file 22). DffrN_fall is a N-bit D-flipflop with reset
clocked on the falling edge. Buf is a 1-bit buffer. We had to play some tricks with signal
buffering to ensure proper fan-out. Leonardo had trouble with our source code and generating
proper fan-out. We believe the problem was that we did not follow a strict hierarchy structure of
combinatorial logic followed by registers.
The component mult_cla instantiates a 24-bit fast adder. It uses the component plusN (file 14).
PlusN is an abstracted + operation in VHDL with some added logic to compute the carry. We
used to have mult_cla and mult_pipe in the same source file as part of the same component.
We separated them at some point because of timing simulation problems with Leonardo.
-
7/30/2019 10.1.1.1.4951
22/112
CMPE223 Booth Multiplier Marc Mosko
component sgn (file 51) implements the sign extender of Fig. 2. The components dffr_fall,
dffrN_fall, gdffr_fall, and gdffrN_fall (all file 25) implement single bit and N-bit D-flipflops
with reset. The g versions are gated and have a tri-state Enable input (no longer used).
Inside mult_pipe , we used to drive each pipeline stage from a gated transparent latch. By using
a gated latch, we could conserve power by eliminating spurious transitions while computing the
previous pipeline stage. At the end of the first pipeline stage, for instance, we would latch the
data at the end on phi_2 and enable the tri-state output at the beginning of phi_1. We used the
components glatchrN , etc. When we switched to the DFF, there was no reason to continue
using a gated version, since the flipflop is not transparent. Thus, the gdffand dffcomponents
are identical except for an extra Enable signal that does nothing. We preserved the Enable input
such that there were not changes to our code semantics.
VHDL Code Vers ion s
There are six versions of the VHDL code. The code that best synthesizes is in a directory called
leo under /projects/kestrel/users/mult/marc/vhdl/booth-1. The leo code was the basis for the
-
7/30/2019 10.1.1.1.4951
23/112
CMPE223 Booth Multiplier Marc Mosko
double-rail nature and had better synthesis results. Kevin Delaney found a cell library for
Leonardo, the synthesis tool. The cell library is called ADK. We began using the ADK cell
library in the source tree adk. adk is a non-pipelined multiplier. adk-pipe is a pipelined
multiplier. adk-pipe-cla is a pipelined multiplier with carry-look-ahead adder. We hard-coded
the CLA structure with a behavioral description. In our final version, we steered away from
being so specific and just use a + sign.
We learned several things from these many versions and our efforts at synthesis. In our opinion,
one should try to be as abstract as possible and let the synthesizer figure out the specifics. One
must be aware of automatic register generation and what sort of statements will not synthesize.
Apart from those concerns, we would recommend staying away from gate-level specifics. When
one tries to enforce a specific structure, there is usually competition with the synthesizer and no
one wins. There are directives to give the synthesizer guidelines for specific modules, but we did
not have much success with them.
-
7/30/2019 10.1.1.1.4951
24/112
CMPE223 Booth Multiplier Marc Mosko
Overflow Logic
A multiply-accumulate where all words
are n-bit does not have overflow. Our
architecture, however, does have the
potential for overflow since the
accumulate is twice the word size of the multiplier/multiplicand. We compute a signed overflow
from the following two assertions for Z[m:0]=X[n:0] * Y[n:0] + W[m:0], where in our case
n=15 and m=31. There is overflow if (1) x*y > 0, w > 0 and z
-
7/30/2019 10.1.1.1.4951
25/112
CMPE223 Booth Multiplier Marc Mosko
Magic Layout
The table below lists the 53 files that make up the Magic layout. In general, there are three types
of files, similar to the VHDL directory structure. For the component foo, the file foo.mag is the
Magic cell. foo.cmd is the RSIM command file that runs a test suite. Some components will
have a foo.{pl|c|cc} program to generate the test cases. Sometimes, there is a foo_head.cmd file
with the header portion of the CMD file independent of the test cases. There is also a csa
subdirectory with a VHDL model of the CSA adder. To view these files with the recompiled
Magic, set the environment variable CAD_HOME=/projects/kestrel/users/mult/tools and
execute Magic as magic -TSCN3ME_SUBM.30 from $CAD_HOME/bin.
Fileno File Name Description
60 Addcell.cmd RSIM command file w/ exhastive stimulus
61 Addcell.mag Generates the +1 for negative Booth encoding
62 broute.mag A wiring channel
63 bth.cmd RSIM command file w/ exhastive stimulus
64 bth.mag Booth encoding and sign propagation
65 bthbuf.mag Inverter chain for booth lines
66 bthroute.mag Wire routing for "bth" cell
67 bwire.mag Wiring channel
-
7/30/2019 10.1.1.1.4951
26/112
CMPE223 Booth Multiplier Marc Mosko
Fileno File Name Description
85 csa_8.mag CSA 8-bit chain
86 csa_cond.cmd RSIM command file w/ exhastive stimulus87 csa_cond.mag CSA conditional input section
88 csa_first.mag CSA first cell in multi-bit chain
89 csa_last.mag CSA last cell in multi-bit chain
90 csa_mid.cmd RSIM command file w/ exhastive stimulus
91 csa_mid.mag CSA middle cell in multi-bit chain
92 csa_wire.mag Used in CSA_32
93 fa.cmd RSIM command file w/ exhastive stimulus
94 fa.mag Full adder CPL style
95 fa_cmos.mag Full adder CMOS style
96 fa_tg.cmd RSIM command file w/ exhastive stimulus
97 fa_tg.mag Full adder w/ 1 level deep TG style for ppfa cell
98 fa_tg2.mag Full adder w/ 1 level deep TG style for W sum
99 invchain.mag Single-rail to double-rail inverter chain
100 invtop.mag Top row inverter chains for X and W
101 mcell.cmd RSIM command file w/ exhastive stimulus102 mcell.mag Multiplier cell (ppmuxfa and wiring)
103 mult_head.cmd Header file for RSIM (no test cases)
104 mult_add.cmd RSIM file with random tests
105 mult_add.mag 16x16 Booth multiplier with 32-bit accumulate
106 mult_add_head.cmd RSIM file header
107 multgen.cc C++ program to generate "mult_add" test cases
108 ppmux.cmd RSIM command file w/ exhastive stimulus
109 ppmux.mag TG style partial product mux
110 ppmuxfa.mag ppmux with full adder (fa_tg)
111 rwire.mag Wiring channel and inverters to drive CSA
-
7/30/2019 10.1.1.1.4951
27/112
CMPE223 Booth Multiplier Marc Mosko
The top-level cell is mult_add.mag. This cell has some glue wiring and all the raw input/output.
The X input is via the cell invtop[15:0]/X_H. The W input connects directly to the wires
Wn_H, where n ranges from 15 to 31 and to the cells invtop[14:0]/X_H. The Y input connects
directly to the wires Yn_H, where n ranges from 0 to 15. The output Z connects to the Sn_H
outputs of various CSA cells. The OVRFLW output connects to ovrflw_0/ovrflw_h.
The X and W[14:0] inputs pass through the cell array invtop. These are inverter chains along
the top of the multiplier to generate the proper drive for the long X wires. The W inverts are
small, since those signals only drive the adder in the top row of the multiplier. The X signals
must drive about 0.450 pF. The cell invtop connects directly to the multiplier array cells, mcell.
The Y input connects to the cell bth along the left side of the multiplier. The bth cell produces
the 5-bit one-hot Booth encoding of the Y word [3]. The bth cell also computes the sign
propagation [3]. There are three Y inputs per bth cell, with one input common between two
cells. Each bth cell generates a double-rail Y signal with a small inverter chain. The output of
-
7/30/2019 10.1.1.1.4951
28/112
CMPE223 Booth Multiplier Marc Mosko
The main array cell is mcell. It contains three components: ppmux, fa_tg, and wroute. Ppmux
is a pass-logic multiplexer to select the proper X input based on the Booth encoding for the row
[3]. The cell fa_tg is a double-rail transmission-gate based full adder [3]. It calculates the sum
and carry in parallel. There are four output inverters for the sum and carry-out. We added four
input inverters for the B_H/B_L inputs, one pair of inverts for each of the carry and sum logic.
We found there was too much back-pressure from the transmission gates and it caused
uncertainty in RSIM about who was driving whom. Wroute is a wire channel routing cell to
pass horizontal and vertical signals. The sum out connects two columns to the right while the
carry-out connects one column to the right. The X signals pass directly down.
Along the right side of the mcell array is a column of addcell. Addcell checks the rows Booth
encoding and generates a double-rail 0 or 1 output [3]. If the Booth encoding is negative, it
generates the 1 output. The cell also passes the sum and carry outputs from mcell through to the
next column. Addcell connects to a column of rwire, which is a vertical wiring channel to
connect Addcell to the fast adder in the right hand column. Rwire has a pair of inverters to drive
-
7/30/2019 10.1.1.1.4951
29/112
CMPE223 Booth Multiplier Marc Mosko
The basic CSA blocks are csa_cond, csa_first, csa_mid, and csa_last [4]. Csa_cond is a sub-
component of the other three. It is a double-rail pass-transistor mux to compute the conditional
sum and carry bits. One must always use csa_first and csa_last. For a three or more bit adder,
one inserts the necessary number of csa_mid cells. We created three adder sizes, csa_2, csa_4
(and csa_4b), and csa_8. Each of these cells has a 2-inverter driver chain for the double-rail
carry-in input. This is necessary, since load varies widely between the three cells. The RSIM
estimates are 0.081pF, 0.133pF, and 0.243pF for the 2, 4, and 8-bit cells (see the Optimization
section below). The cells csa_2 and csa_4 are designed for use along the right side of the
multiplier. The cells csa_4b and csa_8 are designed for the bottom of the multiplier.
We had to make many substantial changes to the CSA designs in [1,2,4]. The original designes
used extensive pass-logic. RSIM showed many unknown errors in our original layouts. We
corrected some by inserting intermediate inverters. Other errors, which we originally thought
were problems with RSIM and pass logic, ended up being insufficient 1 drive from fa_tg for
-
7/30/2019 10.1.1.1.4951
30/112
CMPE223 Booth Multiplier Marc Mosko
The bottom row of mcell connects downward to a row of bwire, a wire routing channel. Below
the channel is a row of fa_tg2. These full adders sum the carry-out, sum-out, and W values for
each output bit. The output of the full adders then passes through the wiring chennel broute and
drives the bottom 16-bits of CSA adder. The last 16-bits of CSA adder are made up of a 4-4-8
design using csa_4b and csa_8.
1. capm2a .00003 ; 2nd metal cap -- area, pf/sq-micron2. capm2p .00020 ; 2nd metal cap -- perimeter, pf/micron3. capma .00006 ; 1st metal cap -- area, pf/sq-micron4. capmp .00020 ; 1st metal cap -- perimeter, pf/micron5. cappa .00005 ; poly cap -- area, pf/sq-micron6. cappp .00020 ; poly cap -- perimeter, pf/micron
7. capda .00030 ; n-diffusion cap -- area, pf/sq-micron8. capdp .00040 ; n-diffusion cap -- perimeter, pf/micron9. cappda .00050 ; p-diffusion cap -- area, pf/sq-micron10. cappdp .00040 ; p-diffusion cap -- perimeter, pf/micron
11. capga .00215 ; gate cap -- area, pf/sq-micron12. lambda 0.3 ; microns/lambda
13. lowthresh 0.4 ; logic low threshold as a normalized voltage14. highthresh 0.6 ; logic high threshold as a normalized voltage
15. cntpullup 016. diffperim 017. subparea 018. diffext 0
-
7/30/2019 10.1.1.1.4951
31/112
CMPE223 Booth Multiplier Marc Mosko
shown below. We used also used SPICE parameters to calculate the gate capacitance. Items 13
18 above were left as-is from the original PRM file. Items 19 26 came from MOSIS.
We calculated the gate capacitance and drain capacitance following the SPICE calculations
presented in [5, pp. 188ff]. Gate capacitance has two components, the intrinsic and extrinsic,
which are summed for the total. oxginC W L C = and
2gso gdo gbogex W LC W C C C + += . The parameters for the gate-source, gate-drain, and
gate-body capacitances came from the Hspice parameters. They are, respectively, 1.93 x10-10
F/m, 1.93x10-10 F/m, and 1.00 x10-9 F/m. The gate oxide thickness is 1.38x10-6 m. Since RSIM
uses a unit measurement per area, we set W and L to 1 . The drain capacitance is given by the
following, where CJ, VJ, PB, MJ, CJSW, and MJSW are SPICE parameters. Their values are
4.22E-4, 2.5, 0.984, 3.49E-10, 1.20E-1. We used an area of 1 and a perimeter of 4.
1 1MJ MJSW
j
VJ VJ Area CJ Perim CJSW
PB PBC
= + + +
-
7/30/2019 10.1.1.1.4951
32/112
CMPE223 Booth Multiplier Marc Mosko
RSIM. Long wires, such as the booth-encoded selectors, could range between 0.5 pF and 0.6 pF.
We generally fixed n based on layout considerations.
When generating double-rail signals from single-rail inputs, we usually use 2-inverter/3-inverter
trees or 3-inverter/4-inverter trees. Sometimes this was sub-optimal, since we used fewer but
larger inverters based on layout restrictions. The layout restrictions came from the standard cell
size we selected early in the design process.
The CSA adder is designed as a 2-2-4-4-4-8-8 chain, based on [2]. Using Magic estimates of
input capacitance for the carry-in, we designed an input driver for each of csa_2, csa_4, and
csa_8 to optimize the performance of each element. The component csa_last generates the
car_h, car_l carry outputs with a 6/6 inverter that then drives a 3/3 transmission gate for the
carry select. Thus, csa_last has low drive ability.
From Magic, the input capacitances of csa 2, csa 4, csa 8 are, respectively, 0.081pf, 0.133pf,
-
7/30/2019 10.1.1.1.4951
33/112
CMPE223 Booth Multiplier Marc Mosko
The 2-2-4-4-4-8-8 design assumes that all inputs arrive at the same time. In our multiplier case,
that is not true. The input to the first 8 bit adder actually arrives last. One might experiment
with different designs, such as 2-2-4-8-2-2-4-8.
We found that the ff output of the cell bth drove about 0.139 pF but only had a 12/16 output
inverter. We redesigned it as a 2-inverter chain of 4/6 and 12/18. Using a 4/6 rather than a 3/5
reduced the size of the second inverter by 2 . Going from a 28 of input capacitance down to
10 also helped. This one change improved performance by approximately 15% overall.
-
7/30/2019 10.1.1.1.4951
34/112
CMPE223 Booth Multiplier Marc Mosko
References
1. Abu-Khater, I.S.; Bellaouar, A.; Elmasry, M.I.; Yan, R.H., Circuit/architecture
for low-power high-performance 32-bit adder, Fifth Great Lakes Symposium on
VLSI, Buffalo, NY, USA, 16-18, March 1995 pp.74-7.
2. Abu-Khater, I.S.; Yan, R.H.; Bellaouar, A.; Elmasry, M.I., A 1-V low-power high-
performance 32-bit conditional sum adder, Symposium on Low Power
Electronics. Digest of Technical Papers, San Diego, CA, USA, 10-12 Oct. 1994,
pp.66-7.
3. Abu-Khater, I.S.; Yan, R.H.; Bellaouar, A.; Elmasry, M.I., Circuit Techniques for
CMOS Low-Power High-Performance Multipliers, IEEE Journal of Solid-State
Circuits, v. 31, no. 10, Oct 1996, pp. 1535 1546.
4. Bellaouar, A. and M.I. Elmasry, Low-Power Digital VLSI Design. Circuits and
Systems, Kluwer Academic Publishers, Boston: 1995.
5. Weste, N.H.E. and K. Eshraghian, Principles of CMOS VLSI Design. A systems
-
7/30/2019 10.1.1.1.4951
35/112
CMPE223 Booth Multiplier Marc Mosko
VHDL Source Code
Addce l l . vhd1. ------------------------------------------------------------------------2. -- Add Cell from "Low-power Digital VLSI Design" by3. -- Bellaouar and Elmasry.4. -- Returns 1 if Booth encoding is negative else 05. ------------------------------------------------------------------------6. library IEEE;7. use IEEE.std_logic_1164.all;8. use work.bth_types.all;
9.10. entity addcell is11. port (bth : in std_ulogic_vector(4 downto 0);12. sum : out std_ulogic);13. end addcell;14.15.16. -- description of adder using concurrent signal assignments17. architecture rtl of addcell is18. begin19. sum
-
7/30/2019 10.1.1.1.4951
36/112
CMPE223 Booth Multiplier Marc Mosko
Adder .vhd
1. ------------------------------------------------------------------------2. -- Single-bit adder3. ------------------------------------------------------------------------4.5. library IEEE, adk;6. use IEEE.std_logic_1164.all;7.8. entity adder is9. port ( a_h : in std_ulogic;10. b_h : in std_ulogic;11. c_h : in std_ulogic;
12. sum_h : out std_ulogic;13. car_h : out std_ulogic);14. end adder;15.16. architecture rtl of adder is17.18.19. component fadd1 is20. port (21. A : in STD_LOGIC;
22. B : in STD_LOGIC;23. CI : in STD_LOGIC;24. S : out STD_LOGIC;25. CO : out STD_LOGIC26. );27. end component;28.29. signal a : std_logic;30. signal b : std_logic;31. signal c : std_logic;32. signal s : std_logic;
33. signal t : std_logic;34.35. begin36. a
-
7/30/2019 10.1.1.1.4951
37/112
CMPE223 Booth Multiplier Marc Mosko
61. sum_h : out std_ulogic_vector(N downto 1);62. car_h : out std_ulogic);63. end adderN;
64.65. -- structural implementation of the N-bit adder66. architecture ripple of adderN is67. component adder68. port (a_h : in std_ulogic;69. b_h : in std_ulogic;70. c_h : in std_ulogic;71. sum_h : out std_ulogic;72. car_h : out std_ulogic);73. end component;74.
75. signal carry : std_ulogic_vector(0 to N);76. begin77. carry(0) b_h(I),
85. c_h => carry(I - 1),86. sum_h => sum_h(I),87. car_h => carry(I));88. end generate;89. end ripple;
-
7/30/2019 10.1.1.1.4951
38/112
CMPE223 Booth Multiplier Marc Mosko
Booth .vhd
1. ------------------------------------------------------------------------2. -- Constants used by Booth functions3. ------------------------------------------------------------------------4. library IEEE;5. use IEEE.std_logic_1164.all;6.7. package bth_types is8. constant bth_m1 : integer := 4;9. constant bth_m2 : integer := 3;10. constant bth_p2 : integer := 2;11. constant bth_p1 : integer := 1;
12. constant bth_z0 : integer := 0;13. end bth_types;14.15.16. ------------------------------------------17. -- Booth encoder for row j18. ------------------------------------------19. library IEEE;20. use IEEE.std_logic_1164.all;21. use work.bth_types.all;
22.23. entity booth_encode is24. port( in_h : in std_ulogic_vector (2 downto 0);25. bth_h : out std_ulogic_vector (4 downto 0));26. end booth_encode;27.28. architecture rtl of booth_encode is29. begin30. -- input "in_h" is Y(2i+1) Y(2i) Y(2i-1) MSB order31. -- See bth.vhd for booth types32. bth_h
-
7/30/2019 10.1.1.1.4951
39/112
CMPE223 Booth Multiplier Marc Mosko
61. end rtl;62.
-
7/30/2019 10.1.1.1.4951
40/112
CMPE223 Booth Multiplier Marc Mosko
claN.vhd
63. ------------------------------------------------------------------------64. -- N-bit Carry-Lookahead adder65. -- The width of the adder is determined by generic N66. -- From Altera examples67. ------------------------------------------------------------------------68. library IEEE;69. use IEEE.std_logic_1164.all;70. use work.adder;71.72. entity claN is73. generic(N : positive);
74. port (a_h : in std_ulogic_vector(N-1 downto 0);75. b_h : in std_ulogic_vector(N-1 downto 0);76. c_h : in std_ulogic;77. sum_h : out std_ulogic_vector(N-1 downto 0);78. car_h : out std_ulogic);79. end claN;80.81. architecture behavioral of claN is82. signal h_sum : std_ulogic_vector(N-1 downto 0);83. signal car_gen : std_ulogic_vector(N-1 downto 0);
84. signal car_prop : std_ulogic_vector(N-1 downto 0);85. signal car_intern : std_ulogic_vector(N-1 downto 1);86.87. begin88. h_sum
-
7/30/2019 10.1.1.1.4951
41/112
CMPE223 Booth Multiplier Marc Mosko
123. architecture behavioral of plusN is124. signal x : std_logic_vector(N-1 downto 0);125. signal y : std_logic_vector(N-1 downto 0);
126.127. signal w : std_logic_vector(N-1 downto 0);128. signal z : std_logic_vector(N-1 downto 0);129. signal a : signed (N-1 downto 0);130. signal b : signed (N-1 downto 0);131. signal c : signed (N-1 downto 0);132. signal s : signed (N-1 downto 0);133.134. signal t4_h : std_ulogic;135. signal t5_h : std_ulogic;136. begin
137. x
-
7/30/2019 10.1.1.1.4951
42/112
CMPE223 Booth Multiplier Marc Mosko
186. signal w : std_logic_vector(N downto 0);187. signal z : std_logic_vector(N downto 0);188. signal a : unsigned (N downto 0);
189. signal b : unsigned (N downto 0);190. signal c : unsigned (N downto 0);191. signal s : unsigned (N downto 0);192.193. begin194. x(N-1 downto 0)
-
7/30/2019 10.1.1.1.4951
43/112
CMPE223 Booth Multiplier Marc Mosko
dr iverN.vhd
1. ------------------------------------------------------------------------2. -- N-bit driver3. ------------------------------------------------------------------------4. library IEEE;5. use IEEE.std_logic_1164.all;6.7. entity buf is8. port ( signal Q : out std_ulogic;9. signal D : in std_ulogic);10. end buf;11.
12. architecture behavior of buf is13. begin14. Q
-
7/30/2019 10.1.1.1.4951
44/112
CMPE223 Booth Multiplier Marc Mosko
la tch.vhd
1. ------------------------------------------------------------------------2. -- N-bit LATCH with reset3. -- The width of the latch is determined by generic N4. ------------------------------------------------------------------------5.6. library IEEE;7. use IEEE.std_logic_1164.all;8.9. entity dffr_fall is10. port ( Rst : in std_ulogic;11. Clk : in std_ulogic;
12. signal D : in std_ulogic;13. signal Q : out std_ulogic);14. end dffr_fall;15.16. architecture behavior of dffr_fall is17. begin18. process(Rst, Clk, D)19. begin20. if Rst = '1' then21. Q
-
7/30/2019 10.1.1.1.4951
45/112
CMPE223 Booth Multiplier Marc Mosko
61. signal D : in std_ulogic;62. signal Q : out std_ulogic);63. end dffr_rise;
64.65. architecture behavior of dffr_rise is66. begin67. process(Rst, Clk, D)68. begin69. if Rst = '1' then70. Q
-
7/30/2019 10.1.1.1.4951
46/112
CMPE223 Booth Multiplier Marc Mosko
124. end component;125.126. begin
127. gen: for j in 0 to N-1 generate128. dffgen: dffr_fall port map (Rst=> Rst, Clk=> Clk, D=> D(j), Q=> Q(j));129. end generate;130. end behavior;131.132. ------------------------------------------------------133.134. library IEEE;135. use IEEE.std_logic_1164.all;136.137. entity dffrN_rise is
138. generic(N : positive);139. port ( Rst : in std_ulogic;140. Clk : in std_ulogic;141. signal D : in std_ulogic_vector(N-1 downto 0);142. signal Q : out std_ulogic_vector(N-1 downto 0));143. end dffrN_rise;144.145. architecture behavior of dffrN_rise is146. component dffr_rise is147. port ( Rst : in std_ulogic;148. Clk : in std_ulogic;149. signal D : in std_ulogic;150. signal Q : out std_ulogic);151. end component;152.153. begin154. gen: for j in 0 to N-1 generate155. dffgen: dffr_rise port map (Rst=> Rst, Clk=> Clk, D=> D(j), Q=> Q(j));156. end generate;157. end behavior;158.
159. library IEEE;160. use IEEE.std_logic_1164.all;161.162. entity latchr is163 t ( R t i td l i
-
7/30/2019 10.1.1.1.4951
47/112
CMPE223 Booth Multiplier Marc Mosko
187. Clk : in std_ulogic;188. signal D : in std_ulogic_vector(N-1 downto 0);189. signal Q : out std_ulogic_vector(N-1 downto 0));
190. end latchrN;191.192. architecture behavior of latchrN is193. component latchr is194. port ( Rst : in std_ulogic;195. Clk : in std_ulogic;196. signal D : in std_ulogic;197. signal Q : out std_ulogic);198. end component;199.200. signal my_clk : std_logic_vector(N/8 downto 0);
201. signal my_rst : std_logic_vector(N/8 downto 0);202.203. begin204. process (Clk)205. begin206. clk_buf: for i in 0 to N/8 LOOP207. my_clk(i) my_clk(j/8), D=> D(j),
Q=> Q(j));220. end generate;
221. end behavior;222.223. ------------------------------------------------------------------------224. -- N-bit dff with reset : NON-TRANSPARENT ON GATED BUFFER225 Th idth f th dff i d t i d b i N
-
7/30/2019 10.1.1.1.4951
48/112
CMPE223 Booth Multiplier Marc Mosko
249. begin250. dff: latchr port map ( Rst=> Rst, Clk=> Clk, D=> D, Q=> w );251. Q
-
7/30/2019 10.1.1.1.4951
49/112
CMPE223 Booth Multiplier Marc Mosko
mul t . vhd1. ------------------------------------------------------------------------2. -- N-bit multiplier Multiplier3. -- This is a phi-2 device.4. --5. -- BusIO_S2H is the pad i/o bus6. -- Ovrflw_s2h is the overflow output. Should be made an InOut for carryin7. -- BusSEL_S2H is a chip select, encoded active high8. -- BS_S2H is the input select (bus high, mult low)9. -- RW_S2H is the Read/Write select (read high, write low)10. -- ME_S2H is the Multiplier Enable11. -- Rst_S2H is a reset signal. It is clocked with PHI_2 to ensure
12. -- that it does not muck with stuff when it is not supposed to13. -- Reset is immediate. There is no 1 cycle delay, like14. -- with regular signals.15. ------------------------------------------------------------------------16. library IEEE;17. use IEEE.std_logic_1164.all;18. --use work.converts.all;19.20. entity mult is21. port ( BusIO_S2H : inout std_logic_vector(15 downto 0);22. Ovrflw_S2H : out std_ulogic;23. BusSEL_S2H : in std_ulogic_vector(2 downto 0);24. BS_S2H : in std_ulogic;25. RW_S2H : in std_ulogic;26. ME_S2H : in std_ulogic;27. Rst_S2H : in std_ulogic;28. PHI_1H : in std_ulogic;29. PHI_2H : in std_ulogic);30. end mult;31.32. architecture structural of mult is
33.34. -- A multiplier register of width N35. component multregn is36. generic(N : positive );37 t ( B OUT S2H t td l i t (N 1 d t 0)
-
7/30/2019 10.1.1.1.4951
50/112
CMPE223 Booth Multiplier Marc Mosko
61. component mult_pipe is62. port( z_v2h : out std_ulogic_vector(7 downto 0);63. a_v2h : out std_ulogic_vector(23 downto 0);64. b_v2h : out std_ulogic_vector(23 downto 0);65. c_v2h : out std_ulogic;66. ovrflw_v2h : out std_ulogic_vector(2 downto 0);67. medly_s2h : out std_ulogic;68. x_s2h : in std_ulogic_vector(15 downto 0);69. y_s2h : in std_ulogic_vector(15 downto 0);70. w_s2h : in std_ulogic_vector(31 downto 0);71. me_s2h : in std_ulogic;72. PHI_1H : in std_ulogic;73. PHI_2H : in std_ulogic;74. Rst_s2h : in std_ulogic
75. );76. end component;77.78. -- single bit D flip flop79. component dffr_fall is80. port ( Rst : in std_ulogic;81. Clk : in std_ulogic;82. signal D : in std_ulogic;83. signal Q : out std_ulogic);84. end component;85.86. component buf is87. port ( Q : out std_ulogic;88. D : in std_ulogic);89. end component;90.91. -- Buses to/from the multiplier from the registers92. signal bus_x : std_ulogic_vector(15 downto 0);93. signal bus_y : std_ulogic_vector(15 downto 0);94. signal bus_w : std_ulogic_vector(31 downto 0);95. signal bus_z : std_ulogic_vector(31 downto 0);
96.97. -- wiring from multiplier to CLA unit98. signal bus_a : std_ulogic_vector(23 downto 0);99. signal bus_b : std_ulogic_vector(23 downto 0);100 i l b td l i
-
7/30/2019 10.1.1.1.4951
51/112
CMPE223 Booth Multiplier Marc Mosko
124.125. -- temporary signals used to compute overflow126. signal t1_h : std_ulogic;127. signal t2_h : std_ulogic;128. signal t3_h : std_ulogic;129. signal t4_h : std_ulogic;130. signal t5_h : std_ulogic;131.132. -- outputs from regsiters133. signal feed_r0 : std_ulogic_vector(15 downto 0);134. signal feed_r1 : std_ulogic_vector(15 downto 0);135. signal feed_r2 : std_ulogic_vector(15 downto 0);136. signal feed_r3 : std_ulogic_vector(15 downto 0);137. signal feed_r4 : std_ulogic_vector(15 downto 0);
138. signal feed_r5 : std_ulogic_vector(15 downto 0);139.140. -- buffered clocks141. signal phi_a_1h : std_ulogic_vector(6 downto 0);142. signal phi_a_2h : std_ulogic_vector(6 downto 0);143.144. begin145. ---------------------------------------------------------------146. -- Decode the input register select147. bus_sel_h
-
7/30/2019 10.1.1.1.4951
52/112
CMPE223 Booth Multiplier Marc Mosko
181. port map ( BusOUT_S2H => feed_r0,182. BusIN_S2H => To_StdULogicVector(busio_s2h),183. MultOut_S2H => bus_x,184. MultIn_V2H => Gnd_16,185. Sel_s2h => bus_sel_h(0),186. BS_S2H => Vdd,187. RW_S2H => RW_S2H,188. MEDLY_S2H => MEDLY_Q2H,189. RST_S2H => RST_S2H,190. PHI_1H => PHI_1H,191. PHI_2H => PHI_2H);192.193. -- R1 is the Y register194. -- R1 never reads from the multiplier (BS = Vdd, MultIn = GND)
195. reg_1: multregN196. generic map (16)197. port map ( BusOUT_S2H => feed_r1,198. BusIN_S2H => To_StdULogicVector(busio_s2h),199. MultOut_S2H => bus_y,200. MultIn_V2H => Gnd_16,201. Sel_s2h => bus_sel_h(1),202. BS_S2H => Vdd,203. RW_S2H => RW_S2H,204. MEDLY_S2H => MEDLY_Q2H,205. RST_S2H => RST_S2H,206. PHI_1H => PHI_1H,207. PHI_2H => PHI_2H);208.209. -- R2 is the W(31:16) register210. -- R2 never reads from the multiplier (BS = Vdd, MultIn = GND)211. reg_2: multregN212. generic map (16)213. port map ( BusOUT_S2H => feed_r2,214. BusIN_S2H => To_StdULogicVector(busio_s2h),215. MultOut_S2H => bus_w(31 downto 16),
216. MultIn_V2H => Gnd_16,217. Sel_s2h => bus_sel_h(2),218. BS_S2H => Vdd,219. RW_S2H => RW_S2H,220 MEDLY S2H > MEDLY Q2H
-
7/30/2019 10.1.1.1.4951
53/112
CMPE223 Booth Multiplier Marc Mosko
244. generic map (16)245. port map ( BusOUT_S2H => feed_r4,246. BusIN_S2H => To_StdULogicVector(busio_s2h),247. MultIn_V2H => bus_z(31 downto 16),248. Sel_s2h => bus_sel_h(4),249. BS_S2H => BS_S2H,250. RW_S2H => RW_S2H,251. MEDLY_S2H => MEDLY_Q2H,252. RST_S2H => RST_S2H,253. PHI_1H => PHI_1H,254. PHI_2H => PHI_2H);255.256. -- R5 is the Z(15:0) register257. -- R4 & R5 have no MultOut connections
258. reg_5: multregN259. generic map (16)260. port map ( BusOUT_S2H => feed_r5,261. BusIN_S2H => To_StdULogicVector(busio_s2h),262. MultIn_V2H => bus_z(15 downto 0),263. Sel_s2h => bus_sel_h(5),264. BS_S2H => BS_S2H,265. RW_S2H => RW_S2H,266. MEDLY_S2H => MEDLY_Q2H,267. RST_S2H => RST_S2H,268. PHI_1H => PHI_1H,269. PHI_2H => PHI_2H);270.271. ---------------------------------------------------------------272. -- Storage for the Overflow output273. ---------------------------------------------------------------274. Rst_q2h ovrflw_v2h, Clk=> MEDLY_Q2H, Rst=> Rst_q2h);
279.280. -- allows us to monitor ovrflw_s2h without using a buffered I/O pin281. ovrflw_s2h
-
7/30/2019 10.1.1.1.4951
54/112
CMPE223 Booth Multiplier Marc Mosko
307.308. cla_0 : mult_cla309. generic map (24)310. port map (311. z_v2h => bus_z(31 downto 8),312. car_v2h => car_out,313. a_v2h => bus_a,314. b_v2h => bus_b,315. c_v2h => bus_c316. );317.318. ---------------------------------------------------------------319. -- Compute the overflow320. -- An overflow is defined when
321. -- 1) x*y > 0 and w > 0 and z < 0 or322. -- 2) x*y < 0 and w < 0 and z > 0323. --324. ----------------------------------------------------------------325.326. t1_h
-
7/30/2019 10.1.1.1.4951
55/112
CMPE223 Booth Multiplier Marc Mosko
mul t_c la .vhd1. ------------------------------------------------------------------------2. -- 24-bit CLA as separate entity for synthesis3. --4. ------------------------------------------------------------------------5.6. library IEEE;7. use IEEE.std_logic_1164.all;8.9. entity mult_cla is10. generic (N : positive );11.
12. port( z_v2h : out std_ulogic_vector(N-1 downto 0);13. car_v2h : out std_ulogic;14. a_v2h : in std_ulogic_vector(N-1 downto 0);15. b_v2h : in std_ulogic_vector(N-1 downto 0);16. c_v2h : in std_ulogic17. );18. end mult_cla;19.20. architecture rtl of mult_cla is21. component plusN is22. generic( N : positive);23. port ( a_h : in std_ulogic_vector(N-1 downto 0);24. b_h : in std_ulogic_vector(N-1 downto 0);25. c_h : in std_ulogic;26. sum_h : out std_ulogic_vector(N-1 downto 0);27. car_h : out std_ulogic);28. end component;29.30. component claN is31. generic( N : positive);32. port ( a_h : in std_ulogic_vector(N-1 downto 0);
33. b_h : in std_ulogic_vector(N-1 downto 0);34. c_h : in std_ulogic;35. sum_h : out std_ulogic_vector(N-1 downto 0);36. car_h : out std_ulogic);37 d t
CMPE223 Booth Multiplier Marc Mosko
-
7/30/2019 10.1.1.1.4951
56/112
CMPE223 Booth Multiplier Marc Mosko
December 1, 2000 Page 54
mul t_p ipe .vhd1. ------------------------------------------------------------------------2. -- Booth encoded carry-save-adder array3. --4. -- From "Low-power Digital VLSI Design" by Bellaouar and Elmasry.
5. -- and6. -- "Circuit Techniques for CMOS Low-Power High-Performance Multipliers"7. -- by Abu-Khater, Bellaouar, Elmasry in IEEE J. Solid-State Circuits v.31 (10)8. -- Oct 1996 pp. 1535ff9. --10. -- z_v2h Multiply accumulate output (x * y + w) (only low-order 8 bits)11. -- a_v2h goes to fast adder for high-order 24-bits12. -- b_v2h13. -- c_v2h14. -- ovrflow_v2h 3-bits to compute overflow (w[31] x[31] y[31])15. -- medly_s2h Output good at end of phase (see me_s2h, this is delayed)16. -- x_s2h multiplicand17. -- y_s2h multiplier (gets booth encoded)18. -- w_s2h accumulate19. -- me_s2h multiplier enable20. -- PHI_1H clock21. -- PHI_2H clock22. -- Rst_s2h Reset internal registers to 023. --24. -- The Y inputs are booth encoded then gated until ME_S2H & PHI_2H.25. -- The Y inputs should be applied first to give the booth encoders time26. -- to settle. The Y inputs must remain valid until MEDLY_S2H (actually27. -- until a 1/2 cycle before...)28. ------------------------------------------------------------------------29. ------------------------------------------------------------------------30. -- Variables are generally named as follows:31. -- name_PtCl
32. --33. -- P = pipe line stage (1, 2, or 3)34. -- t = type (s,q,v)35. -- C = clock phase (1 or 2)36. -- l = logic (L or H)37. --38. -- examples:39. -- sum_0_1v2h = row 0 sum 1st pipe stage, V timing, Phi-2, active high40. --41. -- Rules:42. -- Variables can only be assigned if P and C the same:43. -- x_1v2h
-
7/30/2019 10.1.1.1.4951
57/112
CMPE223 Booth Multiplier Marc Mosko
December 1, 2000 Page 55
44. --45. -- To go between phases/stages you need to use a storage device:46. --47. -- gdffr_fall(Q=> x_2v1h, D=> x_1v2h, Clk=> mdly_q2h, Enable=> mdly_q1h)48. -- This clocks in x_1v2h on mdly_q2h and49. -- enables the output to x_2v1h on mdly_q1h50. --
51. ------------------------------------------------------------------------52. library IEEE;53. use IEEE.std_logic_1164.all;54. --use work.converts.all;55.56. -- We use a fixed width / height for simplicity.57. -- Overflow = x*y + w out of range58.59. entity mult_pipe is60. port( z_v2h : out std_ulogic_vector(7 downto 0);61. a_v2h : out std_ulogic_vector(23 downto 0);62. b_v2h : out std_ulogic_vector(23 downto 0);63. c_v2h : out std_ulogic;64. ovrflw_v2h : out std_ulogic_vector(2 downto 0);65. medly_s2h : out std_ulogic;66. x_s2h : in std_ulogic_vector(15 downto 0);67. y_s2h : in std_ulogic_vector(15 downto 0);68. w_s2h : in std_ulogic_vector(31 downto 0);69. me_s2h : in std_ulogic;70. PHI_1H : in std_ulogic;71. PHI_2H : in std_ulogic;72. Rst_s2h : in std_ulogic73. );74. end mult_pipe;75.76. architecture rtl of mult_pipe is77. constant COL : integer := 16;
78. constant ROW : integer := 8;79.80. -- AddCell will add a 0/1 to each row depending on the sign81. -- of the booth encoding.82. component addcell is83. port ( bth : in std_ulogic_vector(4 downto 0);84. sum : out std_ulogic);85. end component;86.87. -- A standard full adder88. component adder is89. port ( a_h : in std_ulogic;
CMPE223 Booth Multiplier Marc Mosko
-
7/30/2019 10.1.1.1.4951
58/112
CMPE223 Booth Multiplier Marc Mosko
December 1, 2000 Page 56
90. b_h : in std_ulogic;91. c_h : in std_ulogic;92. sum_h : out std_ulogic;93. car_h : out std_ulogic);94. end component;95.96. -- unsigned addition
97. component uplusN is98. generic( N : positive);99. port ( a_h : in std_ulogic_vector(N-1 downto 0);100. b_h : in std_ulogic_vector(N-1 downto 0);101. c_h : in std_ulogic;102. sum_h : out std_ulogic_vector(N-1 downto 0);103. car_h : out std_ulogic);104. end component;105.106. -- A standard full adder 15 bits wide107. component adderN is108. generic( N : positive);109. port ( a_h : in std_ulogic_vector(N-1 downto 0);110. b_h : in std_ulogic_vector(N-1 downto 0);111. c_h : in std_ulogic;112. sum_h : out std_ulogic_vector(N-1 downto 0);113. car_h : out std_ulogic);114. end component;115.116. -- Generate a 5-line demultiplexed booth encoding of 3 input bits117. component booth_encode is118. port( in_h : in std_ulogic_vector (2 downto 0);119. bth_h : out std_ulogic_vector (4 downto 0));120. end component;121.122. -- Partial product generator with full adder123. -- Has only SUM (and carry) out
124. component ppfa is125. port ( bth : in std_ulogic_vector(4 downto 0);126. x1_h : in std_ulogic;127. x2_h : in std_ulogic;128. s0_h : in std_ulogic;129. c0_h : in std_ulogic;130. sum_h : out std_ulogic;131. ca1_h : out std_ulogic);132. end component;133.134. -- Partial product generator with full adder135. -- Has both PP out and SUM (and carry) out
CMPE223 Booth Multiplier Marc Mosko
-
7/30/2019 10.1.1.1.4951
59/112
CMPE223 Booth Multiplier Marc Mosko
December 1, 2000 Page 57
136. component ppfapp is137. port ( bth : in std_ulogic_vector(4 downto 0);138. x1_h : in std_ulogic;139. x2_h : in std_ulogic;140. s0_h : in std_ulogic;141. c0_h : in std_ulogic;142. pp_h : out std_ulogic;
143. sum_h : out std_ulogic;144. ca1_h : out std_ulogic);145. end component;146.147. -- Sign extender. Computes sign bits to pass to next row.148. -- Adds 2 bits per row. "ff" is the "flag" bit.149. component sgn is150. port ( pp_h : in std_ulogic;151. ff_h : in std_ulogic;152. pp_out_h: out std_ulogic;153. ff_out_h: out std_ulogic);154. end component;155.156. -- D flip flop with reset157. component dffr_fall is158. port ( Rst : in std_ulogic;159. Clk : in std_ulogic;160. signal D : in std_ulogic;161. signal Q : out std_ulogic);162. end component;163.164. component dffrN_fall is165. generic(N : positive );166. port ( Rst : in std_ulogic;167. Clk : in std_ulogic;168. signal D : in std_ulogic_vector(N-1 downto 0);169. signal Q : out std_ulogic_vector(N-1 downto 0));
170. end component;171.172. -- a gated flipflop173. component gdffr_fall is174. port ( Rst : in std_ulogic;175. Clk : in std_ulogic;176. Enable : in std_ulogic;177. signal D : in std_ulogic;178. signal Q : out std_ulogic);179. end component;180.181. -- an N-bit gated flipflop
CMPE223 Booth Multiplier Marc Mosko
-
7/30/2019 10.1.1.1.4951
60/112
CMPE223 Booth Multiplier Marc Mosko
December 1, 2000 Page 58
182. component gdffrN_fall is183. generic(N : positive );184. port ( Rst : in std_ulogic;185. Clk : in std_ulogic;186. Enable : in std_ulogic;187. signal D : in std_ulogic_vector(N-1 downto 0);188. signal Q : out std_ulogic_vector(N-1 downto 0));
189. end component;190.191. -- These are the outputs from the sign extenders192. -- one for each row193. -- pp15 is the pp output of the 15th column of each row194. -- we need 9 sets of wires since we have inputs to row 0 and outputs from row 7195.196. -- v2 signals in 1st pipe stage, v1 signals in 2nd197. -- (pp1 = 1st stage, pp2 = 2nd, pp3 = 3rd)198.199. -- There is some overlap here, since in PHI2 we generate pp1_v2h(4) which200. -- is then latech to PHI1201. signal pp_1v2h : std_ulogic_vector(4 downto 0);202. signal ff_1v2h : std_ulogic_vector(4 downto 0);
203. signal pp15_1v2h: std_ulogic_vector(4 downto 0);204.205. signal pp_2v1h : std_ulogic_vector(8 downto 4);206. signal ff_2v1h : std_ulogic_vector(8 downto 4);207. signal pp15_2v1h: std_ulogic_vector(7 downto 4);208.209. signal pp_3v2h : std_ulogic_vector(8 downto 8);210.211. -- each row has an output from the addcell212. signal add_1v2h : std_ulogic_vector(3 downto 0);213. signal add_2v1h : std_ulogic_vector(7 downto 0);214.215. -- these are a cycle later
216. signal add_3v2h : std_ulogic_vector(7 downto 4);217.218. -- each row gets own array. Don't try 2-dimension array.219. -- sum_x_h is the sum output of each column in row X.220. -- ca1_x_h is the carry output of each column in row X.221. -- pre_A_h is the booth encoding for row A before the gate222. -- bth_A_h is the booth encoding for row A after the gate223.224. -- The V2H signals are outputs from the multiplier body225. -- the S1H signals are outputs from the 1st pipeline registers226. -- the V1H signals are outputs from the 1st pipeline gates227.
CMPE223 Booth Multiplier Marc Mosko
-
7/30/2019 10.1.1.1.4951
61/112
CMPE223 Booth Multiplier Marc Mosko
December 1, 2000 Page 59
228. signal sum_0_1v2h : std_ulogic_vector(COL downto 0);229. signal car_0_1v2h : std_ulogic_vector(COL downto 0);230. signal sum_0_2v1h : std_ulogic_vector(1 downto 0);231. signal car_0_2v1h : std_ulogic;232. signal bth_pre_0_h : std_ulogic_vector(4 downto 0);233. signal bth_0_1v2h : std_ulogic_vector(4 downto 0);234.
235. signal sum_1_1v2h : std_ulogic_vector(COL downto 0);236. signal car_1_1v2h : std_ulogic_vector(COL downto 0);237. signal sum_1_2v1h : std_ulogic_vector(1 downto 0);238. signal car_1_2v1h : std_ulogic;239. signal bth_pre_1_h : std_ulogic_vector(4 downto 0);240. signal bth_1_1v2h : std_ulogic_vector(4 downto 0);241.242. signal sum_2_1v2h : std_ulogic_vector(COL downto 0);243. signal car_2_1v2h : std_ulogic_vector(COL downto 0);244. signal sum_2_2v1h : std_ulogic_vector(1 downto 0);245. signal car_2_2v1h : std_ulogic;246. signal bth_pre_2_h : std_ulogic_vector(4 downto 0);247. signal bth_2_1v2h : std_ulogic_vector(4 downto 0);248.
249. signal sum_3_1v2h : std_ulogic_vector(COL downto 0);250. signal car_3_1v2h : std_ulogic_vector(COL downto 0);251. signal sum_3_2v1h : std_ulogic_vector(COL downto 0);252. signal car_3_2v1h : std_ulogic_vector(COL downto 0);253. signal bth_pre_3_h : std_ulogic_vector(4 downto 0);254. signal bth_3_1v2h : std_ulogic_vector(4 downto 0);255.256. -- The V1H signals are outputs from the multiplier body257. -- the S2H signals are outputs from the 2st pipeline registers258. -- the V2H signals are outputs from the 2st pipeline gates259.260. signal sum_4_2v1h : std_ulogic_vector(COL downto 0);261. signal car_4_2v1h : std_ulogic_vector(COL downto 0);
262. signal sum_4_3v2h : std_ulogic_vector(1 downto 0);263. signal car_4_3v2h : std_ulogic;264. signal bth_pre_4_h : std_ulogic_vector(4 downto 0);265. signal bth_4_2v1h : std_ulogic_vector(4 downto 0);266.267. signal sum_5_2v1h : std_ulogic_vector(COL downto 0);268. signal car_5_2v1h : std_ulogic_vector(COL downto 0);269. signal sum_5_3v2h : std_ulogic_vector(1 downto 0);270. signal car_5_3v2h : std_ulogic;271. signal bth_pre_5_h : std_ulogic_vector(4 downto 0);272. signal bth_5_2v1h : std_ulogic_vector(4 downto 0);273.
CMPE223 Booth Multiplier Marc Mosko
-
7/30/2019 10.1.1.1.4951
62/112
CMPE223 Booth Multiplier Marc Mosko
December 1, 2000 Page 60
274. signal sum_6_2v1h : std_ulogic_vector(COL downto 0);275. signal car_6_2v1h : std_ulogic_vector(COL downto 0);276. signal sum_6_3v2h : std_ulogic_vector(1 downto 0);277. signal car_6_3v2h : std_ulogic;278. signal bth_pre_6_h : std_ulogic_vector(4 downto 0);279. signal bth_6_2v1h : std_ulogic_vector(4 downto 0);280.
281. signal sum_7_2v1h : std_ulogic_vector(COL downto 0);282. signal car_7_2v1h : std_ulogic_vector(COL downto 0);283. signal sum_7_3v2h : std_ulogic_vector(COL downto 0);284. signal car_7_3v2h : std_ulogic_vector(COL downto 0);285. signal bth_pre_7_h : std_ulogic_vector(4 downto 0);286. signal bth_7_2v1h : std_ulogic_vector(4 downto 0);287.288. -- The first 15 bits go into a full adder array.289. -- The last 17 bits go into a 42 compressor array with W()290. --291. -- These are the a_h() and b_h() inputs and the carry output292. signal fa_a_2v1h : std_ulogic_vector(7 downto 0);293. signal fa_b_2v1h : std_ulogic_vector(7 downto 0);294. signal fa_car_2v1h : std_ulogic;
295.296. -- these feed the 24-bit CLA297. -- fa_a_3 is (32 - 8) to accomodate an extra carry bit that we do not use298. signal fa_a_3v2h : std_ulogic_vector(32 downto 8);299. signal fa_b_3v2h : std_ulogic_vector(31 downto 8);300. signal fa1_car_3v2h : std_ulogic;301.302. -- The carry outputs of bit 16's compressor (no longer use 4:2 compressors, but303. - the name is the same...)304. --signal comp_ca1_3v2h: std_ulogic;305. --signal comp_ca2_3v2h: std_ulogic;306.307. -- b input and Carry outputs of the 42 compressor array
308. -- cout_out_h is the output of the 42 compressors (since z_v2h309. -- is not inout or buffered) no longer use 42 compressors, but name is the same.310. signal comp_b_3v2h : std_ulogic_vector(15 downto 0);311. signal comp_out_3v2h: std_ulogic_vector(31 downto 0);312.313. -- some miscellaneous signals used to compute the overflow314.315. constant GND : std_ulogic := '0';316. constant VDD : std_ulogic := '1';317.318. -- a modified version of x_s2h to align with the times 2 needed for booth319. -- Use a tempx as the bit-sliced version then assign whole to myx_v2h
CMPE223 Booth Multiplier Marc Mosko
-
7/30/2019 10.1.1.1.4951
63/112
C 3 oo u p e c os o
December 1, 2000 Page 61
320. -- A ModelSim technote said this was the way to do it....321. -- We need to pad with a "0" on the right and duplicate x_s2h(15) on left322. -- myx_v2h is also gated on ME_Q2H323. -- myx_s1h/v1h is latched/gated on MDLY_Q1H in the 2nd pipeline stage324. signal myx_1v2h : std_ulogic_vector(COL+1 downto 0);325. signal myx_2v1h : std_ulogic_vector(COL+1 downto 0);326.
327. signal myx_3v2h: std_ulogic; -- needed in 3rd pipeline stage328.329. signal myy_2s1h: std_ulogic_vector(COL-1 downto 7);330. signal myy_3v2h: std_ulogic; -- needed in 3rd pipeline stage331.332. signal tempx : std_ulogic_vector(COL+1 downto 0);333.334. -- The W signal is gated in three places.335. -- W[14:0] is gated on ME_Q2H336. -- W[15] is gated on MDLY_Q1H337. -- W[31:16] is gated on MDLY_Q2H338. -- the array indicies are to keep them the same as w_s2h339.340. signal w_1v2h : std_ulogic_vector(31 downto 0);
341. signal w_2v1h : std_ulogic_vector(31 downto 15);342. signal w_3v2h : std_ulogic_vector(31 downto 15);343.344.345. -- a temp signal array for the Y input to row 0 booth encoder.346. signal y0_in : std_ulogic_vector(2 downto 0);347.348. -- timing signals for pipeline registers and gates349. signal me_1q2h : std_ulogic;350. signal me_2s1h : std_ulogic;351. signal me_2q1h : std_ulogic;352. signal me_3s2h : std_ulogic;353. signal me_3q2h : std_ulogic;
354.355. -- Internally guarded RESET on PHI_2356. signal rst_q2h : std_ulogic;357.358. -- The 1st 8 bits of z are generated in the 2nd pipeline stage359. signal z_2v1h : std_ulogic_vector(7 downto 0);360.361. signal DBG_EN : std_ulogic := '0';362.363. begin364.365. -- Generate the internal reset signal
CMPE223 Booth Multiplier Marc Mosko
-
7/30/2019 10.1.1.1.4951
64/112
p
December 1, 2000 Page 62
366. rst_q2h rst_q2h);373. dff_clk1: dffr_fall port map( D => me_2s1h, Q=> me_3s2h, CLK=> phi_1h, Rst=> rst_q2h);374.375. medly_s2h w_s2h(31 downto 15), Clk=> me_1q2h, Rst=> Rst_q2h);384.385. wlatch_2: dffrN_fall generic map(17)
386. port map (Q=> w_3v2h, D=> w_2v1h(31 downto 15), Clk=> me_2q1h, Rst=> Rst_q2h);387.388. -- ff_h(0) is always 0389. ff_1v2h(0)
-
7/30/2019 10.1.1.1.4951
65/112
p
December 1, 2000 Page 63
411. ----------------------------------------------------------------412.413. ----------------------------------------------------------------414. -- 1) Generate the sign extender cells, one cell per row415. ----------------------------------------------------------------416. -- There is one sign cell per row417. COLGEN1: for i in 0 to 3 generate
418. sgncell : sgn port map( pp_h => pp15_1v2h(i), ff_h => ff_1v2h(i),419. pp_out_h => pp_1v2h(i+1), ff_out_h => ff_1v2h(i+1) );420. end generate;421.422. pipe_pp2: gdffr_fall port map ( Q=> pp_2v1h(4), D=> pp_1v2h(4),423. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);424. pipe_ff2: gdffr_fall port map ( Q=> ff_2v1h(4), D=> ff_1v2h(4),425. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);426.427. COLGEN2: for i in 4 to 7 generate428. sgncell : sgn port map( pp_h => pp15_2v1h(i), ff_h => ff_2v1h(i),429. pp_out_h => pp_2v1h(i+1), ff_out_h => ff_2v1h(i+1) );430. end generate;431.
432. pipe_pp3: gdffr_fall port map ( Q=> pp_3v2h(8), D=> pp_2v1h(8),433. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);434. --pipe_ff3: gdffr_fall port map ( Q=> ff_3v2h(8), D=> ff_2v1h(8),435. -- Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);436.437.438. ----------------------------------------------------------------439. -- 2) The booth encoders, one cell per row440. ----------------------------------------------------------------441. -- Generate each Booth encoders, one per row. Note that row 0 is special442. -- and pads a "0" as LSB.443. y0_in(2 downto 1) bth_pre_0_h);447. bth_1 : booth_encode port map ( in_h => y_s2h(3 downto 1), bth_h => bth_pre_1_h);448. bth_2 : booth_encode port map ( in_h => y_s2h(5 downto 3), bth_h => bth_pre_2_h);449. bth_3 : booth_encode port map ( in_h => y_s2h(7 downto 5), bth_h => bth_pre_3_h);450.451. -- Delay y_s2h(15 downto 7) until stage 2452.453. bth_4 : booth_encode port map ( in_h => myy_2s1h(9 downto 7), bth_h => bth_pre_4_h);454. bth_5 : booth_encode port map ( in_h => myy_2s1h(11 downto 9), bth_h => bth_pre_5_h);455. bth_6 : booth_encode port map ( in_h => myy_2s1h(13 downto 11), bth_h => bth_pre_6_h);456. bth_7 : booth_encode port map ( in_h => myy_2s1h(15 downto 13), bth_h => bth_pre_7_h);
CMPE223 Booth Multiplier Marc Mosko
-
7/30/2019 10.1.1.1.4951
66/112
p
December 1, 2000 Page 64
457.458. -- Pass the booth encoding through the gated drivers459. --bth_0_1v2h '0');460. --bth_1_1v2h '0');461. --bth_2_1v2h '0');462. --bth_3_1v2h '0');463. --bth_4_2v1h '0');
464. --bth_5_2v1h '0');465. --bth_6_2v1h '0');466. --bth_7_2v1h '0');467. bth_0_1v2h Rst_q2h);482.483. ----------------------------------------------------------------484. -- 3) The add cells, one per row485. ----------------------------------------------------------------486. -- The Add Cells get mixedup on the indicies, since booth encoding is487. -- not a row array. Easiest to just declare each out outside a generate loop488. addcell_0 : addcell port map ( bth => bth_0_1v2h, sum => add_1v2h(0) );489. addcell_1 : addcell port map ( bth => bth_1_1v2h, sum => add_1v2h(1) );490. addcell_2 : addcell port map ( bth => bth_2_1v2h, sum => add_1v2h(2) );
491. addcell_3 : addcell port map ( bth => bth_3_1v2h, sum => add_1v2h(3) );492. addcell_4 : addcell port map ( bth => bth_4_2v1h, sum => add_2v1h(4) );493. addcell_5 : addcell port map ( bth => bth_5_2v1h, sum => add_2v1h(5) );494. addcell_6 : addcell port map ( bth => bth_6_2v1h, sum => add_2v1h(6) );495. addcell_7 : addcell port map ( bth => bth_7_2v1h, sum => add_2v1h(7) );496.497. -- Delay the first 4 to 2nd stage498. gadd1: gdffrN_fall generic map(4)499. port map( Q=> add_2v1h(3 downto 0), D=>add_1v2h,500. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);501.502. -- Delay the last 4 to 3nd stage
CMPE223 Booth Multiplier Marc Mosko
-
7/30/2019 10.1.1.1.4951
67/112
December 1, 2000 Page 65
503. gadd2: gdffrN_fall generic map(4)504. port map( Q=> add_3v2h(7 downto 4), D=>add_2v1h(7 downto 4),505. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);506.507. ----------------------------------------------------------------508. -- 4) The Multiplier body, 16 columns by 8 rows509. ----------------------------------------------------------------
510. -- i is the column511. ROWGEN: for i in 0 to COL generate512. -- for the PPFA cells, columns 0 to 14 use regular PPFA cells513. -- column 15 uses the PPFAPP cell which has a tap on the PP output of514. -- the mux. This is needed to do the sign extension.515. --516. -- So, ppfa_0(5), for example, would be column 5 of row 0517.518. -- The first 15 columns get sum/carry inputs from previous row519. -- Columns 15 and 16 get special wiring from the sign extenders520. -- Column 16 also uses the PPFAPP cells521.522. G0: if( i < COL-1 ) generate523. -- Row 0 is special and gets W() inputs
524. ppfa_0: ppfa port map( bth => bth_0_1v2h,525. x1_h => myx_1v2h(i+1),526. x2_h => myx_1v2h(i),527. s0_h => w_1v2h(i),528. c0_h => GND,529. sum_h => sum_0_1v2h(i),530. ca1_h => car_0_1v2h(i));531.532. -- All other rows get s0_h from 2 columns left and533. -- c0_h from 1 column left from the previous row.534.535. ppfa_1: ppfa port map( bth => bth_1_1v2h,536. x1_h => myx_1v2h(i+1),537. x2_h => myx_1v2h(i),538. s0_h => sum_0_1v2h(i+2),539. c0_h => car_0_1v2h(i+1),540. sum_h => sum_1_1v2h(i),541. ca1_h => car_1_1v2h(i));542.543. ppfa_2: ppfa port map( bth => bth_2_1v2h,544. x1_h => myx_1v2h(i+1),545. x2_h => myx_1v2h(i),546. s0_h => sum_1_1v2h(i+2),547. c0_h => car_1_1v2h(i+1),548. sum_h => sum_2_1v2h(i),
CMPE223 Booth Multiplier Marc Mosko
-
7/30/2019 10.1.1.1.4951
68/112
December 1, 2000 Page 66
549. ca1_h => car_2_1v2h(i));550.551. ppfa_3: ppfa port map( bth => bth_3_1v2h,552. x1_h => myx_1v2h(i+1),553. x2_h => myx_1v2h(i),554. s0_h => sum_2_1v2h(i+2),555. c0_h => car_2_1v2h(i+1),
556. sum_h => sum_3_1v2h(i),557. ca1_h => car_3_1v2h(i));558.559. p00_sum1 : gdffr_fall port map560. ( Q=> sum_3_2v1h(i), D=> sum_3_1v2h(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);561.562. p00_car1 : gdffr_fall port map563. ( Q=> car_3_2v1h(i), D=> car_3_1v2h(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);564.565. -- use the value before the tri-state566. p00_x1 : gdffr_fall port map567. ( Q=> myx_2v1h(i), D=> tempx(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);568.569. ppfa_4: ppfa port map( bth => bth_4_2v1h,
570. x1_h => myx_2v1h(i+1),571. x2_h => myx_2v1h(i),572. s0_h => sum_3_2v1h(i+2),573. c0_h => car_3_2v1h(i+1),574. sum_h => sum_4_2v1h(i),575. ca1_h => car_4_2v1h(i));576.577. ppfa_5: ppfa port map( bth => bth_5_2v1h,578. x1_h => myx_2v1h(i+1),579. x2_h => myx_2v1h(i),580. s0_h => sum_4_2v1h(i+2),581. c0_h => car_4_2v1h(i+1),582. sum_h => sum_5_2v1h(i),583. ca1_h => car_5_2v1h(i));584.585. ppfa_6: ppfa port map( bth => bth_6_2v1h,586. x1_h => myx_2v1h(i+1),587. x2_h => myx_2v1h(i),588. s0_h => sum_5_2v1h(i+2),589. c0_h => car_5_2v1h(i+1),590. sum_h => sum_6_2v1h(i),591. ca1_h => car_6_2v1h(i));592.593. ppfa_7: ppfa port map( bth => bth_7_2v1h,594. x1_h => myx_2v1h(i+1),
CMPE223 Booth Multiplier Marc Mosko
-
7/30/2019 10.1.1.1.4951
69/112
December 1, 2000 Page 67
595. x2_h => myx_2v1h(i),596. s0_h => sum_6_2v1h(i+2),597. c0_h => car_6_2v1h(i+1),598. sum_h => sum_7_2v1h(i),599. ca1_h => car_7_2v1h(i));600.601. p00_sum2 : gdffr_fall port map
602. ( Q=> sum_7_3v2h(i), D=> sum_7_2v1h(i), Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);603.604. p00_car2 : gdffr_fall port map605. ( Q=> car_7_3v2h(i), D=> car_7_2v1h(i), Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);606.607. end generate G0;608.609. -- In column 15, the s0_h input is the "pp" output of the sign extender610. -- pp_h() is indexed by row number.611.612. G15: if( i = COL-1 ) generate613. ppfa15_0: ppfa port map( bth => bth_0_1v2h,614. x1_h => myx_1v2h(i+1),615. x2_h => myx_1v2h(i),
616. s0_h => GND,617. c0_h => GND,618. sum_h => sum_0_1v2h(i),619. ca1_h => car_0_1v2h(i));620.621. ppfa15_1: ppfa port map( bth => bth_1_1v2h,622. x1_h => myx_1v2h(i+1),623. x2_h => myx_1v2h(i),624. s0_h => pp_1v2h(1),625. c0_h => car_0_1v2h(i+1),626. sum_h => sum_1_1v2h(i),627. ca1_h => car_1_1v2h(i));628.629. ppfa15_2: ppfa port map( bth => bth_2_1v2h,630. x1_h => myx_1v2h(i+1),631. x2_h => myx_1v2h(i),632. s0_h => pp_1v2h(2),633. c0_h => car_1_1v2h(i+1),634. sum_h => sum_2_1v2h(i),635. ca1_h => car_2_1v2h(i));636.637. ppfa15_3: ppfa port map( bth => bth_3_1v2h,638. x1_h => myx_1v2h(i+1),639. x2_h => myx_1v2h(i),640. s0_h => pp_1v2h(3),
CMPE223 Booth Multiplier Marc Mosko
-
7/30/2019 10.1.1.1.4951
70/112
December 1, 2000 Page 68
641. c0_h => car_2_1v2h(i+1),642. sum_h => sum_3_1v2h(i),643. ca1_h => car_3_1v2h(i));644.645. p15_sum1 : gdffr_fall port map646. ( Q=> sum_3_2v1h(i), D=> sum_3_1v2h(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);647.
648. p15_car1 : gdffr_fall port map649. ( Q=> car_3_2v1h(i), D=> car_3_1v2h(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);650.651. -- use value before the tri-state (don't use myx_1v2h)652. p15_x1 : gdffr_fall port map653. ( Q=> myx_2v1h(i), D=> tempx(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);654.655. ppfa15_4: ppfa port map( bth => bth_4_2v1h,656. x1_h => myx_2v1h(i+1),657. x2_h => myx_2v1h(i),658. s0_h => pp_2v1h(4),659. c0_h => car_3_2v1h(i+1),660. sum_h => sum_4_2v1h(i),661. ca1_h => car_4_2v1h(i));
662.663. ppfa15_5: ppfa port map( bth => bth_5_2v1h,664. x1_h => myx_2v1h(i+1),665. x2_h => myx_2v1h(i),666. s0_h => pp_2v1h(5),667. c0_h => car_4_2v1h(i+1),668. sum_h => sum_5_2v1h(i),669. ca1_h => car_5_2v1h(i));670.671. ppfa15_6: ppfa port map( bth => bth_6_2v1h,672. x1_h => myx_2v1h(i+1),673. x2_h => myx_2v1h(i),674. s0_h => pp_2v1h(6),675. c0_h => car_5_2v1h(i+1),676. sum_h => sum_6_2v1h(i),677. ca1_h => car_6_2v1h(i));678.679. ppfa15_7: ppfa port map( bth => bth_7_2v1h,680. x1_h => myx_2v1h(i+1),681. x2_h => myx_2v1h(i),682. s0_h => pp_2v1h(7),683. c0_h => car_6_2v1h(i+1),684. sum_h => sum_7_2v1h(i),685. ca1_h => car_7_2v1h(i));686. p15_sum2 : gdffr_fall port map
CMPE223 Booth Multiplier Marc Mosko
-
7/30/2019 10.1.1.1.4951
71/112
December 1, 2000 Page 69
687. ( Q=> sum_7_3v2h(i), D=> sum_7_2v1h(i), Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);688.689. p15_car2 : gdffr_fall port map690. ( Q=> car_7_3v2h(i), D=> car_7_2v1h(i), Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);691.692. end generate G15;693.
694. -- In column 16, the s0_h input is the "ff" output of the sign extender695. -- The c0_h input is 0.696.697. G16: if( i = COL ) generate698. ppfapp_0: ppfapp port map( bth => bth_0_1v2h,699. x1_h => myx_1v2h(i+1),700. x2_h => myx_1v2h(i),701. s0_h => GND,702. c0_h => GND,703. pp_h => pp15_1v2h(0),704. sum_h => sum_0_1v2h(i),705. ca1_h => car_0_1v2h(i));706.707. ppfapp_1: ppfapp port map( bth => bth_1_1v2h,
708. x1_h => myx_1v2h(i+1),709. x2_h => myx_1v2h(i),710. s0_h => ff_1v2h(1),711. c0_h => GND,712. pp_h => pp15_1v2h(1),713. sum_h => sum_1_1v2h(i),714. ca1_h => car_1_1v2h(i));715.716. ppfapp_2: ppfapp port map( bth => bth_2_1v2h,717. x1_h => myx_1v2h(i+1),718. x2_h => myx_1v2h(i),719. s0_h => ff_1v2h(2),720. c0_h => GND,721. pp_h => pp15_1v2h(2),722. sum_h => sum_2_1v2h(i),723. ca1_h => car_2_1v2h(i));724.725. ppfapp_3: ppfapp port map( bth => bth_3_1v2h,726. x1_h => myx_1v2h(i+1),727. x2_h => myx_1v2h(i),728. s0_h => ff_1v2h(3),729. c0_h => GND,730. pp_h => pp15_1v2h(3),731. sum_h => sum_3_1v2h(i),732. ca1_h => car_3_1v2h(i));
CMPE223 Booth Multiplier Marc Mosko
-
7/30/2019 10.1.1.1.4951
72/112
December 1, 2000 Page 70
733.734. p16_sum1 : gdffr_fall port map735. ( Q=> sum_3_2v1h(i), D=> sum_3_1v2h(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);736.737. p16_car1 : gdffr_fall port map738. ( Q=> car_3_2v1h(i), D=> car_3_1v2h(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);739.
740. -- don't use myx_1v2h, use tempx from before the tristate741. p16_x1 : gdffr_fall port map742. ( Q=> myx_2v1h(i), D=> tempx(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);743.744. ppfapp_4: ppfapp port map( bth => bth_4_2v1h,745. x1_h => myx_2v1h(i+1),746. x2_h => myx_2v1h(i),747. s0_h => ff_2v1h(4),748. c0_h => GND,749. pp_h => pp15_2v1h(4),750. sum_h => sum_4_2v1h(i),751. ca1_h => car_4_2v1h(i));752.753. ppfapp_5: ppfapp port map( bth => bth_5_2v1h,
754. x1_h => myx_2v1h(i+1),755. x2_h => myx_2v1h(i),756. s0_h => ff_2v1h(5),757. c0_h => GND,758. pp_h => pp15_2v1h(5),759. sum_h => sum_5_2v1h(i),760. ca1_h => car_5_2v1h(i));761.762. ppfapp_6: ppfapp port map( bth => bth_6_2v1h,763. x1_h => myx_2v1h(i+1),764. x2_h => myx_2v1h(i),765. s0_h => ff_2v1h(6),766. c0_h => GND,767. pp_h => pp15_2v1h(6),768. sum_h => sum_6_2v1h(i),769. ca1_h => car_6_2v1h(i));770.771. ppfapp_7: ppfapp port map( bth => bth_7_2v1h,772. x1_h => myx_2v1h(i+1),773. x2_h => myx_2v1h(i),774. s0_h => ff_2v1h(7),775. c0_h => GND,776. pp_h => pp15_2v1h(7),777. sum_h => sum_7_2v1h(i),778. ca1_h => car_7_2v1h(i));
CMPE223 Booth Multiplier Marc Mosko
-
7/30/2019 10.1.1.1.4951
73/112
December 1, 2000 Page 71
779. p16_sum2 : gdffr_fall port map780. ( Q=> sum_7_3v2h(i), D=> sum_7_2v1h(i), Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);781.782. p16_car2 : gdffr_fall port map783. ( Q=> car_7_3v2h(i), D=> car_7_2v1h(i), Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);784.785. end generate G16;
786.787. end generate;788.789. -- need to latch bit 17 of "myx", since that is not in the generates above790. glatch_x17 : gdffr_fall port map ( Q=> myx_2v1h(17), D=> myx_1v2h(17),791. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);792.793. -- need bit 16 (=x_in(15)) in 3rd pipeline stage for overflow794. glatch_x15 : gdffr_fall port map ( Q=> myx_3v2h, D=> myx_2v1h(16),795. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);796.797. -- These are the tri-state latched outputs going to the adder798. gsum_0_2: gdffrN_fall generic map (2) port map ( Q=> sum_0_2v1h, D=> sum_0_1v2h(1 downto 0),799. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);
800. gsum_1_2: gdffrN_fall generic map (2) port map ( Q=> sum_1_2v1h, D=> sum_1_1v2h(1 downto 0),801. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);802. gsum_2_2: gdffrN_fall generic map (2) port map ( Q=> sum_2_2v1h, D=> sum_2_1v2h(1 downto 0),803. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);804.805. gca1_0_2: gdffr_fall port map ( Q=> car_0_2v1h, D=> car_0_1v2h(0),806. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);807. gca1_1_2: gdffr_fall port map ( Q=> car_1_2v1h, D=> car_1_1v2h(0),808. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);809. gca1_2_2: gdffr_fall port map ( Q=> car_2_2v1h, D=> car_2_1v2h(0),810. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);811.812. gsum_4_3: gdffrN_fall generic map (2) port map ( Q=> sum_4_3v2h, D=> sum_4_2v1h(1 downto 0),813. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);814. gsum_5_3: gdffrN_fall generic map (2) port map ( Q=> sum_5_3v2h, D=> sum_5_2v1h(1 downto 0),815. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);816. gsum_6_3: gdffrN_fall generic map (2) port map ( Q=> sum_6_3v2h, D=> sum_6_2v1h(1 downto 0),817. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);818.819. gca1_4_3: gdffr_fall port map ( Q=> car_4_3v2h, D=> car_4_2v1h(0),820. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);821. gca1_5_3: gdffr_fall port map ( Q=> car_5_3v2h, D=> car_5_2v1h(0),822. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);823. gca1_6_3: gdffr_fall port map ( Q=> car_6_3v2h, D=> car_6_2v1h(0),824. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);
CMPE223 Booth Multiplier Marc Mosko
-
7/30/2019 10.1.1.1.4951
74/112
December 1, 2