10.1.1.1.4951

download 10.1.1.1.4951

of 112

Transcript of 10.1.1.1.4951

  • 7/30/2019 10.1.1.1.4951

    1/112

    16-bit Booth Multiplier

    with 32-bit Accumulate

    Marc MoskoCMPE223 Independent Study

  • 7/30/2019 10.1.1.1.4951

    2/112

  • 7/30/2019 10.1.1.1.4951

    3/112

    CMPE223 Booth Multiplier Marc Mosko

    Table of Contents

    Introduction......................................................................................................................................3Basic Design ....................................................................................................................................4

    Performance Estimates ................................................................................................................5Booth Multiplier ..........................................................................................................................6

    VHDL Source Code.......................................................................................................................10

    Code Overview..........................................................................................................................10I/O Register Design ...................................................................................................................13

    Example Register Access ..........................................................................................................13

    Source Code...............................................................................................................................17Source Code Hierarchy..............................................................................................................18

    VHDL Code Versions................................................................................................................20Overflow Logic..............................................................................................................................22

    Magic Layout .................................................................................................................................23Design Hierarchy .......................................................................................................................24RSIM Calibration.......................................................................................................................28

    Optimization ..............................................................................................................................29References......................................................................................................................................32

    VHDL Source Code.......................................................................................................................33Addcell.vhd................................................................................................................................33Adder.vhd ..................................................................................................................................34

    Booth.vhd...................................................................................................................................36claN.vhd .....................................................................................................................................38

    driverN.vhd ................................................................................................................................41

    latch.vhd.....................................................................................................................................42mult.vhd .....................................................................................................................................47

    mult_cla.vhd ..............................................................................................................................53mult_pipe.vhd ............................................................................................................................54

  • 7/30/2019 10.1.1.1.4951

    4/112

    CMPE223 Booth Multiplier Marc Mosko

    Invchain ...................................................................................................................................101Mcell ........................................................................................................................................103

    Mcell ........................................................................................................................................104Ppmux......................................................................................................................................105

    Ppmuxfa...................................................................................................................................107Rwire........................................................................................................................................108Wiring cells (passive) ..............................................................................................................109

  • 7/30/2019 10.1.1.1.4951

    5/112

    CMPE223 Booth Multiplier Marc Mosko

    Introduction

    This report presents three main topics we investigated as part of a project to build a Booth

    encoded multiply/accumulate VLSI chip. The original scope of work included synthesizing

    VHDL code using the Mentor Graphics tools. Exemplar was the VHDL compiler. Leonardo

    Spectrum was the synthesizer. Since my team, which included Kevin Delaney, did not meet a

    Mosis deadline our chip funding was lost. Since we did not actually fabricate a chip, we cannot

    discuss the success of our results. Likewise, VHDL synthesis using the Exemplar tools was not

    very successful, so we do not discuss synthesis results except in passing. The main points we

    cover are the basic architecture, our VHDL code, and a Magic layout in place of logic synthesis.

    The work presented here, except as cited, is almost entirely my own. Teamwork with Kevin

    Delaney had some influence on the VHDL code, since he was primarily working on the synthesis

    portion of the project.

    Due to length considerations, we have not included all VHDL code or any test suites. We have

  • 7/30/2019 10.1.1.1.4951

    6/112

    CMPE223 Booth Multiplier Marc Mosko

    Basic Design

    The goal of the multiplier is to compute X[15:0] * Y[15:0] + W[31:0] = Z[31:0] and OVRFLW.

    OVRFLW is the multiply-accumulate overflow. We discuss OVRFLW in more detail below. It

    is not simply the carry-out of the final addition.

    Our multiplier is based on a booth encoded array multiplier design in [3,4]. The 32-bit adder we

    use for the final addition is from [1,2,4]. We used a Carry-Select Adder (CSA) since it has fairly

    regular layout and good performance.

    The VHDL design is a 3-stage pipeline with I/O registers and common 16-bit I/O bus. A

    complete transaction takes 7 complete cycles: load X, load Y, load W_H, load W_L, Multiply,

    read Z_H, read Z_L. Our design can pipeline the multiply with loading a value, such as the next

    operations X, so in a stream we are down to 6 cycles. The 6 or 7 cycle length is a limitation of

  • 7/30/2019 10.1.1.1.4951

    7/112

    CMPE223 Booth Multiplier Marc Mosko

    improperly sizes transistors that did not pass 1 or sometimes 0 with enough force to drive the

    whole CPL NMOS chain. [3] also uses cross-coupled minimum sized PMOS latches to restore

    the swing to output inverters. RSIM did not correctly simulate the swing restore, so we had to

    remove the cross-coupled latches.

    We have verified correct operation of both the VHDL and Magic circuits with several boundary

    cases and 10,000 random multiply/accumulates. The VHDL test cases ran through the I/O

    registers while the Magic cases were raw arithmetic computations. The Magic layout had many

    problems, particularly with the carry-select adder design, which uses pass-logic. As of the

    writing of this report, we verified 10,000 random cases on the Magic layout with one error. We

    have fixed that error, but do not have time to rerun the whole batch. It takes about 8 hours to run

    all cases (we used four machines at 2 hours each).

    Per fo rmance Est im atesBased on timing estimates from Leonardo Spectrum, we believe the VHDL system will run at

    about 48 MHz on a 2-phase clock (about 7ns per phase, 3 phases). We do not believe these

  • 7/30/2019 10.1.1.1.4951

    8/112

    CMPE223 Booth Multiplier Marc Mosko

    down to 0.1ns or less. There are a handful of 0.2ns transisitons in the critical path. The critical

    path has 61 transitions.

    Because of uncertainty in both the Leonardo timing and the RSIM calibration, it is possible that

    both results are substantially off. The section RSIM Calibration describes our approach to

    calibrating RSIM for the AMI C5N 0.5 process.

    The original fast adder in the Magic layout was a 2-2-4-4-4-8-8 CSA adder. This design, based

    on [2], assumes that all inputs arrive at about the same time. That is not the case here.

    Generally, the last bit to the adder is around Z[16], so one might wish to experiment with a 2-2-

    4-4-2-2-4-4-4 or other variant. Please see our comments on the CSA adder in the Optimization

    section below.

    We had time to try a 2-2-4-4-4-4-4-8 adder, and our maximum time dropped from 7ns to 6.1ns

    for 4EF9 x E1DC + 287CF2D0 (we have since reduced the time even further). Our intuition

  • 7/30/2019 10.1.1.1.4951

    9/112

    CMPE223 Booth Multiplier Marc Mosko

    present schematics of each component in a later section with the Magic cell layouts. For now,

    we wish to present only a high-level floorplan.

    Figure 2. shows the floorplan of a 6-bit Booth multiplier with 12-bit accumulate. It is essentially

    the same as the 16-bit multiplier. We will use the 6-bit version in our present discussion, since

    the floorplan fits on a single page. The vertical dashed lines are continuations of the X inputs.

    We used dashed lines to make it easier to see the regular wiring pattern.

    The 12-bit accumulate requires a set of full adders as shown. The first five bits of the

    accumulate, W[4:0], use the first Booth row for addition. W5 cannot be added with X5. X5 is a

    sign bit but W5 is not. Therefore, we must add W5 with an adder on the outside of the array. A

    standard array multiplier has a fast adder outside the array. Along the bottom of the array, a sum

    bit is 1i j jZ C S+= + . Sand C are the sum and carry outputs of the bottom Booth row ppfacomponents. In our case of a 6-bit multiplier, j=i-5. To add in a third bit, Wi, we use full adder

    to compute the sum S and carry C from C S W+ + We may then use a fast adder to

  • 7/30/2019 10.1.1.1.4951

    10/112

    CMPE223 Booth Multiplier Marc Mosko

    single full adders have no ripple carry, this style seems to work well.

    The design in [3,4] uses a sign extension mechanism between booth-encoded rows. There is no

    constant offset, as in the current Kestrel multiplier. The sign extension uses the partial product

    output (pp_out) from the left-most column of the previous row. The pp_out output is the output

    of the partial product mux before the adder. Using this and the ff output of the previous rows

    sgn component, the sign extender computes new outputs to carry the sign to the next row. This

    technique uses one additional column in the array.

  • 7/30/2019 10.1.1.1.4951

    11/112

    CMPE223 Booth Multiplier Marc Mosko

    fa

    p

    pfa

    x4

    d

    w4

    ppf

    a

    w0

    x0

    gnd

    add

    cell

    fa

    p

    pfa

    ppf

    a

    add

    ce

    ll

    ppfa

    pp

    fa

    ppfa

    ppfa

    pp

    fa

    ppfa

    fa

    p

    pfa

    ppf

    a

    add

    cell

    ppfa

    pp

    fa

    ppfa

    12-bitCSAAdder(1/2)

    HA

    w1

    x1

    w

    2

    x2

    w3

    x3

    w5

    FA

    w6

    12-bitCSA

    Adde

    r(1/2)

    FA

    w7

    F

    Aw8

    FA

    FA

    FA

    w9

    w10

    w11

  • 7/30/2019 10.1.1.1.4951

    12/112

    CMPE223 Booth Multiplier Marc Mosko

    VHDL Source Code

    This section presents the most recent VHDL source code for a pipelined Booth encoded

    multiplier. The code presented below may be found in the directory

    http://www.cse.ucsc.edu/~mmosko/cmpe223/report2/vhdl. There are several other versions

    under /projects/kestrel/users/mult/marc/vhdl/booth-1. The code presented here is mostly based

    on the leo directory (short for Leonardo, the synthesizer). The last part of this section makes

    brief comments on the other versions.

    We used C++ to model the Booth multiply/accumulate before writing the VHDL code. The link

    is http://www.cse.ucsc.edu/~mmosko/cmpe223/report2/cpp. There are three versions. The first

    is an 8-bit adder using 4:2 compressors. The second is a 16-bit also with 4:2 compressors. The

    third is a 16-bit with a fast adder. We shall not discuss this code any further in the interests of

    space.

  • 7/30/2019 10.1.1.1.4951

    13/112

    CMPE223 Booth Multiplier Marc Mosko

    The VHDL code mirrors the design in Fig. 4 as closely as possible. We made some abstractions.

    An abstract data type in VHDL replaces the one-hot booth encoding. This allows the synthesizer

    to use whatever technique it chooses. The adders are abstract + signs, not actual fast adder

    implementations. The synthesizer may then use whatever style is appropriate.

    Referring to Fig. 4, there are two main sections to the multiplier. The top section is the chip I/O

    consisting of six 16-bit registers, an overflow register, a 16-bit common I/O bus, and control

    signals. The bottom section is the pipelined Booth multiplier. The multiplier begins

    i hi 2 d h l d h f ll i l hi 2 W di h

    Signal Direction Purposebusio_s2h In/Out 16-bit input/output from multilier.

    ovrflw_s2h Out OVRFLW output

    bussel_s2h In 3-bit mutiplexed register selection.

    bs_s2h In Input from bus (H) or multiplier (L). Applies to Z registers.

    rw_s2h In Read from bus (H) or write to bus (L).

    me_s2h In Multiplier Enable (perform the calculation)

    rst_s2h In Reset all registers to 0.

    clk In phi_1H, phi_1L, phi_2H, phi_2L

  • 7/30/2019 10.1.1.1.4951

    14/112

    CMPE223 Booth Multiplier Marc Mosko

    The second pipeline stage consists of four more Booth encoders reading from latched Y values.

    The computation begins on phi_1 and the results are latched at the end of the phase. There is

    another array multiplier, which continues the multiplication process. There is also an unsigned

    8-bit adder to sum the results from the first pipeline section. Leonardo synthesized an Inverted

    Nibble adder. Since this addition is independent of the results in the second pipeline stage, we

    can perform this addition with little overhead.

    According to Leonardo timing estimates, the 8-bit adder is not necessarily for free. The 8-bit

    adder takes a comparable amount of time to the 4-row Booth multiplier. In fact, in some timing

    runs the 8-bit adder took longer than the multiply, indicating that it might not be a good idea to

    try the addition in this pipeline stage. Because of problems we had with the Leonardo timing

    estimates, we did not finish an analysis of this question. We would surmise that since all

    pipeline stages have the same period, the second pipeline stage with 8-bit accumulate would still

    take less time than the 24-bit accumulate in the third stage.

  • 7/30/2019 10.1.1.1.4951

    15/112

    CMPE223 Booth Multiplier Marc Mosko

    I /O Regist er Design

    The I/O registers follow the schematic

    in Fig. 3. The signal medly_s2h is

    used to clock in the value from

    multin_v2h. It only applies to the Z

    registers. The multiplier generates the

    delayed multiplier enable signal,

    medly, as part of the pipeline. The

    output signal store_s2h feeds both

    multout_s2h and busout_s2h. The tri-

    state drivers for the I/O bus are located

    in different VHDL code because of problems we had with the VHDL compiler. The register

    drives the bus when sel_s2h and not(rw_s2h) is true, otherwise it is tri-state.

    0

    1D Q

    CLK RST

    DFF

    0

    1

    bs_s2h

    multin_v2h

    busin_s2hw2_s2h store_s2h

    csel_s2h

    csel_q2h

    phi_2h

    rst_q2h

    rst_s2h

    medly_s2h

    rden_s2h

    rden_s2h

    sel_s2h

    rw_s2h

    Figure 3. Multiplier I/O Register

    Line-----A-Bus output-------Bus input--------BBB-C-D-E-F00000122 1011111001000011 1011111001000011 000 1 1 1 0

  • 7/30/2019 10.1.1.1.4951

    16/112

    CMPE223 Booth Multiplier Marc Mosko

    Column A is the expected carry out, which is set when reading from the multiplier. Bus Output

    is the expected bus output. Bus input is the external driver to the bus. Column A is ovrflw_s2h.

    Column B is the bussel_s2h signal. Column C is the bs_s2h. Column D is the rw_s2h signal.

    Note that rw_s2h only affects the Z registers. Column E is the me_s2h signal. Column F is

    rst_s2h.

    Prior to line 122, values were loaded in to the X, Y, and W registers. On line 122, we enable

    me_s2h, which latches the X, Y, and W register values in to the multiplier. Because of our I/O

    register design, we may simultaneously load a new value in to a register while reading the

    register. Line 122 loads a new value 1011111001000011 (BE43) into the X register by

    selecting register 0 via bussel_s2h and asserting rw_s2h.

    Line 123 loads a new value in to the Y register and simultaneously stores the multiplier output in

    to the Z registers. The new value 1010111101100101 (AFC5) is stored in the Y register

    by selecting register 1 via bussel s2h and asserting rw s2h. The multiplier result is stored in

  • 7/30/2019 10.1.1.1.4951

    17/112

    CMPE223 Booth Multiplier Marc Mosko

    from a D-flipflop. In our test stimulus file, lines other than 124, 125, 12a, and 12b are - dont

    care.

    Lines 126 and 127 load the W values (2D04417F). Lines 128 and 129 are similar to line 122 and

    123. They load the next X and Y value and compute BE43 * AFC5 + 2D04417F = 41B71EEE.

    We read the Z values in lines 12a and 12b.

  • 7/30/2019 10.1.1.1.4951

    18/112

    CMPE223 Booth Multiplier Marc Mosko

    in rw_s2h

    in

    reg 0x[15:0]

    reg 1y[15:0]

    reg 2w[31:16]

    reg 3w[15:0]

    reg 4Z[31:16]

    reg 5Z[15:0]

    dffOvrflw

    BUSIO_S2H[15:0]

    bussel_s2h[2:0] 3:8

    demuxin

    out

    io

    inrst_s2h

    bs_s2h

    ovrflw_s2h

    in

    in

    clk[3:0]

    clk = phi1_h/l,

    phi2_h/l

    me_s2h

    16 x 4 Booth Multiplier

    (9bits)

    Booth(20bout)

    16 x 4 Booth Multiplier

    pipeline registers

    20bout)

    pipeline

    registers

    8bunsingedadd

    pipeline

    registers

    A

    (1/3)

    y[8:0]

    y

    [1

    5:

    8]

    z[7:0]

    z[15:8]

    ovrflw

    logic

  • 7/30/2019 10.1.1.1.4951

    19/112

    CMPE223 Booth Multiplier Marc Mosko

    Source Code

    The table below lists the 59 files that are part of the VHDL code. Generally, there are three or

    four files associated with each major component. For the component foo, there would be

    foo.vhd, which is the instantiation of the entity and architecture. foo_test.vhd is a test script that

    uses foo.vhd as a component (UUT). The test script reads stimulus from foo.txt. Sometimes

    there will be a foogen.{pl|cc} to generate the stimulus.

    Fileno File Name Description1 addcell.txt Test file stimulus

    2 addcell.vhd Implements +1 when booth sign negative3 addcell_test.vhd Test script for addcell4 adder.txt Test file stimulus5 adder.vhd Single bit and N bit full adder6 adder_test.vhd

    7 adder15.txt Test file stimulus8 adder15gen.pl Generates stimulus for exhaustive 15-bit adder, incorrect carry

    out9 adderN_test.vhd A 15-bit adder test using "adder.vhd"10 adk.vhd Cell library for Leonardo11 booth.txt Test file stimulus12 booth.vhd Booth type (abstracts one-hot), booth encoder, sign propagation13 booth_test.vhd

    14 claN.vhd An n-bit carry-lookahead adder (abstract "plusN" and

  • 7/30/2019 10.1.1.1.4951

    20/112

    CMPE223 Booth Multiplier Marc Mosko

    29 modelsim.ini Ini file for Exemplar VHDL30 mult.txt Test file stimulus

    31 mult.vhd The whole multiplier with I/O registers32 mult_test.vhd Test for "mult.vhd"33 mult_test_1.vhd Test for synthesized "mult.vhd" (uses std_ulogic)34 mult_cla.vhd The CLA adder used by the multiplier35 mult_framegen Generates test cases (boundary and random)36 mult_framegen.cc C++ source code for mult_framegen37 mult_frame.tcl A timing analysis file example

    38 mult_frame.txt Test stimulus39 mult_frame.vhd The booth array multiplier and CLA adder40 mult_frame_test.vhd Test for "mult_frame"41 mult_frame_test_u.vhdTest for "mult_frame" (synthesized)42 mult_pipe.txt Test stimulus43 mult_pipe.vhd The booth array multiplier44 mult_pipe_test.vhd Old test script -- out of date

    45 multgen.cc Generates test cases for "mult.vhd"46 multreg.vhd N-bit multiplier register using "dffr_fall" and an input buffer47 multregN.txt Test stimulus for 4-bit register, non-exhaustive, out of date (for

    latch, not dff)48 multreg_test.vhd Test script for 4-bit register49 Mymake CSH script to create everything50 plusN_test.vhd Tests abstract N-bit adder

    51 pp.vhd Partial-product cells ppmux, ppfa, ppfapp)52 ppfa.txt Test cases for "ppfa"53 ppfagen.pl Generates test cases for "ppfa"54 t t T f " "

  • 7/30/2019 10.1.1.1.4951

    21/112

    CMPE223 Booth Multiplier Marc Mosko

    computes bus_wr_h[7:0] as a one-hot control signal to a set of 16-bit tri-state buffers for each

    I/O registers busout_s2h signal. Finally, the code computes the ovrflw signal.

    The component multregn instantiates an N-bit I/O register, as described above. It uses the

    components dffrN_fall (file 25) and buf (file 22). DffrN_fall is a N-bit D-flipflop with reset

    clocked on the falling edge. Buf is a 1-bit buffer. We had to play some tricks with signal

    buffering to ensure proper fan-out. Leonardo had trouble with our source code and generating

    proper fan-out. We believe the problem was that we did not follow a strict hierarchy structure of

    combinatorial logic followed by registers.

    The component mult_cla instantiates a 24-bit fast adder. It uses the component plusN (file 14).

    PlusN is an abstracted + operation in VHDL with some added logic to compute the carry. We

    used to have mult_cla and mult_pipe in the same source file as part of the same component.

    We separated them at some point because of timing simulation problems with Leonardo.

  • 7/30/2019 10.1.1.1.4951

    22/112

    CMPE223 Booth Multiplier Marc Mosko

    component sgn (file 51) implements the sign extender of Fig. 2. The components dffr_fall,

    dffrN_fall, gdffr_fall, and gdffrN_fall (all file 25) implement single bit and N-bit D-flipflops

    with reset. The g versions are gated and have a tri-state Enable input (no longer used).

    Inside mult_pipe , we used to drive each pipeline stage from a gated transparent latch. By using

    a gated latch, we could conserve power by eliminating spurious transitions while computing the

    previous pipeline stage. At the end of the first pipeline stage, for instance, we would latch the

    data at the end on phi_2 and enable the tri-state output at the beginning of phi_1. We used the

    components glatchrN , etc. When we switched to the DFF, there was no reason to continue

    using a gated version, since the flipflop is not transparent. Thus, the gdffand dffcomponents

    are identical except for an extra Enable signal that does nothing. We preserved the Enable input

    such that there were not changes to our code semantics.

    VHDL Code Vers ion s

    There are six versions of the VHDL code. The code that best synthesizes is in a directory called

    leo under /projects/kestrel/users/mult/marc/vhdl/booth-1. The leo code was the basis for the

  • 7/30/2019 10.1.1.1.4951

    23/112

    CMPE223 Booth Multiplier Marc Mosko

    double-rail nature and had better synthesis results. Kevin Delaney found a cell library for

    Leonardo, the synthesis tool. The cell library is called ADK. We began using the ADK cell

    library in the source tree adk. adk is a non-pipelined multiplier. adk-pipe is a pipelined

    multiplier. adk-pipe-cla is a pipelined multiplier with carry-look-ahead adder. We hard-coded

    the CLA structure with a behavioral description. In our final version, we steered away from

    being so specific and just use a + sign.

    We learned several things from these many versions and our efforts at synthesis. In our opinion,

    one should try to be as abstract as possible and let the synthesizer figure out the specifics. One

    must be aware of automatic register generation and what sort of statements will not synthesize.

    Apart from those concerns, we would recommend staying away from gate-level specifics. When

    one tries to enforce a specific structure, there is usually competition with the synthesizer and no

    one wins. There are directives to give the synthesizer guidelines for specific modules, but we did

    not have much success with them.

  • 7/30/2019 10.1.1.1.4951

    24/112

    CMPE223 Booth Multiplier Marc Mosko

    Overflow Logic

    A multiply-accumulate where all words

    are n-bit does not have overflow. Our

    architecture, however, does have the

    potential for overflow since the

    accumulate is twice the word size of the multiplier/multiplicand. We compute a signed overflow

    from the following two assertions for Z[m:0]=X[n:0] * Y[n:0] + W[m:0], where in our case

    n=15 and m=31. There is overflow if (1) x*y > 0, w > 0 and z

  • 7/30/2019 10.1.1.1.4951

    25/112

    CMPE223 Booth Multiplier Marc Mosko

    Magic Layout

    The table below lists the 53 files that make up the Magic layout. In general, there are three types

    of files, similar to the VHDL directory structure. For the component foo, the file foo.mag is the

    Magic cell. foo.cmd is the RSIM command file that runs a test suite. Some components will

    have a foo.{pl|c|cc} program to generate the test cases. Sometimes, there is a foo_head.cmd file

    with the header portion of the CMD file independent of the test cases. There is also a csa

    subdirectory with a VHDL model of the CSA adder. To view these files with the recompiled

    Magic, set the environment variable CAD_HOME=/projects/kestrel/users/mult/tools and

    execute Magic as magic -TSCN3ME_SUBM.30 from $CAD_HOME/bin.

    Fileno File Name Description

    60 Addcell.cmd RSIM command file w/ exhastive stimulus

    61 Addcell.mag Generates the +1 for negative Booth encoding

    62 broute.mag A wiring channel

    63 bth.cmd RSIM command file w/ exhastive stimulus

    64 bth.mag Booth encoding and sign propagation

    65 bthbuf.mag Inverter chain for booth lines

    66 bthroute.mag Wire routing for "bth" cell

    67 bwire.mag Wiring channel

  • 7/30/2019 10.1.1.1.4951

    26/112

    CMPE223 Booth Multiplier Marc Mosko

    Fileno File Name Description

    85 csa_8.mag CSA 8-bit chain

    86 csa_cond.cmd RSIM command file w/ exhastive stimulus87 csa_cond.mag CSA conditional input section

    88 csa_first.mag CSA first cell in multi-bit chain

    89 csa_last.mag CSA last cell in multi-bit chain

    90 csa_mid.cmd RSIM command file w/ exhastive stimulus

    91 csa_mid.mag CSA middle cell in multi-bit chain

    92 csa_wire.mag Used in CSA_32

    93 fa.cmd RSIM command file w/ exhastive stimulus

    94 fa.mag Full adder CPL style

    95 fa_cmos.mag Full adder CMOS style

    96 fa_tg.cmd RSIM command file w/ exhastive stimulus

    97 fa_tg.mag Full adder w/ 1 level deep TG style for ppfa cell

    98 fa_tg2.mag Full adder w/ 1 level deep TG style for W sum

    99 invchain.mag Single-rail to double-rail inverter chain

    100 invtop.mag Top row inverter chains for X and W

    101 mcell.cmd RSIM command file w/ exhastive stimulus102 mcell.mag Multiplier cell (ppmuxfa and wiring)

    103 mult_head.cmd Header file for RSIM (no test cases)

    104 mult_add.cmd RSIM file with random tests

    105 mult_add.mag 16x16 Booth multiplier with 32-bit accumulate

    106 mult_add_head.cmd RSIM file header

    107 multgen.cc C++ program to generate "mult_add" test cases

    108 ppmux.cmd RSIM command file w/ exhastive stimulus

    109 ppmux.mag TG style partial product mux

    110 ppmuxfa.mag ppmux with full adder (fa_tg)

    111 rwire.mag Wiring channel and inverters to drive CSA

  • 7/30/2019 10.1.1.1.4951

    27/112

    CMPE223 Booth Multiplier Marc Mosko

    The top-level cell is mult_add.mag. This cell has some glue wiring and all the raw input/output.

    The X input is via the cell invtop[15:0]/X_H. The W input connects directly to the wires

    Wn_H, where n ranges from 15 to 31 and to the cells invtop[14:0]/X_H. The Y input connects

    directly to the wires Yn_H, where n ranges from 0 to 15. The output Z connects to the Sn_H

    outputs of various CSA cells. The OVRFLW output connects to ovrflw_0/ovrflw_h.

    The X and W[14:0] inputs pass through the cell array invtop. These are inverter chains along

    the top of the multiplier to generate the proper drive for the long X wires. The W inverts are

    small, since those signals only drive the adder in the top row of the multiplier. The X signals

    must drive about 0.450 pF. The cell invtop connects directly to the multiplier array cells, mcell.

    The Y input connects to the cell bth along the left side of the multiplier. The bth cell produces

    the 5-bit one-hot Booth encoding of the Y word [3]. The bth cell also computes the sign

    propagation [3]. There are three Y inputs per bth cell, with one input common between two

    cells. Each bth cell generates a double-rail Y signal with a small inverter chain. The output of

  • 7/30/2019 10.1.1.1.4951

    28/112

    CMPE223 Booth Multiplier Marc Mosko

    The main array cell is mcell. It contains three components: ppmux, fa_tg, and wroute. Ppmux

    is a pass-logic multiplexer to select the proper X input based on the Booth encoding for the row

    [3]. The cell fa_tg is a double-rail transmission-gate based full adder [3]. It calculates the sum

    and carry in parallel. There are four output inverters for the sum and carry-out. We added four

    input inverters for the B_H/B_L inputs, one pair of inverts for each of the carry and sum logic.

    We found there was too much back-pressure from the transmission gates and it caused

    uncertainty in RSIM about who was driving whom. Wroute is a wire channel routing cell to

    pass horizontal and vertical signals. The sum out connects two columns to the right while the

    carry-out connects one column to the right. The X signals pass directly down.

    Along the right side of the mcell array is a column of addcell. Addcell checks the rows Booth

    encoding and generates a double-rail 0 or 1 output [3]. If the Booth encoding is negative, it

    generates the 1 output. The cell also passes the sum and carry outputs from mcell through to the

    next column. Addcell connects to a column of rwire, which is a vertical wiring channel to

    connect Addcell to the fast adder in the right hand column. Rwire has a pair of inverters to drive

  • 7/30/2019 10.1.1.1.4951

    29/112

    CMPE223 Booth Multiplier Marc Mosko

    The basic CSA blocks are csa_cond, csa_first, csa_mid, and csa_last [4]. Csa_cond is a sub-

    component of the other three. It is a double-rail pass-transistor mux to compute the conditional

    sum and carry bits. One must always use csa_first and csa_last. For a three or more bit adder,

    one inserts the necessary number of csa_mid cells. We created three adder sizes, csa_2, csa_4

    (and csa_4b), and csa_8. Each of these cells has a 2-inverter driver chain for the double-rail

    carry-in input. This is necessary, since load varies widely between the three cells. The RSIM

    estimates are 0.081pF, 0.133pF, and 0.243pF for the 2, 4, and 8-bit cells (see the Optimization

    section below). The cells csa_2 and csa_4 are designed for use along the right side of the

    multiplier. The cells csa_4b and csa_8 are designed for the bottom of the multiplier.

    We had to make many substantial changes to the CSA designs in [1,2,4]. The original designes

    used extensive pass-logic. RSIM showed many unknown errors in our original layouts. We

    corrected some by inserting intermediate inverters. Other errors, which we originally thought

    were problems with RSIM and pass logic, ended up being insufficient 1 drive from fa_tg for

  • 7/30/2019 10.1.1.1.4951

    30/112

    CMPE223 Booth Multiplier Marc Mosko

    The bottom row of mcell connects downward to a row of bwire, a wire routing channel. Below

    the channel is a row of fa_tg2. These full adders sum the carry-out, sum-out, and W values for

    each output bit. The output of the full adders then passes through the wiring chennel broute and

    drives the bottom 16-bits of CSA adder. The last 16-bits of CSA adder are made up of a 4-4-8

    design using csa_4b and csa_8.

    1. capm2a .00003 ; 2nd metal cap -- area, pf/sq-micron2. capm2p .00020 ; 2nd metal cap -- perimeter, pf/micron3. capma .00006 ; 1st metal cap -- area, pf/sq-micron4. capmp .00020 ; 1st metal cap -- perimeter, pf/micron5. cappa .00005 ; poly cap -- area, pf/sq-micron6. cappp .00020 ; poly cap -- perimeter, pf/micron

    7. capda .00030 ; n-diffusion cap -- area, pf/sq-micron8. capdp .00040 ; n-diffusion cap -- perimeter, pf/micron9. cappda .00050 ; p-diffusion cap -- area, pf/sq-micron10. cappdp .00040 ; p-diffusion cap -- perimeter, pf/micron

    11. capga .00215 ; gate cap -- area, pf/sq-micron12. lambda 0.3 ; microns/lambda

    13. lowthresh 0.4 ; logic low threshold as a normalized voltage14. highthresh 0.6 ; logic high threshold as a normalized voltage

    15. cntpullup 016. diffperim 017. subparea 018. diffext 0

  • 7/30/2019 10.1.1.1.4951

    31/112

    CMPE223 Booth Multiplier Marc Mosko

    shown below. We used also used SPICE parameters to calculate the gate capacitance. Items 13

    18 above were left as-is from the original PRM file. Items 19 26 came from MOSIS.

    We calculated the gate capacitance and drain capacitance following the SPICE calculations

    presented in [5, pp. 188ff]. Gate capacitance has two components, the intrinsic and extrinsic,

    which are summed for the total. oxginC W L C = and

    2gso gdo gbogex W LC W C C C + += . The parameters for the gate-source, gate-drain, and

    gate-body capacitances came from the Hspice parameters. They are, respectively, 1.93 x10-10

    F/m, 1.93x10-10 F/m, and 1.00 x10-9 F/m. The gate oxide thickness is 1.38x10-6 m. Since RSIM

    uses a unit measurement per area, we set W and L to 1 . The drain capacitance is given by the

    following, where CJ, VJ, PB, MJ, CJSW, and MJSW are SPICE parameters. Their values are

    4.22E-4, 2.5, 0.984, 3.49E-10, 1.20E-1. We used an area of 1 and a perimeter of 4.

    1 1MJ MJSW

    j

    VJ VJ Area CJ Perim CJSW

    PB PBC

    = + + +

  • 7/30/2019 10.1.1.1.4951

    32/112

    CMPE223 Booth Multiplier Marc Mosko

    RSIM. Long wires, such as the booth-encoded selectors, could range between 0.5 pF and 0.6 pF.

    We generally fixed n based on layout considerations.

    When generating double-rail signals from single-rail inputs, we usually use 2-inverter/3-inverter

    trees or 3-inverter/4-inverter trees. Sometimes this was sub-optimal, since we used fewer but

    larger inverters based on layout restrictions. The layout restrictions came from the standard cell

    size we selected early in the design process.

    The CSA adder is designed as a 2-2-4-4-4-8-8 chain, based on [2]. Using Magic estimates of

    input capacitance for the carry-in, we designed an input driver for each of csa_2, csa_4, and

    csa_8 to optimize the performance of each element. The component csa_last generates the

    car_h, car_l carry outputs with a 6/6 inverter that then drives a 3/3 transmission gate for the

    carry select. Thus, csa_last has low drive ability.

    From Magic, the input capacitances of csa 2, csa 4, csa 8 are, respectively, 0.081pf, 0.133pf,

  • 7/30/2019 10.1.1.1.4951

    33/112

    CMPE223 Booth Multiplier Marc Mosko

    The 2-2-4-4-4-8-8 design assumes that all inputs arrive at the same time. In our multiplier case,

    that is not true. The input to the first 8 bit adder actually arrives last. One might experiment

    with different designs, such as 2-2-4-8-2-2-4-8.

    We found that the ff output of the cell bth drove about 0.139 pF but only had a 12/16 output

    inverter. We redesigned it as a 2-inverter chain of 4/6 and 12/18. Using a 4/6 rather than a 3/5

    reduced the size of the second inverter by 2 . Going from a 28 of input capacitance down to

    10 also helped. This one change improved performance by approximately 15% overall.

  • 7/30/2019 10.1.1.1.4951

    34/112

    CMPE223 Booth Multiplier Marc Mosko

    References

    1. Abu-Khater, I.S.; Bellaouar, A.; Elmasry, M.I.; Yan, R.H., Circuit/architecture

    for low-power high-performance 32-bit adder, Fifth Great Lakes Symposium on

    VLSI, Buffalo, NY, USA, 16-18, March 1995 pp.74-7.

    2. Abu-Khater, I.S.; Yan, R.H.; Bellaouar, A.; Elmasry, M.I., A 1-V low-power high-

    performance 32-bit conditional sum adder, Symposium on Low Power

    Electronics. Digest of Technical Papers, San Diego, CA, USA, 10-12 Oct. 1994,

    pp.66-7.

    3. Abu-Khater, I.S.; Yan, R.H.; Bellaouar, A.; Elmasry, M.I., Circuit Techniques for

    CMOS Low-Power High-Performance Multipliers, IEEE Journal of Solid-State

    Circuits, v. 31, no. 10, Oct 1996, pp. 1535 1546.

    4. Bellaouar, A. and M.I. Elmasry, Low-Power Digital VLSI Design. Circuits and

    Systems, Kluwer Academic Publishers, Boston: 1995.

    5. Weste, N.H.E. and K. Eshraghian, Principles of CMOS VLSI Design. A systems

  • 7/30/2019 10.1.1.1.4951

    35/112

    CMPE223 Booth Multiplier Marc Mosko

    VHDL Source Code

    Addce l l . vhd1. ------------------------------------------------------------------------2. -- Add Cell from "Low-power Digital VLSI Design" by3. -- Bellaouar and Elmasry.4. -- Returns 1 if Booth encoding is negative else 05. ------------------------------------------------------------------------6. library IEEE;7. use IEEE.std_logic_1164.all;8. use work.bth_types.all;

    9.10. entity addcell is11. port (bth : in std_ulogic_vector(4 downto 0);12. sum : out std_ulogic);13. end addcell;14.15.16. -- description of adder using concurrent signal assignments17. architecture rtl of addcell is18. begin19. sum

  • 7/30/2019 10.1.1.1.4951

    36/112

    CMPE223 Booth Multiplier Marc Mosko

    Adder .vhd

    1. ------------------------------------------------------------------------2. -- Single-bit adder3. ------------------------------------------------------------------------4.5. library IEEE, adk;6. use IEEE.std_logic_1164.all;7.8. entity adder is9. port ( a_h : in std_ulogic;10. b_h : in std_ulogic;11. c_h : in std_ulogic;

    12. sum_h : out std_ulogic;13. car_h : out std_ulogic);14. end adder;15.16. architecture rtl of adder is17.18.19. component fadd1 is20. port (21. A : in STD_LOGIC;

    22. B : in STD_LOGIC;23. CI : in STD_LOGIC;24. S : out STD_LOGIC;25. CO : out STD_LOGIC26. );27. end component;28.29. signal a : std_logic;30. signal b : std_logic;31. signal c : std_logic;32. signal s : std_logic;

    33. signal t : std_logic;34.35. begin36. a

  • 7/30/2019 10.1.1.1.4951

    37/112

    CMPE223 Booth Multiplier Marc Mosko

    61. sum_h : out std_ulogic_vector(N downto 1);62. car_h : out std_ulogic);63. end adderN;

    64.65. -- structural implementation of the N-bit adder66. architecture ripple of adderN is67. component adder68. port (a_h : in std_ulogic;69. b_h : in std_ulogic;70. c_h : in std_ulogic;71. sum_h : out std_ulogic;72. car_h : out std_ulogic);73. end component;74.

    75. signal carry : std_ulogic_vector(0 to N);76. begin77. carry(0) b_h(I),

    85. c_h => carry(I - 1),86. sum_h => sum_h(I),87. car_h => carry(I));88. end generate;89. end ripple;

  • 7/30/2019 10.1.1.1.4951

    38/112

    CMPE223 Booth Multiplier Marc Mosko

    Booth .vhd

    1. ------------------------------------------------------------------------2. -- Constants used by Booth functions3. ------------------------------------------------------------------------4. library IEEE;5. use IEEE.std_logic_1164.all;6.7. package bth_types is8. constant bth_m1 : integer := 4;9. constant bth_m2 : integer := 3;10. constant bth_p2 : integer := 2;11. constant bth_p1 : integer := 1;

    12. constant bth_z0 : integer := 0;13. end bth_types;14.15.16. ------------------------------------------17. -- Booth encoder for row j18. ------------------------------------------19. library IEEE;20. use IEEE.std_logic_1164.all;21. use work.bth_types.all;

    22.23. entity booth_encode is24. port( in_h : in std_ulogic_vector (2 downto 0);25. bth_h : out std_ulogic_vector (4 downto 0));26. end booth_encode;27.28. architecture rtl of booth_encode is29. begin30. -- input "in_h" is Y(2i+1) Y(2i) Y(2i-1) MSB order31. -- See bth.vhd for booth types32. bth_h

  • 7/30/2019 10.1.1.1.4951

    39/112

    CMPE223 Booth Multiplier Marc Mosko

    61. end rtl;62.

  • 7/30/2019 10.1.1.1.4951

    40/112

    CMPE223 Booth Multiplier Marc Mosko

    claN.vhd

    63. ------------------------------------------------------------------------64. -- N-bit Carry-Lookahead adder65. -- The width of the adder is determined by generic N66. -- From Altera examples67. ------------------------------------------------------------------------68. library IEEE;69. use IEEE.std_logic_1164.all;70. use work.adder;71.72. entity claN is73. generic(N : positive);

    74. port (a_h : in std_ulogic_vector(N-1 downto 0);75. b_h : in std_ulogic_vector(N-1 downto 0);76. c_h : in std_ulogic;77. sum_h : out std_ulogic_vector(N-1 downto 0);78. car_h : out std_ulogic);79. end claN;80.81. architecture behavioral of claN is82. signal h_sum : std_ulogic_vector(N-1 downto 0);83. signal car_gen : std_ulogic_vector(N-1 downto 0);

    84. signal car_prop : std_ulogic_vector(N-1 downto 0);85. signal car_intern : std_ulogic_vector(N-1 downto 1);86.87. begin88. h_sum

  • 7/30/2019 10.1.1.1.4951

    41/112

    CMPE223 Booth Multiplier Marc Mosko

    123. architecture behavioral of plusN is124. signal x : std_logic_vector(N-1 downto 0);125. signal y : std_logic_vector(N-1 downto 0);

    126.127. signal w : std_logic_vector(N-1 downto 0);128. signal z : std_logic_vector(N-1 downto 0);129. signal a : signed (N-1 downto 0);130. signal b : signed (N-1 downto 0);131. signal c : signed (N-1 downto 0);132. signal s : signed (N-1 downto 0);133.134. signal t4_h : std_ulogic;135. signal t5_h : std_ulogic;136. begin

    137. x

  • 7/30/2019 10.1.1.1.4951

    42/112

    CMPE223 Booth Multiplier Marc Mosko

    186. signal w : std_logic_vector(N downto 0);187. signal z : std_logic_vector(N downto 0);188. signal a : unsigned (N downto 0);

    189. signal b : unsigned (N downto 0);190. signal c : unsigned (N downto 0);191. signal s : unsigned (N downto 0);192.193. begin194. x(N-1 downto 0)

  • 7/30/2019 10.1.1.1.4951

    43/112

    CMPE223 Booth Multiplier Marc Mosko

    dr iverN.vhd

    1. ------------------------------------------------------------------------2. -- N-bit driver3. ------------------------------------------------------------------------4. library IEEE;5. use IEEE.std_logic_1164.all;6.7. entity buf is8. port ( signal Q : out std_ulogic;9. signal D : in std_ulogic);10. end buf;11.

    12. architecture behavior of buf is13. begin14. Q

  • 7/30/2019 10.1.1.1.4951

    44/112

    CMPE223 Booth Multiplier Marc Mosko

    la tch.vhd

    1. ------------------------------------------------------------------------2. -- N-bit LATCH with reset3. -- The width of the latch is determined by generic N4. ------------------------------------------------------------------------5.6. library IEEE;7. use IEEE.std_logic_1164.all;8.9. entity dffr_fall is10. port ( Rst : in std_ulogic;11. Clk : in std_ulogic;

    12. signal D : in std_ulogic;13. signal Q : out std_ulogic);14. end dffr_fall;15.16. architecture behavior of dffr_fall is17. begin18. process(Rst, Clk, D)19. begin20. if Rst = '1' then21. Q

  • 7/30/2019 10.1.1.1.4951

    45/112

    CMPE223 Booth Multiplier Marc Mosko

    61. signal D : in std_ulogic;62. signal Q : out std_ulogic);63. end dffr_rise;

    64.65. architecture behavior of dffr_rise is66. begin67. process(Rst, Clk, D)68. begin69. if Rst = '1' then70. Q

  • 7/30/2019 10.1.1.1.4951

    46/112

    CMPE223 Booth Multiplier Marc Mosko

    124. end component;125.126. begin

    127. gen: for j in 0 to N-1 generate128. dffgen: dffr_fall port map (Rst=> Rst, Clk=> Clk, D=> D(j), Q=> Q(j));129. end generate;130. end behavior;131.132. ------------------------------------------------------133.134. library IEEE;135. use IEEE.std_logic_1164.all;136.137. entity dffrN_rise is

    138. generic(N : positive);139. port ( Rst : in std_ulogic;140. Clk : in std_ulogic;141. signal D : in std_ulogic_vector(N-1 downto 0);142. signal Q : out std_ulogic_vector(N-1 downto 0));143. end dffrN_rise;144.145. architecture behavior of dffrN_rise is146. component dffr_rise is147. port ( Rst : in std_ulogic;148. Clk : in std_ulogic;149. signal D : in std_ulogic;150. signal Q : out std_ulogic);151. end component;152.153. begin154. gen: for j in 0 to N-1 generate155. dffgen: dffr_rise port map (Rst=> Rst, Clk=> Clk, D=> D(j), Q=> Q(j));156. end generate;157. end behavior;158.

    159. library IEEE;160. use IEEE.std_logic_1164.all;161.162. entity latchr is163 t ( R t i td l i

  • 7/30/2019 10.1.1.1.4951

    47/112

    CMPE223 Booth Multiplier Marc Mosko

    187. Clk : in std_ulogic;188. signal D : in std_ulogic_vector(N-1 downto 0);189. signal Q : out std_ulogic_vector(N-1 downto 0));

    190. end latchrN;191.192. architecture behavior of latchrN is193. component latchr is194. port ( Rst : in std_ulogic;195. Clk : in std_ulogic;196. signal D : in std_ulogic;197. signal Q : out std_ulogic);198. end component;199.200. signal my_clk : std_logic_vector(N/8 downto 0);

    201. signal my_rst : std_logic_vector(N/8 downto 0);202.203. begin204. process (Clk)205. begin206. clk_buf: for i in 0 to N/8 LOOP207. my_clk(i) my_clk(j/8), D=> D(j),

    Q=> Q(j));220. end generate;

    221. end behavior;222.223. ------------------------------------------------------------------------224. -- N-bit dff with reset : NON-TRANSPARENT ON GATED BUFFER225 Th idth f th dff i d t i d b i N

  • 7/30/2019 10.1.1.1.4951

    48/112

    CMPE223 Booth Multiplier Marc Mosko

    249. begin250. dff: latchr port map ( Rst=> Rst, Clk=> Clk, D=> D, Q=> w );251. Q

  • 7/30/2019 10.1.1.1.4951

    49/112

    CMPE223 Booth Multiplier Marc Mosko

    mul t . vhd1. ------------------------------------------------------------------------2. -- N-bit multiplier Multiplier3. -- This is a phi-2 device.4. --5. -- BusIO_S2H is the pad i/o bus6. -- Ovrflw_s2h is the overflow output. Should be made an InOut for carryin7. -- BusSEL_S2H is a chip select, encoded active high8. -- BS_S2H is the input select (bus high, mult low)9. -- RW_S2H is the Read/Write select (read high, write low)10. -- ME_S2H is the Multiplier Enable11. -- Rst_S2H is a reset signal. It is clocked with PHI_2 to ensure

    12. -- that it does not muck with stuff when it is not supposed to13. -- Reset is immediate. There is no 1 cycle delay, like14. -- with regular signals.15. ------------------------------------------------------------------------16. library IEEE;17. use IEEE.std_logic_1164.all;18. --use work.converts.all;19.20. entity mult is21. port ( BusIO_S2H : inout std_logic_vector(15 downto 0);22. Ovrflw_S2H : out std_ulogic;23. BusSEL_S2H : in std_ulogic_vector(2 downto 0);24. BS_S2H : in std_ulogic;25. RW_S2H : in std_ulogic;26. ME_S2H : in std_ulogic;27. Rst_S2H : in std_ulogic;28. PHI_1H : in std_ulogic;29. PHI_2H : in std_ulogic);30. end mult;31.32. architecture structural of mult is

    33.34. -- A multiplier register of width N35. component multregn is36. generic(N : positive );37 t ( B OUT S2H t td l i t (N 1 d t 0)

  • 7/30/2019 10.1.1.1.4951

    50/112

    CMPE223 Booth Multiplier Marc Mosko

    61. component mult_pipe is62. port( z_v2h : out std_ulogic_vector(7 downto 0);63. a_v2h : out std_ulogic_vector(23 downto 0);64. b_v2h : out std_ulogic_vector(23 downto 0);65. c_v2h : out std_ulogic;66. ovrflw_v2h : out std_ulogic_vector(2 downto 0);67. medly_s2h : out std_ulogic;68. x_s2h : in std_ulogic_vector(15 downto 0);69. y_s2h : in std_ulogic_vector(15 downto 0);70. w_s2h : in std_ulogic_vector(31 downto 0);71. me_s2h : in std_ulogic;72. PHI_1H : in std_ulogic;73. PHI_2H : in std_ulogic;74. Rst_s2h : in std_ulogic

    75. );76. end component;77.78. -- single bit D flip flop79. component dffr_fall is80. port ( Rst : in std_ulogic;81. Clk : in std_ulogic;82. signal D : in std_ulogic;83. signal Q : out std_ulogic);84. end component;85.86. component buf is87. port ( Q : out std_ulogic;88. D : in std_ulogic);89. end component;90.91. -- Buses to/from the multiplier from the registers92. signal bus_x : std_ulogic_vector(15 downto 0);93. signal bus_y : std_ulogic_vector(15 downto 0);94. signal bus_w : std_ulogic_vector(31 downto 0);95. signal bus_z : std_ulogic_vector(31 downto 0);

    96.97. -- wiring from multiplier to CLA unit98. signal bus_a : std_ulogic_vector(23 downto 0);99. signal bus_b : std_ulogic_vector(23 downto 0);100 i l b td l i

  • 7/30/2019 10.1.1.1.4951

    51/112

    CMPE223 Booth Multiplier Marc Mosko

    124.125. -- temporary signals used to compute overflow126. signal t1_h : std_ulogic;127. signal t2_h : std_ulogic;128. signal t3_h : std_ulogic;129. signal t4_h : std_ulogic;130. signal t5_h : std_ulogic;131.132. -- outputs from regsiters133. signal feed_r0 : std_ulogic_vector(15 downto 0);134. signal feed_r1 : std_ulogic_vector(15 downto 0);135. signal feed_r2 : std_ulogic_vector(15 downto 0);136. signal feed_r3 : std_ulogic_vector(15 downto 0);137. signal feed_r4 : std_ulogic_vector(15 downto 0);

    138. signal feed_r5 : std_ulogic_vector(15 downto 0);139.140. -- buffered clocks141. signal phi_a_1h : std_ulogic_vector(6 downto 0);142. signal phi_a_2h : std_ulogic_vector(6 downto 0);143.144. begin145. ---------------------------------------------------------------146. -- Decode the input register select147. bus_sel_h

  • 7/30/2019 10.1.1.1.4951

    52/112

    CMPE223 Booth Multiplier Marc Mosko

    181. port map ( BusOUT_S2H => feed_r0,182. BusIN_S2H => To_StdULogicVector(busio_s2h),183. MultOut_S2H => bus_x,184. MultIn_V2H => Gnd_16,185. Sel_s2h => bus_sel_h(0),186. BS_S2H => Vdd,187. RW_S2H => RW_S2H,188. MEDLY_S2H => MEDLY_Q2H,189. RST_S2H => RST_S2H,190. PHI_1H => PHI_1H,191. PHI_2H => PHI_2H);192.193. -- R1 is the Y register194. -- R1 never reads from the multiplier (BS = Vdd, MultIn = GND)

    195. reg_1: multregN196. generic map (16)197. port map ( BusOUT_S2H => feed_r1,198. BusIN_S2H => To_StdULogicVector(busio_s2h),199. MultOut_S2H => bus_y,200. MultIn_V2H => Gnd_16,201. Sel_s2h => bus_sel_h(1),202. BS_S2H => Vdd,203. RW_S2H => RW_S2H,204. MEDLY_S2H => MEDLY_Q2H,205. RST_S2H => RST_S2H,206. PHI_1H => PHI_1H,207. PHI_2H => PHI_2H);208.209. -- R2 is the W(31:16) register210. -- R2 never reads from the multiplier (BS = Vdd, MultIn = GND)211. reg_2: multregN212. generic map (16)213. port map ( BusOUT_S2H => feed_r2,214. BusIN_S2H => To_StdULogicVector(busio_s2h),215. MultOut_S2H => bus_w(31 downto 16),

    216. MultIn_V2H => Gnd_16,217. Sel_s2h => bus_sel_h(2),218. BS_S2H => Vdd,219. RW_S2H => RW_S2H,220 MEDLY S2H > MEDLY Q2H

  • 7/30/2019 10.1.1.1.4951

    53/112

    CMPE223 Booth Multiplier Marc Mosko

    244. generic map (16)245. port map ( BusOUT_S2H => feed_r4,246. BusIN_S2H => To_StdULogicVector(busio_s2h),247. MultIn_V2H => bus_z(31 downto 16),248. Sel_s2h => bus_sel_h(4),249. BS_S2H => BS_S2H,250. RW_S2H => RW_S2H,251. MEDLY_S2H => MEDLY_Q2H,252. RST_S2H => RST_S2H,253. PHI_1H => PHI_1H,254. PHI_2H => PHI_2H);255.256. -- R5 is the Z(15:0) register257. -- R4 & R5 have no MultOut connections

    258. reg_5: multregN259. generic map (16)260. port map ( BusOUT_S2H => feed_r5,261. BusIN_S2H => To_StdULogicVector(busio_s2h),262. MultIn_V2H => bus_z(15 downto 0),263. Sel_s2h => bus_sel_h(5),264. BS_S2H => BS_S2H,265. RW_S2H => RW_S2H,266. MEDLY_S2H => MEDLY_Q2H,267. RST_S2H => RST_S2H,268. PHI_1H => PHI_1H,269. PHI_2H => PHI_2H);270.271. ---------------------------------------------------------------272. -- Storage for the Overflow output273. ---------------------------------------------------------------274. Rst_q2h ovrflw_v2h, Clk=> MEDLY_Q2H, Rst=> Rst_q2h);

    279.280. -- allows us to monitor ovrflw_s2h without using a buffered I/O pin281. ovrflw_s2h

  • 7/30/2019 10.1.1.1.4951

    54/112

    CMPE223 Booth Multiplier Marc Mosko

    307.308. cla_0 : mult_cla309. generic map (24)310. port map (311. z_v2h => bus_z(31 downto 8),312. car_v2h => car_out,313. a_v2h => bus_a,314. b_v2h => bus_b,315. c_v2h => bus_c316. );317.318. ---------------------------------------------------------------319. -- Compute the overflow320. -- An overflow is defined when

    321. -- 1) x*y > 0 and w > 0 and z < 0 or322. -- 2) x*y < 0 and w < 0 and z > 0323. --324. ----------------------------------------------------------------325.326. t1_h

  • 7/30/2019 10.1.1.1.4951

    55/112

    CMPE223 Booth Multiplier Marc Mosko

    mul t_c la .vhd1. ------------------------------------------------------------------------2. -- 24-bit CLA as separate entity for synthesis3. --4. ------------------------------------------------------------------------5.6. library IEEE;7. use IEEE.std_logic_1164.all;8.9. entity mult_cla is10. generic (N : positive );11.

    12. port( z_v2h : out std_ulogic_vector(N-1 downto 0);13. car_v2h : out std_ulogic;14. a_v2h : in std_ulogic_vector(N-1 downto 0);15. b_v2h : in std_ulogic_vector(N-1 downto 0);16. c_v2h : in std_ulogic17. );18. end mult_cla;19.20. architecture rtl of mult_cla is21. component plusN is22. generic( N : positive);23. port ( a_h : in std_ulogic_vector(N-1 downto 0);24. b_h : in std_ulogic_vector(N-1 downto 0);25. c_h : in std_ulogic;26. sum_h : out std_ulogic_vector(N-1 downto 0);27. car_h : out std_ulogic);28. end component;29.30. component claN is31. generic( N : positive);32. port ( a_h : in std_ulogic_vector(N-1 downto 0);

    33. b_h : in std_ulogic_vector(N-1 downto 0);34. c_h : in std_ulogic;35. sum_h : out std_ulogic_vector(N-1 downto 0);36. car_h : out std_ulogic);37 d t

    CMPE223 Booth Multiplier Marc Mosko

  • 7/30/2019 10.1.1.1.4951

    56/112

    CMPE223 Booth Multiplier Marc Mosko

    December 1, 2000 Page 54

    mul t_p ipe .vhd1. ------------------------------------------------------------------------2. -- Booth encoded carry-save-adder array3. --4. -- From "Low-power Digital VLSI Design" by Bellaouar and Elmasry.

    5. -- and6. -- "Circuit Techniques for CMOS Low-Power High-Performance Multipliers"7. -- by Abu-Khater, Bellaouar, Elmasry in IEEE J. Solid-State Circuits v.31 (10)8. -- Oct 1996 pp. 1535ff9. --10. -- z_v2h Multiply accumulate output (x * y + w) (only low-order 8 bits)11. -- a_v2h goes to fast adder for high-order 24-bits12. -- b_v2h13. -- c_v2h14. -- ovrflow_v2h 3-bits to compute overflow (w[31] x[31] y[31])15. -- medly_s2h Output good at end of phase (see me_s2h, this is delayed)16. -- x_s2h multiplicand17. -- y_s2h multiplier (gets booth encoded)18. -- w_s2h accumulate19. -- me_s2h multiplier enable20. -- PHI_1H clock21. -- PHI_2H clock22. -- Rst_s2h Reset internal registers to 023. --24. -- The Y inputs are booth encoded then gated until ME_S2H & PHI_2H.25. -- The Y inputs should be applied first to give the booth encoders time26. -- to settle. The Y inputs must remain valid until MEDLY_S2H (actually27. -- until a 1/2 cycle before...)28. ------------------------------------------------------------------------29. ------------------------------------------------------------------------30. -- Variables are generally named as follows:31. -- name_PtCl

    32. --33. -- P = pipe line stage (1, 2, or 3)34. -- t = type (s,q,v)35. -- C = clock phase (1 or 2)36. -- l = logic (L or H)37. --38. -- examples:39. -- sum_0_1v2h = row 0 sum 1st pipe stage, V timing, Phi-2, active high40. --41. -- Rules:42. -- Variables can only be assigned if P and C the same:43. -- x_1v2h

  • 7/30/2019 10.1.1.1.4951

    57/112

    CMPE223 Booth Multiplier Marc Mosko

    December 1, 2000 Page 55

    44. --45. -- To go between phases/stages you need to use a storage device:46. --47. -- gdffr_fall(Q=> x_2v1h, D=> x_1v2h, Clk=> mdly_q2h, Enable=> mdly_q1h)48. -- This clocks in x_1v2h on mdly_q2h and49. -- enables the output to x_2v1h on mdly_q1h50. --

    51. ------------------------------------------------------------------------52. library IEEE;53. use IEEE.std_logic_1164.all;54. --use work.converts.all;55.56. -- We use a fixed width / height for simplicity.57. -- Overflow = x*y + w out of range58.59. entity mult_pipe is60. port( z_v2h : out std_ulogic_vector(7 downto 0);61. a_v2h : out std_ulogic_vector(23 downto 0);62. b_v2h : out std_ulogic_vector(23 downto 0);63. c_v2h : out std_ulogic;64. ovrflw_v2h : out std_ulogic_vector(2 downto 0);65. medly_s2h : out std_ulogic;66. x_s2h : in std_ulogic_vector(15 downto 0);67. y_s2h : in std_ulogic_vector(15 downto 0);68. w_s2h : in std_ulogic_vector(31 downto 0);69. me_s2h : in std_ulogic;70. PHI_1H : in std_ulogic;71. PHI_2H : in std_ulogic;72. Rst_s2h : in std_ulogic73. );74. end mult_pipe;75.76. architecture rtl of mult_pipe is77. constant COL : integer := 16;

    78. constant ROW : integer := 8;79.80. -- AddCell will add a 0/1 to each row depending on the sign81. -- of the booth encoding.82. component addcell is83. port ( bth : in std_ulogic_vector(4 downto 0);84. sum : out std_ulogic);85. end component;86.87. -- A standard full adder88. component adder is89. port ( a_h : in std_ulogic;

    CMPE223 Booth Multiplier Marc Mosko

  • 7/30/2019 10.1.1.1.4951

    58/112

    CMPE223 Booth Multiplier Marc Mosko

    December 1, 2000 Page 56

    90. b_h : in std_ulogic;91. c_h : in std_ulogic;92. sum_h : out std_ulogic;93. car_h : out std_ulogic);94. end component;95.96. -- unsigned addition

    97. component uplusN is98. generic( N : positive);99. port ( a_h : in std_ulogic_vector(N-1 downto 0);100. b_h : in std_ulogic_vector(N-1 downto 0);101. c_h : in std_ulogic;102. sum_h : out std_ulogic_vector(N-1 downto 0);103. car_h : out std_ulogic);104. end component;105.106. -- A standard full adder 15 bits wide107. component adderN is108. generic( N : positive);109. port ( a_h : in std_ulogic_vector(N-1 downto 0);110. b_h : in std_ulogic_vector(N-1 downto 0);111. c_h : in std_ulogic;112. sum_h : out std_ulogic_vector(N-1 downto 0);113. car_h : out std_ulogic);114. end component;115.116. -- Generate a 5-line demultiplexed booth encoding of 3 input bits117. component booth_encode is118. port( in_h : in std_ulogic_vector (2 downto 0);119. bth_h : out std_ulogic_vector (4 downto 0));120. end component;121.122. -- Partial product generator with full adder123. -- Has only SUM (and carry) out

    124. component ppfa is125. port ( bth : in std_ulogic_vector(4 downto 0);126. x1_h : in std_ulogic;127. x2_h : in std_ulogic;128. s0_h : in std_ulogic;129. c0_h : in std_ulogic;130. sum_h : out std_ulogic;131. ca1_h : out std_ulogic);132. end component;133.134. -- Partial product generator with full adder135. -- Has both PP out and SUM (and carry) out

    CMPE223 Booth Multiplier Marc Mosko

  • 7/30/2019 10.1.1.1.4951

    59/112

    CMPE223 Booth Multiplier Marc Mosko

    December 1, 2000 Page 57

    136. component ppfapp is137. port ( bth : in std_ulogic_vector(4 downto 0);138. x1_h : in std_ulogic;139. x2_h : in std_ulogic;140. s0_h : in std_ulogic;141. c0_h : in std_ulogic;142. pp_h : out std_ulogic;

    143. sum_h : out std_ulogic;144. ca1_h : out std_ulogic);145. end component;146.147. -- Sign extender. Computes sign bits to pass to next row.148. -- Adds 2 bits per row. "ff" is the "flag" bit.149. component sgn is150. port ( pp_h : in std_ulogic;151. ff_h : in std_ulogic;152. pp_out_h: out std_ulogic;153. ff_out_h: out std_ulogic);154. end component;155.156. -- D flip flop with reset157. component dffr_fall is158. port ( Rst : in std_ulogic;159. Clk : in std_ulogic;160. signal D : in std_ulogic;161. signal Q : out std_ulogic);162. end component;163.164. component dffrN_fall is165. generic(N : positive );166. port ( Rst : in std_ulogic;167. Clk : in std_ulogic;168. signal D : in std_ulogic_vector(N-1 downto 0);169. signal Q : out std_ulogic_vector(N-1 downto 0));

    170. end component;171.172. -- a gated flipflop173. component gdffr_fall is174. port ( Rst : in std_ulogic;175. Clk : in std_ulogic;176. Enable : in std_ulogic;177. signal D : in std_ulogic;178. signal Q : out std_ulogic);179. end component;180.181. -- an N-bit gated flipflop

    CMPE223 Booth Multiplier Marc Mosko

  • 7/30/2019 10.1.1.1.4951

    60/112

    CMPE223 Booth Multiplier Marc Mosko

    December 1, 2000 Page 58

    182. component gdffrN_fall is183. generic(N : positive );184. port ( Rst : in std_ulogic;185. Clk : in std_ulogic;186. Enable : in std_ulogic;187. signal D : in std_ulogic_vector(N-1 downto 0);188. signal Q : out std_ulogic_vector(N-1 downto 0));

    189. end component;190.191. -- These are the outputs from the sign extenders192. -- one for each row193. -- pp15 is the pp output of the 15th column of each row194. -- we need 9 sets of wires since we have inputs to row 0 and outputs from row 7195.196. -- v2 signals in 1st pipe stage, v1 signals in 2nd197. -- (pp1 = 1st stage, pp2 = 2nd, pp3 = 3rd)198.199. -- There is some overlap here, since in PHI2 we generate pp1_v2h(4) which200. -- is then latech to PHI1201. signal pp_1v2h : std_ulogic_vector(4 downto 0);202. signal ff_1v2h : std_ulogic_vector(4 downto 0);

    203. signal pp15_1v2h: std_ulogic_vector(4 downto 0);204.205. signal pp_2v1h : std_ulogic_vector(8 downto 4);206. signal ff_2v1h : std_ulogic_vector(8 downto 4);207. signal pp15_2v1h: std_ulogic_vector(7 downto 4);208.209. signal pp_3v2h : std_ulogic_vector(8 downto 8);210.211. -- each row has an output from the addcell212. signal add_1v2h : std_ulogic_vector(3 downto 0);213. signal add_2v1h : std_ulogic_vector(7 downto 0);214.215. -- these are a cycle later

    216. signal add_3v2h : std_ulogic_vector(7 downto 4);217.218. -- each row gets own array. Don't try 2-dimension array.219. -- sum_x_h is the sum output of each column in row X.220. -- ca1_x_h is the carry output of each column in row X.221. -- pre_A_h is the booth encoding for row A before the gate222. -- bth_A_h is the booth encoding for row A after the gate223.224. -- The V2H signals are outputs from the multiplier body225. -- the S1H signals are outputs from the 1st pipeline registers226. -- the V1H signals are outputs from the 1st pipeline gates227.

    CMPE223 Booth Multiplier Marc Mosko

  • 7/30/2019 10.1.1.1.4951

    61/112

    CMPE223 Booth Multiplier Marc Mosko

    December 1, 2000 Page 59

    228. signal sum_0_1v2h : std_ulogic_vector(COL downto 0);229. signal car_0_1v2h : std_ulogic_vector(COL downto 0);230. signal sum_0_2v1h : std_ulogic_vector(1 downto 0);231. signal car_0_2v1h : std_ulogic;232. signal bth_pre_0_h : std_ulogic_vector(4 downto 0);233. signal bth_0_1v2h : std_ulogic_vector(4 downto 0);234.

    235. signal sum_1_1v2h : std_ulogic_vector(COL downto 0);236. signal car_1_1v2h : std_ulogic_vector(COL downto 0);237. signal sum_1_2v1h : std_ulogic_vector(1 downto 0);238. signal car_1_2v1h : std_ulogic;239. signal bth_pre_1_h : std_ulogic_vector(4 downto 0);240. signal bth_1_1v2h : std_ulogic_vector(4 downto 0);241.242. signal sum_2_1v2h : std_ulogic_vector(COL downto 0);243. signal car_2_1v2h : std_ulogic_vector(COL downto 0);244. signal sum_2_2v1h : std_ulogic_vector(1 downto 0);245. signal car_2_2v1h : std_ulogic;246. signal bth_pre_2_h : std_ulogic_vector(4 downto 0);247. signal bth_2_1v2h : std_ulogic_vector(4 downto 0);248.

    249. signal sum_3_1v2h : std_ulogic_vector(COL downto 0);250. signal car_3_1v2h : std_ulogic_vector(COL downto 0);251. signal sum_3_2v1h : std_ulogic_vector(COL downto 0);252. signal car_3_2v1h : std_ulogic_vector(COL downto 0);253. signal bth_pre_3_h : std_ulogic_vector(4 downto 0);254. signal bth_3_1v2h : std_ulogic_vector(4 downto 0);255.256. -- The V1H signals are outputs from the multiplier body257. -- the S2H signals are outputs from the 2st pipeline registers258. -- the V2H signals are outputs from the 2st pipeline gates259.260. signal sum_4_2v1h : std_ulogic_vector(COL downto 0);261. signal car_4_2v1h : std_ulogic_vector(COL downto 0);

    262. signal sum_4_3v2h : std_ulogic_vector(1 downto 0);263. signal car_4_3v2h : std_ulogic;264. signal bth_pre_4_h : std_ulogic_vector(4 downto 0);265. signal bth_4_2v1h : std_ulogic_vector(4 downto 0);266.267. signal sum_5_2v1h : std_ulogic_vector(COL downto 0);268. signal car_5_2v1h : std_ulogic_vector(COL downto 0);269. signal sum_5_3v2h : std_ulogic_vector(1 downto 0);270. signal car_5_3v2h : std_ulogic;271. signal bth_pre_5_h : std_ulogic_vector(4 downto 0);272. signal bth_5_2v1h : std_ulogic_vector(4 downto 0);273.

    CMPE223 Booth Multiplier Marc Mosko

  • 7/30/2019 10.1.1.1.4951

    62/112

    CMPE223 Booth Multiplier Marc Mosko

    December 1, 2000 Page 60

    274. signal sum_6_2v1h : std_ulogic_vector(COL downto 0);275. signal car_6_2v1h : std_ulogic_vector(COL downto 0);276. signal sum_6_3v2h : std_ulogic_vector(1 downto 0);277. signal car_6_3v2h : std_ulogic;278. signal bth_pre_6_h : std_ulogic_vector(4 downto 0);279. signal bth_6_2v1h : std_ulogic_vector(4 downto 0);280.

    281. signal sum_7_2v1h : std_ulogic_vector(COL downto 0);282. signal car_7_2v1h : std_ulogic_vector(COL downto 0);283. signal sum_7_3v2h : std_ulogic_vector(COL downto 0);284. signal car_7_3v2h : std_ulogic_vector(COL downto 0);285. signal bth_pre_7_h : std_ulogic_vector(4 downto 0);286. signal bth_7_2v1h : std_ulogic_vector(4 downto 0);287.288. -- The first 15 bits go into a full adder array.289. -- The last 17 bits go into a 42 compressor array with W()290. --291. -- These are the a_h() and b_h() inputs and the carry output292. signal fa_a_2v1h : std_ulogic_vector(7 downto 0);293. signal fa_b_2v1h : std_ulogic_vector(7 downto 0);294. signal fa_car_2v1h : std_ulogic;

    295.296. -- these feed the 24-bit CLA297. -- fa_a_3 is (32 - 8) to accomodate an extra carry bit that we do not use298. signal fa_a_3v2h : std_ulogic_vector(32 downto 8);299. signal fa_b_3v2h : std_ulogic_vector(31 downto 8);300. signal fa1_car_3v2h : std_ulogic;301.302. -- The carry outputs of bit 16's compressor (no longer use 4:2 compressors, but303. - the name is the same...)304. --signal comp_ca1_3v2h: std_ulogic;305. --signal comp_ca2_3v2h: std_ulogic;306.307. -- b input and Carry outputs of the 42 compressor array

    308. -- cout_out_h is the output of the 42 compressors (since z_v2h309. -- is not inout or buffered) no longer use 42 compressors, but name is the same.310. signal comp_b_3v2h : std_ulogic_vector(15 downto 0);311. signal comp_out_3v2h: std_ulogic_vector(31 downto 0);312.313. -- some miscellaneous signals used to compute the overflow314.315. constant GND : std_ulogic := '0';316. constant VDD : std_ulogic := '1';317.318. -- a modified version of x_s2h to align with the times 2 needed for booth319. -- Use a tempx as the bit-sliced version then assign whole to myx_v2h

    CMPE223 Booth Multiplier Marc Mosko

  • 7/30/2019 10.1.1.1.4951

    63/112

    C 3 oo u p e c os o

    December 1, 2000 Page 61

    320. -- A ModelSim technote said this was the way to do it....321. -- We need to pad with a "0" on the right and duplicate x_s2h(15) on left322. -- myx_v2h is also gated on ME_Q2H323. -- myx_s1h/v1h is latched/gated on MDLY_Q1H in the 2nd pipeline stage324. signal myx_1v2h : std_ulogic_vector(COL+1 downto 0);325. signal myx_2v1h : std_ulogic_vector(COL+1 downto 0);326.

    327. signal myx_3v2h: std_ulogic; -- needed in 3rd pipeline stage328.329. signal myy_2s1h: std_ulogic_vector(COL-1 downto 7);330. signal myy_3v2h: std_ulogic; -- needed in 3rd pipeline stage331.332. signal tempx : std_ulogic_vector(COL+1 downto 0);333.334. -- The W signal is gated in three places.335. -- W[14:0] is gated on ME_Q2H336. -- W[15] is gated on MDLY_Q1H337. -- W[31:16] is gated on MDLY_Q2H338. -- the array indicies are to keep them the same as w_s2h339.340. signal w_1v2h : std_ulogic_vector(31 downto 0);

    341. signal w_2v1h : std_ulogic_vector(31 downto 15);342. signal w_3v2h : std_ulogic_vector(31 downto 15);343.344.345. -- a temp signal array for the Y input to row 0 booth encoder.346. signal y0_in : std_ulogic_vector(2 downto 0);347.348. -- timing signals for pipeline registers and gates349. signal me_1q2h : std_ulogic;350. signal me_2s1h : std_ulogic;351. signal me_2q1h : std_ulogic;352. signal me_3s2h : std_ulogic;353. signal me_3q2h : std_ulogic;

    354.355. -- Internally guarded RESET on PHI_2356. signal rst_q2h : std_ulogic;357.358. -- The 1st 8 bits of z are generated in the 2nd pipeline stage359. signal z_2v1h : std_ulogic_vector(7 downto 0);360.361. signal DBG_EN : std_ulogic := '0';362.363. begin364.365. -- Generate the internal reset signal

    CMPE223 Booth Multiplier Marc Mosko

  • 7/30/2019 10.1.1.1.4951

    64/112

    p

    December 1, 2000 Page 62

    366. rst_q2h rst_q2h);373. dff_clk1: dffr_fall port map( D => me_2s1h, Q=> me_3s2h, CLK=> phi_1h, Rst=> rst_q2h);374.375. medly_s2h w_s2h(31 downto 15), Clk=> me_1q2h, Rst=> Rst_q2h);384.385. wlatch_2: dffrN_fall generic map(17)

    386. port map (Q=> w_3v2h, D=> w_2v1h(31 downto 15), Clk=> me_2q1h, Rst=> Rst_q2h);387.388. -- ff_h(0) is always 0389. ff_1v2h(0)

  • 7/30/2019 10.1.1.1.4951

    65/112

    p

    December 1, 2000 Page 63

    411. ----------------------------------------------------------------412.413. ----------------------------------------------------------------414. -- 1) Generate the sign extender cells, one cell per row415. ----------------------------------------------------------------416. -- There is one sign cell per row417. COLGEN1: for i in 0 to 3 generate

    418. sgncell : sgn port map( pp_h => pp15_1v2h(i), ff_h => ff_1v2h(i),419. pp_out_h => pp_1v2h(i+1), ff_out_h => ff_1v2h(i+1) );420. end generate;421.422. pipe_pp2: gdffr_fall port map ( Q=> pp_2v1h(4), D=> pp_1v2h(4),423. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);424. pipe_ff2: gdffr_fall port map ( Q=> ff_2v1h(4), D=> ff_1v2h(4),425. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);426.427. COLGEN2: for i in 4 to 7 generate428. sgncell : sgn port map( pp_h => pp15_2v1h(i), ff_h => ff_2v1h(i),429. pp_out_h => pp_2v1h(i+1), ff_out_h => ff_2v1h(i+1) );430. end generate;431.

    432. pipe_pp3: gdffr_fall port map ( Q=> pp_3v2h(8), D=> pp_2v1h(8),433. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);434. --pipe_ff3: gdffr_fall port map ( Q=> ff_3v2h(8), D=> ff_2v1h(8),435. -- Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);436.437.438. ----------------------------------------------------------------439. -- 2) The booth encoders, one cell per row440. ----------------------------------------------------------------441. -- Generate each Booth encoders, one per row. Note that row 0 is special442. -- and pads a "0" as LSB.443. y0_in(2 downto 1) bth_pre_0_h);447. bth_1 : booth_encode port map ( in_h => y_s2h(3 downto 1), bth_h => bth_pre_1_h);448. bth_2 : booth_encode port map ( in_h => y_s2h(5 downto 3), bth_h => bth_pre_2_h);449. bth_3 : booth_encode port map ( in_h => y_s2h(7 downto 5), bth_h => bth_pre_3_h);450.451. -- Delay y_s2h(15 downto 7) until stage 2452.453. bth_4 : booth_encode port map ( in_h => myy_2s1h(9 downto 7), bth_h => bth_pre_4_h);454. bth_5 : booth_encode port map ( in_h => myy_2s1h(11 downto 9), bth_h => bth_pre_5_h);455. bth_6 : booth_encode port map ( in_h => myy_2s1h(13 downto 11), bth_h => bth_pre_6_h);456. bth_7 : booth_encode port map ( in_h => myy_2s1h(15 downto 13), bth_h => bth_pre_7_h);

    CMPE223 Booth Multiplier Marc Mosko

  • 7/30/2019 10.1.1.1.4951

    66/112

    p

    December 1, 2000 Page 64

    457.458. -- Pass the booth encoding through the gated drivers459. --bth_0_1v2h '0');460. --bth_1_1v2h '0');461. --bth_2_1v2h '0');462. --bth_3_1v2h '0');463. --bth_4_2v1h '0');

    464. --bth_5_2v1h '0');465. --bth_6_2v1h '0');466. --bth_7_2v1h '0');467. bth_0_1v2h Rst_q2h);482.483. ----------------------------------------------------------------484. -- 3) The add cells, one per row485. ----------------------------------------------------------------486. -- The Add Cells get mixedup on the indicies, since booth encoding is487. -- not a row array. Easiest to just declare each out outside a generate loop488. addcell_0 : addcell port map ( bth => bth_0_1v2h, sum => add_1v2h(0) );489. addcell_1 : addcell port map ( bth => bth_1_1v2h, sum => add_1v2h(1) );490. addcell_2 : addcell port map ( bth => bth_2_1v2h, sum => add_1v2h(2) );

    491. addcell_3 : addcell port map ( bth => bth_3_1v2h, sum => add_1v2h(3) );492. addcell_4 : addcell port map ( bth => bth_4_2v1h, sum => add_2v1h(4) );493. addcell_5 : addcell port map ( bth => bth_5_2v1h, sum => add_2v1h(5) );494. addcell_6 : addcell port map ( bth => bth_6_2v1h, sum => add_2v1h(6) );495. addcell_7 : addcell port map ( bth => bth_7_2v1h, sum => add_2v1h(7) );496.497. -- Delay the first 4 to 2nd stage498. gadd1: gdffrN_fall generic map(4)499. port map( Q=> add_2v1h(3 downto 0), D=>add_1v2h,500. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);501.502. -- Delay the last 4 to 3nd stage

    CMPE223 Booth Multiplier Marc Mosko

  • 7/30/2019 10.1.1.1.4951

    67/112

    December 1, 2000 Page 65

    503. gadd2: gdffrN_fall generic map(4)504. port map( Q=> add_3v2h(7 downto 4), D=>add_2v1h(7 downto 4),505. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);506.507. ----------------------------------------------------------------508. -- 4) The Multiplier body, 16 columns by 8 rows509. ----------------------------------------------------------------

    510. -- i is the column511. ROWGEN: for i in 0 to COL generate512. -- for the PPFA cells, columns 0 to 14 use regular PPFA cells513. -- column 15 uses the PPFAPP cell which has a tap on the PP output of514. -- the mux. This is needed to do the sign extension.515. --516. -- So, ppfa_0(5), for example, would be column 5 of row 0517.518. -- The first 15 columns get sum/carry inputs from previous row519. -- Columns 15 and 16 get special wiring from the sign extenders520. -- Column 16 also uses the PPFAPP cells521.522. G0: if( i < COL-1 ) generate523. -- Row 0 is special and gets W() inputs

    524. ppfa_0: ppfa port map( bth => bth_0_1v2h,525. x1_h => myx_1v2h(i+1),526. x2_h => myx_1v2h(i),527. s0_h => w_1v2h(i),528. c0_h => GND,529. sum_h => sum_0_1v2h(i),530. ca1_h => car_0_1v2h(i));531.532. -- All other rows get s0_h from 2 columns left and533. -- c0_h from 1 column left from the previous row.534.535. ppfa_1: ppfa port map( bth => bth_1_1v2h,536. x1_h => myx_1v2h(i+1),537. x2_h => myx_1v2h(i),538. s0_h => sum_0_1v2h(i+2),539. c0_h => car_0_1v2h(i+1),540. sum_h => sum_1_1v2h(i),541. ca1_h => car_1_1v2h(i));542.543. ppfa_2: ppfa port map( bth => bth_2_1v2h,544. x1_h => myx_1v2h(i+1),545. x2_h => myx_1v2h(i),546. s0_h => sum_1_1v2h(i+2),547. c0_h => car_1_1v2h(i+1),548. sum_h => sum_2_1v2h(i),

    CMPE223 Booth Multiplier Marc Mosko

  • 7/30/2019 10.1.1.1.4951

    68/112

    December 1, 2000 Page 66

    549. ca1_h => car_2_1v2h(i));550.551. ppfa_3: ppfa port map( bth => bth_3_1v2h,552. x1_h => myx_1v2h(i+1),553. x2_h => myx_1v2h(i),554. s0_h => sum_2_1v2h(i+2),555. c0_h => car_2_1v2h(i+1),

    556. sum_h => sum_3_1v2h(i),557. ca1_h => car_3_1v2h(i));558.559. p00_sum1 : gdffr_fall port map560. ( Q=> sum_3_2v1h(i), D=> sum_3_1v2h(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);561.562. p00_car1 : gdffr_fall port map563. ( Q=> car_3_2v1h(i), D=> car_3_1v2h(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);564.565. -- use the value before the tri-state566. p00_x1 : gdffr_fall port map567. ( Q=> myx_2v1h(i), D=> tempx(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);568.569. ppfa_4: ppfa port map( bth => bth_4_2v1h,

    570. x1_h => myx_2v1h(i+1),571. x2_h => myx_2v1h(i),572. s0_h => sum_3_2v1h(i+2),573. c0_h => car_3_2v1h(i+1),574. sum_h => sum_4_2v1h(i),575. ca1_h => car_4_2v1h(i));576.577. ppfa_5: ppfa port map( bth => bth_5_2v1h,578. x1_h => myx_2v1h(i+1),579. x2_h => myx_2v1h(i),580. s0_h => sum_4_2v1h(i+2),581. c0_h => car_4_2v1h(i+1),582. sum_h => sum_5_2v1h(i),583. ca1_h => car_5_2v1h(i));584.585. ppfa_6: ppfa port map( bth => bth_6_2v1h,586. x1_h => myx_2v1h(i+1),587. x2_h => myx_2v1h(i),588. s0_h => sum_5_2v1h(i+2),589. c0_h => car_5_2v1h(i+1),590. sum_h => sum_6_2v1h(i),591. ca1_h => car_6_2v1h(i));592.593. ppfa_7: ppfa port map( bth => bth_7_2v1h,594. x1_h => myx_2v1h(i+1),

    CMPE223 Booth Multiplier Marc Mosko

  • 7/30/2019 10.1.1.1.4951

    69/112

    December 1, 2000 Page 67

    595. x2_h => myx_2v1h(i),596. s0_h => sum_6_2v1h(i+2),597. c0_h => car_6_2v1h(i+1),598. sum_h => sum_7_2v1h(i),599. ca1_h => car_7_2v1h(i));600.601. p00_sum2 : gdffr_fall port map

    602. ( Q=> sum_7_3v2h(i), D=> sum_7_2v1h(i), Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);603.604. p00_car2 : gdffr_fall port map605. ( Q=> car_7_3v2h(i), D=> car_7_2v1h(i), Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);606.607. end generate G0;608.609. -- In column 15, the s0_h input is the "pp" output of the sign extender610. -- pp_h() is indexed by row number.611.612. G15: if( i = COL-1 ) generate613. ppfa15_0: ppfa port map( bth => bth_0_1v2h,614. x1_h => myx_1v2h(i+1),615. x2_h => myx_1v2h(i),

    616. s0_h => GND,617. c0_h => GND,618. sum_h => sum_0_1v2h(i),619. ca1_h => car_0_1v2h(i));620.621. ppfa15_1: ppfa port map( bth => bth_1_1v2h,622. x1_h => myx_1v2h(i+1),623. x2_h => myx_1v2h(i),624. s0_h => pp_1v2h(1),625. c0_h => car_0_1v2h(i+1),626. sum_h => sum_1_1v2h(i),627. ca1_h => car_1_1v2h(i));628.629. ppfa15_2: ppfa port map( bth => bth_2_1v2h,630. x1_h => myx_1v2h(i+1),631. x2_h => myx_1v2h(i),632. s0_h => pp_1v2h(2),633. c0_h => car_1_1v2h(i+1),634. sum_h => sum_2_1v2h(i),635. ca1_h => car_2_1v2h(i));636.637. ppfa15_3: ppfa port map( bth => bth_3_1v2h,638. x1_h => myx_1v2h(i+1),639. x2_h => myx_1v2h(i),640. s0_h => pp_1v2h(3),

    CMPE223 Booth Multiplier Marc Mosko

  • 7/30/2019 10.1.1.1.4951

    70/112

    December 1, 2000 Page 68

    641. c0_h => car_2_1v2h(i+1),642. sum_h => sum_3_1v2h(i),643. ca1_h => car_3_1v2h(i));644.645. p15_sum1 : gdffr_fall port map646. ( Q=> sum_3_2v1h(i), D=> sum_3_1v2h(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);647.

    648. p15_car1 : gdffr_fall port map649. ( Q=> car_3_2v1h(i), D=> car_3_1v2h(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);650.651. -- use value before the tri-state (don't use myx_1v2h)652. p15_x1 : gdffr_fall port map653. ( Q=> myx_2v1h(i), D=> tempx(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);654.655. ppfa15_4: ppfa port map( bth => bth_4_2v1h,656. x1_h => myx_2v1h(i+1),657. x2_h => myx_2v1h(i),658. s0_h => pp_2v1h(4),659. c0_h => car_3_2v1h(i+1),660. sum_h => sum_4_2v1h(i),661. ca1_h => car_4_2v1h(i));

    662.663. ppfa15_5: ppfa port map( bth => bth_5_2v1h,664. x1_h => myx_2v1h(i+1),665. x2_h => myx_2v1h(i),666. s0_h => pp_2v1h(5),667. c0_h => car_4_2v1h(i+1),668. sum_h => sum_5_2v1h(i),669. ca1_h => car_5_2v1h(i));670.671. ppfa15_6: ppfa port map( bth => bth_6_2v1h,672. x1_h => myx_2v1h(i+1),673. x2_h => myx_2v1h(i),674. s0_h => pp_2v1h(6),675. c0_h => car_5_2v1h(i+1),676. sum_h => sum_6_2v1h(i),677. ca1_h => car_6_2v1h(i));678.679. ppfa15_7: ppfa port map( bth => bth_7_2v1h,680. x1_h => myx_2v1h(i+1),681. x2_h => myx_2v1h(i),682. s0_h => pp_2v1h(7),683. c0_h => car_6_2v1h(i+1),684. sum_h => sum_7_2v1h(i),685. ca1_h => car_7_2v1h(i));686. p15_sum2 : gdffr_fall port map

    CMPE223 Booth Multiplier Marc Mosko

  • 7/30/2019 10.1.1.1.4951

    71/112

    December 1, 2000 Page 69

    687. ( Q=> sum_7_3v2h(i), D=> sum_7_2v1h(i), Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);688.689. p15_car2 : gdffr_fall port map690. ( Q=> car_7_3v2h(i), D=> car_7_2v1h(i), Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);691.692. end generate G15;693.

    694. -- In column 16, the s0_h input is the "ff" output of the sign extender695. -- The c0_h input is 0.696.697. G16: if( i = COL ) generate698. ppfapp_0: ppfapp port map( bth => bth_0_1v2h,699. x1_h => myx_1v2h(i+1),700. x2_h => myx_1v2h(i),701. s0_h => GND,702. c0_h => GND,703. pp_h => pp15_1v2h(0),704. sum_h => sum_0_1v2h(i),705. ca1_h => car_0_1v2h(i));706.707. ppfapp_1: ppfapp port map( bth => bth_1_1v2h,

    708. x1_h => myx_1v2h(i+1),709. x2_h => myx_1v2h(i),710. s0_h => ff_1v2h(1),711. c0_h => GND,712. pp_h => pp15_1v2h(1),713. sum_h => sum_1_1v2h(i),714. ca1_h => car_1_1v2h(i));715.716. ppfapp_2: ppfapp port map( bth => bth_2_1v2h,717. x1_h => myx_1v2h(i+1),718. x2_h => myx_1v2h(i),719. s0_h => ff_1v2h(2),720. c0_h => GND,721. pp_h => pp15_1v2h(2),722. sum_h => sum_2_1v2h(i),723. ca1_h => car_2_1v2h(i));724.725. ppfapp_3: ppfapp port map( bth => bth_3_1v2h,726. x1_h => myx_1v2h(i+1),727. x2_h => myx_1v2h(i),728. s0_h => ff_1v2h(3),729. c0_h => GND,730. pp_h => pp15_1v2h(3),731. sum_h => sum_3_1v2h(i),732. ca1_h => car_3_1v2h(i));

    CMPE223 Booth Multiplier Marc Mosko

  • 7/30/2019 10.1.1.1.4951

    72/112

    December 1, 2000 Page 70

    733.734. p16_sum1 : gdffr_fall port map735. ( Q=> sum_3_2v1h(i), D=> sum_3_1v2h(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);736.737. p16_car1 : gdffr_fall port map738. ( Q=> car_3_2v1h(i), D=> car_3_1v2h(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);739.

    740. -- don't use myx_1v2h, use tempx from before the tristate741. p16_x1 : gdffr_fall port map742. ( Q=> myx_2v1h(i), D=> tempx(i), Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);743.744. ppfapp_4: ppfapp port map( bth => bth_4_2v1h,745. x1_h => myx_2v1h(i+1),746. x2_h => myx_2v1h(i),747. s0_h => ff_2v1h(4),748. c0_h => GND,749. pp_h => pp15_2v1h(4),750. sum_h => sum_4_2v1h(i),751. ca1_h => car_4_2v1h(i));752.753. ppfapp_5: ppfapp port map( bth => bth_5_2v1h,

    754. x1_h => myx_2v1h(i+1),755. x2_h => myx_2v1h(i),756. s0_h => ff_2v1h(5),757. c0_h => GND,758. pp_h => pp15_2v1h(5),759. sum_h => sum_5_2v1h(i),760. ca1_h => car_5_2v1h(i));761.762. ppfapp_6: ppfapp port map( bth => bth_6_2v1h,763. x1_h => myx_2v1h(i+1),764. x2_h => myx_2v1h(i),765. s0_h => ff_2v1h(6),766. c0_h => GND,767. pp_h => pp15_2v1h(6),768. sum_h => sum_6_2v1h(i),769. ca1_h => car_6_2v1h(i));770.771. ppfapp_7: ppfapp port map( bth => bth_7_2v1h,772. x1_h => myx_2v1h(i+1),773. x2_h => myx_2v1h(i),774. s0_h => ff_2v1h(7),775. c0_h => GND,776. pp_h => pp15_2v1h(7),777. sum_h => sum_7_2v1h(i),778. ca1_h => car_7_2v1h(i));

    CMPE223 Booth Multiplier Marc Mosko

  • 7/30/2019 10.1.1.1.4951

    73/112

    December 1, 2000 Page 71

    779. p16_sum2 : gdffr_fall port map780. ( Q=> sum_7_3v2h(i), D=> sum_7_2v1h(i), Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);781.782. p16_car2 : gdffr_fall port map783. ( Q=> car_7_3v2h(i), D=> car_7_2v1h(i), Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);784.785. end generate G16;

    786.787. end generate;788.789. -- need to latch bit 17 of "myx", since that is not in the generates above790. glatch_x17 : gdffr_fall port map ( Q=> myx_2v1h(17), D=> myx_1v2h(17),791. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);792.793. -- need bit 16 (=x_in(15)) in 3rd pipeline stage for overflow794. glatch_x15 : gdffr_fall port map ( Q=> myx_3v2h, D=> myx_2v1h(16),795. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);796.797. -- These are the tri-state latched outputs going to the adder798. gsum_0_2: gdffrN_fall generic map (2) port map ( Q=> sum_0_2v1h, D=> sum_0_1v2h(1 downto 0),799. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);

    800. gsum_1_2: gdffrN_fall generic map (2) port map ( Q=> sum_1_2v1h, D=> sum_1_1v2h(1 downto 0),801. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);802. gsum_2_2: gdffrN_fall generic map (2) port map ( Q=> sum_2_2v1h, D=> sum_2_1v2h(1 downto 0),803. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);804.805. gca1_0_2: gdffr_fall port map ( Q=> car_0_2v1h, D=> car_0_1v2h(0),806. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);807. gca1_1_2: gdffr_fall port map ( Q=> car_1_2v1h, D=> car_1_1v2h(0),808. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);809. gca1_2_2: gdffr_fall port map ( Q=> car_2_2v1h, D=> car_2_1v2h(0),810. Clk=> me_1q2h, Enable=> me_2q1h, Rst=> Rst_q2h);811.812. gsum_4_3: gdffrN_fall generic map (2) port map ( Q=> sum_4_3v2h, D=> sum_4_2v1h(1 downto 0),813. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);814. gsum_5_3: gdffrN_fall generic map (2) port map ( Q=> sum_5_3v2h, D=> sum_5_2v1h(1 downto 0),815. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);816. gsum_6_3: gdffrN_fall generic map (2) port map ( Q=> sum_6_3v2h, D=> sum_6_2v1h(1 downto 0),817. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);818.819. gca1_4_3: gdffr_fall port map ( Q=> car_4_3v2h, D=> car_4_2v1h(0),820. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);821. gca1_5_3: gdffr_fall port map ( Q=> car_5_3v2h, D=> car_5_2v1h(0),822. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);823. gca1_6_3: gdffr_fall port map ( Q=> car_6_3v2h, D=> car_6_2v1h(0),824. Clk=> me_2q1h, Enable=> me_3q2h, Rst=> Rst_q2h);

    CMPE223 Booth Multiplier Marc Mosko

  • 7/30/2019 10.1.1.1.4951

    74/112

    December 1, 2