VLSI Arithmetic Lecture 10: Multipliers

VLSI Arithmetic

Lecture 10:Multipliers

Prof. Vojin G. OklobdzijaUniversity of California

http://www.ece.ucdavis.edu/acsel

April 22, 2023 2

Multiplication Algorithm*

*from Parhami

April 22, 2023 3

*from Parhami

April 22, 2023 4

*from Parhami

April 22, 2023 5

*from Parhami

April 22, 2023 6

Multiplication*

*from Parhami

April 22, 2023 7

Multiplication*

*from Parhami

April 22, 2023 8

*from Parhami

April 22, 2023 9

*from Parhami

April 22, 2023 10

Multiplier Recoding**from Parhami

April 22, 2023 11

*from Parhami

April 22, 2023 12

Multiplication by Constants

*from Parhami

April 22, 2023 13

Multiplication by Constants

*from Parhami

April 22, 2023 14

Fast Multipliers

*from Parhami

April 22, 2023 15

Using Higher Radix Multiplier

*from Parhami

April 22, 2023 16

Using Higher Radix Multiplier

*from Parhami

April 22, 2023 17

Higher Radix Multiplier

*from Parhami

April 22, 2023 18

*from Parhami

April 22, 2023 19

Booth’s Recoding

*from Parhami

April 22, 2023 20

Booth’s Recoding

*from Parhami

April 22, 2023 21

Booth’s Recoding

*from Parhami

April 22, 2023 22

*from Parhami

April 22, 2023 23

Higher Radix Multipliers

*from Parhami

April 22, 2023 24

Tree and Array Multipliers

*from Parhami

April 22, 2023 25

Tree and Array Multipliers

*from Parhami

April 22, 2023 26

Generating Partial Products

*from G. Bewick

April 22, 2023 27

Generating Partial Products*from G. Bewick

April 22, 2023 28

Generating Partial Products using Booth’s Recoding

*from G. Bewick

April 22, 2023 29

Generating Partial Products using Booth’s Recoding

*from G. Bewick

April 22, 2023 30

Booth Partial Product Selector Logic

*from G. Bewick

April 22, 2023 31

Tree Multipliers

*from Parhami

April 22, 2023 32

April 22, 2023 33

April 22, 2023 34

April 22, 2023 35

April 22, 2023 36

April 22, 2023 37

Tree Multipliers

*from Parhami

April 22, 2023 38

Tree Multipliers

*from Parhami

April 22, 2023 39

Tree Multipliers

*from Parhami

April 22, 2023 40

April 22, 2023 41

April 22, 2023 42

April 22, 2023 43

Reduction using 4:2 Compressors

*from G. Bewick

44 Multiplier DesignApril 22, 2023

A Method for Generation of FastParallel Multipliers

Vojin G. OklobdzijaDavid VillegerSimon S. Liu

Electrical and Computer EngineeringUniversity of California

Multiplier DesignApril 22, 2023 45

Fast Parallel MultipliersObjectiveImproved Speed of Parallel Multiplier via:

• Improvements in Partial-Product Bit Reduction Techniques

• Optimization of the Final Adder for the Uneven Signal Arrival Profile from the Multiplier Tree

Multiplication

Algorithm:

i ryXryXXYP

0 p)(0

)(1)1(j

njj Xyrpr

p for j=0,....,n-1

initially

p(n)=XY after n steps

Parallel Multipliers

Step 0

S tep 1

S tep 2

S tep 3

S tep 4

Minimum Number of Stages (Dada’s Rule)

Their Schemes

Use of 4:2 Compressors

A. Weinberger 1981M. Santoro 1988

4:2 Compressor

I4 I1I2I3

Critical Signal Path in a 4:2 Compressor Tree

Re-designed 4:2 Compressor with 3 XOR Delay (Nagamatsu, Toshiba)C inI1

I2I3I4

Critical Path in a 4:2 Compressor

Signal Arrival Profile for RWT (3:2) and MWT (4:2)

Using 9:2 Compressors

(P. Song, G. De Michelli 1991)

Compressor Tree Implemented with 9:2 Compressors

9:2 Compressor Structure

0 10 20 30 40 50 60 70 80 90 100

Multiplier Width (bits)

Critical Path: (Equivalent XOR Gate Delays)

3,2 Counter

9to2 Compressor (Redesigned)

4:2 Compressor (Redesigned)

Delay Expressed as No. of XOR Gate Delays

Use of Higher-Order Compressors

D. Villeger, V.G. Oklobdzija 1993

Design of a 13:2 Compressor from a 9:2 Compressor

Delay Profile of a 24:2 Compressor Tree

Compressor Family Characteristics

Using Carry-Propagate Adders

(G. Bewick 1993) (D. Villeger & V. G. Oklobdzija 1993)

Column Compression Tree Consisting of 4-bit Adders

Bit Reduction Using 4-bit Adders (24X24)

Idea !!!!!

A Method for Speed Optimized Partial Product Reduction and Generation of

Fast Parallel Multipliers Using an Algorithmic Approach – TDM

(Oklobdzija, Villeger, Liu, 1994)

Carry Propagate Adder

Vertical Slices

Horizontal PropagationCarry and Sum Connection to the Final Adder

Partial Product Martix Divided into Vertical Compressor Slices

Example of a12 X 12 Multiplication

1 0 1 1 0 1 0 1 0 1 0 01 0 1 1 0 1 0 1 0 1 0 0

0 0 0 0 0 0 0 0 0 0 0 01 0 1 1 0 1 0 1 0 1 0 0

1 0 1 1 0 1 0 1 0 1 0 00 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0

1 0 1 1 0 1 0 1 0 1 0 01 0 1 1 0 1 0 1 0 1 0 0

0 0 0 0 0 0 0 0 0 0 0 01 0 1 1 0 1 0 1 0 1 0 0

Vertical Compressor Slice - VCS

(Partial Product for X*Y =B54 * B1B)

0 0 1 1 0 1 0

3-Dimensional View of Partial Product Reduction

Final Adder

Cin Sum

Signal Delays in a Full Adder(3,2) Counter

Fast Input

Fast Output

Cin Sum

Signal Delays in a Full Adder(3,2) Counter

Fast Input

Fast Output

Three-Dimensional optimization Method: TDM(Oklobdzija, Villeger, Liu, 1996)

ABCin Sum

I2I3I4

Cin 3 XORdelays

Method

cina b

TDM ArrangementWorst Case

cina b

Two cases of signals passing through the next level

cina b

Best Balanced Case Average Case

cina b

2 221 1 0

Example of a Optimized Interconnection

cina b

bit (n-1) positionbit (n) position

2 xor0 xor

3 xor3 xor

Example of a not Optimized Interconnection

cina b

bit (n-1) positionbit (n) position

2 xor0 xor

4 xor3 xor

Example of Delay Optimization

The 9th Vertical Compressor Slice of a Multiplier

A B Cin

0 0 0 0 0 0 0 0 0 .5 1 1 2 3

.5 1 11 2 22 2.5

3 3 3.5 4

87 Multiplier DesignApril 22, 2023Computer Tools

cina b

Signals withDifferent Delay Levels

Bypass Signals,No Delay Added

cina b

Method for Optimal Interconnection

Signals with short delays

Signals with intermediate delays

Add slow input delay

Add fast input delay

Level i

Level i+1

Signals with long delays

Signals with long delaysSignals with

short delaysSignals with intermediate delays

cina b

Design of Parallel MultipliersAlgorithm for Automatic Generation of Partial Product Array.

Initialize:

Form 2N-1 lists Li ( i = 0, 2N-2 ) each consisting of pi elements where:

p i = i+1 for i N-1 and p i = 2N-1-i for i N

An element of a list Li ( j = 0,...,pi-1 ) is a pair: <nj, Dj>i where:

nj : is a unique node identifying name

Dj : is a delay associated with that node representing a delay of a signal arriving to the node nj with respect to some reference point.

For i = 0,1 and 2N-2: connect nodes from the corresponding lists Li directly to the CPA.

Delays

Delay(S) = MAX {Delay(A) + DA-S, Delay(B) + DB-S, Delay(Cin) + DCin-S}

Delay(C) = MAX {Delay(A) + DA-C, Delay(B) + DB-C, Delay(Cin) + DCin-C}

In our case the delays in a FA are :

FAA S = FAB S = 2 XOR delays FACin S = FAA C = FAB C = FACin C = 1 XOR delay.

In a HA:

HAA S = HAB S = 1 XOR delay while HAA C = HAB C = 0.5 XOR delay.

For i=2 to i=2N-3 {Partial Product Array Generation} Begin For if length of Li is even Then Begin If sort the elements of Li in ascending order by the values of delay j

connect an HA to the first 2 elements of Li starting with the slowest input

Ds =max {A+A-S, B+B-S} Dc =max {A+A-C, B+B-C} remove 2 elements from Li

insert the pair <Ds,NetName> into Li

insert the pair <Dc,NetName> into Li+1

decrement the length of Li

increment the length of Li+1

End If;

while length of Li > 3 Begin While sort the elements of Li in ascending order by the values of delay j connect an FA to the first 3 elements of Li starting with the slowest input of the FA:

Ds =max {cA+cA-S, cB+cB-S, cCi+cCi-S} Dc = max {cA+cA-C, cB+cB-C, cCi+cCi-C}

remove 3 elements from Li insert the pair <Ds,NetName> into Li insert the pair <Dc,NetName> into Li+1 subtract 2 from the length of Li increment the length of Li+1

End While;

sort the elements of Li connect an FA to the last 3 nodes of Li connect the S and C to the bit i and i+1 of the CPA

End For;End Method;

Competing Approaches

Comparison between TDM and other representative schemes, in XOR levels.

MultiplierWord-length

Wallace Tree [7] 4:2 Tree [11] Fadavi-Ardekani [16]

3 2 2 2 24 4 3 3 36 6 6 5 58 8 6 7 59 8 8 7 6

11 10 9 8 712 10 9 8 716 12 9 10 819 12 12 11 924 14 12 12 1032 16 12 13 1142 16 15 14 1253 18 15 15 1364 20 15 16 1495 20 18 17 15

oC, VCritical Path Delay [CMOS: Leff=1 , T=25 cc=5V]

N = 24-bits 4:2 Design 9:2 Design Fadavi-Ardekani TDM DesignDelay [nS] 14.0 13.0 11.7 10.5

0 20 40 60 80 100

Multiplier Width

Equivalent XOR Delays

Fadavi-Ardekani

Algorithm for Implementation of Fast Parallel Multipliers

[1] V. G. Oklobdzija, D. Villeger, and S. S. Liu, “A Method For Speed Optimized Partial Product Reduction And Generation Of Fast Parallel Multipliers Using An Algorithmic Approach,” IEEE Transactions on Computers, Vol 45, No.3, March, 1996.

[2] V. G. Oklobdzija and D. Villeger, “Improving Multiplier Design By Using Improved Column Compression Tree And Optimized Final Adder In CMOS Technology,” IEEE Transactions on VLSI Systems, Vol.3, No.2, June, 1995, 25 pages.

[3] V. G. Oklobdzija and D. Villeger, “Multiplier Design Utilizing Improved Column Compression Tree And Optimized Final Adder In CMOS Technology,” Proceedings of the 1993 International Symposium on VLSI Technology, Systems and Applications, pp. 209-212, 1993.

[4] P. Stelling, C. Martel, V. G. Oklobdzija, R. Ravi, “Optimal Circuits for Parallel Multipliers,” IEEE Transaction on Computers, Vol. 47, No.3, pp. 273-285, March, 1998.

Organization of Hitachi's DPL multiplier

4-2 4-2

54 bit 54 bit

Booth's Encoder

108-b CLA Adder

108 bit

Walace's tree

Conditional Carry Selection (CCS)

Hitachi's 4:2 compressor structure

3 GATES

DPL multiplexer circuit

Addition Under Non-equal Signal Arrival Profile

Assumption

P. Stelling , V. G. Oklobdzija, "Design Strategies for Optimal Hybrid Final Adders in a Parallel Multiplier", special issue on VLSI Arithmetic, Journal of VLSI Signal Processing, Kluwer Academic Publishers, Vol.14, No.3, December 1996

Signal Arrival Profile form the Parallel Multiplier Partial-Product Recuction Tree

Oklobdzija, Villeger, IEEE Transactions on VLSI Systems, June, 1995

Oklobdzija and Villeger, IEEE Transactions on VLSI Systems, June, 1995

Performing Multiply-Add Operation in the Multiply Time

P. Stelling, V. G. Oklobdzija, " Achieving Multiply-Accumulate Operation in the Multiply Time", Thirteenth International Symposium on Computer Arithmetic, Pacific Grove, California, July 5 - 9, 1997.

Final Adder: Implementation

RECOMENDATIONS

Fast Parallel Multipliers• Different Counter and Compressor Families were

compared. The best way is to build a compressor of the maximal size (i.e. the entire size of the multiplier)

• The Essence of the optimal tree is optimal wiring and NOT the use of counter/compressor family

• The use of Carry-Propagate Adders is advantageous for larger size multipliers in the first stage and for particular technology

• Tuning of the Final Adder into the signal arrival profile is more important than the speed of the Final Adder.

THEEND

Hollywood

VLSI Arithmetic Lecture 10: Multipliers

Documents

Transcript of VLSI Arithmetic Lecture 10: Multipliers

Prof. V.G. OklobdzijaVLSI Arithmetic1 VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California .

VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California .

EIT120 - Introduction to the Structured VLSI Design (Fall ... · EIT120 - Introduction to the Structured VLSI Design (Fall 2009) Arithmetic Logic Unit (ALU) FIR Filter v.1.0.0 Abstract

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: … IEEE PAPERS/vlsi... · High-Throughput Finite Field Multipliers Using Redundant Basis for FPGA and ASIC ... MULTIPLIERS USING REDUNDANT

Chapter 12 Arithmetic Circuits in CMOS VLSI

Techniques for Architecture Design for Binary Arithmetic ...dl.ifip.org/db/conf/vlsi/vlsisoc2009/DepraB09.pdf · Techniques for Architecture Design for Binary Arithmetic Decoder Engines

arithmetic circuits-2people.ee.duke.edu/.../Lectures/ArithmeticCircuits_2.pdf · 2011. 12. 1. · Arithmetic Circuits-2 • Multipliers – Array multipliers • Shifters ... •

VLSI Arithmetic - University of California, Davisvojin/CLASSES/EEC180A/... · Parallel Multipliers Step 0 Step 1 Step 2 Step 3 Step 4. 11 May 2004 Multiplier Design 51. 11 May 2004

High-Speed VLSI Arithmetic Units: Adders and Multipliers

CALTECH CONFERENCE ON VLSI,caltechconf.library.caltech.edu/229/1/KYLiu.pdf · efficients is used here to save multipliers. III. Symbol-Slice VLSI RS Encoder Architecture A finite

DESIGN AND CLOCKING OF VLSI MULTIPLIERSvlsiweb.stanford.edu/people/alum/pdf/8910_Santoro___Design_And... · DESIGN AND CLOCKING OF VLSI MULTIPLIERS Mark Ronald Santoro Technical Report

DESIGN AND CLOCKING OF VLSI MULTIPLIERS - · PDF fileDESIGN AND CLOCKING OF VLSI MULTIPLIERS ... This work was supported by the Defense Advanced Project Research Agency ... An 8 Bit

VLSI Architectures and Arithmetic Operations with Application to the Fermat Number Transform

VLSI Arithmetic - University of California, Davisvojin/CLASSES/EPFL/Lectures/VLSI... · Lecture 5. Prefix Adders and Parallel Prefix Adders. Oklobdzija 2004 Computer Arithmetic 4

-1- UC San Diego / VLSI CAD Laboratory Accuracy-Configurable Adder for Approximate Arithmetic Designs Andrew B. Kahng, Seokhyeong Kang VLSI CAD LABORATORY,

Arithmetic Circuits & Multipliersweb.mit.edu/6.111/www/f2017/handouts/L08.pdfArithmetic Circuits & Multipliers • Addition, subtraction • Performance issues-- ripple carry-- carry

VLSI Arithmetic-Lect-5

M. Tech. VLSI Design - jecrcuniversity.edu.in MTech VLSI Engg.pdf · Arithmetic design- Superscalar and super pipeline design- Multiprocessor system interconnects-Message passing

Lecture 8: Sequential Multipliers - George Mason …ece.gmu.edu/~dhwang/ece645/spring2008/lectures/ece645_lecture8.pdf · Lecture 8: Sequential Multipliers ECE 645—Computer Arithmetic

Chapter 3 Arithmetic for Computers - VLSI Signal …twins.ee.nctu.edu.tw/courses/co_09/lecture/Chapter 3...Chapter 3 — Arithmetic for Computers — 10 Detecting Overflow No overflow