Post on 08-Feb-2016
description
VLSI Arithmetic
Lecture 10:Multipliers
Prof. Vojin G. OklobdzijaUniversity of California
http://www.ece.ucdavis.edu/acsel
44 Multiplier DesignApril 22, 2023
A Method for Generation of FastParallel Multipliers
by
Vojin G. OklobdzijaDavid VillegerSimon S. Liu
Electrical and Computer EngineeringUniversity of California
Davis
Multiplier DesignApril 22, 2023 45
Fast Parallel MultipliersObjectiveImproved Speed of Parallel Multiplier via:
• Improvements in Partial-Product Bit Reduction Techniques
• Optimization of the Final Adder for the Uneven Signal Arrival Profile from the Multiplier Tree
Multiplier DesignApril 22, 2023 46
Multiplication
Algorithm:
in
i
iin
i
i ryXryXXYP
1
0
1
0
0 p)(0
)(1)1(j
njj Xyrpr
p for j=0,....,n-1
initially
p(n)=XY after n steps
Multiplier DesignApril 22, 2023 60
Re-designed 4:2 Compressor with 3 XOR Delay (Nagamatsu, Toshiba)C inI1
I2I3I4
0
1
S
C
Cout
Multiplier DesignApril 22, 2023 66
Title
0
2
4
6
8
10
12
14
16
18
20
22
24D
elay
(XO
R G
ates
)
0 10 20 30 40 50 60 70 80 90 100
Multiplier Width (bits)
Critical Path: (Equivalent XOR Gate Delays)
3,2 Counter
9to2 Compressor (Redesigned)
4:2 Compressor (Redesigned)
Multiplier DesignApril 22, 2023 68
Use of Higher-Order Compressors
D. Villeger, V.G. Oklobdzija 1993
Multiplier DesignApril 22, 2023 72
Using Carry-Propagate Adders
(G. Bewick 1993) (D. Villeger & V. G. Oklobdzija 1993)
Multiplier DesignApril 22, 2023 76
A Method for Speed Optimized Partial Product Reduction and Generation of
Fast Parallel Multipliers Using an Algorithmic Approach – TDM
(Oklobdzija, Villeger, Liu, 1994)
Multiplier DesignApril 22, 2023 77
Carry Propagate Adder
Vertical Slices
Horizontal PropagationCarry and Sum Connection to the Final Adder
Partial Product Martix Divided into Vertical Compressor Slices
Multiplier DesignApril 22, 2023 78
Example of a12 X 12 Multiplication
1 0 1 1 0 1 0 1 0 1 0 01 0 1 1 0 1 0 1 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 01 0 1 1 0 1 0 1 0 1 0 0
1 0 1 1 0 1 0 1 0 1 0 00 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0
1 0 1 1 0 1 0 1 0 1 0 01 0 1 1 0 1 0 1 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 01 0 1 1 0 1 0 1 0 1 0 0
Vertical Compressor Slice - VCS
(Partial Product for X*Y =B54 * B1B)
FA FA
FA
FA
0 0 1 1 0 1 0
FA
3-Dimensional View of Partial Product Reduction
Time
Final Adder
Multiplier DesignApril 22, 2023 79
AB
Cin Sum
Carry
Signal Delays in a Full Adder(3,2) Counter
Fast Input
Fast Output
Multiplier DesignApril 22, 2023 80
AB
Cin Sum
Carry
Signal Delays in a Full Adder(3,2) Counter
Fast Input
Fast Output
Multiplier DesignApril 22, 2023 81
Three-Dimensional optimization Method: TDM(Oklobdzija, Villeger, Liu, 1996)
Sum
Carry
ABCin Sum
Carry
ABCin
I1
I2I3I4
Cout
Cin 3 XORdelays
Multiplier DesignApril 22, 2023 83
sc
cina b
TDM ArrangementWorst Case
4
4
21
44
sc
cina b
24 3
6 1
1
6
3
Multiplier DesignApril 22, 2023 84
Two cases of signals passing through the next level
0 0 0
sc
cina b
sc
cina b
sc
cina b
1
2 11
100 0
02 2
Best Balanced Case Average Case
sc
cina b
2 221 1 0
3 433
Multiplier DesignApril 22, 2023 85
Example of a Optimized Interconnection
sc
cina b
sc
cina b
sc
cina b
sc
cina b
bit (n-1) positionbit (n) position
2 xor0 xor
1 xor
3 xor3 xor
Example of a not Optimized Interconnection
sc
cina b
sc
cina b
sc
cina b
sc
cina b
bit (n-1) positionbit (n) position
2 xor0 xor
1 xor
4 xor3 xor
Example of Delay Optimization
Multiplier DesignApril 22, 2023 86
The 9th Vertical Compressor Slice of a Multiplier
A B
C S
A B Cin
C S
A B Cin
C S
A B Cin
C S
A B Cin
C S
A B Cin
C S
A B Cin
C S
0 0 0 0 0 0 0 0 0 .5 1 1 2 3
.5 1 11 2 22 2.5
3 3 3.5 4
5 5
Multiplier DesignApril 22, 2023 88
sc
cina b
Signals withDifferent Delay Levels
Signals withDifferent Delay Levels
Bypass Signals,No Delay Added
sc
cina b
sc
cina b
Method for Optimal Interconnection
Signals with short delays
Signals with intermediate delays
Add slow input delay
Add fast input delay
Level i
Level i+1
Signals with long delays
Signals with long delaysSignals with
short delaysSignals with intermediate delays
sc
cina b
Multiplier DesignApril 22, 2023 89
Design of Parallel MultipliersAlgorithm for Automatic Generation of Partial Product Array.
Initialize:
Form 2N-1 lists Li ( i = 0, 2N-2 ) each consisting of pi elements where:
p i = i+1 for i N-1 and p i = 2N-1-i for i N
An element of a list Li ( j = 0,...,pi-1 ) is a pair: <nj, Dj>i where:
nj : is a unique node identifying name
Dj : is a delay associated with that node representing a delay of a signal arriving to the node nj with respect to some reference point.
For i = 0,1 and 2N-2: connect nodes from the corresponding lists Li directly to the CPA.
Multiplier DesignApril 22, 2023 90
Delays
Delay(S) = MAX {Delay(A) + DA-S, Delay(B) + DB-S, Delay(Cin) + DCin-S}
Delay(C) = MAX {Delay(A) + DA-C, Delay(B) + DB-C, Delay(Cin) + DCin-C}
In our case the delays in a FA are :
FAA S = FAB S = 2 XOR delays FACin S = FAA C = FAB C = FACin C = 1 XOR delay.
In a HA:
HAA S = HAB S = 1 XOR delay while HAA C = HAB C = 0.5 XOR delay.
Multiplier DesignApril 22, 2023 91
For i=2 to i=2N-3 {Partial Product Array Generation} Begin For if length of Li is even Then Begin If sort the elements of Li in ascending order by the values of delay j
connect an HA to the first 2 elements of Li starting with the slowest input
Ds =max {A+A-S, B+B-S} Dc =max {A+A-C, B+B-C} remove 2 elements from Li
insert the pair <Ds,NetName> into Li
insert the pair <Dc,NetName> into Li+1
decrement the length of Li
increment the length of Li+1
End If;
92 Multiplier DesignApril 22, 2023
while length of Li > 3 Begin While sort the elements of Li in ascending order by the values of delay j connect an FA to the first 3 elements of Li starting with the slowest input of the FA:
Ds =max {cA+cA-S, cB+cB-S, cCi+cCi-S} Dc = max {cA+cA-C, cB+cB-C, cCi+cCi-C}
remove 3 elements from Li insert the pair <Ds,NetName> into Li insert the pair <Dc,NetName> into Li+1 subtract 2 from the length of Li increment the length of Li+1
End While;
sort the elements of Li connect an FA to the last 3 nodes of Li connect the S and C to the bit i and i+1 of the CPA
End For;End Method;
94 Multiplier DesignApril 22, 2023
Comparison between TDM and other representative schemes, in XOR levels.
MultiplierWord-length
Wallace Tree [7] 4:2 Tree [11] Fadavi-Ardekani [16]
TDM
3 2 2 2 24 4 3 3 36 6 6 5 58 8 6 7 59 8 8 7 6
11 10 9 8 712 10 9 8 716 12 9 10 819 12 12 11 924 14 12 12 1032 16 12 13 1142 16 15 14 1253 18 15 15 1364 20 15 16 1495 20 18 17 15
Multiplier DesignApril 22, 2023 95
oC, VCritical Path Delay [CMOS: Leff=1 , T=25 cc=5V]
N = 24-bits 4:2 Design 9:2 Design Fadavi-Ardekani TDM DesignDelay [nS] 14.0 13.0 11.7 10.5
Multiplier DesignApril 22, 2023 96
0
2
4
6
8
10
12
14
16
18
20
22
24
Del
ay (X
OR
Lev
els)
0 20 40 60 80 100
Multiplier Width
Equivalent XOR Delays
TDM
Fadavi-Ardekani
9:2
4:2
3,2
Multiplier DesignApril 22, 2023 97
Algorithm for Implementation of Fast Parallel Multipliers
[1] V. G. Oklobdzija, D. Villeger, and S. S. Liu, “A Method For Speed Optimized Partial Product Reduction And Generation Of Fast Parallel Multipliers Using An Algorithmic Approach,” IEEE Transactions on Computers, Vol 45, No.3, March, 1996.
[2] V. G. Oklobdzija and D. Villeger, “Improving Multiplier Design By Using Improved Column Compression Tree And Optimized Final Adder In CMOS Technology,” IEEE Transactions on VLSI Systems, Vol.3, No.2, June, 1995, 25 pages.
[3] V. G. Oklobdzija and D. Villeger, “Multiplier Design Utilizing Improved Column Compression Tree And Optimized Final Adder In CMOS Technology,” Proceedings of the 1993 International Symposium on VLSI Technology, Systems and Applications, pp. 209-212, 1993.
[4] P. Stelling, C. Martel, V. G. Oklobdzija, R. Ravi, “Optimal Circuits for Parallel Multipliers,” IEEE Transaction on Computers, Vol. 47, No.3, pp. 273-285, March, 1998.
Multiplier DesignApril 22, 2023 98
Organization of Hitachi's DPL multiplier
4-2 4-2
4-2
4-2 4-2
4-2
4-2 4-2
4-2
4-2
4-2
4-2
4-2
54 bit 54 bit
Booth's Encoder
108-b CLA Adder
108 bit
Walace's tree
Conditional Carry Selection (CCS)
Multiplier DesignApril 22, 2023 99
Hitachi's 4:2 compressor structure
MUX
MUX
MUX
MUX
I4
I3
I1
I2
MUX
MUX
I1
I3
I4
Ci
Ci
Co
C
S
3 GATES
Multiplier DesignApril 22, 2023 100
DPL multiplexer circuit
L
HMUX
D0
D1
D0
D1
S S
OUT
OUT
OUT
S
D1
D0
Multiplier DesignApril 22, 2023 101
Addition Under Non-equal Signal Arrival Profile
Assumption
P. Stelling , V. G. Oklobdzija, "Design Strategies for Optimal Hybrid Final Adders in a Parallel Multiplier", special issue on VLSI Arithmetic, Journal of VLSI Signal Processing, Kluwer Academic Publishers, Vol.14, No.3, December 1996
Multiplier DesignApril 22, 2023 102
Signal Arrival Profile form the Parallel Multiplier Partial-Product Recuction Tree
Multiplier DesignApril 22, 2023 103
Oklobdzija, Villeger, IEEE Transactions on VLSI Systems, June, 1995
Multiplier DesignApril 22, 2023 104
Oklobdzija and Villeger, IEEE Transactions on VLSI Systems, June, 1995
Multiplier DesignApril 22, 2023 113
Performing Multiply-Add Operation in the Multiply Time
P. Stelling, V. G. Oklobdzija, " Achieving Multiply-Accumulate Operation in the Multiply Time", Thirteenth International Symposium on Computer Arithmetic, Pacific Grove, California, July 5 - 9, 1997.
Multiplier DesignApril 22, 2023 120
Fast Parallel Multipliers• Different Counter and Compressor Families were
compared. The best way is to build a compressor of the maximal size (i.e. the entire size of the multiplier)
• The Essence of the optimal tree is optimal wiring and NOT the use of counter/compressor family
• The use of Carry-Propagate Adders is advantageous for larger size multipliers in the first stage and for particular technology
• Tuning of the Final Adder into the signal arrival profile is more important than the speed of the Final Adder.