A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal (BCD) Codes
Xiaoping Cui, Weiqiang Liu* and Wenwen Dong
College of Electronic and Information Engineering,
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Fabrizio Lombardi
Department of Electrical and Computer Engineering,
Northeastern University, Boston, USA
Outline
The Proposed Partial Product Tree
Motivation
Review of BCD Representations and Decimal Multiplier
2
The Proposed Partial Product Tree
Evaluation and Comparison
Conclusions
Motivation
Why Decimal Arithmetic is Needed?
� Binary arithmetic introduces conversion and rounding errors
� Decimal arithmetic is highly demanded in many applications
(financial, commercial and so on) that cannot tolerant errors.
3
(financial, commercial and so on) that cannot tolerant errors.
� Decimal specification has been added to the revised IEEE
754-2008 standard.
� High performance decimal arithmetic circuits are required.
Introduction
� Partial Production Generation
Sign Digit (SD) Radix-10 recoding
Redundant BCD excess-3 (XS-3)
Overloaded decimal digit set (ODDS) code
Double BCD recoding
Generation of Multiplier
5X 4X 3X 2X 1X
XS-3 digits in [-3,12]
Selection of Multiples (MUX-5)
PP[0] … PP[k] … PP[d-1] PP[d]
ODDS digits in [0,15]
Radix-10
recoder
4d
X(BCD)
4d
Y(BCD)
6d
.
.
.
.
.
.Yb0
YbK
Ybd-1 6
6
6
PPG
Block
4(d+1) 4(d+1) 4(d+1) 4(d+1) 4d
4(d+1) 4(d+1) 4(d+1) 4d... ...
� Partial Product Compression
Decimal 3:2 CSA
Binary compressor
� Final Decimal Adder
Parallel prefix/carry select adder
4
(d+1):2 PPR tree
ODDS digits in [0,15]
A(BCD XS-6) B(BCD-8421)
BCD Adder (2d digits)
8d 8d
8d
P(BCD)
[7] A. Vazquez, E. Antelo, and J. Bruguera, “Fast Radix-10 Multiplication
Using Redundant BCD codes”, IEEE Transactions on Computers, vol. 63,
no. 8, pp. 1902–1914, Aug. 2014.
Partial Production Generation
� SD Radix-10 recoding schemeSD Radix-10 Recoding
yi,3yi,2yi,1yi,0 y5i y4i y3iy2iy1i
(ysi-1=0)ysi y5i y4i y3iy2iy1i
(ysi-1=1)ysi
0000 00000 0 00001 0
0001 00001 0 00010 01
5&& 5
1 5&& 5
i i iY Y Y
Y Y Y
−
< < + < ≥
5
0010 00010 0 00100 0
0011 00100 0 01000 0
0100 01000 0 10000 0
0101 10000 0 01000 1
0110 01000 1 00100 1
0111 00100 1 00010 1
1000 00010 1 00001 1
1001 00001 1 00000 1
1
1
1
1 5&& 5
(10 ) 5&& 5
(10 ) 1 5&& 5
i i i
i
i i i
i i i
Y Y YYb
Y Y Y
Y Y Y
−
−
−
+ < ≥=
− − ≥ <− − + ≥ ≥
Partial Production Generation
Redundant BCD Codes
6
XS-3 Recoding (Redundant Odds [0, 15])
Partial Production Generation
N*Xd-1+3…………… N*Xi+3 N*Xi-1+3 ……… N*X0+3
4444
X0Xi-1XiXd-1
D0T0Di-1Ti-1 Ti-2DiTiDd-1 Td-2Td-1
……………… ……………
Digit-set[0,9]
STEP 1: DigitMappings
[3,9N+3]
Carry-out
X(BCD)
4d
╳5╳5 ╳4 ╳3 ╳2 Xi+3
5X 4X 3X 2X 1X
Digits in XS-3[-3,12]
4(d+1) 4(d+1) 4(d+1) 4(d+1) 4d
+
D0T0Di-1Ti-1 Ti-2DiTiDd-1 Td-2Td-1
++
444
……… 4…
NX0NXi-1NXiNXd-1
Carry-out
STEP 2:Carry assimilation
[-3,12]
Digits in XS-3[-3,12]
MUX-5
4 4 4 4 4
5Xi 4Xi 3Xi 2Xi 1Xi
1 1 1 1
1 1 1 1
SD Radix-101digit encoding
Y5kY4k Y3kY2kY1k
4
Yk(BCD)
YskYsk-1
Ysk
Ysk*
Digit in XS-3[-3,12]
Digits in ODDS[0,15]
Convert the XS-3 digits to ODDS by
adding pre-computed correction term:
fc(16)=1032 +
07407407407407417037037037037037
Advantage of XS-3 Codes: difficult
multiples (such as 3X) can be obtained in
a carry-free manner
7
Decimal PP Compression
Using ODDS Generation of Multiplier
5X 4X 3X 2X 1X
XS-3 digits in [-3,12]
Selection of Multiples (MUX-5)
PP[0] … PP[k] … PP[d-1] PP[d]
Radix-10
recoder
4d
X(BCD)
4d
Y(BCD)
6d
.
.
.
.
.
.Yb0
YbK
Ybd-1 6
6
6
PPG
Block
4(d+1) 4(d+1) 4(d+1) 4(d+1) 4d
Partial Product Compression
The (d+1:2) PP Reduction (PPR):
(1) A regular binary CSA tree
(2) A binary counter is used to count PP[0] … PP[k] … PP[d-1] PP[d]
(d+1):2 PPR tree
ODDS digits in [0,15]
A(BCD XS-6) B(BCD-8421)
BCD Adder (2d digits)
Yb0
4(d+1) 4(d+1) 4(d+1) 4d... ...
8d 8d
8d
P(BCD)
(3) The ODDS partial products in
(1) and (2) are added by the
binary CSA tree and the decimal
digit 3:2 compressor
(2) A binary counter is used to count
carries generated between the digit
columns in the binary CSA tree
8
� Decimal 3:2 CSADecimal PP Compression
Based on BCD-4221/5211
ci,jbi,jai,j
Partial Product (PP) Compression
s i,jhi,j
9
Proposed Design
A
B
P
Partial
productcompression
Partial
product
generation
Final
decimal
adder
New Design of PPR
Tree Block
10
Proposed Design: A New PPR Tree
Proposed PPR
(reduction) Tree
(d+1):2 Binary
CSA Tree
(ODDS to 4221)
BCD-4221 Sum
Correction Block
(Decimal Counter)
A Decimal Digit
3:2 Compressor
(BCD-4221)
11
A PPR Tree FOR 16*16-digit multiplier
17:2
Binary
PPR tree
8-bit
counter
3-bit counter
3-bit
counter
·
·
·4
18
1
3
3
2
PP i[0] PP i[k] PPi[16]
. . . . . .
Ci-1[0]
·
·
·
·
·
·
3
1
ui,3; ui,2; ui,1
vi,1
Ci[0]
·
·
·
Ci[7]
Ci[8]
Ci[9]
Ci[10]
Ci[11]
C [12]
Ci-1[K]
Ci-1[13]
4
2
BCD-4221 Sum
Correction Block
p1 p2 p3 p4
cout2 cin
p1 p2 p3 p4
cout2 cin
p1 p2 p3 p4
cout2 cin
p1 p2 p3 p4
cout2 cin
+6
c [13]ci-1[13]
� The No. of PP rows in the 1st,
2nd, 3rd and 4th stages are 17, 9,
6 and 4, respectively.
counter
6:2 Decimal PPR Tree
44
2
4221
4221
2*4221
x6
x6
x6
4221 4221 4221 4221 4*4221 55
1
1
3
1
4 4
Bi Ai
zi,1
Ci[12]
Ci[13]
Hi(4221) Si(4221)
x2 x1
ui-1,3; ui-1,2; ui-1,1
vi-1,1
zi-1,1
cout2 cin
cout1 sum
cout2 cin
cout1 sum
cout2 cin
cout1 sum
cout2 cin
cout1 sum
(2) (1)(4) (2)(8) (4)(16) (8)
si,3ci,3
ci[13]
si,2ci,2 si,1ci,1 si,0ci,0
ci-1[13]
12
� 4-bit binary 4:2 compressor
in last compression stage
the Proposed PPR Tree
3:2 3:2HA
3:23:2
ui,3ui,2ui,1ui-1,1 ui,0ui,0ui-1,3ui-1,2
3:2
3:23:2
C[4] C[0] C[5] C[6] C[7]C[1] C[2] C[3]
C[8]C[9]C[10] C[11]C[12]C[13]
4 4 4 4
ui,31 ui,2
1ui,1
1ui,0
1vi,1
1vi,0
1
vi,0vi,0vi,1vi-1,1
zi,11
zi,01
zi,0zi,0zi,1zi-1,1
BCD-4221 Sum
Correction Block
F
8-bit counter
3-bit counter
BCD-4221 8-bit and
3-bit counter correction
� The 8-bit BCD-4221 counter is faster
than a binary counter (only two 3:2 CSA
delay).
� 3-bit counters are used to generate a
BCD-4221 decimal correction digit by 3:2
3:2
x2
3:2
3:2
x2
x2
42214221
x2
x2
4 4
6:2 Decimal
PPR Tree Block
Ai(4221) Bi(4*4221)
Hi(4221) Si(4221)
4 4x2 x1
F
F
F
6:2 decimal PPR tree block
13
BCD-4221 decimal correction digit by
using only one 3:2 compressor.
� To balance the paths in the decimal 6:2
PPR tree and reduce the critical path.
Using Hybrid (Multiple) BCD Codes
4*4221
4221
BCD-4221
PPR Tree
BCD-
8421
PPG
XS-3
2*4221
4221
ODDSBCD-4221 sum
correction block
Binary
PPR Tree
Adder
set
excess-6
BCD-8421
Decimal
Adder
BCD-
8421
14
Advantages of the proposed PPR tree
1 A BCD-4221 counter is faster than a binary counter (a 8-bit counter has
two 3:2 CSA stages, and 3-bit counter has one 3:2 CSA stage.)
2 A non-fixed size BCD-4221 counter correction block is used to
3
2 A non-fixed size BCD-4221 counter correction block is used to
balance the paths and reduce the critical path delay of decimal 6:2 PPR
tree.
The final two PP rows are generated using a decimal PPR tree
based on BCD-4221 that is easy to be converted to BCD-8421.
15
Evaluation
BlockDelay
#FO4
Area
#NAND2
PPG Stage 10.2 14900
25.3 14306
Area and Delay (LE-Based Model) for the Proposed 16×16-digit Multipliers.
PPR Tree 25.3 14306
Adder Setup 3.2 1050
Decimal Adder 11.5 2400
Total50.2
32656
16
Evaluation
DesignDelay
#FO4
Area
#NAND2
Non-Redundant [13] 58.3 35750
Area and Delay (LE-Based) Comparision for Different BCD Multiplier Designs.
Redundant [7] 51.4 30600
Proposed
50.2
Compared with [13] -13.89%
Compared with [7] -2.23%
32656
Compared with [13] -8.65%
Compared with [7] +6.05%[7] A. Vazquez, E. Antelo, and J. Bruguera, “Fast Radix-10 Multiplication Using Redundant BCD codes”, IEEE Transactions on
Computers, vol. 63, no. 8, pp. 1902–1914, Aug. 2014.
[13] A. Vazquez, E. Antelo and P. Montuschi, “Improved Design of High-Performance Parallel Decimal Multipliers”, IEEE Transactions
on Computers, vol. 59, no. 5, pp. 679–693, May 2010.
17
Evaluation
DesignDelay(ns)
Ratio Area(μm2) Ratio
Proposed 3.21 1 43053.5 1
Area and Delay Comparison Using NanGate 45nm open cell library
� The proposed design reduces the delay by 12.30% and the area by 10.9%
compared with [13].
� [7] reduces the delay by 10.75% and the area by 11.1% compared with [13] (no
direct comparison as some parts of PPR circuit of [7] are not provided in detail).
Non-Redundant
[13]3.66 1.14 48326.1 1.12
18
Conclusion
� Design of parallel decimal multiplier is studied
� A parallel decimal multiplier based on a new PPR tree is proposed by using:
� A BCD-4221 sum correction block with non-fixed size counters,
� A decimal PPR tree based on BCD-4221 decimal digit 3:2 compressor.
19
� A decimal PPR tree based on BCD-4221 decimal digit 3:2 compressor.
� The proposed parallel decimal multiplier is faster than previous best designs.
Thank you!Thank you!
Questions?
Top Related