A Parallel Decimal Multiplier Using Hybrid Binary Coded...

20
A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal (BCD) Codes Xiaoping Cui, Weiqiang Liu* and Wenwen Dong College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China Fabrizio Lombardi Department of Electrical and Computer Engineering, Northeastern University, Boston, USA

Transcript of A Parallel Decimal Multiplier Using Hybrid Binary Coded...

Page 1: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal (BCD) Codes

Xiaoping Cui, Weiqiang Liu* and Wenwen Dong

College of Electronic and Information Engineering,

Nanjing University of Aeronautics and Astronautics, Nanjing, China

Fabrizio Lombardi

Department of Electrical and Computer Engineering,

Northeastern University, Boston, USA

Page 2: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Outline

The Proposed Partial Product Tree

Motivation

Review of BCD Representations and Decimal Multiplier

2

The Proposed Partial Product Tree

Evaluation and Comparison

Conclusions

Page 3: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Motivation

Why Decimal Arithmetic is Needed?

� Binary arithmetic introduces conversion and rounding errors

� Decimal arithmetic is highly demanded in many applications

(financial, commercial and so on) that cannot tolerant errors.

3

(financial, commercial and so on) that cannot tolerant errors.

� Decimal specification has been added to the revised IEEE

754-2008 standard.

� High performance decimal arithmetic circuits are required.

Page 4: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Introduction

� Partial Production Generation

Sign Digit (SD) Radix-10 recoding

Redundant BCD excess-3 (XS-3)

Overloaded decimal digit set (ODDS) code

Double BCD recoding

Generation of Multiplier

5X 4X 3X 2X 1X

XS-3 digits in [-3,12]

Selection of Multiples (MUX-5)

PP[0] … PP[k] … PP[d-1] PP[d]

ODDS digits in [0,15]

Radix-10

recoder

4d

X(BCD)

4d

Y(BCD)

6d

.

.

.

.

.

.Yb0

YbK

Ybd-1 6

6

6

PPG

Block

4(d+1) 4(d+1) 4(d+1) 4(d+1) 4d

4(d+1) 4(d+1) 4(d+1) 4d... ...

� Partial Product Compression

Decimal 3:2 CSA

Binary compressor

� Final Decimal Adder

Parallel prefix/carry select adder

4

(d+1):2 PPR tree

ODDS digits in [0,15]

A(BCD XS-6) B(BCD-8421)

BCD Adder (2d digits)

8d 8d

8d

P(BCD)

[7] A. Vazquez, E. Antelo, and J. Bruguera, “Fast Radix-10 Multiplication

Using Redundant BCD codes”, IEEE Transactions on Computers, vol. 63,

no. 8, pp. 1902–1914, Aug. 2014.

Page 5: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Partial Production Generation

� SD Radix-10 recoding schemeSD Radix-10 Recoding

yi,3yi,2yi,1yi,0 y5i y4i y3iy2iy1i

(ysi-1=0)ysi y5i y4i y3iy2iy1i

(ysi-1=1)ysi

0000 00000 0 00001 0

0001 00001 0 00010 01

5&& 5

1 5&& 5

i i iY Y Y

Y Y Y

< < + < ≥

5

0010 00010 0 00100 0

0011 00100 0 01000 0

0100 01000 0 10000 0

0101 10000 0 01000 1

0110 01000 1 00100 1

0111 00100 1 00010 1

1000 00010 1 00001 1

1001 00001 1 00000 1

1

1

1

1 5&& 5

(10 ) 5&& 5

(10 ) 1 5&& 5

i i i

i

i i i

i i i

Y Y YYb

Y Y Y

Y Y Y

+ < ≥=

− − ≥ <− − + ≥ ≥

Page 6: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Partial Production Generation

Redundant BCD Codes

6

Page 7: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

XS-3 Recoding (Redundant Odds [0, 15])

Partial Production Generation

N*Xd-1+3…………… N*Xi+3 N*Xi-1+3 ……… N*X0+3

4444

X0Xi-1XiXd-1

D0T0Di-1Ti-1 Ti-2DiTiDd-1 Td-2Td-1

……………… ……………

Digit-set[0,9]

STEP 1: DigitMappings

[3,9N+3]

Carry-out

X(BCD)

4d

╳5╳5 ╳4 ╳3 ╳2 Xi+3

5X 4X 3X 2X 1X

Digits in XS-3[-3,12]

4(d+1) 4(d+1) 4(d+1) 4(d+1) 4d

+

D0T0Di-1Ti-1 Ti-2DiTiDd-1 Td-2Td-1

++

444

……… 4…

NX0NXi-1NXiNXd-1

Carry-out

STEP 2:Carry assimilation

[-3,12]

Digits in XS-3[-3,12]

MUX-5

4 4 4 4 4

5Xi 4Xi 3Xi 2Xi 1Xi

1 1 1 1

1 1 1 1

SD Radix-101digit encoding

Y5kY4k Y3kY2kY1k

4

Yk(BCD)

YskYsk-1

Ysk

Ysk*

Digit in XS-3[-3,12]

Digits in ODDS[0,15]

Convert the XS-3 digits to ODDS by

adding pre-computed correction term:

fc(16)=1032 +

07407407407407417037037037037037

Advantage of XS-3 Codes: difficult

multiples (such as 3X) can be obtained in

a carry-free manner

7

Page 8: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Decimal PP Compression

Using ODDS Generation of Multiplier

5X 4X 3X 2X 1X

XS-3 digits in [-3,12]

Selection of Multiples (MUX-5)

PP[0] … PP[k] … PP[d-1] PP[d]

Radix-10

recoder

4d

X(BCD)

4d

Y(BCD)

6d

.

.

.

.

.

.Yb0

YbK

Ybd-1 6

6

6

PPG

Block

4(d+1) 4(d+1) 4(d+1) 4(d+1) 4d

Partial Product Compression

The (d+1:2) PP Reduction (PPR):

(1) A regular binary CSA tree

(2) A binary counter is used to count PP[0] … PP[k] … PP[d-1] PP[d]

(d+1):2 PPR tree

ODDS digits in [0,15]

A(BCD XS-6) B(BCD-8421)

BCD Adder (2d digits)

Yb0

4(d+1) 4(d+1) 4(d+1) 4d... ...

8d 8d

8d

P(BCD)

(3) The ODDS partial products in

(1) and (2) are added by the

binary CSA tree and the decimal

digit 3:2 compressor

(2) A binary counter is used to count

carries generated between the digit

columns in the binary CSA tree

8

Page 9: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

� Decimal 3:2 CSADecimal PP Compression

Based on BCD-4221/5211

ci,jbi,jai,j

Partial Product (PP) Compression

s i,jhi,j

9

Page 10: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Proposed Design

A

B

P

Partial

productcompression

Partial

product

generation

Final

decimal

adder

New Design of PPR

Tree Block

10

Page 11: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Proposed Design: A New PPR Tree

Proposed PPR

(reduction) Tree

(d+1):2 Binary

CSA Tree

(ODDS to 4221)

BCD-4221 Sum

Correction Block

(Decimal Counter)

A Decimal Digit

3:2 Compressor

(BCD-4221)

11

Page 12: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

A PPR Tree FOR 16*16-digit multiplier

17:2

Binary

PPR tree

8-bit

counter

3-bit counter

3-bit

counter

·

·

·4

18

1

3

3

2

PP i[0] PP i[k] PPi[16]

. . . . . .

Ci-1[0]

·

·

·

·

·

·

3

1

ui,3; ui,2; ui,1

vi,1

Ci[0]

·

·

·

Ci[7]

Ci[8]

Ci[9]

Ci[10]

Ci[11]

C [12]

Ci-1[K]

Ci-1[13]

4

2

BCD-4221 Sum

Correction Block

p1 p2 p3 p4

cout2 cin

p1 p2 p3 p4

cout2 cin

p1 p2 p3 p4

cout2 cin

p1 p2 p3 p4

cout2 cin

+6

c [13]ci-1[13]

� The No. of PP rows in the 1st,

2nd, 3rd and 4th stages are 17, 9,

6 and 4, respectively.

counter

6:2 Decimal PPR Tree

44

2

4221

4221

2*4221

x6

x6

x6

4221 4221 4221 4221 4*4221 55

1

1

3

1

4 4

Bi Ai

zi,1

Ci[12]

Ci[13]

Hi(4221) Si(4221)

x2 x1

ui-1,3; ui-1,2; ui-1,1

vi-1,1

zi-1,1

cout2 cin

cout1 sum

cout2 cin

cout1 sum

cout2 cin

cout1 sum

cout2 cin

cout1 sum

(2) (1)(4) (2)(8) (4)(16) (8)

si,3ci,3

ci[13]

si,2ci,2 si,1ci,1 si,0ci,0

ci-1[13]

12

� 4-bit binary 4:2 compressor

in last compression stage

Page 13: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

the Proposed PPR Tree

3:2 3:2HA

3:23:2

ui,3ui,2ui,1ui-1,1 ui,0ui,0ui-1,3ui-1,2

3:2

3:23:2

C[4] C[0] C[5] C[6] C[7]C[1] C[2] C[3]

C[8]C[9]C[10] C[11]C[12]C[13]

4 4 4 4

ui,31 ui,2

1ui,1

1ui,0

1vi,1

1vi,0

1

vi,0vi,0vi,1vi-1,1

zi,11

zi,01

zi,0zi,0zi,1zi-1,1

BCD-4221 Sum

Correction Block

F

8-bit counter

3-bit counter

BCD-4221 8-bit and

3-bit counter correction

� The 8-bit BCD-4221 counter is faster

than a binary counter (only two 3:2 CSA

delay).

� 3-bit counters are used to generate a

BCD-4221 decimal correction digit by 3:2

3:2

x2

3:2

3:2

x2

x2

42214221

x2

x2

4 4

6:2 Decimal

PPR Tree Block

Ai(4221) Bi(4*4221)

Hi(4221) Si(4221)

4 4x2 x1

F

F

F

6:2 decimal PPR tree block

13

BCD-4221 decimal correction digit by

using only one 3:2 compressor.

� To balance the paths in the decimal 6:2

PPR tree and reduce the critical path.

Page 14: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Using Hybrid (Multiple) BCD Codes

4*4221

4221

BCD-4221

PPR Tree

BCD-

8421

PPG

XS-3

2*4221

4221

ODDSBCD-4221 sum

correction block

Binary

PPR Tree

Adder

set

excess-6

BCD-8421

Decimal

Adder

BCD-

8421

14

Page 15: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Advantages of the proposed PPR tree

1 A BCD-4221 counter is faster than a binary counter (a 8-bit counter has

two 3:2 CSA stages, and 3-bit counter has one 3:2 CSA stage.)

2 A non-fixed size BCD-4221 counter correction block is used to

3

2 A non-fixed size BCD-4221 counter correction block is used to

balance the paths and reduce the critical path delay of decimal 6:2 PPR

tree.

The final two PP rows are generated using a decimal PPR tree

based on BCD-4221 that is easy to be converted to BCD-8421.

15

Page 16: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Evaluation

BlockDelay

#FO4

Area

#NAND2

PPG Stage 10.2 14900

25.3 14306

Area and Delay (LE-Based Model) for the Proposed 16×16-digit Multipliers.

PPR Tree 25.3 14306

Adder Setup 3.2 1050

Decimal Adder 11.5 2400

Total50.2

32656

16

Page 17: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Evaluation

DesignDelay

#FO4

Area

#NAND2

Non-Redundant [13] 58.3 35750

Area and Delay (LE-Based) Comparision for Different BCD Multiplier Designs.

Redundant [7] 51.4 30600

Proposed

50.2

Compared with [13] -13.89%

Compared with [7] -2.23%

32656

Compared with [13] -8.65%

Compared with [7] +6.05%[7] A. Vazquez, E. Antelo, and J. Bruguera, “Fast Radix-10 Multiplication Using Redundant BCD codes”, IEEE Transactions on

Computers, vol. 63, no. 8, pp. 1902–1914, Aug. 2014.

[13] A. Vazquez, E. Antelo and P. Montuschi, “Improved Design of High-Performance Parallel Decimal Multipliers”, IEEE Transactions

on Computers, vol. 59, no. 5, pp. 679–693, May 2010.

17

Page 18: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Evaluation

DesignDelay(ns)

Ratio Area(μm2) Ratio

Proposed 3.21 1 43053.5 1

Area and Delay Comparison Using NanGate 45nm open cell library

� The proposed design reduces the delay by 12.30% and the area by 10.9%

compared with [13].

� [7] reduces the delay by 10.75% and the area by 11.1% compared with [13] (no

direct comparison as some parts of PPR circuit of [7] are not provided in detail).

Non-Redundant

[13]3.66 1.14 48326.1 1.12

18

Page 19: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Conclusion

� Design of parallel decimal multiplier is studied

� A parallel decimal multiplier based on a new PPR tree is proposed by using:

� A BCD-4221 sum correction block with non-fixed size counters,

� A decimal PPR tree based on BCD-4221 decimal digit 3:2 compressor.

19

� A decimal PPR tree based on BCD-4221 decimal digit 3:2 compressor.

� The proposed parallel decimal multiplier is faster than previous best designs.

Page 20: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Thank you!Thank you!

Questions?