Download - A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Transcript
Page 1: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal (BCD) Codes

Xiaoping Cui, Weiqiang Liu* and Wenwen Dong

College of Electronic and Information Engineering,

Nanjing University of Aeronautics and Astronautics, Nanjing, China

Fabrizio Lombardi

Department of Electrical and Computer Engineering,

Northeastern University, Boston, USA

Page 2: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Outline

The Proposed Partial Product Tree

Motivation

Review of BCD Representations and Decimal Multiplier

2

The Proposed Partial Product Tree

Evaluation and Comparison

Conclusions

Page 3: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Motivation

Why Decimal Arithmetic is Needed?

� Binary arithmetic introduces conversion and rounding errors

� Decimal arithmetic is highly demanded in many applications

(financial, commercial and so on) that cannot tolerant errors.

3

(financial, commercial and so on) that cannot tolerant errors.

� Decimal specification has been added to the revised IEEE

754-2008 standard.

� High performance decimal arithmetic circuits are required.

Page 4: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Introduction

� Partial Production Generation

Sign Digit (SD) Radix-10 recoding

Redundant BCD excess-3 (XS-3)

Overloaded decimal digit set (ODDS) code

Double BCD recoding

Generation of Multiplier

5X 4X 3X 2X 1X

XS-3 digits in [-3,12]

Selection of Multiples (MUX-5)

PP[0] … PP[k] … PP[d-1] PP[d]

ODDS digits in [0,15]

Radix-10

recoder

4d

X(BCD)

4d

Y(BCD)

6d

.

.

.

.

.

.Yb0

YbK

Ybd-1 6

6

6

PPG

Block

4(d+1) 4(d+1) 4(d+1) 4(d+1) 4d

4(d+1) 4(d+1) 4(d+1) 4d... ...

� Partial Product Compression

Decimal 3:2 CSA

Binary compressor

� Final Decimal Adder

Parallel prefix/carry select adder

4

(d+1):2 PPR tree

ODDS digits in [0,15]

A(BCD XS-6) B(BCD-8421)

BCD Adder (2d digits)

8d 8d

8d

P(BCD)

[7] A. Vazquez, E. Antelo, and J. Bruguera, “Fast Radix-10 Multiplication

Using Redundant BCD codes”, IEEE Transactions on Computers, vol. 63,

no. 8, pp. 1902–1914, Aug. 2014.

Page 5: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Partial Production Generation

� SD Radix-10 recoding schemeSD Radix-10 Recoding

yi,3yi,2yi,1yi,0 y5i y4i y3iy2iy1i

(ysi-1=0)ysi y5i y4i y3iy2iy1i

(ysi-1=1)ysi

0000 00000 0 00001 0

0001 00001 0 00010 01

5&& 5

1 5&& 5

i i iY Y Y

Y Y Y

< < + < ≥

5

0010 00010 0 00100 0

0011 00100 0 01000 0

0100 01000 0 10000 0

0101 10000 0 01000 1

0110 01000 1 00100 1

0111 00100 1 00010 1

1000 00010 1 00001 1

1001 00001 1 00000 1

1

1

1

1 5&& 5

(10 ) 5&& 5

(10 ) 1 5&& 5

i i i

i

i i i

i i i

Y Y YYb

Y Y Y

Y Y Y

+ < ≥=

− − ≥ <− − + ≥ ≥

Page 6: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Partial Production Generation

Redundant BCD Codes

6

Page 7: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

XS-3 Recoding (Redundant Odds [0, 15])

Partial Production Generation

N*Xd-1+3…………… N*Xi+3 N*Xi-1+3 ……… N*X0+3

4444

X0Xi-1XiXd-1

D0T0Di-1Ti-1 Ti-2DiTiDd-1 Td-2Td-1

……………… ……………

Digit-set[0,9]

STEP 1: DigitMappings

[3,9N+3]

Carry-out

X(BCD)

4d

╳5╳5 ╳4 ╳3 ╳2 Xi+3

5X 4X 3X 2X 1X

Digits in XS-3[-3,12]

4(d+1) 4(d+1) 4(d+1) 4(d+1) 4d

+

D0T0Di-1Ti-1 Ti-2DiTiDd-1 Td-2Td-1

++

444

……… 4…

NX0NXi-1NXiNXd-1

Carry-out

STEP 2:Carry assimilation

[-3,12]

Digits in XS-3[-3,12]

MUX-5

4 4 4 4 4

5Xi 4Xi 3Xi 2Xi 1Xi

1 1 1 1

1 1 1 1

SD Radix-101digit encoding

Y5kY4k Y3kY2kY1k

4

Yk(BCD)

YskYsk-1

Ysk

Ysk*

Digit in XS-3[-3,12]

Digits in ODDS[0,15]

Convert the XS-3 digits to ODDS by

adding pre-computed correction term:

fc(16)=1032 +

07407407407407417037037037037037

Advantage of XS-3 Codes: difficult

multiples (such as 3X) can be obtained in

a carry-free manner

7

Page 8: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Decimal PP Compression

Using ODDS Generation of Multiplier

5X 4X 3X 2X 1X

XS-3 digits in [-3,12]

Selection of Multiples (MUX-5)

PP[0] … PP[k] … PP[d-1] PP[d]

Radix-10

recoder

4d

X(BCD)

4d

Y(BCD)

6d

.

.

.

.

.

.Yb0

YbK

Ybd-1 6

6

6

PPG

Block

4(d+1) 4(d+1) 4(d+1) 4(d+1) 4d

Partial Product Compression

The (d+1:2) PP Reduction (PPR):

(1) A regular binary CSA tree

(2) A binary counter is used to count PP[0] … PP[k] … PP[d-1] PP[d]

(d+1):2 PPR tree

ODDS digits in [0,15]

A(BCD XS-6) B(BCD-8421)

BCD Adder (2d digits)

Yb0

4(d+1) 4(d+1) 4(d+1) 4d... ...

8d 8d

8d

P(BCD)

(3) The ODDS partial products in

(1) and (2) are added by the

binary CSA tree and the decimal

digit 3:2 compressor

(2) A binary counter is used to count

carries generated between the digit

columns in the binary CSA tree

8

Page 9: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

� Decimal 3:2 CSADecimal PP Compression

Based on BCD-4221/5211

ci,jbi,jai,j

Partial Product (PP) Compression

s i,jhi,j

9

Page 10: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Proposed Design

A

B

P

Partial

productcompression

Partial

product

generation

Final

decimal

adder

New Design of PPR

Tree Block

10

Page 11: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Proposed Design: A New PPR Tree

Proposed PPR

(reduction) Tree

(d+1):2 Binary

CSA Tree

(ODDS to 4221)

BCD-4221 Sum

Correction Block

(Decimal Counter)

A Decimal Digit

3:2 Compressor

(BCD-4221)

11

Page 12: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

A PPR Tree FOR 16*16-digit multiplier

17:2

Binary

PPR tree

8-bit

counter

3-bit counter

3-bit

counter

·

·

·4

18

1

3

3

2

PP i[0] PP i[k] PPi[16]

. . . . . .

Ci-1[0]

·

·

·

·

·

·

3

1

ui,3; ui,2; ui,1

vi,1

Ci[0]

·

·

·

Ci[7]

Ci[8]

Ci[9]

Ci[10]

Ci[11]

C [12]

Ci-1[K]

Ci-1[13]

4

2

BCD-4221 Sum

Correction Block

p1 p2 p3 p4

cout2 cin

p1 p2 p3 p4

cout2 cin

p1 p2 p3 p4

cout2 cin

p1 p2 p3 p4

cout2 cin

+6

c [13]ci-1[13]

� The No. of PP rows in the 1st,

2nd, 3rd and 4th stages are 17, 9,

6 and 4, respectively.

counter

6:2 Decimal PPR Tree

44

2

4221

4221

2*4221

x6

x6

x6

4221 4221 4221 4221 4*4221 55

1

1

3

1

4 4

Bi Ai

zi,1

Ci[12]

Ci[13]

Hi(4221) Si(4221)

x2 x1

ui-1,3; ui-1,2; ui-1,1

vi-1,1

zi-1,1

cout2 cin

cout1 sum

cout2 cin

cout1 sum

cout2 cin

cout1 sum

cout2 cin

cout1 sum

(2) (1)(4) (2)(8) (4)(16) (8)

si,3ci,3

ci[13]

si,2ci,2 si,1ci,1 si,0ci,0

ci-1[13]

12

� 4-bit binary 4:2 compressor

in last compression stage

Page 13: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

the Proposed PPR Tree

3:2 3:2HA

3:23:2

ui,3ui,2ui,1ui-1,1 ui,0ui,0ui-1,3ui-1,2

3:2

3:23:2

C[4] C[0] C[5] C[6] C[7]C[1] C[2] C[3]

C[8]C[9]C[10] C[11]C[12]C[13]

4 4 4 4

ui,31 ui,2

1ui,1

1ui,0

1vi,1

1vi,0

1

vi,0vi,0vi,1vi-1,1

zi,11

zi,01

zi,0zi,0zi,1zi-1,1

BCD-4221 Sum

Correction Block

F

8-bit counter

3-bit counter

BCD-4221 8-bit and

3-bit counter correction

� The 8-bit BCD-4221 counter is faster

than a binary counter (only two 3:2 CSA

delay).

� 3-bit counters are used to generate a

BCD-4221 decimal correction digit by 3:2

3:2

x2

3:2

3:2

x2

x2

42214221

x2

x2

4 4

6:2 Decimal

PPR Tree Block

Ai(4221) Bi(4*4221)

Hi(4221) Si(4221)

4 4x2 x1

F

F

F

6:2 decimal PPR tree block

13

BCD-4221 decimal correction digit by

using only one 3:2 compressor.

� To balance the paths in the decimal 6:2

PPR tree and reduce the critical path.

Page 14: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Using Hybrid (Multiple) BCD Codes

4*4221

4221

BCD-4221

PPR Tree

BCD-

8421

PPG

XS-3

2*4221

4221

ODDSBCD-4221 sum

correction block

Binary

PPR Tree

Adder

set

excess-6

BCD-8421

Decimal

Adder

BCD-

8421

14

Page 15: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Advantages of the proposed PPR tree

1 A BCD-4221 counter is faster than a binary counter (a 8-bit counter has

two 3:2 CSA stages, and 3-bit counter has one 3:2 CSA stage.)

2 A non-fixed size BCD-4221 counter correction block is used to

3

2 A non-fixed size BCD-4221 counter correction block is used to

balance the paths and reduce the critical path delay of decimal 6:2 PPR

tree.

The final two PP rows are generated using a decimal PPR tree

based on BCD-4221 that is easy to be converted to BCD-8421.

15

Page 16: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Evaluation

BlockDelay

#FO4

Area

#NAND2

PPG Stage 10.2 14900

25.3 14306

Area and Delay (LE-Based Model) for the Proposed 16×16-digit Multipliers.

PPR Tree 25.3 14306

Adder Setup 3.2 1050

Decimal Adder 11.5 2400

Total50.2

32656

16

Page 17: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Evaluation

DesignDelay

#FO4

Area

#NAND2

Non-Redundant [13] 58.3 35750

Area and Delay (LE-Based) Comparision for Different BCD Multiplier Designs.

Redundant [7] 51.4 30600

Proposed

50.2

Compared with [13] -13.89%

Compared with [7] -2.23%

32656

Compared with [13] -8.65%

Compared with [7] +6.05%[7] A. Vazquez, E. Antelo, and J. Bruguera, “Fast Radix-10 Multiplication Using Redundant BCD codes”, IEEE Transactions on

Computers, vol. 63, no. 8, pp. 1902–1914, Aug. 2014.

[13] A. Vazquez, E. Antelo and P. Montuschi, “Improved Design of High-Performance Parallel Decimal Multipliers”, IEEE Transactions

on Computers, vol. 59, no. 5, pp. 679–693, May 2010.

17

Page 18: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Evaluation

DesignDelay(ns)

Ratio Area(μm2) Ratio

Proposed 3.21 1 43053.5 1

Area and Delay Comparison Using NanGate 45nm open cell library

� The proposed design reduces the delay by 12.30% and the area by 10.9%

compared with [13].

� [7] reduces the delay by 10.75% and the area by 11.1% compared with [13] (no

direct comparison as some parts of PPR circuit of [7] are not provided in detail).

Non-Redundant

[13]3.66 1.14 48326.1 1.12

18

Page 19: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Conclusion

� Design of parallel decimal multiplier is studied

� A parallel decimal multiplier based on a new PPR tree is proposed by using:

� A BCD-4221 sum correction block with non-fixed size counters,

� A decimal PPR tree based on BCD-4221 decimal digit 3:2 compressor.

19

� A decimal PPR tree based on BCD-4221 decimal digit 3:2 compressor.

� The proposed parallel decimal multiplier is faster than previous best designs.

Page 20: A Parallel Decimal Multiplier Using Hybrid Binary Coded ...arith23.gforge.inria.fr/slides/Cui-Liu-Dong-Lombardi.pdf · A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal

Thank you!Thank you!

Questions?