An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal...

48
An IEEE 754-2008 Decimal Parallel and Pipelined FPGA Floating-Point Multiplier Malte Baesler, Sven-Ole Voigt, Thomas Teufel Institute for Reliable Computing Hamburg University of Technology September 1st, 2010

Transcript of An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal...

Page 1: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

An IEEE 754­2008 Decimal Parallel and Pipelined FPGA Floating­Point Multiplier

Malte Baesler, Sven­Ole Voigt, Thomas Teufel

Institute for Reliable ComputingHamburg University of Technology

September 1st, 2010

Page 2: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Agenda

1. Introduction

a)Why Decimal Floating­Point Arithmetic?

b)What are the Requirements on the Decimal Multiplier?

2. Decimal Fixed­Point Multiplier

3. Decimal Floating­Point Multiplier

4. Post Place & Route Results

a)Fixed­Point Multiplier

b)Floating­Point Multiplier

1/30

Page 3: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Introduction

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

2/30

Page 4: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Why decimal floating­point arithmetic?

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

● avoid conversion errors● human centric applications● required for commercial applications, e.g. interest 

calculation

2/30

Page 5: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Why decimal floating­point arithmetic?

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

● avoid conversion errors● human centric applications● required for commercial applications, e.g. interest 

calculation

IEEE Standard 754­2008 for Floating­Point Arithmetic

● published in August 2008● replaces IEEE 754­1985 and IEEE 854­1987● binary and decimal floating­point arithmetic

2/30

Page 6: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Floating­Point Arithmetic

IEEE 754­2008 Floating­Point Arithmetic

decimal64 data format● radix b=10● significand precision p=16● exponent range q

min=­398, q

max=369

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

3/30

Page 7: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Requirements on the multiplier

● fast● low resource usage● IEEE 754­2008 compliant● pipelined due to re­use in accurate scalar product    

 fully combinational→● optimized for FPGA architecture (Virtex5)

– internal fast carry chain

– DSP48E slices

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

4/30

Page 8: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Requirements on the multiplier

● fast● low resource usage● IEEE 754­2008 compliant● pipelined due to re­use in accurate scalar product    

 fully combinational→● optimized for FPGA architecture (Virtex5)

– internal fast carry chain

– DSP48E slices

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

4/30

Page 9: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Decimal Fixed­Point Multiplier

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

5/30

Page 10: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Fixed­Point Multiplier

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

How does multiplication work?school method:

● partial product generation● accumulation of partial products

1234⋅5678 = 5000⋅1234  600⋅1234  70⋅1234  8⋅1234

5/30

Page 11: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Fixed­Point Multiplier

● based on concepts of A. Vazquez, E. Antelo, P.Montuschi 1

● fully combinational● BCD recoding schemes● fast partial product generation● fast BCD­4221 carry save adder reduction tree

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

1“A new family of high­performance parallel decimal multipliers“, 18th IEEE Symposium on Computer Arithmetic, June 2007

6/30

Page 12: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Fixed­Point Multiplier

ABCD­8421

P0 BCD­4221

P1 BCD­4221

Pp+1 BCD­4221

...

p digits

SBCD­8421

S_sBCD­4221

S_wBCD­4221

2p digits

2p

2p

BBCD­8421

p digitsP

PG

en

CS

AT CP

A

DR

ec

CSAT Carry Save Adder TreeCPA Carry Propagation Adder

PPGen Partial Product GeneratorDRec Decimal Recoding Unit

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

7/30

Page 13: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Decimal Recoding

ABCD­8421

P0 BCD­4221

P1 BCD­4221

Pp+1 BCD­4221

...

p digits

SBCD­8421

S_sBCD­4221

S_wBCD­4221

2p digits

2p

2p

BBCD­8421

p digitsP

PG

en

CS

AT CP

A

DR

ec

CSAT Carry Save Adder TreeCPA Carry Propagation Adder

PPGen Partial Product GeneratorDRec Decimal Recoding Unit

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

8/30

Page 14: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Decimal Recoding

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

● transforms the multiplier's digit set              into ● reduces number of multiplicand multiples

● very fast operation, no ripple carry

A×1, A×2, A×3, A×4, A×5

{0,9} {−5,5}

8/30

Page 15: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Partial Product Generator

ABCD­8421

P0 BCD­4221

P1 BCD­4221

Pp+1 BCD­4221

...

p digits

SBCD­8421

S_sBCD­4221

S_wBCD­4221

2p digits

2p

2p

BBCD­8421

p digitsP

PG

en

CS

AT CP

A

DR

ec

CSAT Carry Save Adder TreeCPA Carry Propagation Adder

PPGen Partial Product GeneratorDRec Decimal Recoding Unit

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

9/30

Page 16: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Partial Product Generator

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

● calculates multiples – exploits correlation between shift operation and constant 

value multiplication●

– BCD Recoding is fast– fixed­value shift operation is for free– only         requires one carry propagate adder

● generates partial products                by selection of

● 10's complement for

X 5421≪1=X⋅28421

A×1, A×2, A×3, A×4, A×5

X 8421≪3=X⋅55421

A×3P0

P p1

A×1−A×5

Bk0 :−X nX 0= X n X 01

9/30

Page 17: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

BCD­4221 Carry Save Adder Tree

ABCD­8421

P0 BCD­4221

P1 BCD­4221

Pp+1 BCD­4221

...

p digits

SBCD­8421

S_sBCD­4221

S_wBCD­4221

2p digits

2p

2p

BBCD­8421

p digitsP

PG

en

CS

AT CP

A

DR

ec

CSAT Carry Save Adder TreeCPA Carry Propagation Adder

PPGen Partial Product GeneratorDRec Decimal Recoding Unit

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

10/30

Page 18: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Carry Save Adder Tree

P1

P2

P3

Pp+1

...

carry save adder tree sums up p+1 partial products

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

10/30

Page 19: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Carry Save Adder Tree

P1

P2

P3

Pp+1

...

C1

C2

Cp

sign extension

sign extension

sign extension

CSA tree with respect to decimal recoding

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

10/30

Page 20: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Carry Save Adder Tree

P1

P2

P3

Pp+1

...

C1

C2

Cp

improved sign extension

improved CSA tree with respect to decimal recoding

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

10/30

Page 21: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Improved Sign Extension

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

● adding several words composed of leading nines and following zeros always yields to a word composed of 0, 8, and 9. For example

● position of 0, 8, and 9 can be calculated very fast by means of FPGA's fast carry chain

999999990000 999900000000 990000000000= x989899990000

X kNegDC

={9 for  ck

in=0∧signk=1

8 for  ckin=1∧signk=1

0 else 

ckout=ck1

in={ 1 for  signk=1

ckin else 

11/30

Page 22: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Fixed­Point Multiplier

ABCD­8421

P0 BCD­4221

P1 BCD­4221

Pp+1 BCD­4221

...

p digits

SBCD­8421

S_sBCD­4221

S_wBCD­4221

2p digits

2p

2p

BBCD­8421

p digitsP

PG

en

CS

AT CP

A

DR

ec

CSAT Carry Save Adder TreeCPA Carry Propagation Adder

PPGen Partial Product GeneratorDRec Decimal Recoding Unit

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

12/30

Page 23: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Fixed­Point Multiplier

ABCD­8421

P0 BCD­4221

P1 BCD­4221

Pp+1 BCD­4221

...

p digits

SBCD­8421

S_sBCD­4221

S_wBCD­4221

2p digits

2p

2p

BBCD­8421

p digitsP

PG

en

CS

AT CP

A

DR

ec

CSAT Carry Save Adder TreeCPA Carry Propagation Adder

PPGen Partial Product GeneratorDRec Decimal Recoding Unit

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

12/30

Page 24: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Decimal Floating­Point Multiplier

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

13/30

Page 25: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Decimal Floating­Point Multiplier

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

● additional units for rounding, exponent computation and data format encoding/decoding

● based on M. Erle, B. Hickmann, M.Schulte 2

● early estimation of shift left amount

● fully IEEE 754­2008 compliant

● support for gradual underflow and all rounding modes

● adapted to FPGA technology

2“Decimal Floating­Point Multiplication“, IEEE Transaction on Computers, VOL. 58, NO. 7, July 2009

13/30

Page 26: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

Densily Packed Decimal (DPD) Decoder

Leading Zeros Count  /Shift Left Amount 

Computation 

Decimal Fixed­Point Multipliplier

Left Shift Register

Carry Propagate Adder

Overflow / Underflow Correction

Rounding Unit

Round­Up Detection

Exception Unit DPD Encoder

Exponent Computation

X•Yexception signals

XY X = 0x03C80000534B9C1EY = 0x0250000277CB0D10

14/30

Page 27: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

Densily Packed Decimal (DPD) Decoder

Leading Zeros Count  /Shift Left Amount 

Computation 

Decimal Fixed­Point Multipliplier

Left Shift Register

Carry Propagate Adder

Overflow / Underflow Correction

Rounding Unit

Round­Up Detection

Exception Unit DPD Encoder

Exponent Computation

X•Yexception signals

XY X = 0x03C80000534B9C1EY = 0x0250000277CB0D10

X = +0000001234567890 EXP­156Y = +0000009876543210 EXP­250X•Y = +12193263111263526900 EXP­406

15/30

Page 28: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

X = +0000001234567890 EXP­156Y = +0000009876543210 EXP­250X•Y = +12193263111263526900 EXP­406

Z = significand(X•Y)Z =   00000000000012193263111263526900Zs = 66888846846688648888664609006600Zc = 33111153153323544414446654520300

Densily Packed Decimal (DPD) Decoder

Leading Zeros Count  /Shift Left Amount 

Computation 

Decimal Fixed­Point Multipliplier

Left Shift Register

Carry Propagate Adder

Overflow / Underflow Correction

Rounding Unit

Round­Up Detection

Exception Unit DPD Encoder

Exponent Computation

X•Yexception signals

XY

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

16/30

Page 29: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

X = +0000001234567890 EXP­156Y = +0000009876543210 EXP­250X•Y = +12193263111263526900 EXP­406

Z = significand(X•Y)Z =   00000000000012193263111263526900Zs = 66888846846688648888664609006600Zc = 33111153153323544414446654520300

LZ(X)=6, LZ(Y)=6,   SLA=min(6+6, p)=12Z =   1219326311126352.690000000000000Zs = 8864888866460900.660000000000000Zc = 2354441444665452.030000000000000

Densily Packed Decimal (DPD) Decoder

Leading Zeros Count  /Shift Left Amount 

Computation 

Decimal Fixed­Point Multipliplier

Left Shift Register

Carry Propagate Adder

Overflow / Underflow Correction

Rounding Unit

Round­Up Detection

Exception Unit DPD Encoder

Exponent Computation

X•Yexception signals

XY

17/30

Page 30: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Densily Packed Decimal (DPD) Decoder

Leading Zeros Count  /Shift Left Amount 

Computation 

Decimal Fixed­Point Multipliplier

Left Shift Register

Carry Propagate Adder

Overflow / Underflow Correction

Rounding Unit

Round­Up Detection

Exception Unit DPD Encoder

Exponent Computation

X•Yexception signals

XY X = +0000001234567890 EXP­156Y = +0000009876543210 EXP­250X•Y = +12193263111263526900 EXP­406

Z = significand(X•Y)Z =   00000000000012193263111263526900Zs = 66888846846688648888664609006600Zc = 33111153153323544414446654520300

LZ(X)=6, LZ(Y)=6,   SLA=min(6+6, p)=12Z =   1219326311126352.690000000000000Zs = 8864888866460900.660000000000000Zc = 2354441444665452.030000000000000

Z' = 1219326311126352, G=6, R=9, sb='0'

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

18/30

Page 31: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Densily Packed Decimal (DPD) Decoder

Leading Zeros Count  /Shift Left Amount 

Computation 

Decimal Fixed­Point Multipliplier

Left Shift Register

Carry Propagate Adder

Overflow / Underflow Correction

Rounding Unit

Round­Up Detection

Exception Unit DPD Encoder

Exponent Computation

X•Yexception signals

XY

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

X = +0000001234567890 EXP­156Y = +0000009876543210 EXP­250X•Y = +12193263111263526900 EXP­406

Z = significand(X•Y)Z =   00000000000012193263111263526900Zs = 66888846846688648888664609006600Zc = 33111153153323544414446654520300

LZ(X)=6, LZ(Y)=6,   SLA=min(6+6, p)=12Z =   1219326311126352.690000000000000Zs = 8864888866460900.660000000000000Zc = 2354441444665452.030000000000000

Z' = 1219326311126352, G=6, R=9, sb='0'exponent = ­406 + p – SLA = ­402

19/30

Page 32: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Densily Packed Decimal (DPD) Decoder

Leading Zeros Count  /Shift Left Amount 

Computation 

Decimal Fixed­Point Multipliplier

Left Shift Register

Carry Propagate Adder

Overflow / Underflow Correction

Rounding Unit

Round­Up Detection

Exception Unit DPD Encoder

Exponent Computation

X•Yexception signals

XY X = +0000001234567890 EXP­156Y = +0000009876543210 EXP­250X•Y = +12193263111263526900 EXP­406

Z = significand(X•Y)Z =   00000000000012193263111263526900Zs = 66888846846688648888664609006600Zc = 33111153153323544414446654520300

LZ(X)=6, LZ(Y)=6,   SLA=min(6+6, p)=12Z =   1219326311126352.690000000000000Zs = 8864888866460900.660000000000000Zc = 2354441444665452.030000000000000

Z' = 1219326311126352, G=6, R=9, sb='0'exponent = ­406 + p – SLA = ­402

Z'' = 0000121932631112, G=6, R=3, sb='1'exponent = ­398

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

20/30

Page 33: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Densily Packed Decimal (DPD) Decoder

Leading Zeros Count  /Shift Left Amount 

Computation 

Decimal Fixed­Point Multipliplier

Left Shift Register

Carry Propagate Adder

Overflow / Underflow Correction

Rounding Unit

Round­Up Detection

Exception Unit DPD Encoder

Exponent Computation

X•Yexception signals

XY X = +0000001234567890 EXP­156Y = +0000009876543210 EXP­250X•Y = +12193263111263526900 EXP­406

Z = significand(X•Y)Z =   00000000000012193263111263526900Zs = 66888846846688648888664609006600Zc = 33111153153323544414446654520300

LZ(X)=6, LZ(Y)=6,   SLA=min(6+6, p)=12Z =   1219326311126352.690000000000000Zs = 8864888866460900.660000000000000Zc = 2354441444665452.030000000000000

Z' = 1219326311126352, G=6, R=9, sb='0'exponent = ­406 + p – SLA = ­402

Z'' = 0000121932631112, G=6, R=3, sb='1'exponent = ­398

round up → Z''' = 0000121932631113 EXP­398

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

21/30

Page 34: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

Densily Packed Decimal (DPD) Decoder

Leading Zeros Count  /Shift Left Amount 

Computation 

Decimal Fixed­Point Multipliplier

Left Shift Register

Carry Propagate Adder

Overflow / Underflow Correction

Rounding Unit

Round­Up Detection

Exception Unit DPD Encoder

Exponent Computation

X•Yexception signals

XY X = +0000001234567890 EXP­156Y = +0000009876543210 EXP­250X•Y = +12193263111263526900 EXP­406

Z = significand(X•Y)Z =   00000000000012193263111263526900Zs = 66888846846688648888664609006600Zc = 33111153153323544414446654520300

LZ(X)=6, LZ(Y)=6,   SLA=min(6+6, p)=12Z =   1219326311126352.690000000000000Zs = 8864888866460900.660000000000000Zc = 2354441444665452.030000000000000

Z' = 1219326311126352, G=6, R=9, sb='0'exponent = ­406 + p – SLA = ­402

Z'' = 0000121932631112, G=6, R=3, sb='1'exponent = ­398

round up → Z''' = 0000121932631113 EXP­398Z = 0x000000285BCCC493invalid   inexact   overflow   underflow

22/30

Page 35: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

type1 type2 type3fixed­pointmultiplier output

redundant(delayed CPA)

redundant(delayed CPA)

non­redundant

CPA length (digits) p+2 = 18 p+2 = 18 2·p = 32

shift register multiplier based multiplexer based multiplexer based

decimal fixed­point multiplier

shift register

CPA (p+2) CPA (p­2)

Ps Pc

shift registerQsu Qsl Qcu Qcl

OR

product RG sticky bit

...

decimal fixed­point multiplier

CPA (2·p)

Ps Pc

shift register

OR

sticky bit

...

product RG23/30

Page 36: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

decimal fixed­point multiplier

shift register

CPA (p+2) CPA (p­2)

Ps Pc

shift registerQsu Qsl Qcu Qcl

OR

product RG sticky bit

...

decimal fixed­point multiplier

CPA (2·p)

Ps Pc

shift register

OR

sticky bit

...

product RG

type1 type2 type3fixed­pointmultiplier output

redundant(delayed CPA)

redundant(delayed CPA)

non­redundant

CPA length (digits) p+2 = 18 p+2 = 18 2·p = 32

shift register multiplier based multiplexer based multiplexer based

23/30

Page 37: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

decimal fixed­point multiplier

shift register

CPA (p+2) CPA (p­2)

Ps Pc

shift registerQsu Qsl Qcu Qcl

OR

product RG sticky bit

...

decimal fixed­point multiplier

CPA (2·p)

Ps Pc

shift register

OR

sticky bit

...

product RG

type1 type2 type3fixed­pointmultiplier output

redundant(delayed CPA)

redundant(delayed CPA)

non­redundant

CPA length (digits) p+2 = 18 p+2 = 18 2·p = 32

shift register multiplier based multiplexer based multiplexer based

23/30

Page 38: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

shifting through multiplication:

● requires two DSP48Es per 32bit shift

● saves LUTs  

X≪n ≡ X⋅2n

MUL MUL

X(31:16) X(15:0)shift 2k

ADD

Y(15:0)Y(31:16)

DS

P48

E

DS

P48

E

type1 type2 type3fixed­pointmultiplier output

redundant(delayed CPA)

redundant(delayed CPA)

non­redundant

CPA length (digits) p+2 = 18 p+2 = 18 2·p = 32

shift register multiplier based multiplexer based multiplexer based

24/30

Page 39: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Post Place & Route Results

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

25/30

Page 40: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Decimal Fixed­Point Multiplier with CPA output

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

● Xilinx Virtex5, speed grade ­2● up to 13 pipeline registers, configurable via VHDL generics

● 5350 – 6500 LUTs,  0 – 4900 FFs● 5500 – 7600 combined LUTs and FFs

25/30

Page 41: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Decimal Fixed­Point Multiplier with CPA output

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

● 5350 – 6500 LUTs,  0 – 4900 FFs● 5350 – 7600 combined LUTs and FFs

● Xilinx Virtex5, speed grade ­2● up to 13 pipeline registers, configurable via VHDL generics

25/30

Page 42: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Decimal Floating­Point Multiplier

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

26/30

Page 43: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Decimal Floating­Point Multiplier

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

27/30

Page 44: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Decimal Floating­Point Multiplier

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

Type1mul­based shifting,

delayed CPA

Type2mux­based shifting,

delayed CPA

Type3mux­based shifting,

no delayed CPA

#LUTs 6300 ­ 8400 7900 ­ 9400 7500 ­ 9400

#FFs 0 ­ 4100 0 ­ 4500 0 ­ 4400

#(LUT + FFs) 6500 ­ 8400 8300 ­ 9300 7600 ­ 9600

#DSP48E 17 0 0

● approx. 70% of the LUTs are used by the fixed­point multiplier (for Type2 and Type3)

● medium Virtex5 XC5VLX110T: 8000­9000 LUTs ~ 11.5%­13%

28/30

Page 45: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

0 3 6 90

5000

10000

decimal binary

number of pipeline registers

num

ber o

f LU

Ts

0 3 6 90

100

200

300

400

decimal binary

number of pipeline registers

max

. fre

quen

cy (M

Hz)

Comparison to binary floating­point multiplier● 64 bit binary floating­point multiplier generated with CoreGen● no DSP48E● Type2 decimal vs. CoreGen binary multiplier

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

decimal mult. : 3.2 – 3.5 more LUTs binary mult. : 1.6 – 2.2 times faster

29/30

Page 46: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

0 3 6 90

5000

10000

decimal binary

number of pipeline registers

num

ber o

f LU

Ts

0 3 6 90

100

200

300

400

decimal binary

number of pipeline registers

max

. fre

quen

cy (M

Hz)

Comparison to binary floating­point multiplier● 64 bit binary floating­point multiplier generated with CoreGen● no DSP48E● Type2 decimal vs. CoreGen binary multiplier

Introduction    Decimal Fixed­Point Multiplier    Decimal Floating­Point Multiplier    Post Place & Route Results

decimal mult. : 3.2 – 3.5 more LUTs binary mult. : 1.6 – 2.2 times faster

29/30

Page 47: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Decimal Floating­Point MultiplierM. Baesler, S. Voigt, T. Teufel 09/01/2010                            

Summary

● decimal fixed­point multiplier– parallel, fully combinational– configurable number of pipeline stages

● decimal floating­point multiplier– configurable number of pipeline stages– three different implementations– tradeoff: area vs. speed

● future work: fully IEEE 754­2008 compliant co­processor

30/30

Page 48: An IEEE 7542008 Decimal Parallel and Pipelined FPGA ... Decimal FixedPoint Multiplier Decimal FloatingPoint Multiplier Post Place & Route Results additional units for rounding, exponent

Thank you for your attention!!!