A Combined Decimal and Binary Floating-point Multiplier

27
1 A Combined Decimal and Binary Floating- point Multiplier Charles Tsen, Sonia González- Navarro, Michael Schulte, Brian Hickmann, Katherine Compton 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors

description

A Combined Decimal and Binary Floating-point Multiplier. Charles Tsen, Sonia González-Navarro, Michael Schulte, Brian Hickmann, Katherine Compton 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors. Presented by: Mehrnoosh Janbakhsh Feb 2010. - PowerPoint PPT Presentation

Transcript of A Combined Decimal and Binary Floating-point Multiplier

1

A Combined Decimal and Binary Floating-point Multiplier

Charles Tsen, Sonia González-Navarro, Michael Schulte, Brian Hickmann, Katherine Compton2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors

2

Presented by: Mehrnoosh JanbakhshFeb 2010

3

In this presentation, we describe the first hardware design of a combined binary and decimal floating-point multiplier, based on specifications in the IEEE 754-2008 Floating-point Standard. The multiplier design operates on either 64-bit binary encoded decimal floating-point (DFP) numbers or 64-bit binary floating-point (BFP) numbers.

4

IEEE 754-2008 defines two encodings for DFP numbers:

The decimal encoding of DFP numbers (the significand is encoded) which is named Densely-Packed Decimal (DPD).

The binary encoding of DFP numbers and is commonly referred to as Binary Integer Decimal (BID) because the significand is encoded as an unsigned binary integer.

5

The designed multiplier uses the BID encoding for DFP multiplication, also shares the hardware for BFP and BID multiplication.

6

Outline

i. Describes the BFP and BID data types

ii. Reviews the BFPiii. BID multiplication algorithmsiv. Introduces the combined BFP and

BID algorithmv. The synthesis resultsvi. Future research

7

DFP AND BFP DATATYPES-Representation

The BFP and DFP number formats use three fields to define a number: a sign, an exponent, and a significand.

The value of a normalized BFP number is:

(-1) power S .C.2 power E-biasS: sign

C: significand

E: the biased exponent

Bias: positive const.

8

In DFP S is the sign and the exponent E is biased by a value bias to allow negative exponents but Unlike BFP,

the significand C is an unsigned integer with p decimal digits of precision, and this significand is not normalized—

It can be any value in the range [0,10powerp -1]

9

Example

To clarify the floating-point formats, consider an example of how to represent the value 0.125 in both BFP and BID systems. In 64-bit BFP, it is represented as (-1)power 0. (1.00000…0). 2power (1020-1023), where there are 52 binary zeros after the radix point of the significand.

With the 64-bit BID encoding, 0.125 is represented as (-1)power 0.125. 10power(395-398).

In this case, the significand is represented as a binaryinteger 0…01111101, where there are 47 zeros before

the leftmost 1.

10

- Rounding Modes

The rounding mode, combined with the sign, whether the closest number is odd or even, and the location of the infinitely precise result on the number line determine the direction of rounding.

IEEE 754-2008 specifies five rounding modes for floating-point numbers: RTE, RTA, RTZ, RTN,RTP.

The RTA rounding mode is required only for DFP, but the other four rounding modes apply to both BFP and DFP.

11

- Special Values and Exceptions

Invalid, divide by zero, underflow, overflow, and inexact are exceptions.

The special values are infinity (INF), signaling Not-a-Number (sNaN), and quiet Not-A-Number (qNaN). The difference between sNaN and qNaN is that the sNaN will cause the invalid exception flag to be raised when it is an operand to any operation.

12

FLOATING-POINT MULTIPLICATION ALGORITHMS

Step1: Decode inputs A and B to obtain (signA, EA, CA) and (sign B, EB, CB). Also detect special input operands, such as NaN, Zero, and INF.

Step2: Compute intermediate product: CIP = CA.CB with a binary multiplier. In parallel, compute intermediate exponent, EIP = EA + EB - bias and final sign, sign Z = sign A XOR sign B

Step3: Examine CIP to determine if rounding is needed. Rounding is needed if CIP exceeds p bits or digits.

Step4: Create CZ via a conditional increment of CTP based on r* and s*. If rounding causes a carry out, set CZ to

1,000,000,000,000,00010 and adjust the final exponent, EZ .

Step5: Encode the output, based on (sign Z, EZ, CZ).

13

COMBINED MULTIPLIER DESIGN- Operand Decoder and Encoder

The exponent and significand widths differ by only one bit between BID and BFP. Thus, each input is decoded into 70-bits: 1 bit for the sign, 11 bits for the exponent, 54 bits for significand, and 4 bits to indicate a special value using a one-hot encoding.

14

Block diagram of combined multiplier

15

Shared hardware Unshared hardware

-Significand decoding

-Sign decoding

-Special case detection

-Exponent decoding

-BFP subnormal detection

-BID non-canonical zero

detection

SHARED HARDWARE IN OPERAND DECODER BLOCK

16

COMBINED MULTIPLIER DESIGN- Multiply Datapath

17

DATAPATH BLOCK DESCRIPTION

This block multiplies the significands,CA and CB, to obtain an intermediate product, CIP, which has up to 107 significant bits.

CIP.wd, to truncate d decimal digits as the first step in rounding BID numbers.

The 107 times108-bit multiplication uses four 54 times 54-bit multiplies

The fully shaded portions represent hardware that is completely shared between the BID and BFP datapaths. The unshaded areas are dedicated to only one of the datatypes, and the partially shaded areas contain some shared circuitry and some dedicated circuitry.

18

To determine if a BID value must be rounded, it is compared to 10 power16.

To avoid a long carry chain, the multiplier individually examines the lower and upper 54-bits of PS and PC, since if any bit is set in the upper 54-bits,rounding is needed. If the sum of the lower bits of PS and PC are greater than 10 power 16 or if the OR'd bit is set, then rounding is needed for BID.

For normalized BFP multiplication, since it is known that CIP is in the range [1.0, 4.0), normalization consists of a conditional right shift by one bit and an OR tree to determine s*.

The design sets a bit called ultimate if CTP is all 1s, indicating that incrementing it will cause a carryout.

19

SHARED HARDWARE IN MULTIPLY DATAPATH BLOCK

Shared hardware Unshared hardware

-54x54-bit multiplier

-Right shifter

-Sticky calculation

-Exponent calculation

(bias difference requires extra logic)

-Detect if BID rounding needed

-Multiply feedback path for BID

rounding

-BID rounding lookup tables

-BID digits counting

-Detect all-1 significand for BFP

-Detect all-9 significand for BID

20

COMBINED MULTIPLIER DESIGN- Rounding Logic

Based on s* and r* on Floating-point rounding techniques for both BID and BFP the sign of the result, and the rounding mode, the final result is determined by conditionally incrementing the upper bits of CIP.

SHARED HARDWARE IN ROUNDING LOGIC

Shared hardware Unshared hardware

-Incrementer

-Increment decision logic

-Overflow detection

-Underflow detection

21

COMBINED MULTIPLIER DESIGN- Control

If a BID multiply enters the unit while it is idle, the operation begins immediately. Subsequent multiplies wait until the current BID multiply finishes, which takes five or fifteen cycles, depending on if rounding is needed.

If a BFP multiply enters the unit while it is idle, are fully pipelined. Since BFP multiplies always take five cycles in this design, the control can keep track of how many cycles before the pipeline is empty.

It is chosen to make the multiplier have variable latency for BID multiplication (five to fifteen cycles) to exploit a common case.

22

Future work

May provide more sophisticated communication with a scheduler to enable more than one BID multiply operation in flight.

The design could be enhanced to allow BID and

BFP operations to be interleaved.

23

Results

The combined BFP and BID multiplier are modeled in RTL-level Verilog

For baseline comparisons, the hardware for the standalone BID multiplier and BFP multiplier are modeled

All three designs were simulated with hundreds of directed test cases and millions of random test cases using Mentor Graphics Modelsim.

The synthesis are performed based on Synopsys Design Compiler and TSMC’s tcbn65gplus 65nm CMOS standard cell library.

24

SYNTHESIS RESULTS

Design Area

(um2)

Delay

(ns)

Delay

(FO4)

Standalone BFP 43947 0.58 19

Standalone BID 58482 0.79 26

Total Area of Standalone

Multipliers

102429 ------ -----

Combined BID and BFP 59100 0.81 26

25

……Results

The area of a combined BID and BFP multiplier occupies 58% of the total area of separate BFP and BID units.

The delay of the combined multiplier is slightly longer than the standalone DFP multiplier and 37.8% longer than the standalone BFP unit.

26

CONCLUSIONS AND FUTURE WORK

The goal of this research was to investigate hardware sharing opportunities for IEEE 754-2008 floating-point multiplication. The work shows that the sharing potential between BFP and BID may be beneficial to chip designers wishing to conserve area. Future work to improve the algorithms and designs for hardware sharing may lend further insights into sharing possibilities.

27

Any Questions?