High Performance FPGA-based Floating Point Adder with Three Inputs

High Performance FPGA-High Performance FPGA-based Floating Point Adder based Floating Point Adder

with Three Inputswith Three InputsAuthors: A. Guntoro and M. GlesnerAuthors: A. Guntoro and M. Glesner

Institute of Microelectronic SystemInstitute of Microelectronic System

Conference: Field Programmable Logic and Conference: Field Programmable Logic and Applications (FPL), 2008Applications (FPL), 2008

Presenter: Presenter: Tareq Hasan KhanTareq Hasan Khan ID: 11083577ID: 11083577

ECE, U of SECE, U of S

Literature review-2 (EE 800)Literature review-2 (EE 800)

22

OutlineOutline

IEEE 754 StandardIEEE 754 Standard Floating point addition algorithmFloating point addition algorithm Proposed three input floating point Proposed three input floating point

adderadder Overall architectureOverall architecture Brief description of each stageBrief description of each stage

ResultsResults Conclusion Conclusion

33

IEEE 754 Standard IEEE 754 Standard

Issued by IEEE in the year 1985Issued by IEEE in the year 1985 Covers different types of floating point formatCovers different types of floating point format

SingleSingle Double… etc Double… etc

In radix-2, floating point number can be written asIn radix-2, floating point number can be written as

(-1)(-1)s s x 1.f x 2x 1.f x 2ee

where,where, s = sign bit, s = sign bit, f = mantissa, f = mantissa, e = biased exponent e = biased exponent

44

Floating point addition Floating point addition algorithmalgorithm

1. Calculate the exponent difference.2. Align the mantissa by shifting the mantissa

with the lower exponent to the right.3. Add/sub both mantissas depending on the

sign bits.4. Perform the Leading-One Detection (LOD) to

determine the location of the first logic one.5. Normalize and round the result.

55

OutlineOutline




66

Proposed three input floating Proposed three input floating point adder architecturepoint adder architecture

Used in lifting based Used in lifting based Discrete Wavelet Discrete Wavelet Transform (DWT)Transform (DWT)

5 stage pipeline 5 stage pipeline

Unique researchUnique research

77

Stage 1Stage 1 Mantissa Comparator: compares the two

mantissas Ma and Mb and latches both mantissas

Zero logic: detects if the corresponding input is zero.

Exponent difference: computes the two differences between Ea and Eb (i.e Ea − Eb and Eb − Ea).

88

Stage 2Stage 2 Shift, swap, add guard block

shift the mantissa with the smaller exponent to the right with the amount determined by the exponent selector block.

Swaps the mantissas when (Ma < Mb and Ea = Eb) or (Ea < Eb) is true.

The hidden bit and the guard bits are appended, resulting in fractions Fa and Fb.

If a zero number is detected, the corresponding fractions will be set to zero.

Exponent difference block computes the two differences between Ed and Ec

Mc is latched in Register

99

Stage 3Stage 3 Add/sub and shift

The fractions Fa and Fb are added/subtracted depending on the sign difference (Sa XOR Sb), resulting the fraction Fab.

If the exponent Ec is greater than max(Ea, Eb),the result will be shifted to the right.

Shift and add guard It prepares the mantissa Mc. If Ec is less than max(Ea, Eb),

Mc will be shifted right instead. The hidden bit and the guard bits are appended to Mc,

resulting in fraction Fc.

1010

Stage 4Stage 4 Operand swap and add/sub block

Swaps the operands Fab and Fc if necessary (notice that both operands have the same exponent).

It performs the addition or subtraction, which results Fr. Leading One Petection (LOP) block

Predicts the first occurrence of the “logic one” directly from the operands. One-bit inaccuracy might occur, so it gives two values at the output

Exponent adjustment block prepares the dominant exponent by simply adding two to the larger exponent (i.e. max(Ea, Eb, Ec) + 2). Because three addition/subtraction arithmetic operations might have an increase of exponent by two.

1111

Stage 5Stage 5 LOP error is corrected from Fr Normalization is basically a shiftleft

block with the amount given by the corrected LOP value The overflow and underflow detector verifies if the

resulting fraction and exponent lay outside the floating-point range.

The rounding logic implements two rounding mechanisms: rounding to zero and rounding to nearest.

1212

OutlineOutline




1313

ResultResult

Xilinx Virtex2 XC2V2000-5

Xilinx Virtex2 XC2VP30-7

Config. Format: exponent–mantissa–guard

1414

ResultResult

Slice usage Slightly higher compared to Malik, but still lower compared to the

IP core. Operating speeds

Higher than both the IP core and Malik on most of the target devices.About 19% speed gain can be achieved on Virtex2Pro and 22% on Virtex2 compared to Malik.

Addition of three floating-point The architectures from IP core and Malik will consume at least

twice as many slices and will have a 10-level pipeline stage.

1515

ConclusionConclusion

Design of a 3 input floating point Design of a 3 input floating point adder adder 5 stage pipeline5 stage pipeline

Can be operated on Can be operated on

Xilinx Virtex2 XC2V2000-5 and Xilinx Virtex2 XC2V2000-5 and

Virtex2Pro XC2VP30-7 at Virtex2Pro XC2VP30-7 at

105 MHz and 143 MHz respectively.105 MHz and 143 MHz respectively.

1616

ThanksThanks

High Performance FPGA-based Floating Point Adder with Three Inputs

Documents

Transcript of High Performance FPGA-based Floating Point Adder with Three Inputs