DSP Floating Point Formats

By: Mehrnaz MonajatiInstructor: Dr. S.M. Fakhrai

This is a class presentation. All data are copy rights of their respective authors as listed in the references and have been used here for educational purpose only.

Fixed vs. Floating Point DSPsCostEase of useAccuracyDynamic range

*

Fixed vs. Floating Point DSPsCostToday, fixed-point DSPs continue to benefit more from cost reductions of scale in manufacturing since they are more often used for high-volume applications the same reductions will apply to floating-point DSPs when high-volume demand for the devices appears. Today, cost has increasingly become an issue of SOC integration and volume, rather than a result of the size of the DSP core itself.*

Fixed vs. Floating Point DSPsEase of use

TodayTI fixed-point DSPs have long been supported by outstandingly efficient C compilers The advantage of implementing real arithmetic directly in floating-point hardware still remainsReduction in FXP complexity

FXP DSPs still have an edge in cost and FLP DSPs in ease of use, but the edge has narrowedLast days TI floating-point supported the C languageFXP DSPs were programmed at the assembly code levelCoding of real arithmetic in to hardware Directly in FLPindirectly in FXPsoftware routines that added development time and extra instructions to the algorithmProgrammingEasier in FLP*

Fixed vs. Floating Point DSPsAccuracyDynamic rangeAccuracy of FLP is greater than FXPFLP has greater precision in integer as well as real valuesExponentiation vastly increases the dynamic rangeInternal data representations in FLP DSPs are more exact than in FXPensuring greater accuracy in end result

*

Fixed vs. Floating Point DSPsFXP DSPs

TIs TMS320C62x FXP DSPs Two data paths operating in parallelEach with a 16-bit word widthprovides signed integer values within a range from 2^15 to 2^15 TMS320C64x DSPs,double the overall throughput with four 16-bit multipliersTMS320C5x and TMS320C2x DSPsdesigned for handheld and control applications, respectivelyare based on single 16-bit data paths

*

Fixed vs. Floating Point DSPsFLP DSPs

TMS320C67x FLP DSPs divide a 32-bit data path into two parts: a 24-bit mantissa and an 8-bit exponent.16M range of precisionsupporting a vastly greater dynamic range than is available with the FXP format. The C67x DSP can also perform calculationsC67x DSPUsing industry-standard double-width precision 64 bits, including a 53-bit mantissa and an 11-bit exponentAchieves much greater precision and dynamic range at the expense of speed, since it requires multiple cycles for each operation

*

Standards for FLP Number Formats*

FLP Nnumber Formats*

Sample Floating Point DSPsAMD - Athlon ProcessorXilinx Virtex-5 APU Floating Point UnitDigital Core Design DFPAU ver 2.05*

AMD - Athlon Processor 2000Include the most powerful floating point engine for x86 platformsDelivers twice the peak x87 floating point execution rate of the Intel Pentium III processor Rivals the FP performance of many RISC processors in that timeSuperscalar and Super pipelinedHigher clock frequenciesHigher overall throughput

*Ref. [3]

AMD - Athlon Processor 2000*Ref. [3]

Xilinx Virtex-5 APU FLP Unit 2009designed for the PowerPC 440 embedded microprocessor of the Virtex-5 FXT FPGA familysupport for IEEE-754 standard in single or double precisionOptimized for 2:1 and 3:1 APU:CPU clock ratios allowing PowerPC processor to operate at maximum frequencyApplication:Digital signal processing of high-quality audio or video signals where a very large dynamic range is needed to retain fidelity.Matrix inversion in wireless communications and radarDigital signal processing tasks, spectral methods such as FFTStatistical processing where floating-point is often the simplest way to avoid integer overflow and rounding errors*

Xilinx Virtex-5 APU FLP Unit 2009Increased Processing CapacityHardware floating-point operations complete faster than the equivalent software emulation routinesThe floating-point operators within the FPU are pipelinedmultiple floating-point calculations can proceed in parallelThe FPU is autonomousthe PowerPC processor internal pipeline can continue to execute integer instructions while floating-point operations are handled by the FPU in parallelIEEE 754-1985 / Book-E Standard CompatibilityThe standard represents very small numbers by allowing significands of the form "0.x" in addition to the usual 1.x used by normalized FLP numbersIn Book-E, the multiply part of a multiply-add operation should not round its result before supplying it to the addition partThe FPU treats all not-a-number (NaN) values as quiet NaNs, which do not cause exceptions. When a floating-point operation results in a NaN because one of the inputs was a NaN, the input NaN is not propagated to the output; the default quiet NaN value is provided. This value is 0x7ff8000000000000 in double precision, and 0x7f800000 in single precision*

Xilinx Virtex-5 APU FLP Unit*Ref. [4]

Digital Core Design DFPAU ver. 2.05, 2010It is a FLP Arithmetic Co-processordirectly replaces C software functions, by equivalent, very fast hardware operationssignificantly accelerate system performance It doesnt require any programmingEverything is done automatically during software compilation by the DFPAU C driver. Supports addition, subtraction, multiplication, division, square root, comparison, absolute valueThe input numbers format is according to IEEE-754Each floating point function can be turned on/off at configuration level providing the flexible scalability of DFPAU moduletechnology independent design *

Digital Core Design DFPAU ver. 2.05, 2010*Ref. [5]Ref. [5]

Architectural Modification to Improve FLP Unit in FPGAs 2008 [1]Variable length shifters account for over 30% of a adder and 25% of a multiplierCoarse-grained approachEmbedded Shifterfine-grained approachMultiplexer*

embedded shifter4:1 multiplexerConsumed chip area1.5%0.48%Saved area14.6%7.3%Increased clock rate3.3%11.6%

Low power FLP Unit 2009 [2]Design of embedded systems applications with low power consumption and fast processing performing basic operations such as addition, subtraction, multiplication and divisionIdea: the functional units (adder, shifter, registers) are shared between different operationsAdvantage: saving silicon areaDisadvantage: the increase in the number of cycles required to perform the operation*

Low power FLP Unit - 2009*Ref. [2]

Reconfigurable FLP Unit 2009 [7]Non-numerical applications usually have very few FLP operationsFLP unit is always under idle modeIn idle mode, the floating-point unit still consume power and the die area is wastedIdea: reconfigurable floating-point unit that provide integer and floating-point operations*

Reconfigurable FLP Unit*rAMM ArrayRef. [7]

Reconfigurable FLP Unit*Ref. [7]

Reconfigurable FLP Unit*Ref. [7]Ref. [7]

ReferencesM. Beauchamp, et al., "Architectural modifications to enhance the floating-point performance of FPGAs," IEEE Transactions on Very Large Scale Integration Systems, vol. 16, p. 177, 2008.R.Neves, et al. "A Floating Point Unit Architecture for Low Power Embedded Systems Applications," XXIV SIM - South Symposium on Microelectronics, 2009.AMD Athlon Floating Point Engine, "AMD Athlon Processor floating Point Capability, The Most Powerful, Architecturally Advanced Floating Point Engine Ever Delivered in an x86 Microprocessor," with paper, 2000.Xilinx DS693 Virtex-5 APU Floating-Point Unit v1.01a, Data Sheet, DS693, 2009.DFPAU floating-point pipelined divider, 2010, .G. Frantz and R. Simar, "Comparing Fixed and Floating Point DSPs," SPRY061, Texas Instruments, 2004.Y. Lee and J. Jou, "Design of A Reconfigurable Floating-Point Unit," 2009.*

Embedded shifter block diagram*Ref. [1]

4:1 Multiplexer*Ref. [1]

**

DSP Floating Point Formats

Documents

Transcript of DSP Floating Point Formats