Post on 07-Apr-2020
EE 3610 Digital Systems Suketu Naik
1
Floating PointRepresentation
EE 3610: Digital Systems
EE 3610 Digital Systems Suketu Naik
2
Floating boat
EE 3610 Digital Systems Suketu Naik
3
Floating numbers
EE 3610 Digital Systems Suketu Naik
4Fixed Point Numbers
Generic Binary RepresentationExample: 1101012 1 x 25 + 1 x 24 + 0 x 23 + 1 x 22 + 0 x 21 + 1 x 20
= 32 + 16 + 4 + 1 = 5310Example: 1011.10112 1 x 23 + 0 x 22 + 1 x 21 + 1 x 20 1 x 2-1 + 0 x 2-2 + 1 x 2-3 + 1 x 2-4
= 11. 6875
N-bit fixed point, 2’s complement integer representation X = -bN-1 2N-1 + bN-2 2N-2 + … + b020
EE 3610 Digital Systems Suketu Naik
5Fixed Point Numbers‘Fixed Point’ number has a fixed number of digits after and before the decimal point
Qm.n notation m bits for integer portion n bits for fractional portionTotal number of bits N = m + n + 1, for signednumbersExample: 16-bit number (N=16) and Q2.13 format 2 bits for integer portion 13 bits for fractional portion 1 signed bit (MSB)
EE 3610 Digital Systems Suketu Naik
6
Examples:1110 in Integer Representation Q3.0-23 + 22 + 21 = -211.10 Fractional Q1.2 Representation-21 + 20 + 2-1 = -2 + 1 + 0.5 = -0.5
1.110 Fractional Q0.3 Representation-20 + 2-1 + 2-2 = -1 + 0.5 + 0.25 = -0.25
Difficult to use due to possible overflow In a 16-bit processor, the dynamic range is -32,768 to 32,767.Example: 200 × 350 = 70000: an overflow!
Fixed Point Numbers: Qm.n notation
EE 3610 Digital Systems Suketu Naik
7Fixed Point Numbers: Qm.n notationDynamic Range and Precision of 16-Bit Numbers for Different Q Formats
EE 3610 Digital Systems Suketu Naik
8Fixed Point Numbers: Adv. And Disadv. Disadvantage of fixed point: dynamic range is small
Advantages of fixed point: logic circuits simpler, less memory,less processor speed: smaller, faster, cheaper, lower power consumption
EE 3610 Digital Systems Suketu Naik
9Application of Fixed Point DSPADSP Blackfin: multi-format audio, video, voice and image
processing Example: Undersea pipline monitoring in Orman Natural
Gas Field Hydro-acoustics and non-contact sensors together with ADSP Blackfin and LabVIEW Embedded Module for sub sea state monitoring and analysis tasks
ADI Blackfin DSP Orman Natural Gas Field (Norway)
EE 3610 Digital Systems Suketu Naik
10Why Floating Point?
Floating point numbers are used to represent fractions and allow more precision
e.g. 35.261, 3.14159, 1.25
Arithmetic units for floating-point numbers are more complex than fixed-point numbers
IEEE 754 Floating-Point Formats
Single precision (32-bit) Double precision (64-bit) Extended precision (128-bit)
EE 3610 Digital Systems Suketu Naik
11Basics
N= F x BE
N = a floating point (real) number F = fraction B = base E = exponentBase can be 2, 10, 16 or other
Here we consider B=2
Fraction F and Exponent E can be represented in a number of ways
EE 3610 Digital Systems Suketu Naik
12Basics
Binary numbers, like decimal numbers, can be represented in scientific notation
Example: 923.5210 can be represented as 9.2352 x 102
Similarly, 101011.1012 = 43.62510 can be represented as 1.01011101 x 25
EE 3610 Digital Systems Suketu Naik
13Basics: Sign Field
43.62510 101011.1012 1.01011101 x 25
0 xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxx
Sign field is 1-bit long: 0 (positive) or 1 (negative)
N= (-1)S x (1 + F) x 2E
S = Sign fieldE = Exponent fieldF = Mantissa (Fractional) Field
EE 3610 Digital Systems Suketu Naik
14Basics: Exponent Field
43.62510 101011.1012 1.01011101 x 25
0 10000100 xxxx xxxx xxxx xxxx xxxx xxx
Exponent field is 8-bits long: range 0 to 255Add 127 to the exponent in scientific notationHere 5 + 127 = 132: 10000100 (If the exponent is less than 127 then it is negative)
EE 3610 Digital Systems Suketu Naik
15Basics: Mantissa (Fraction) Field
43.62510 101011.1012 1.01011101 x 25
0 10000100 01011101000000000000000
Mantissa field is 23-bits long: throw away 1 to the leftof binary point and pad it with zeros at the end1.01011101 01011101000000000000000
EE 3610 Digital Systems Suketu Naik
16Single Precision Representation
43.62510 101011.1012 1.01011101 x 25
Sign Exponent Fraction
0 10000100 01011101000000000000000
EE 3610 Digital Systems Suketu Naik
17Formats: Single PrecisionSingle Precision
Example: 13.4510 : 1101.01 1100 1100 1100 ... 2: 1.10101 1100 1100 ... x 23
(Note: 0.45 produces repeating binary fractions)Sign: positive: 0Exponent: 3 + 127 = 130: 10000010Mantissa (Fraction): 10101 1100 1100 1100 1100 11:
EE 3610 Digital Systems Suketu Naik
18Formats: Double PrecisionDouble Precision
Example: 13.4510 : 1101.01 1100 1100 1100 ... 21.10101 1100 1100 ... x 23
(Note: 0.45 produces repeating binary fractions)Sign: positive: 0Exponent: 3 + 1023 = 1026: 10000000010 Mantissa (Fraction): 10101 1100 1100 1100 1100 1100 1100 1100 1100 1100 1100 1100 110
EE 3610 Digital Systems Suketu Naik
19IEEE 754 Floating Point: Range
Type Size Exponent Mantissa Range
Single 32-bit 8-bit 23-bit 2x10+/-38
Double 64-bit 11-bit 52-bit 2x10+/-308
EE 3610 Digital Systems Suketu Naik
20Floating Point (FP) to DecimalExample: HEX 0xC0B40000, assume signal precision FP
Step 1: Convert HEX into Binary
HEX C 0 B 4 0 0 0
Binary 1100 0000 1011 0100 0000 0000 0000
EE 3610 Digital Systems Suketu Naik
21Floating Point (FP) to DecimalStep 2: Reorganize into packets of 1, 8, and 23 bits long
Sign Exponent Fraction
1 10000001 01101000000000000000000
HEX C 0 B 4 0 0 0
Binary 1100 0000 1011 0100 0000 0000 0000
EE 3610 Digital Systems Suketu Naik
22Floating Point (FP) to DecimalStep 3: Conversion
Sign Exponent Fraction
1 10000001 01101000000000000000000
Sign=1 (negative)Exponent= 10000001 = 1 x 27+ 1 x 20 = 129 => 129-127 = 2Mantissa= 1.01101
N= -1.01101 x 22
= -101.101= -(1 x 22 + 1 x 20 + 1 x 2-1 + 1 x 2-3)= -(4 + 1 + 0.5 + 0.125 ) = -5.625
EE 3610 Digital Systems Suketu Naik
23Floating Point (FP): VHDL
use ieee.float_pkg.all; variable x, y, z : float (5 downto -10); begin y := to_float (3.1415, y); -- Uses “y” for the sizing only. z := “0011101010101010”; –- 1/3 x := z + y;
EE 3610 Digital Systems Suketu Naik
24Application of Floating Point DSPADSP SHARC: most computationally intensive, real-time signal-processing applications Examples: High Definition Audio, Precision Motor/Tool Control, Precision Sensors
ADI SHARC DSP
EE 3610 Digital Systems Suketu Naik
25ReferencesFloating Point Converter http://babbage.cs.qc.cuny.edu/IEEE-754.old/Decimal.html
32-bit single precision floating point adderhttp://upcommons.upc.edu/pfc/bitstream/2099.1/15467/4/32BitFloatingPointAdder.pdf
ADI SHARC Processors http://www.analog.com/en/processors-dsp/sharc/products/index.html