Lecture 09a Numerical Issues. Lecture 09a, Slide 2 Learning Objectives Numerical issues and data...
-
Upload
blaise-asher-lawrence -
Category
Documents
-
view
220 -
download
5
Transcript of Lecture 09a Numerical Issues. Lecture 09a, Slide 2 Learning Objectives Numerical issues and data...
Lecture 09aLecture 09a
Numerical IssuesNumerical Issues
Lecture 09a, Slide 2
Learning ObjectivesLearning Objectives
Numerical issues and data formats.Numerical issues and data formats. Fixed point.Fixed point. Fractional number.Fractional number. Floating point.Floating point. Comparison of formats and dynamic Comparison of formats and dynamic
ranges.ranges.
Lecture 09a, Slide 3
Numerical Issues and Data FormatsNumerical Issues and Data Formats
C6000 Numerical C6000 Numerical Representation Representation
Fixed point arithmetic:Fixed point arithmetic: 16-bit (integer or fractional). 16-bit (integer or fractional). Signed or unsigned.Signed or unsigned.
Floating point arithmetic:Floating point arithmetic: 32-bit single precision.32-bit single precision. 64-bit double precision.64-bit double precision.
Lecture 09a, Slide 4
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 00 00
Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
2233 2222 2211 2200
00 00 00 00
Unsigned Unsigned integer integer numbersnumbers
Lecture 09a, Slide 5
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 11 1100 00 00 00 00
Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
2233 2222 2211 2200
00 00 00 11
UnsignedUnsigned integer integer numbersnumbers
Lecture 09a, Slide 6
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 1100 00 11 00
1122
00 00 00 00 00
Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
2233 2222 2211 2200
00 00 11 00
UnsignedUnsigned integer integer numbersnumbers
Lecture 09a, Slide 7
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
2233 2222 2211 2200
11 11 11 11
00 00 00 1100 00 11 0000 00 11 1100 11 00 0000 11 00 1100 11 11 0000 11 11 1111 00 00 0011 00 00 1111 00 11 0011 00 11 1111 11 00 0011 11 00 1111 11 11 0011 11 11 11
112233445566778899
101011111212131314141515
00 00 00 00 00UnsignedUnsigned integer integer numbersnumbers
Lecture 09a, Slide 8
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 00 00
00 00 00 00 Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
-2-233 2222 2211 2200
SignedSigned integer integer numbersnumbers
Lecture 09a, Slide 9
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 00 00
00 00 00 11 Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
-2-233 2222 2211 2200
00 00 00 11 11SignedSigned integer integer numbersnumbers
Lecture 09a, Slide 10
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 00 00
00 00 11 00 Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
-2-233 2222 2211 2200
00 00 00 11 1100 00 11 00 22
SignedSigned integer integer numbersnumbers
Lecture 09a, Slide 11
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 1100 00 11 0000 00 11 1100 11 00 0000 11 00 1100 11 11 0000 11 11 11
11223344556677
00 00 00 00 00
Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
-2-233 2222 2211 2200
00 11 11 11
SignedSigned integer integer numbersnumbers
Lecture 09a, Slide 12
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 1100 00 11 0000 00 11 1100 11 00 0000 11 00 1100 11 11 0000 11 11 1111 00 00 00
11223344556677-8-8
00 00 00 00 00
Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
-2-233 2222 2211 2200
11 00 00 00
SignedSigned integer integer numbersnumbers
Lecture 09a, Slide 13
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 1100 00 11 0000 00 11 1100 11 00 0000 11 00 1100 11 11 0000 11 11 1111 00 00 00
11223344556677-8-8
00 00 00 00 00
Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
-2-233 2222 2211 2200
11 00 00 11
11 00 00 11 -7-7
SignedSigned integer integer numbersnumbers
Lecture 09a, Slide 14
Fixed Point Arithmetic - DefinitionFixed Point Arithmetic - Definition
For simplicity a 4-bit representation is used:For simplicity a 4-bit representation is used:
00 00 00 1100 00 11 0000 00 11 1100 11 00 0000 11 00 1100 11 11 0000 11 11 1111 00 00 0011 00 00 1111 00 11 0011 00 11 1111 11 00 0011 11 00 1111 11 11 0011 11 11 11
11223344556677-8-8-7-7-6-6-5-5-4-4-3-3-2-2-1-1
00 00 00 00 00
Decimal Decimal EquivalentEquivalent
Binary Binary NumberNumber
-2-233 2222 2211 2200
11 11 11 11
SignedSigned integer integer numbersnumbers
Lecture 09a, Slide 15
Fixed Point Arithmetic - ProblemsFixed Point Arithmetic - Problems
The following equation is the basis of many The following equation is the basis of many DSP algorithms (See Lecture 01):DSP algorithms (See Lecture 01):
Two problems arise when using signed and Two problems arise when using signed and unsigned integers:unsigned integers: Multiplication overflow.Multiplication overflow. Addition overflow.Addition overflow.
1
0
N
k
knxkany
Lecture 09a, Slide 16
16-bit x 16-bit = 32-bit16-bit x 16-bit = 32-bit Example: using 4-bit representationExample: using 4-bit representation
24 cannot be represented with 4-bits.24 cannot be represented with 4-bits.
Multiplication OverflowMultiplication Overflow
33
88
2424
xx
00 00 11 11
11 00 00 00xx
11 00 00 0000 00 00 11
Lecture 09a, Slide 17
32-bit + 32-bit = 33-bit32-bit + 32-bit = 33-bit Example: using 4-bit representationExample: using 4-bit representation
16 cannot be represented with 4-bits.16 cannot be represented with 4-bits.
Addition OverflowAddition Overflow
11 00 00 00
11 00 00 00++
88
88
1616
++
00 00 00 0011
Lecture 09a, Slide 18
Fixed Point Arithmetic - SolutionFixed Point Arithmetic - Solution
The solutions for The solutions for reducingreducing the overflow the overflow problem are:problem are: Saturate the result.Saturate the result. Use double precision result.Use double precision result. Use fractional arithmetic.Use fractional arithmetic. Use floating point arithmetic.Use floating point arithmetic.
Lecture 09a, Slide 19
Solution - Saturate the resultSolution - Saturate the result
Unsigned numbers:Unsigned numbers: If A x B If A x B 15 15 result = A x B result = A x B If A x B > 15 If A x B > 15 result = 15 result = 15
00 00 11 11
11 00 00 00xx
11 00 00 00
11 11 11 11
00 00 00 11
33
88
2424
1515SaturatedSaturated
Lecture 09a, Slide 20
Solution - Saturate the resultSolution - Saturate the result
Signed numbers:Signed numbers: If -8 If -8 A x B A x B 7 7 result = A x B result = A x B If If A x B > 7 A x B > 7 result = 7 result = 7 If If A x B < -8 A x B < -8 result = -8 result = -8
00 00 11 11
11 00 00 00xx
11 00 00 00
11 00 00 00
11 11 11 00
33
-8-8
-24-24
-8-8SaturatedSaturated
Lecture 09a, Slide 21
Solution - Double precision resultSolution - Double precision result
For a 4-bit x 4-bit multiplication hold the For a 4-bit x 4-bit multiplication hold the result in an 8-bit location.result in an 8-bit location.
Problems:Problems: Uses more memory for storing data.Uses more memory for storing data. If the result is used in another multiplication If the result is used in another multiplication
the data needs to be represented into single the data needs to be represented into single precision format (e.g. prod = prod x sum).precision format (e.g. prod = prod x sum).
Results need to be scaled down if it is to be Results need to be scaled down if it is to be sent to an D/A converter.sent to an D/A converter.
Lecture 09a, Slide 22
Solution - Fractional arithmeticSolution - Fractional arithmetic
If A and B are fractional then:If A and B are fractional then: A x B < min(A, B)A x B < min(A, B) i.e. The result is less than the operands hence i.e. The result is less than the operands hence
it will never overflow.it will never overflow. Examples: Examples:
0.6 x 0.2 = 0.12 (0.12 < 0.6 and 0.12 < 0.2)0.6 x 0.2 = 0.12 (0.12 < 0.6 and 0.12 < 0.2) 0.9 x 0.9 = 0.81 (0.81 < 0.9)0.9 x 0.9 = 0.81 (0.81 < 0.9) 0.1 x 0.1 = 0.01 (0.01 < 0.1)0.1 x 0.1 = 0.01 (0.01 < 0.1)
Lecture 09a, Slide 23
-2-200 22-1-1 22-2-2 22-(N-1)-(N-1)
++
Fractional numbersFractional numbers
Definition:Definition:
00 00 11
-2-200 22-1-1 22-2-2
11
22-(N-1)-(N-1)
00 11 11 11 = MAX= MAX
00 00 00 11 = 2= 2-(N-1)-(N-1)
11 00 00 00 = MAX+2= MAX+2-(N-1) -(N-1) = 1= 1
MAX = 1-2MAX = 1-2-(N-1)-(N-1)
Largest Largest Number:Number:
What is the largest number?What is the largest number?
-1-1 0.50.5 0.250.25
Lecture 09a, Slide 24
Fractional numbersFractional numbers
Definition:Definition:
00 00 11
-2-200 22-1-1 22-2-2
11
22-(N-1)-(N-1)
11 00 00
-2-200 22-1-1 22-2-2
00
22-(N-1)-(N-1)
= MIN = -1= MIN = -1
For 16-bit representation:For 16-bit representation: MAX = 1 - 2MAX = 1 - 2-15 -15 = 0.999969= 0.999969 MIN = -1MIN = -1 -1-1 x < 1 x < 1
Smallest Smallest Number:Number:
What is the smallest number?What is the smallest number?
Lecture 09a, Slide 25
Fractional numbers - Sign ExtensionFractional numbers - Sign Extension
To keep the same resolution as the To keep the same resolution as the operands we need to select these 4-bits:operands we need to select these 4-bits:
00 11 11 00a=a= = 0.5 + 0.25 = 0.75= 0.5 + 0.25 = 0.75
11 11 11 00b=b= = -1 + 0.5 + 0.25 = -0.25= -1 + 0.5 + 0.25 = -0.25
00 00 00 0000 11 11 00 ..
00 11 11 00 .. ..11 00 11 00 .. .. ..
00 11 00 0011 11 11 11
Sign extensionSign extension
11 11 11 00
xx
Lecture 09a, Slide 26
Q-FormatQ-Format
IQ-MathIQ-Math
Lecture 09a, Slide 27
Fractional numbers - Sign ExtensionFractional numbers - Sign Extension
The way to do it is to shift left by one bit The way to do it is to shift left by one bit and store upper 4-bits or right shift by and store upper 4-bits or right shift by three and store the lower 4-bits:three and store the lower 4-bits:
00 11 11 00a=a= = 0.5 + 0.25 = 0.75= 0.5 + 0.25 = 0.75
11 11 11 00b=b= = -1 + 0.5 + 0.25 = -0.25= -1 + 0.5 + 0.25 = -0.25
00 00 0000 11 00
00 11 11 0011 00 11 00
.... ..
.. .. ..
00 11 00 0011 11 11 11
Sign extensionSign extension
11 11 11 00
xx
0000
1100
1100 000000000000
Sign extension bitsSign extension bits
Lecture 09a, Slide 28
CPUCPUMPY A3,A4,A6MPY A3,A4,A6NOP NOP
Q15Q15 s. x x x x x x x x x x x x x x x
s. y y y y y y y y y y y y y y yxx Q15 Q15
s.s z z z z z z z z z z z z z z z z z z z z z z z z z z z z z zQ30Q30
15-bit * 15-bit Multiplication15-bit * 15-bit Multiplication
Store toStore toData MemoryData Memory SHR SHR A6, A6,1515,A6,A6
STH STH A6,*A7 A6,*A7
s. z z z z z z z z z z z z z z zQ15Q15
Lecture 09a, Slide 29
‘‘C6000 C Data TypesC6000 C Data Types
TypeType SizeSize RepresentationRepresentation
char, signed charchar, signed char 8 bits8 bits ASCIIASCIIunsigned charunsigned char 8 bits8 bits ASCIIASCIIshortshort 16 bits16 bits 2’s complement2’s complementunsigned shortunsigned short 16 bits16 bits binarybinaryint, signed intint, signed int 32 bits32 bits 2s complement 2s complement unsigned intunsigned int 32 bits32 bits binarybinarylong, signed longlong, signed long 40 bits 40 bits 2’s complement2’s complementunsigned longunsigned long 40 bits 40 bits binarybinaryenumenum 32 bits 32 bits 2’s complement2’s complementfloatfloat 32 bits 32 bits IEEE 32-bitIEEE 32-bitdoubledouble 64 bits 64 bits IEEE 64-bitIEEE 64-bitlong doublelong double 64 bits 64 bits IEEE 64-bitIEEE 64-bitpointerspointers 32 bits 32 bits binarybinary
Lecture 09a, Slide 30
Pseudo assembly language:Pseudo assembly language:
Pseudo ‘C’ language:Pseudo ‘C’ language:
Fractional numbers - Sign ExtensionFractional numbers - Sign Extension
A0 = 0x80000000 ; initial valueA1 = 0.5 ; initial valueA2 = 0.5 ; initial valueA3 = 0 ; initial value
MPY A1, A2, A3 ; A3 = 0x10000000SHL A3,1,A3 ; A3 = 0x20000000STH A3, *A0 ; 0x2000 -> 0x80000000
or
MPY A1, A2, A3 ; A3 = 0x10000000SHR A3,15,A3 ; A3 = 0x00002000STH A3, *A0 ; 0x2000 -> 0x80000000
short a, b, result;int prod;
prod = a * b;prod = prod >> 15;result = (short) prod;
Lecture 09a, Slide 31
Fractional numbers - ProblemsFractional numbers - Problems
There are some problems that need to There are some problems that need to be resolved when using fractional be resolved when using fractional numbers.numbers.
These are:These are: Result of -1 x -1 = 1Result of -1 x -1 = 1 Accumulative overflow.Accumulative overflow.
Lecture 09a, Slide 32
Problem of -1 x -1Problem of -1 x -1
We have seen that:We have seen that: -1-1 x < 1 x < 1 -1 x -1 = 1 which cannot be represented.-1 x -1 = 1 which cannot be represented.
Solution:Solution: There are two instructions that saturate the There are two instructions that saturate the
result if you have -1 x -1:result if you have -1 x -1:
SMPYSMPY SMPYHSMPYH
Lecture 09a, Slide 33
Problem of -1 x -1Problem of -1 x -1
In one cycle these instructions do the In one cycle these instructions do the following:following: Multiply.Multiply. Shift left by 1-bit.Shift left by 1-bit. Saturate if the sign bits are 01.Saturate if the sign bits are 01.
It can be shown that:It can be shown that:
Positive ResultPositive ResultNegative ResultNegative Result-1 x -1 Result-1 x -1 Result
Result of MPY(H)Result of MPY(H)00.xxx-xb00.xxx-xb11.xxx-xb11.xxx-xb01.xxx-xb01.xxx-xb
Result of SMPY(H)Result of SMPY(H)0.xxx-xb0.xxx-xb1.xxx-xb1.xxx-xb0.xxx-xb0.xxx-xb
Lecture 09a, Slide 34
Problem of Accumulative OverflowProblem of Accumulative Overflow
In this case the overflow is due to the summation.In this case the overflow is due to the summation.
Examples of overflow:Examples of overflow:
99
0k
knxkany
0x7fff + 0x0002 = 0x80010x7fff + 0x0002 = 0x8001
0x7ffe0x7ffe
0x00000x00000xffff0xffff
0x7fff0x7fff0x80010x8001
(positive number + positive number = negative number!)(positive number + positive number = negative number!)
0xffff + 0x0002 = 0x00010xffff + 0x0002 = 0x0001(negative number + positive number = negative number!)(negative number + positive number = negative number!)
Lecture 09a, Slide 35
Problem of Accumulative OverflowProblem of Accumulative Overflow
Solutions:Solutions:(1)(1) Saturate the intermediate results by using these add instructions:Saturate the intermediate results by using these add instructions:
If saturation occurs the SAT bit in the CSR is set to 1. You must If saturation occurs the SAT bit in the CSR is set to 1. You must clear it.clear it.
(2)(2) Use guard bits:Use guard bits:
e.g. e.g. ADD ADD A1A1, , A2A2, , A1:A0A1:A0
SADDSADD SSUBSSUB
Lecture 09a, Slide 36
Problem of Accumulative OverflowProblem of Accumulative Overflow
Solutions:Solutions:(3)(3) Do nothing if the system is Do nothing if the system is Non-Gain:Non-Gain:
With a non-gain system the final result is always less than With a non-gain system the final result is always less than unity.unity.
Example system:Example system:
This will be non-gain if:This will be non-gain if:
99
0
1k
ka
99
0k
knxkany
1ix
Lecture 09a, Slide 37
Floating Point ArithmeticFloating Point Arithmetic
The C67xx support both single and The C67xx support both single and double precision floating point formats.double precision floating point formats.
The single precision format is as The single precision format is as follows:follows:
ss3131
ee3030
ee2222 2121
ee ee mm...... mm00
mm mm......
1-bit1-bit 8-bits8-bits 23-bits23-bits
value = (-1)value = (-1)sign sign * (1.mantissa) * 2* (1.mantissa) * 2(exponent-127)(exponent-127)
s = sign bits = sign bit
e = exponent (8-bit biased : -127)e = exponent (8-bit biased : -127)
m = mantissa (23-bit normalised fraction)m = mantissa (23-bit normalised fraction)
Lecture 09a, Slide 38
Floating Point Arithmetic ExampleFloating Point Arithmetic Example
Example: Conversion between integer and floating point.Example: Conversion between integer and floating point.
Convert ‘dd’ to the IEEE floating point format:Convert ‘dd’ to the IEEE floating point format:
int dd = 0x6000 0000;int dd = 0x6000 0000;
flot1 = (float) dd;flot1 = (float) dd;
Lecture 09a, Slide 39
Floating Point Arithmetic ExampleFloating Point Arithmetic Example
flot1 = 0x4EC0 0000flot1 = 0x4EC0 0000
To view the value of “flot1” use:To view the value of “flot1” use:
VView: iew: MMemory:emory:AAddress= &flot1ddress= &flot1
We find:We find:
Lecture 09a, Slide 40
Floating Point Arithmetic ExampleFloating Point Arithmetic Example
Let us check to see if we have the same Let us check to see if we have the same number:number:
4 E C 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s exponent mantissa
s = 0s = 0
e = 10011101b = 128+16+8+4+1 = 157e = 10011101b = 128+16+8+4+1 = 157
m = 0.100b = 0.5m = 0.100b = 0.5
float1 float1 = (-1)= (-1)00 * (1.5) * 2 * (1.5) * 2(157-127)(157-127) = 1.5 * 2 = 1.5 * 23030
= 1610612736 decimal= 1610612736 decimal
= 0x6000 0000= 0x6000 0000
Lecture 09a, Slide 41
Floating Point IEEE StandardFloating Point IEEE Standard
Special values:Special values:
ss
0011ssss0011ss
ee
000000
0<e<2550<e<255255255255255255255
mm
0000
00mm0000
00
NumberNumber
0-0(-1)s * 0.m * 2-126
(-1)s * 1.m * 2e-127
+-NaN (not a number)
Lecture 09a, Slide 42
Floating Point IEEE StandardFloating Point IEEE Standard
Dynamic range:Dynamic range: Largest positive number:Largest positive number:
e(max) = 255, e(max) = 255, m(max) = 1-2m(max) = 1-2-(23-1)-(23-1)
max max = [1 + (1 -2= [1 + (1 -2-24-24)] * 2)] * 2255-127255-127
= 3.4 * 10= 3.4 * 103838
Smallest positive number:Smallest positive number: e(min) = 0, e(min) = 0, m(min) = 0.5 (normalised 0.100…0b)m(min) = 0.5 (normalised 0.100…0b) minmin = 1.5 * 2= 1.5 * 2-127-127 = 8.816 * 10 = 8.816 * 10-39-39
value = (-1)value = (-1)sign sign * (1.mantissa) * 2* (1.mantissa) * 2(exponent-127)(exponent-127)
Lecture 09a, Slide 43
Floating Point IEEE StandardFloating Point IEEE Standard
Dynamic range:Dynamic range: Largest negative number:Largest negative number:
e(max) = 255, e(max) = 255, m(max) = 1-2m(max) = 1-2-24 -24
max max = [-1 + (1 -2= [-1 + (1 -2-24-24)] * 2)] * 2255-127255-127
= -3.4 * 10= -3.4 * 103838
Smallest negative number:Smallest negative number: e(min) = 0, e(min) = 0, m(min) = 0.5 (normalised 1.100…0b)m(min) = 0.5 (normalised 1.100…0b) minmin = -1.5 * 2= -1.5 * 2-127-127 = -8.816 * 10 = -8.816 * 10-39-39
value = (-1)value = (-1)sign sign * (1.mantissa) * 2* (1.mantissa) * 2(exponent-127)(exponent-127)
Lecture 09a, Slide 44
Floating/Fixed Point SummaryFloating/Fixed Point Summary
Floating point single precision:Floating point single precision:
Floating point double precision:Floating point double precision:ss
3131
ee3030
ee2323 2222
ee ee mm...... mm00
mm mm......
1-bit1-bit 8-bits8-bits 23-bits23-bits
ss6363
ee6262
ee5252 5151
ee ee mm...... mm00
mm mm......
1-bit1-bit 11-bits11-bits 52-bits52-bits
value = (-1)value = (-1)ss * 1.m * 2 * 1.m * 2e-127e-127
value = (-1)value = (-1)ss * 1.m * 2 * 1.m * 2e-1023e-1023
odd:even registersodd:even registers
Lecture 09a, Slide 45
Floating/Fixed Point Summary Floating/Fixed Point Summary (Short: N = 16;(Short: N = 16; Int: N = 32)Int: N = 32)
Unsigned integer:Unsigned integer:
Signed integer:Signed integer:
Signed fractional:Signed fractional: xx22N-1N-1 2200
xx xx......
2211
xx-2-2N-1N-1 2200
xx xx......
2211
xx-2-200 22-(N-1)-(N-1)
xx......xx22-1-1
xx22-2-2
Lecture 09a, Slide 46
Floating/Fixed Point Dynamic RangeFloating/Fixed Point Dynamic Range
Smallest Number Smallest Number (positive)(positive)
Largest Number Largest Number (positive)(positive)
Smallest Number Smallest Number (negative)(negative)
Floating Floating Point Point Single Single
PrecisionPrecision
3.4 x 103.4 x 103838
8.8 x 108.8 x 10-39-39
-3.4 x 10-3.4 x 103838
221616 - 1 - 1
11
-2-21616
16-bit16-bit
223232 - 1 - 1
11
-2-23232
32-bit32-bit
1-21-2-15-15
22-15-15
-1-1
16-bit16-bit
1-21-2-31-31
22-31-31
-1-1
32-bit32-bit
IntegerInteger
Fixed PointFixed Point
FractionalFractional
Lecture 09a, Slide 47
Numerical Issues - Useful TipsNumerical Issues - Useful Tips Multiply by 2: Multiply by 2: Use shift leftUse shift left Divide by 2:Divide by 2: Use shift rightUse shift right LogLog22N:N: Use shiftUse shift Sine, Cosine, Log:Sine, Cosine, Log: Use look up tablesUse look up tables To convert a fractional number to hex:To convert a fractional number to hex:
Num x 2Num x 21515
Then convert to hexThen convert to hex
e.g: convert 0.5 to hexe.g: convert 0.5 to hex 0.5 x 20.5 x 21515 = 16384 = 16384 (16384)(16384)decdec = (0x4000) = (0x4000)hexhex
Lecture 09a, Slide 48
Numerical Issues - 32-bit MultiplicationNumerical Issues - 32-bit Multiplication
It is possible to perform 32-bit multiplication using It is possible to perform 32-bit multiplication using 16-bit multipliers.16-bit multipliers.
Example: c = a x b (with 32-bit values).Example: c = a x b (with 32-bit values).
aahh aall
bbhh bbll
a =a =
b =b =
32-bits32-bits
a * b a * b == (a(ahh << 16 + a << 16 + all)* (b)* (bhh << 16 + b << 16 + bll))
== [(a[(ahh * b * bhh) << 32] + [(a) << 32] + [(all * b * bhh) << 16] + ) << 16] +
[(a[(ahh * b * bll) << 16] + [a) << 16] + [all * b * bl l ]]
Lecture 09a, Slide 49
LinksLinks
Further reading:Further reading: Understanding TMS320C62xx DSP Single-precision Understanding TMS320C62xx DSP Single-precision
Floating-Point Functions:Floating-Point Functions: spra515.pdfspra515.pdf TMS320C6000 Integer Division: TMS320C6000 Integer Division: spra707.pdfspra707.pdf
Lecture 09aLecture 09a
Numerical IssuesNumerical Issues
- End -- End -