Floating Point Representations CDA 3101 Discussion Session 02.
-
Upload
chad-jackson -
Category
Documents
-
view
213 -
download
0
Transcript of Floating Point Representations CDA 3101 Discussion Session 02.
Floating Point Floating Point RepresentationsRepresentations
CDA 3101 CDA 3101
Discussion Session 02Discussion Session 02
Question 1Question 1• Converting the binary number1010 0100 1001 0010 0100 1001 0010 01002
to decimal, if the binary is
Unsigned? 2’s complement? Single precision floating-point?
Question 1.1Question 1.1• Converting bin (unsigned) to dec 1010 0100 1001 0010 0100 1001 0010 01002
1*231 + 1*229 + … + 1*28 + 1*25 + 1*22
= 2761050404
Question 1.2Question 1.2• Converting bin (2’s complement) to dec 1010 0100 1001 0010 0100 1001 0010 01002
-1*231 + 1*229 + … + 1*28 + 1*25 + 1*22
= -1533916892
Question 1.3Question 1.3• Converting bin (Single precision FP) to dec
1010 0100 1001 0010 0100 1001 0010 01002
Sign bit : 1
Exponent : 01001001 = 73
Fraction : 00100100100100100100100 =1*2-3 + 1*2-6 + … + 1*2-15 + 1*2-18 + 1*2-21
=0.142857074
(-1)S * (1.Fraction) * 2(Exponent - 127)
=(-1)1 * (1.142857074) * 2(73 - 127)
=-1.142857074 * 2-54
=-6.344131187 * 10-17
S(1) Biased Exponent(8) Fraction (23)
Question 2Question 2• Show the IEEE 754 binary representation
for the floating-point number 0.110 in single precision and double precision
Question 2.1Question 2.1• Converting 0.110 to single-precision FP
Step1: Covert fraction 0.1 to binary (multiplying by 2)0.1*2 = 0.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, … 000110011…
1.10011… * 2-4
Step2: Express in single precision format(-1)S * (1.Fraction) * 2(Exponent +127)
=(-1)0 * (1.10011001100110011001100) * 2(-4+127)
0 01111011 10011001100110011001100
Question 2.2Question 2.2• Converting 0.110 to double-precision FP
Step1: Covert fraction 0.1 to binary (multiplying by 2)0.1*2 = 0.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, … 000110011…
1.10011… * 2-4
Step2: Express in double precision format(-1)S * (1.Fraction) * 2(Exponent +1023)
=(-1)0 * (1.1001100110011001100110) * 2(-4+1023)
0 01111111011 1001100110011001100110011001100110011001100110011001
Question 3Question 3• Convert the following single-precision
numbers into decimala. 0 11111111 0000000000000000000000b. 0 00000000 0000000000000000000010
Question 3.1Question 3.1• Converting bin (Single precision FP) to dec 0 11111111 000000000000000000000002
Sign bit : 0 Exponent : 11111111 = Infinity Fraction : 00000000000000000000000 = 0
Infinity
S(1) Biased Exponent(8) Fraction (23)
Question 3.2Question 3.2• Converting bin (Single precision FP) to dec 0 00000000 000000000000000000000102
Sign bit : 0 Exponent : 00000000 = 0 Fraction : 00000000000000000000010 =1*2-22
=0.000000238
(-1)S * (0.Fraction) * 2-126
=(-1)0 * (0.000000238) * 2-126
= 2.797676555 * 10-45
S(1) Biased Exponent(8) Fraction (23)
Question 4Question 4• Consider the 80-bit extended-precision IEEE
754 floating point standard that uses 1 bit for the sign, 16 bits for the biased exponent and 63 bits for the fraction (f). Then, write (i) the 80- bit extended-precision floating point representation in binary and (ii) the corresponding value in base-10 positional (decimal) system of
a. the third smallest positive normalized numberb. the largest (farthest from zero) negative
normalized number c. the third smallest positive denormalized
number that can be represented.
Question 4.1Question 4.1
• The third smallest positive normalized numberBias: 215-1 = 32767
Sign: 0Biased Exponent: 0000 0000 0000 0001Fraction (f): 61 zeros followed by 10Decimal Value: (-1)0*2(1-32767)*(1+2-62) = 2-32766+2-32828
Question 4.2Question 4.2• The largest (farthest from zero)
negative normalized number Sign: 1Biased Exponent: 1111 1111 1111 1110Fraction: 63 onesDecimal Value: (-1)1*2(65534-32767)*(1+2-1+2-2+…+2-63) = -232767(264-1)2-63 = -232768 (approx.)
Question 4.3Question 4.3• The third smallest positive
denormalized number Sign: 0Biased Exponent: 0000 0000 0000 0000Fraction: 61 zeros followed by 11Decimal Value: (-1)0*2-32766*(2-62+2-63) = 3*2-32829