Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.
-
Upload
madeline-lamb -
Category
Documents
-
view
214 -
download
1
Transcript of Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.
![Page 1: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/1.jpg)
Ellen Spertus
MCS 111
October 11, 2001
Floating Point Arithmetic
![Page 2: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/2.jpg)
2
Decimal addition (1)
• Problem: 9.999×101 + 1.610×10-1
• Estimate answer:
![Page 3: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/3.jpg)
3
Decimal addition (2)
• Problem: 9.999×101 + 1.610×10-1
• Calculate answer:
9.999×101
+1.610×10-1
![Page 4: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/4.jpg)
4
Decimal addition (3)
• Problem: 9.999×101 + 1.610×10-1
• How should we add them?
![Page 5: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/5.jpg)
5
Floating point addition
• Adjust numbers to have same exponent
• Add the significands
• Normalize the sum
![Page 6: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/6.jpg)
6
Binary addition (1)
• Problem: 1.01×22 + 1.101×2-1
• Adjust numbers to have same exponent:
• Add the significands
• Normalize the sum
![Page 7: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/7.jpg)
7
Binary addition (2)
• Problem: 1.11×21 + 1.01×23
• Adjust numbers to have same exponent:
• Add the significands
• Normalize the sum
![Page 8: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/8.jpg)
8
8-bit floating-point format (2)
• Exponent (3 bits) is biased by 3
• The leading one of significand is implicit
• Zero is represented by all zeros
sign 1 bit
exponent 3 bits
significand 4 bits
number base 2
number base 10
0 100 0000 0 000 1000
![Page 9: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/9.jpg)
9
Practice
Add two numbers from previous slide
sign 1 bit
exponent 3 bits
significand 4 bits
number base 2
number base 10
0 100 0000 0 000 1000
![Page 10: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/10.jpg)
10
Problem
![Page 11: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/11.jpg)
11
Rounding (1)
• Round 1.00011 to have one fewer digit
• Modes– Always round up (IRS)– Always round down– Truncate– Round to nearest even
![Page 12: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/12.jpg)
12
Rounding (2)
• Round -1.00011 to have one fewer digit
• Modes– Always round up (IRS)– Always round down– Truncate– Round to nearest even
![Page 13: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/13.jpg)
13
Ensuring accurate results
• Our significands are 4 bits wide.
• We use 6 bits when adding two significands.– Guard bit– Round bit
• Purpose: Accurate rounding
![Page 14: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/14.jpg)
14
Adding large numbers
• What if we add 1.1111×24 + 1.1111×24
![Page 15: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/15.jpg)
15
How can we get underflow?
![Page 16: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/16.jpg)
16
Associativity of arithmetic
• (x+y)+z = x+(y+z)
• When is this true?
![Page 17: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/17.jpg)
17
Breakdown of associativity
• Values– x = 1.0000– y = 0.00001– z = 0.00001
Assume rounding by truncation.
• (x+y)+z • x+(y+z)
![Page 18: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/18.jpg)
18
MIPS floating point
• 32 floating-point registers (32 bits each)• Instructions
– Addition: add.s, add.d– Subtraction: sub.s, sub.d– Multiplication: mul.s, mul.d– Division: div.s, div.d– Comparison: c.x.s and c.x.d where x is:
eq, neq, lt, le, gt, ge
– Conditional branch: bc1t, bc1f
![Page 19: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/19.jpg)
19
Summary
• Computers aren’t limited to integers
• Floating-point arithmetic is quirky– Loss of precision due to rounding– Underflow– Overflow
• Big picture: Floating point arithmetic can be implemented with enough ______________________.
![Page 20: Ellen Spertus MCS 111 October 11, 2001 Floating Point Arithmetic.](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649e745503460f94b73d89/html5/thumbnails/20.jpg)
20