Floating Point Numbers

7
Floating Point Numbers Muddsar Jamil CS 147

description

Muddsar Jamil CS 147. Floating Point Numbers. Introduction & Representation. Provides the ability to represent very large numbers, as well as very small numbers. Example: 1 Trillion = 40 bits to left of radix pt. Retaining as much precision as needed increase calculation efficiency. - PowerPoint PPT Presentation

Transcript of Floating Point Numbers

Page 1: Floating Point Numbers

Floating Point Numbers

Muddsar JamilCS 147

Page 2: Floating Point Numbers

Introduction & Representation

Provides the ability to represent very large numbers, as well as very small numbers.

Example: 1 Trillion = 40 bits to left of radix pt. Retaining as much precision as needed

increase calculation efficiency.

A great deal of extra hardware is required in order to store/manipulate numbers with 80 bits or more.

Page 3: Floating Point Numbers

Computer Representation of Floating Point Numbers

_ _ _ _ . _ _ _ _ _ _ _ _ _ _ _ _

This format makes for easier comparison: =, =/=, <=, ≥

Example: Convert (358)10

in to the above format to be used as a floating point number.

Java: Float x = new Float(358.0f)

Sign Bit0 = +1 = -

3-bitexponent

Three base 16 digits

Page 4: Floating Point Numbers

Example Continued

First step is to convert 358 from base 10 to 16. Using Horner's method:

358/16 = 22 --- R 6 22/16 = 1 --- R 6 358

10 = 166

16 Next, convert to floating-point and Normalize

(166)10

= (166.)16

x 160

Normalize: ( .166 )16

x 163

The exponent is 3, but we represent it in excess 4: 0 1 1 (+3)

10 Excess 4 + 1 0 0 (+4)

10

= 1 1 1

0 1 1 1 . 0 0 0 1 0 1 1 0 0 1 1 0 + 3 1 6 6

Sign Expon. Fraction

Page 5: Floating Point Numbers

Fractional -> Fixed Point Conversion

Convert (XYZ.375)10

to Binary First, convert XYZ using Horner's method. Next, Convert the .375

10 as following:

.375 x 2 = 0.75 .75 x 2 = 1.5 .5 x 2 = 1.0

So (.375)10

= (.011)2

Most Significant Bit

Least Significant Bit

Page 6: Floating Point Numbers

IEEE 754 Floating Point Standard

Created in 1985 to ensure standard representation among different systems.

Most new architectures support IEEE 754.

Two Formats: Single Precision

1 8 23

Double Precision

=32 bits totalSign Expon. Fraction

=64 bits1 11 52Sign Expon. Fraction

Page 7: Floating Point Numbers

IEEE 754 Representations

Can Represent (among others) Non-zero, normalized numbers

Clean zero All 0s in exponent and fraction Sign bit can be 0, or 1, to represent +0 or -0

Infinity / Overflow / NaN Exponent contains all 1s, Fraction is all 0s Sign bit can be 0, or 1 0 / 0; Sqrt(-1);