Floating Point Numbers CS208. Floating Point Numbers Now you've seen unsigned and signed integers....

Floating Point Numbers

Now you've seen unsigned and signed integers. In real life we also need to be able represent numbers with fractional parts (like: -12.5 & 45.39).

Called Floating Point numbers.

You will learn the IEEE 32-bit floating point representation.

In the decimal system, a decimal point (radix point) separates the whole numbers from the fractional part

Examples:

37.25 ( whole = 37, fraction = 25/100)

123.567

10.12345678

For example, 37.25 can be analyzed as:

101 100 10-1 10-2

Tens Units Tenths Hundredths

3 7 2 5

37.25 = (3 x 10) + (7 x 1) + (2 x 1/10) + (5 x 1/100)

Binary Equivalence

The binary equivalent of a floating point number can be determined by computing the binary representation for each part separately.

1) For the whole part:

Use subtraction or division method previously learned.

2) For the fractional part:

Use the subtraction or multiplication method (to be shown next)

Fractional Part – Multiplication Method

In the binary representation of a floating point number the column values will be as follows:

… 25 24 23 22 21 20 . 2-1 2-2 2-3 2-4 …

… 32 16 8 4 2 1 . 1/2 1/4 1/8 1/16…

… 32 16 8 4 2 1 . .5 .25 .125 .0625…

Ex 1. Find the binary equivalent of 0.25 Step 1: Multiply the fraction by 2 until the fractional part

becomes 0 .25 x 2 0.5

x 2 1.0

Step 2: Collect the whole parts in forward order. Put

them after the radix point

. .5 .25 .125 .0625 . 0 1

Ex 2. Find the binary equivalent of 0.625 Step 1: Multiply the fraction by 2 until the fractional part

becomes 0 .625 x 2 1.25

x 2 0.50

x 21.0

Step 2: Collect the whole parts in forward order. Put them after the radix point

. .5 .25 .125 .0625 . 1 0 1

Fractional Part – Subtraction Method

Start with the column values again, as follows:

… 20 . 2-1 2-2 2-3 2-4 2-5 2-6…

… 1 . 1/2 1/4 1/8 1/16 1/32 1/64…

… 1 . .5 .25 .125 .0625 .03125 .015625…

Fractional Part – Subtraction Method

Starting with 0.5, subtract the column values from left to right. Insert a 0 in the column if the value cannot be subtracted or 1 if it can be. Continue until the fraction becomes .0

.25 .5 .25 .125 .0625 - .25 .0 1 .0

Binary Equivalent of FP number

Ex 2. Convert 37.25, using subtraction method.

64 32 16 8 4 2 1 . .5 .25 .125 .0625

26 25 24 23 22 21 20 . 2-1 2-2 2-3 2-4

1 0 0 1 0 1 37 - 32

. 0 1 .25

- .25 .0

37.2510 = 100101.012

Binary Equivalent of FP number

Ex 3. Convert 18.625, using subtraction method. 64 32 16 8 4 2 1 . .5 .25 .125 .0625 26 25 24 23 22 21 20 . 2-1 2-2 2-3 2-4

1 0 0 1 0 1 0 1

18 .625 - 16 - .5 2 .125

- 2 - .125 0 0

18.62510 = 10010.1012

Try It Yourself

Convert the following decimal numbers to binary:

60.7510

190.562510

(Answers on next page)

Answers

60.7510 = 111100.112

190.562510

= 10111110.10012

Problem storing binary form

We have no way to store the radix point!

Standards committee came up with a way to store floating point numbers (that have a decimal point)

IEEE Floating Point Representation

Floating point numbers can be stored into 32-bits, by dividing the bits into three parts:

the sign, the exponent, and the mantissa.

1 2 9 10 32

The first (leftmost) field of our floating point representation will STILL be the sign bit:

0 for a positive number, 1 for a negative number.

Storing the Binary Form

How do we store a radix point?

- All we have are zeros and ones…

Make sure that the radix point is ALWAYS in the same position within the number.

Use the IEEE 32-bit standard

the leftmost digit must be a 1

Solution is NormalizationEvery binary number, except the one corresponding to the number zero, can be normalized by choosing the exponent so that the radix point falls to the right of the leftmost 1 bit.

37.2510 = 100101.012 = 1.0010101 x 25

7.62510 = 111.1012 = 1.11101 x 22

0.312510 = 0.01012 = 1.01 x 2-2

The second field of the floating point number will be the exponent.

The exponent is stored as an unsigned 8-bit number, RELATIVE to a bias of 127. Exponent 5 is stored as (127 + 5) or 132

132 = 10000100 Exponent -5 is stored as (127 + (-5)) or 122

122 = 01111010

Try It Yourself

How would the following exponents be stored (8-bits, 127-biased):

(Answers on next slide)

Answers

exponent -10 8-bit

bias +127 value

117 01110101

28 exponent 8 8-bit

bias +127 value

135 10000111

The mantissa is the set of 0’s and 1’s to the left of the radix point of the normalized (when the digit to the left of the radix point is 1) binary number.

Ex: 1.00101 X 23

(The mantissa is 00101)

The mantissa is stored in a 23 bit field, so we add zeros to the right side and store:

00101000000000000000000

Decimal Floating Point toIEEE standard Conversion

Ex 1: Find the IEEE FP representation of 40.15625

Step 1.

Compute the binary equivalent of the whole part and the fractional part. (i.e. convert 40 and .15625 to their binary equivalents)

40 .15625- 32 Result: -.12500 Result: 8 101000 .03125 .00101- 8 -.03125 0 .0

So: 40.1562510 = 101000.001012

Step 2. Normalize the number by moving the decimal point to the right of the leftmost one.

101000.00101 = 1.0100000101 x 25

Step 3. Convert the exponent to a biased exponent

127 + 5 = 132

And convert biased exponent to 8-bit unsigned binary:

13210 = 100001002

Step 4. Store the results from steps 1-3:

Sign Exponent Mantissa

(from step 3) (from step 2)

0 10000100 01000001010000000000000

Decimal Floating Point toIEEE standard ConversionEx 2: Find the IEEE FP representation of –24.75 Step 1. Compute the binary equivalent of the whole

part and the fractional part.

24 .75- 16 Result: - .50 Result: 8 11000 .25 .11- 8 - .25 0 .0

So: -24.7510 = -11000.112

Step 2.

Normalize the number by moving the decimal point to the right of the leftmost one.

-11000.11 = -1.100011 x 24

Decimal Floating Point toIEEE standard Conversion.

Step 3. Convert the exponent to a biased exponent

127 + 4 = 131

==> 13110 = 100000112 Step 4. Store the results from steps 1-3

Sign Exponent mantissa1 10000011 1000110..0

Try It Yourself

Convert the following numbers to IEEE

floating point representation:

60.7510

-190.562510

Answers

60.7510 =0 10000100 11100110..0

-190.562510 =1 10000110 011111010010..0

IEEE standard to Decimal Floating Point Conversion.

Do the steps in reverse order

In reversing the normalization step move the radix point the number of digits equal to the exponent: If exponent is positive, move to the right If exponent is negative, move to the left

Ex 1: Convert the following 32-bit binary number to its decimal floating point equivalent:

Sign Exponent Mantissa 1 01111101 010..0

IEEE standard to Decimal Floating Point Conversion..

Step 1: Extract the biased exponent and unbias it

Biased exponent = 011111012 = 12510

Unbiased Exponent: 125 – 127 = -2

Step 2: Write Normalized number in the form:

1 . ____________ x 2 ----

For our number:-1. 01 x 2 –2

Mantissa

Exponent

Step 3: Denormalize the binary number from step 2 (i.e. move the decimal and get rid of (x 2n) part):

-0.01012 (negative exponent – move left)

Step 4: Convert binary number to the FP equivalent

(i.e. Add all column values with 1s in them)

-0.01012 = - ( 0.25 + 0.0625)

= -0.312510

Ex 2: Convert the following 32 bit binary number to its decimal floating point equivalent:

Sign Exponent Mantissa

0 10000011 10011000..0

Step 1: Extract the biased exponent and unbias it

Biased exponent = 10000112 = 13110

Unbiased Exponent: 131 – 127 = 4

Step 2: Write Normalized number in the form:

1 . ____________ x 2 ----

For our number:

1.10011 x 2 4

Mantissa

Exponent

Step 4: Convert binary number to the FP equivalent (i.e. Add all column values with 1s in them)

11001.1 = 16 + 8 + 1 +.5

= 25.510

Step 3: Denormalize the binary number from step 2 (i.e. move the decimal and get rid of (x 2n) part:

11001.12 (positive exponent – move right)

Try It Yourself

Convert the following IEEE floating point numbers to decimal:

0 10000010 1110010..0

0 10000110 01010..0

1 01111101 10100..0

Answers

0 10000010 1110010..0 =15.12510

0 10000110 01010..0 =168.010

1 01111101 10100 =-0.4062510

Floating Point Numbers CS208. Floating Point Numbers Now you've seen unsigned and signed integers....

Documents

Transcript of Floating Point Numbers CS208. Floating Point Numbers Now you've seen unsigned and signed integers....

Lecture03 Floating Point Numbers

Instruction Set Extensions for Computation on Complex Floating Point Numbers

A anguage eference anual - Columbia University6 ‘\t’ Horizontal table character ‘\0’ Null character 3.1.1.3 Floating Numbers As for floating point numbers, MASL supports the

Representation of floating point numbers in IEEE 754 ...

Floating Point Arithmetic. Outline Overflow Fixed Point Numbers Floating Point Numbers Excess Representation Arithmetic with Floating Point.

Lecture 13 Floating Point I (1) Fall 2005 Lecture 13: (Integer Multiplication and Division) FLOATING POINT NUMBERS.

Floating-point numberskbeach/courses/fall2017/... · Floating-point numbers ‣ Floating-point numbers have the form ‣ The adjustable radix point allows for calculation over a wide

Set 16 FLOATING POINT ARITHMETIC. TOPICS Binary representation of floating point Numbers Computer representation of floating point numbers Floating point.

Lecture #1: Floating Point Numbers...4 IEEE Floating Point: Single Precision • 32 bits available: – 1 bit sign; 8 bit exponent; 23 bit significand • Special numbers: (zero and

GRAPHIC ERA UNIVERSITY DEHRADUN … implementation of dice game. 8 3 Floating point arithmetic and programmable gate arrays Representation of floating point numbers, floating point

Floating Point Numbers & Parallel Computing. Outline Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing.

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Instruction Set Extensions for Computation on Complex Floating Point Numbers Authors:

Floating Point Numbers - eng.auburn.edunelsovp/courses/elec5200_6200/ELEC5200_6200... · Floating Point Numbers. ELEC 5200/6200 - From P-H slides We need a way to represent a wide

Floating Point Numbers - Confex · PDF fileThe decimal digits are in binary-coded decimal ... for the rightmost byte where the sign code replaces the zone code. Floating Point Numbers

Lecture 17 Floating point - University of California, San Diego · 2017-03-07 · Scott B. Baden / CSE 160 /Wi '16 6 Floating point numbers are not real numbers! • Floating-point

Part 4 Binary Floating-point Numbers

Secure Computation on Floating Point Numbers€¦ · Secure Computation on Floating Point Numbers Mehrdad Aliasgari, Marina Blanton, Yihua Zhang, and Aaron Steele University of Notre

Floating point numbers in Scilab · Floating point numbers are at the core of numerical computations (as in Scilab, Matlab and Octave, for example), as opposed to symbolic computations

How to Print Floating-Point Numbers Accuratelykurtstephens.com/files/p372-steele.pdf · RETROSPECTIVE: How to Print Floating-Point Numbers Accurately Guy L. Steele Jr. Sun Microsystems

CSCI 2021: Binary Floating Point Numbers