Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore,...

16
Floating Point Numbers Prof. Usagi

Transcript of Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore,...

Page 1: Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore, the decimal point is fixed • Floating point • One bit is used for representing

Floating Point NumbersProf. Usagi

Page 2: Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore, the decimal point is fixed • Floating point • One bit is used for representing

Floating point numbers

2

Page 3: Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore, the decimal point is fixed • Floating point • One bit is used for representing

• If you add the largest integer with 1, the result will become the smallest integer.

3

Let’s revisit the 4-bit binary adding• 7 + 1 = ?

0 1 1 1+ 0 0 0 1

0001 = -8

111

Sign bit

Page 4: Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore, the decimal point is fixed • Floating point • One bit is used for representing

• We want to express both a relational number’s “integer” and “fraction” parts • Fixed point

• One bit is used for representing positive or negative • Fixed number of bits is used for the integer part • Fixed number of bits is used for the fraction part • Therefore, the decimal point is fixed

• Floating point • One bit is used for representing positive or negative • A fixed number of bits is used for exponent • A fixed number of bits is used for fraction • Therefore, the decimal point is floating —

depending on the value of exponent4

“Floating” v.s. “Fixed” point

+/- Integer Fraction.is always here

+/- Exponent Fraction.Can be anywhere in the fraction

Page 5: Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore, the decimal point is fixed • Floating point • One bit is used for representing

• Realign the number into 1.F * 2e • Exponent stores e + 127 • Fraction only stores F

5

IEEE 754 format+/- Exponent (8-bit) Fraction (23-bit)32-bit float

Page 6: Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore, the decimal point is fixed • Floating point • One bit is used for representing

Floating point adder

6

Page 7: Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore, the decimal point is fixed • Floating point • One bit is used for representing

Demo — what’s in c?

7

#include <stdio.h>

int main(int argc, char **argv) { float a, b, c; a = 1280.245; b = 0.0004; c = a + b; printf("1280.245 + 0.0004 = %f\n",c); return 0; }

Page 8: Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore, the decimal point is fixed • Floating point • One bit is used for representing

• Not all applications require “high precision” • Deep neural networks are surprisingly error tolerable

8

Other floating point formats

+/- Exponent (8-bit) Fraction (23-bit)32-bit float

64-bit double +/- Exponent (11-bit) Fraction (52-bit)

+/- Exp (5-bit) Fraction (10-bit)16-bit half added in 2008

Page 9: Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore, the decimal point is fixed • Floating point • One bit is used for representing

Can you tell the difference?

9

Higher resolution

But we all can tell they are our mascots!

Page 10: Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore, the decimal point is fixed • Floating point • One bit is used for representing

How about this?

10

Page 11: Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore, the decimal point is fixed • Floating point • One bit is used for representing

• Not all applications require “high precision” • Deep neural networks are surprisingly error tolerable

11

Other floating point formats

+/- Exponent (8-bit) Fraction (23-bit)32-bit float

64-bit double +/- Exponent (11-bit) Fraction (52-bit)

+/- Exp (5-bit) Fraction (10-bit)16-bit half added in 2008

Page 12: Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore, the decimal point is fixed • Floating point • One bit is used for representing

Mixed-precision

12

Page 13: Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore, the decimal point is fixed • Floating point • One bit is used for representing

Google’s Tensor Processing Units

13

https://cloud.google.com/tpu/docs/system-architecture

Page 14: Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore, the decimal point is fixed • Floating point • One bit is used for representing

EdgeTPU

14

Page 15: Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore, the decimal point is fixed • Floating point • One bit is used for representing

• Reading quiz 5 due 4/28 BEFORE the lecture • Under iLearn > reading quizzes

• Lab 3 due 4/30 • Watch the video and read the instruction BEFORE your session • There are links on both course webpage and iLearn lab section • Submit through iLearn > Labs

• Midterm on 5/7 during the lecture time, access through iLearn — no late submission is allowed — make sure you will be able to take that at the time

• Check your grades in iLearn15

Announcement

Page 16: Floating Point Numbers - University of California, Riversidehtseng/classes/ee120a...• Therefore, the decimal point is fixed • Floating point • One bit is used for representing

つづく

ElectricalComputerEngineering

Science 120A