CSE 246: Computer Arithmetic Algorithms and Hardware Design

19
CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Winter 2004 Lecture 10 Thursday 02/19/02

description

CSE 246: Computer Arithmetic Algorithms and Hardware Design. Winter 2004 Lecture 10 Thursday 02/19/02. Instructor: Prof. Chung-Kuan Cheng. Topics:. Rounding F.P. Numbers Ch. 11 (all). Rounding the numbers. Why we need the Sticky bit Round bit Guard bit. Example 1. 1.00000x2 4 - PowerPoint PPT Presentation

Transcript of CSE 246: Computer Arithmetic Algorithms and Hardware Design

Page 1: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246: Computer Arithmetic Algorithms and Hardware Design

Instructor:Prof. Chung-Kuan Cheng

Winter 2004

Lecture 10

Thursday 02/19/02

Page 2: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 2

Topics:

Rounding F.P. Numbers Ch. 11 (all)

Page 3: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 3

Rounding the numbers• Why we need the

• Sticky bit• Round bit• Guard bit

Page 4: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 4

Example 1 1.00000x24

-1.10000x2-3

Normalize according to exponent

1.00000 x24

-0.00000011x24

0.11111101x24

Renormalize

1.1111101x23

Result = 1.11111x23

Take 5 bits after decimal

Round bit

Sticky Bit

Page 5: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 5

Example 2 1.00001x23

-1.01011x2-1

Normalize according to exponent

1.00000 x23

-0.000101011x23

0.111100101x23

Renormalize

1.11100101x22

Result = 1.11101x22

Take 5 bits after decimal

Round bit

Bit on the boundary

Non-zero => round-up

Page 6: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 6

Theory behind it

g r

round

guard Other bits

OR

Sticky bit

When shifting right, don’t need to remember anything more than 3 bits below This is a necessary and sufficient condition

The most we ever normalize is by just 1 bit after a subtraction, since all numbers are exponent-normalized before the operation

Page 7: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 7

Chapter 11 Polynomial Approximation of

Functions

Page 8: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 8

Taylor Series

f(x) = f(x0) +

Example:

sin(x) = x – x^3/3! + x^5/5! – x^7/7!+…

10

0 )(!

)(

i

ii

xxi

xxf

Page 9: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 9

Taylor SeriesGiven:

PN(x) =

= c0+x(c1+x(c2+…+x(cN-1+xcN)))))

R(N) =cN

R(i-1) =ci-1+xR(i)

…PN (X) =R(0)

N

i

iixc

0

How to calculate value of function?

Group common factors ….

N multiples and adds

Recursively

Page 10: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 10

Taylor Series 1 adder => do it in series Given more components => can we go

faster?

Take N = 7 as example

c7x7+c6x6+c5x5+c4x4+c3x3+c2x2+c1x1+c0

How to accelerate?

Page 11: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 11

Taylor Series c7x7+c6x6+c5x5+c4x4+c3x3+c2x2+c1x1+c0

• But this is not much better. Still have overhead of 3 stages to generate x^7

+

x x x x x x x

+ + +

+ +

+

Carry-save=constant time

Log n

x

x2

x3x4

x5 x6 x7

Page 12: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 12

Taylor Series c7x+c6 c5x+c4x c3x+c2 c1x+c0

x2(c7x+c6)+c5x+c4x x2(c3x+c2)+ c1x+c0

x4 [x2(c7x+c6)+c5x+c4x]+x2(c3x+c2)+c1x+c0

• This is a bit faster. Only 2 stages• But what is fastest way to produce result? & energy

efficient?=> minimize[# of multiplies]

• All this uses +’s and x’s. Need to get rid of them.=> Let’s to try table look-up

x

x2

x4

Page 13: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 13

Taylor Series – Table look-up• SRAM/DRAM => eat power

• ROM => better option

f(x) =

• Suppose there is a table as a binary tree.• Let x = xH + xL x0 = xH

ExampleX = 110101xH = 110000 f(xH + xL) =

xL = 000101

0

00 !

)()(

i

ii

i

xxxf

0 !

)()(

i

iL

Hi

i

xxf

Page 14: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 14

Taylor Series – Table look-up• 1st order

f(xH + xL) ~=

=> Only 1 multiplication !!!

LH

H xxf

xf1

)()(

'

x

Table-1

Table-2

x

+ f(xH + xL)xH

xL

f(xH)

f’(xH)

Page 15: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 15

Taylor Series With extra order => 1 Extra table and

1 multiplier If you wish to change the function, all

you have to do is just change the content of the table

Problem? => Now it’s the size of the table!

L /2^L

Page 16: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 16

Taylor Series Let’s reduce X into 3 sections (instead of the

previous 2 (High and Low) )x = x1+x22-k+x32-2k

=>

f(x)= f(x1+x22-k)+x32-2k + f ’(x1) + EpsilonE ~= 2-3k

f(x) requires a 2n x Vn table2n: # of bits of x

Vn: # bits of f(x)

32bit x => 2^32 x 2^32 = 2^64 bits -> HUGE!!

-> but do we really need all those #’s in the table??

Page 17: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 17

Taylor SeriesLet E = epsilon, [] = Lower limit

x*y = (x+y)^2 / 4 – (x-y)^2 / 4= ( [(x+y)/2] + E/2 )^2 - ( [(x-y)/2] + E/2 )^2= [ (x+y)/2 ] ^ 2 - [ (x-y)/2 ] ^ 2 - E * y

………x

Content of lower bits determines lower bits of result, but not other bits !!

………x^2

Table

Page 18: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 18

Taylor Series

2^n x V vs. 2^n x (v-w ) + 2^L x w2^n x v – (2^n x w - 2^L x w )2^n x v – w (2^n - 2^L )

Size of table is reduced by

2^n x vn /x v / f(x)

2^n x (v-w)n /xv-w/

2^L x wL /w/

f(x)

Page 19: CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246 19

End of Ch. 11 Some parts of Ch. 11 (e.g. log ) will be

covered part of Ch. 12 discussion