FFTMultiplicationLER.pdf

8/14/2019 FFTMultiplicationLER.pdf

1/15

School of Computer Science and Engineering

University of New South Wales

COMP3121/3821/9101/9801

A. Ignjatovic

4/4/2013

Polynomial Multiplication and The Fast Fourier Transform (FFT)

We now continue elaborating on the methods from the previous lecture on fast multi-

plication of large integers, focusing our attention on the problem of efficient multiplication

of polynomials. Read pages 822-838 of the second edition of the textbook (CLRS) or 776-

791 of the rst edition. Why are we doing the FFT??? Besides being a great example

of divide-and-conquer design strategy, it is BY FAR the MOST EXECUTED algorithmtoday; it runs huge number of times each second in your mobile phone, your modem, your

digital camera, your MP3 player ... It is arguably the MOST important algorithm today,

without any serious competition!

Multiplication of Polynomials

Let A(x) =

n j =0 A j x

j , B (x) =

n j =0 B j x

j be two polynomials of degree n; (if one of

the polynomials is of lower degree, we can pad it with leading zero coefficients). Let usset C (x) = A(x) B(x); then C (x) is of degree (at most) 2n. Thus, it can be written asC (x) = 2n j =0 c j x j , and if we set A i and B i to zero for i > n , we have

(1) C (x) =2n

j =0

c j x j = A(x) B(x) =2n

j =0

j

i=0

AiB ji x j .

Thus, we have to nd an efficient algorithm for nding the coefficients

c j =

j

i=0AiB ji

for j 2n, from Ai , Bi , i n . Prima facie, nding the coefficients of C (x) directly stillinvolves (n + 1) 2 multiplications, because all pairs of the form AiB j , 0 i, j n appearin the coefficients c j = ji=0 AiB ji . 1


2/15

2

Let An . . . A0 and Bn . . . B0 be an arbitrary pair of number sequences; lets us pad them

with zeros from the left to length 2 n + 1, i.e., let us set A i = 0 and B i = 0 for n < i 2n.Then the sequence

j

i=0

AiB ji2n

j =0

is called the (linear) convolution of the sequences A and B , and is denoted A B:

A B = {An Bn , . . . , A2B0 + A1B1 + A0B2, A1B0 + A0B1, A0B0}Thus, we need efficient algorithms for evaluating linear convolution of two sequences.

Coefficient vs value representation of polynomials. Every polynomial A(x) of degree n is

uniquely determined by its values at n + 1 distinct input values for x:

A(x) {(x0, A(x0)), (x1, A(x1)), . . . , (xn , A(xn ))}If A(x) = An xn + An1x

n1 + . . . + A0, we can write in matrix form:

(2)

1 x0 x20 . . . xn01 x1 x21 . . . xn1... ... ... ... ...

1 xn x2n . . . xnn

A0

A1...

An

=

A(x0)

A(x1)

...

A(xn )

.

The determinant of the above matrix is the Van Der Monde determinant, and if all xi

are distinct, it can be shown that it is non-zero, because

(3) det

1 x0 x20 . . . xn01 x1 x21 . . . xn1... ... ... ... ...

1 xn x2n . . . xnn

=

1 x0 x20 . . . xn01 x1 x21 . . . xn1... ... ... ... ...

1 xn x2n . . . xnn

= i= j (xi x j ) = 0 .

Thus, if all xi are distinct, given any values A(x0), A(x1), . . . , A(xn ) the coefficients

A0, A1, . . . , An are uniquely determined:


3/15

3

(4)

A0

A1...

An

=

1 x0 x20 . . . xn0

1 x1 x21 . . . x

n1...

... ...

... ...

1 xn x2n . . . xnn

1A(x0)

A(x1)...

A(xn )

Why do we consider value representation of polynomials? Because polynomials in value

representation are easy to multiply: If

A(x) {(x0, A(x0)), (x1, A(x1)), . . . , (xn , A(xn ))}and

B(x) {(x0, B (x0)), (x1, B (x1)), . . . , (xn , B (xn ))},then their product C (x) = A(x)B(x) can be value represented as

C (x) {(x0, A(x0)B(x0)), (x1, A(x1)B(x1)), . . . , (xn , A(xn )B(xn ))},which involves only n multiplications of the form A(xi)B(xi). Thus, unlike the polynomials

in coefficient form, polynomials in value form are easy to multiply in linear time. For this

reason our strategy will be as follows:

We will nd a fast algorithm for converting coefficient representation of

polynomials into value representation, we will multiply the polynomials in

their value form in linear time, and then we will nd a fast algorithm for

converting value representation into the standard coefficient representation.

However, if A(x) and B(x) are of degree n, the product polynomial C (x) = A(x)B(x)

is of degree 2n, and to uniquely determine it, we need 2 n + 1 of its values:

C (x) = A(x) B(x) {(x0, A(x0)B(x0)), (x1, A(x1)B(x1)), . . . , (x2n , A(x2n )B(x2n ))}Thus, we must overdetermine A(x) and B(x) by starting with 2 n + 1 values of these two

polynomials:


4/15

4

A(x) {(x0, A(x0)), (x1, A(x1)), . . . , (x2n , A(x2n ))}B(x) {(x0, B (x0)), (x1, B (x1)), . . . , (x2n , B (x2n ))}

C (x) = A(x) B(x) {(x0, A(x0)B(x0)), (x1, A(x1)B(x1)), . . . , (x2n , A(x2n )B(x2n ))}We will then use these 2 n + 1 values of C (x) to nd the coefficients ci , 0 i 2n; this

is called interpolation . Finding the values at certain set of points (knowing the coefficients

of the polynomial) is called evaluation .

Thus, to nd the coefficients of a polynomial of order 2n we need only nd its values at

2n + 1 points. In case of large integer multiplication, instead of looking at values for large

x like 2k , we choose small values of x, namely

xi {n, (n 1), . . . , 0, . . . (n 1), n}.However, as we saw, this produced gigantic constants to be used in our algorithm, for

example, multiplications with n2n , which rendered our algorithm useless in practice. Thus

we need inputs for our polynomials whose all powers are of the same size, and to do that

we must resort to complex numbers. Besides controlling the sizes of numbers involved,

using complex roots of unity will provide another key feature which will make our divide-and-conquer algorithms fast (the cancelation lemma below).

Complex numbers z = a + ib can be represented using their modulus |z | = a2 + b2 andtheir argument , dened as arg z = arctan ba , where the arctan function (pronounced arcus

tangens) is dened so that it takes values in ( , ]:z = |z |ei arg z = |z |(cos arg z + i sinarg z ),

see gure below.

As you can recall, z n = |z |n ein arg z ; thus, if we take the primitive nth root of unity, i.e.,n = e

2n i , since |n | = 1, we have |mn | = |n |m = 1, for all m. Note that kn = e

2kn i; thus,

all powers of n belong to the unit circle and are equally spaced, having arguments which

are integer multiples of 2n .


5/15

5

Besides remaining of constant size (modulus), roots of unity satisfy the following

cancelation property:

(dn )dk = kn .

Thus, taking the primitive root of unity of order d times n to the power d times k, is the

same as taking the root of unity of order n to the power k. This is demonstrated by the

following simple calculation:

(dn )dk = ( e2dn i)dk = ( e

2n i )k = ( n )k .

This fact has the following simple consequence, crucial for our algorithm.

Lemma 0.1 (Halving Lemma) . If n > 0 is an even number, then the squares of the nth

root of unity are exactly the n/ 2 complex roots of unity of order n/ 2.

Proof. By the above cancelation property we have

(kn )2 = ( 2 n2 )

2k = kn2.

Thus, the total number of squares of roots of unity of order n is n/ 2. This fact is crucial

for our FFT algorithm.


6/15

6

The Discrete Fourier Transform

Let A = ( A0, A1, . . . , An ) be a sequence of n + 1 real or complex numbers. We can

then form the corresponding polynomial A(x) =n j =0 A j x

j

, and evaluate it at all complexroots of unity of order n + 1, i.e., we can evaluate A(kn +1 ) for all 0 k n. Thesequence of values (A(1), A(n +1 ), A(2n +1 ), . . . , A(nn +1 )), is called the Discrete Fourier

Transform (DFT) of the sequence A = ( A0, A1, . . . , An ).

To multiply two polynomials of degree (at most) n we will evaluate them at the roots

of unity of order 2n + 1, thus in effect taking the DFT of the (0 padded) sequence of

their coefficients (A0, A1, . . . , An , 0, . . . , 0

n

); we will then multiply the corresponding values

at these roots of unity, and then use the inverse transformation for DFT, namely IDFT, torecover the coefficients of the product polynomial from its values at these roots of unity:

A(x) = A0 + A1x + . . . + An1xn1 DF T = { A(1), A(2n +1 ), A(22n +1 ), . . . , A(2n2n +1 )}

B(x) = B0 + B1x + . . . + Bn1xn1 DF T = { B(1), B (2n +1 ), B (22n +1 ), . . . , B (2n2n +1 )}

multiplication

{A(1)B(1), A(2n +1 )B(2n +1 ), . . . , A(2n2n +1 )B(2n2n +1 )} IDFT

C (x) = A(x) B(x) =2n

j =0

j

i=0

AiB ji x j .

We now have to show that both DFT and the IDFT can be computed efficiently, rather

than in time O(n2) which the brute force polynomial multiplication would require. Thats

precisely what our FFT algorithm accomplishes.

FFT

The FFT is thus a fast algorithm which, given a polynomial (or, equivalently, the se-

quence of its coefficients) produces its values at all the roots of unity of the appropriate

order (i.e., the DFT of the sequence of its coefficients). To make our divide and conquer al-

gorithm runs smoothly, we will assume that we are evaluating a polynomial of degree 2 k at


7/15

7

2k roots of unity of order 2k . This adds inessential cost, because if the starting polynomial

is of degree n, there is a power of two smaller than 2 n (how would you nd it?). Thus, we

would pad the original polynomial with 0 coefficients for the leading powers, so that it be-

comes a polynomial of order 2k . We can now proceed with the divide-and-conquer method:

We break the original polynomial into two, separating even and odd degrees (recall weare assuming that n is of the form 2k):

A(x)

= ( A0 + A2x2 + A4x4 + . . . An2x2(n/ 21) ) + ( A1x + A3x3 + . . . An1xn1)

= ( A0 + A2(x2) + A4(x2)2 + . . . + An2(x2)n/ 21) + x(A1 + A3x2 + A5(x2)2 + . . . + An1(x

2)n/ 2

= A0(x2) + xA1(x2),

where the two polynomials

A0(y) = A0 + A2y + A4y2 + . . . An yn/ 21

and

A1(y) = A1 + A3y + A5y2 + . . . An1yn/ 21

have n/ 2 coefficients (they are both of degree n/ 2 1), and they have to be evaluated atall values (kn )2, because we got A(x) = A0(x2) + xA1(x2).

In order to use divide-and-conquer strategy, we have to reduce a problem of size n to

two problems of size n/ 2. But what is a problem of size n?

Evaluate a polynomial given by n coefficients at n input values.

Thus, a problem of size n/ 2 is:


8/15

8

Evaluate a polynomial given by n/ 2 coefficients at n/ 2 input values.

We have reduced evaluation of a polynomial given by n coefficients into two subproblems

of evaluating two polynomials given by n/ 2 coefficients, but for successful reduction we also

have to make sure that these two polynomials are evaluated for only n/ 2 values, and this

is where our Halving Lemma comes into the play: we need the values of A(x) = A0(x2) +

xA1(x2) for x j = kn , but this involves evaluating A0(x) and A1(x) only at x2 j = ( kn )2. As

we saw, by our Halving Lemma, there are only n/ 2 distinct squares of the roots kn . Thus,

we have succeeded in reducing our problem into two subproblems of size n/ 2. To combine

the solutions we need to form sums A(kn ) = A0((kn )2) + kn A1((kn )2), and this involves

n multiplications ( kn with A1((kn )2) and n subsequent additions. Thus to combine the

solutions we need O(n) operations, and we get the following recurrence:

T (n) = 2 T (n/ 2) + O(n).

By the Master theorem we get that T (n) = ( n log n).

We can make the above algorithm slightly faster by realizing that kn = nkn , so wecan halve the total number of multiplications by going through only 0n , 1n , . . . ,

n/ 2n , and

just use 0n ,

1n , . . . ,

n/ 2

n instead of n/ 2

n , n/ 2+1

n , . . . , n

1

n . Thus we get the followingpseudo code for our FFT algorithm:

F F T (A)

(1) n length[A](2) if n = 1

(3) return a

(4) A[0]

(A0, A2, . . . An

2)

(5) A[1] (A1, A3, . . . An1)(6) y[0] F F T (A[0])(7) y[1] F F T (A[1])(8) n e

2n ;


9/15

9

(9) 1(10) for k = 0 to k = n/ 2 1 do:(11) yk

y[0]k +

y[1]k

(12) yk+ n/ 2 y[0]k y

[1]k

(13) n(14) return y

Steps 11 and 12 form the buttery operation , often implemented in processors with a

separate hardware for speed, see the diagram below:

Inverse DFT. The above evaluation of a polynomial A(x) = A0 + A1x + . . . + An1xn1

at roots of unity kn of order n can be represented in the matrix form as follows:

(5)

1 1 1 . . . 1

1 n 2n . . . n1n1 2n 22n . . .

2(n1)n...

... ...

... ...

1 n1n 2(n1)n . . . (n1)( n1)n

A0

A1

A2...

An

=

A(1)

A(n )

A(2n )...

A(n1n )

.

Thus, if we have the values A(1) = A(0n ), A(n ), A(2n ), . . . , A(n1n ), we can get thecoefficients from

(6)

A0

A1

A2...

An

=

1 1 1 . . . 1

1 n 2n . . . n1n1 2n 22n . . .

2(n1)n...

... ...

... ...

1 n

1

n 2(n1)n . . .

(n1)( n1)n

1 A(1)A(n )

A(2n )...

A(n

1

n )

.

This is another place where something remarkable about the roots of unity is true: to

obtain the inverse of the above matrix, all we have to do is just change the signs of the

exponents:


10/15

10

(7)

1 1 1 . . . 1

1 n 2n . . . n1n1 2n 22n . . .

2(n1)n...

... ...

... ...

1 n1n 2(n1)n . . . (n1)( n1)n

1

= 1n

1 1 1 . . . 1

1 1n 2n . . . (n1)n

1 2n 22n . . . 2(n1)n

... ...

... ...

...

1 (n1)n 2(n1)n . . . (n1)( n1)n

To see this, note that if we evaluate the product

(8)1 1 1 . . . 1

1 n 2n . . . n1n1 2n 22n . . .

2(n1)n...

... ...

... ...

1 n1n 2(n1)n . . . (n1)( n1)n

1 1 1 . . . 1

1 1n 2n . . . (n1)n

1 2n 22n . . . 2(n1)n

... ...

... ...

...

1 (n1)n 2(n1)n . . . (n1)( n1)n

we get that the ( i, j ) entry in the product matrix is equal to

(9) (1 in 2in . . . i(n1)n )

1

jn2 jn

...

(n

1) j

n

=n1

k=0

ikn jkn =n1

k=0

(i j )kn

We now have two possibilities:

(1) i = j : then n1k=0 (i j )kn = n1k=0 0n = n1k=0 1 = n;


11/15

11

(2) i = j : then n1k=0 (i j )kn represents a geometric series with the ratio n , and thus,(10)

n1

k=0

(i j )kn = 1

(i j )nn

1 i

j

n=

1(nn )i j1

i

j

n=

111

i

j

n= 0

This proves our claim that (7) holds. Thus, (6) implies that

(11)

A0

A1

A2...

An

= 1n

1 1 1 . . . 1

1 1n 2n . . . (n1)n

1 2n 22n . . . 2(n1)n

... ...

... ...

...

1 (n1)n 2(n1)n . . . (n1)( n1)n

A(1)

A(n )

A(2n )...

A(n1n )

.

But this means that, in order to invert DFT, all we have to do is to apply our FFT

algorithm with 1n in place of n , and then just divide the result with n. Consequently,we can use the same algorithm and the same hardware for computing both the DFT and

the IDFT (the Inverse Discrete Fourier Transform) with a minor change mentioned above!

1. Interpretation of DFT

So far we have followed the textbook (CLRS); however, what Cormen at al. call DFT,

namely, the sequence ( A(0n ), A(1n ), A(2n ), . . . , A(n1n )) is usually considered the Inversetransform of the sequence of the coefficients (a0, a1, a2, . . . , a n1) of the polynomial A(x),while (A(0n ), A(1n ), A(2n ), . . . , A(

(n1)n )) is considered the forward operation i.e.,the DFT. Clearly, since 1n (DF T IDFT ) = I (I is the identity mapping), both choices areequally legitimate, but taking ( A(0n ), A(1n ), A(2n ), . . . , A(

(n1)n )) as the forward op-eration has an important conceptual advantage and is used more often than the textbooks

choice.

To explain this, recall that the scalar product (also called the dot product) of two vectors

with real coordinates, x = ( x0, x1, . . . , x n1) and y = ( y0, y1, . . . , yn1), x,y R n , denoted


12/15

12

by x,y (or x y ) is dened asx,y =

n1

i=0

xiyi .

If the coordinates of our vectors are complex numbers, i.e., if x,y C n , then the scalarproduct of such two wectors is dened as

x,y =n1

i=0

xiyi ,

where z denotes the complex conjugate of z , i.e., a + i b = a i b.Since e

2ikn = e2

ikn , we now see that equations (9) and (10) actually show that any

two distinct rows (or columns) of the matrix corresponding to DFT are orthogonal, or,

in other words, that for i = j vectors wi = (1 in , 2in , . . . i(n1)n ) and w j =(1 jn , 2 jn , . . .

j (n1)n ) are mutually orthogonal. Thus, the set { wi : 0 i n } is anorthogonal basis for the space C n . From the same equations it is also clear that the norm

wi 2 = wi , wi = n. Thus, if we set e i = 1 n wi , then ei 2 = ei , ei = 1n wi , wi = 1,which means that the set of vectors B = { ei : 0 i n } form an orthonormal base forthe vector space C n of complex sequences of length n.

If we accept that the forward operation involves negative powers of the roots of unity

n , it is easy to see what DFT of a sequence c = c0, c1, . . . , cn1 represents: it is just thesequence of the coordinates of the vector c , because for A(x) = c0 + c1x + . . . + cn1xn1we have

(12) A(kn ) =n1

i=0

ci(kn )i =n1

i=0

cikin = c, ek

Thus, A(kn ) is simply the projection of the vector c onto the basis vector ek . We can now

represent c in the base { ei : 0 i n }:

(13) c =n1

i=0

c, ei ei =n1

i=0

A(in ) ei

Sequence (c0, c1, c2, . . . , cn1) can be represented in the usual base

B = {(1, 0, 0, 0, . . . 0), (0, 1, 0, 0, . . . , 0), . . . , (0, 0, 0, . . . , 1)}


13/15

13

Figure 1

in the obvious way:

(c0, c1, c2, . . . , cn1) = c0(1, 0, 0, 0, . . . 0) + c1(0, 1, 0, 0, . . . , 0) + . . . + cn1(0, 0, 0, . . . , 1)

Thus, taking the Discrete Fourier Transform of a sequence ( c0, c1, . . . , cn1) amounts torepresenting such a sequence in a different basis, namely the basis B = { ei : 0 i n }.

Both sides of the equation (13) represent the same vector c ; the mth coordinate of the

left side is cm ; the mth coordinate of the right side is 1n n1k=0 A(kn )mkn ; thus, changingindex of summation from i to k we get

(14) cm = 1n

n1

k=0

A(kn )e2imk

n

Note that e2i m ( n k )

n = e2imn

n e2i m ( k )

n = e2i m e2i m ( k )

n = e2i m ( k )

n . Thus, if we assume for

simplicity that the sequence c is of odd length 2n + 1, then from the above equation,

(15) cm = 1

2n + 1n

k= nA(k2n +1 )e

2imk2n +1

Assume now that the elements cm of the sequence corresponding to c are samples of a

sound f (t), taken at equidistant (unit) intervals, i.e., cm = f (m); then (15) states that


14/15

14

(16) f (m) = 1

2n + 1

n

i=

n

A(k2n +1 )e2imk2n +1

i.e., that the equation

(17) f (t) = 1

2n + 1

n

i= nA(k2n +1 )e

2itk2n +1

holds at all integer points n , . . . , 1, 0, 1, . . . , n .The values A(k2n +1 ) provided by DFT are complex numbers so we can represent them

via their absolute value and their argument, i.e., A(k2n +1 ) = |A(k2n +1 )|earg( A(k2n +1 )) ; thus,

the signal has been, in a sense which can be made precise, represented as a sum of complex

exponentials, or, equivalently, as a sum of sine waves (cosines are shifted sines):

f (t) 1

2n + 1

n

i= n|A(kn )|ei arg( A(

kn )) e

2i kn t =

12n + 1

n

i= n|A(kn )|ei(

2 kn t+arg ( A(

kn )))

= 1

2n + 1

n

i= n|A(kn )| cos

2 kn

t + arg ( A(kn )) + i sin2 k

n t + arg ( A(kn ))

If the signal is real valued (rather than complex valued), then it is easy to see that A(kn ) =A(kn ) and that the imaginary parts of the above expression cancel out and thus we get that

the signal is represented as a sum of cosine waves, shifted for arg (A(kn )) with amplitudes

22n +1 |A(kn )|, so that at integers (0 , . . . , n 1) the values of the signal match the values of such sum of cosine waves (i.e., that f (t) is interpolated ) by such a sum. So, in a sense, DFT

roughly tells us what frequencies in the range 2n2n1 , 2 (n1)2n1

, . . . , 22n1 , 0, 22n1

, 2 (n1)2n1 , 2n2n1

are present in the signal and with what amplitudes and phase shifts! This (partly) explains

why DFT is so useful in signal processing: it gives an insight in approximate spectral con-

tent of the signal (what frequencies are present) and on top of it, DFT can very efficiently

computed via FFT! But this is where the story only begins; for example, one can show

that if you apply a lter to a signal, for example if you want to attenuate or emphasize

certain frequencies, then all you have to do is convolve the samples of the signal with a

sequence of xed coefficients corresponding to the lter. Just as in the case of polynomial

multiplication, to obtain such a convolution you can simply compute the FFT of the signal


15/15

15

and multiply it with the FFT of the lter coefficients and then take the inverse FFT! But

more about that from Dolby specialists next week!

FFTMultiplicationLER.pdf

Documents

Transcript of FFTMultiplicationLER.pdf