FFTMultiplicationLER.pdf

download FFTMultiplicationLER.pdf

of 15

Transcript of FFTMultiplicationLER.pdf

  • 8/14/2019 FFTMultiplicationLER.pdf

    1/15

    School of Computer Science and Engineering

    University of New South Wales

    COMP3121/3821/9101/9801

    A. Ignjatovic

    4/4/2013

    Polynomial Multiplication and The Fast Fourier Transform (FFT)

    We now continue elaborating on the methods from the previous lecture on fast multi-

    plication of large integers, focusing our attention on the problem of efficient multiplication

    of polynomials. Read pages 822-838 of the second edition of the textbook (CLRS) or 776-

    791 of the rst edition. Why are we doing the FFT??? Besides being a great example

    of divide-and-conquer design strategy, it is BY FAR the MOST EXECUTED algorithmtoday; it runs huge number of times each second in your mobile phone, your modem, your

    digital camera, your MP3 player ... It is arguably the MOST important algorithm today,

    without any serious competition!

    Multiplication of Polynomials

    Let A(x) =

    n j =0 A j x

    j , B (x) =

    n j =0 B j x

    j be two polynomials of degree n; (if one of

    the polynomials is of lower degree, we can pad it with leading zero coefficients). Let usset C (x) = A(x) B(x); then C (x) is of degree (at most) 2n. Thus, it can be written asC (x) = 2n j =0 c j x j , and if we set A i and B i to zero for i > n , we have

    (1) C (x) =2n

    j =0

    c j x j = A(x) B(x) =2n

    j =0

    j

    i=0

    AiB ji x j .

    Thus, we have to nd an efficient algorithm for nding the coefficients

    c j =

    j

    i=0AiB ji

    for j 2n, from Ai , Bi , i n . Prima facie, nding the coefficients of C (x) directly stillinvolves (n + 1) 2 multiplications, because all pairs of the form AiB j , 0 i, j n appearin the coefficients c j = ji=0 AiB ji . 1

  • 8/14/2019 FFTMultiplicationLER.pdf

    2/15

    2

    Let An . . . A0 and Bn . . . B0 be an arbitrary pair of number sequences; lets us pad them

    with zeros from the left to length 2 n + 1, i.e., let us set A i = 0 and B i = 0 for n < i 2n.Then the sequence

    j

    i=0

    AiB ji2n

    j =0

    is called the (linear) convolution of the sequences A and B , and is denoted A B:

    A B = {An Bn , . . . , A2B0 + A1B1 + A0B2, A1B0 + A0B1, A0B0}Thus, we need efficient algorithms for evaluating linear convolution of two sequences.

    Coefficient vs value representation of polynomials. Every polynomial A(x) of degree n is

    uniquely determined by its values at n + 1 distinct input values for x:

    A(x) {(x0, A(x0)), (x1, A(x1)), . . . , (xn , A(xn ))}If A(x) = An xn + An1x

    n1 + . . . + A0, we can write in matrix form:

    (2)

    1 x0 x20 . . . xn01 x1 x21 . . . xn1... ... ... ... ...

    1 xn x2n . . . xnn

    A0

    A1...

    An

    =

    A(x0)

    A(x1)

    ...

    A(xn )

    .

    The determinant of the above matrix is the Van Der Monde determinant, and if all xi

    are distinct, it can be shown that it is non-zero, because

    (3) det

    1 x0 x20 . . . xn01 x1 x21 . . . xn1... ... ... ... ...

    1 xn x2n . . . xnn

    =

    1 x0 x20 . . . xn01 x1 x21 . . . xn1... ... ... ... ...

    1 xn x2n . . . xnn

    = i= j (xi x j ) = 0 .

    Thus, if all xi are distinct, given any values A(x0), A(x1), . . . , A(xn ) the coefficients

    A0, A1, . . . , An are uniquely determined:

  • 8/14/2019 FFTMultiplicationLER.pdf

    3/15

    3

    (4)

    A0

    A1...

    An

    =

    1 x0 x20 . . . xn0

    1 x1 x21 . . . x

    n1...

    ... ...

    ... ...

    1 xn x2n . . . xnn

    1A(x0)

    A(x1)...

    A(xn )

    Why do we consider value representation of polynomials? Because polynomials in value

    representation are easy to multiply: If

    A(x) {(x0, A(x0)), (x1, A(x1)), . . . , (xn , A(xn ))}and

    B(x) {(x0, B (x0)), (x1, B (x1)), . . . , (xn , B (xn ))},then their product C (x) = A(x)B(x) can be value represented as

    C (x) {(x0, A(x0)B(x0)), (x1, A(x1)B(x1)), . . . , (xn , A(xn )B(xn ))},which involves only n multiplications of the form A(xi)B(xi). Thus, unlike the polynomials

    in coefficient form, polynomials in value form are easy to multiply in linear time. For this

    reason our strategy will be as follows:

    We will nd a fast algorithm for converting coefficient representation of

    polynomials into value representation, we will multiply the polynomials in

    their value form in linear time, and then we will nd a fast algorithm for

    converting value representation into the standard coefficient representation.

    However, if A(x) and B(x) are of degree n, the product polynomial C (x) = A(x)B(x)

    is of degree 2n, and to uniquely determine it, we need 2 n + 1 of its values:

    C (x) = A(x) B(x) {(x0, A(x0)B(x0)), (x1, A(x1)B(x1)), . . . , (x2n , A(x2n )B(x2n ))}Thus, we must overdetermine A(x) and B(x) by starting with 2 n + 1 values of these two

    polynomials:

  • 8/14/2019 FFTMultiplicationLER.pdf

    4/15

    4

    A(x) {(x0, A(x0)), (x1, A(x1)), . . . , (x2n , A(x2n ))}B(x) {(x0, B (x0)), (x1, B (x1)), . . . , (x2n , B (x2n ))}

    C (x) = A(x) B(x) {(x0, A(x0)B(x0)), (x1, A(x1)B(x1)), . . . , (x2n , A(x2n )B(x2n ))}We will then use these 2 n + 1 values of C (x) to nd the coefficients ci , 0 i 2n; this

    is called interpolation . Finding the values at certain set of points (knowing the coefficients

    of the polynomial) is called evaluation .

    Thus, to nd the coefficients of a polynomial of order 2n we need only nd its values at

    2n + 1 points. In case of large integer multiplication, instead of looking at values for large

    x like 2k , we choose small values of x, namely

    xi {n, (n 1), . . . , 0, . . . (n 1), n}.However, as we saw, this produced gigantic constants to be used in our algorithm, for

    example, multiplications with n2n , which rendered our algorithm useless in practice. Thus

    we need inputs for our polynomials whose all powers are of the same size, and to do that

    we must resort to complex numbers. Besides controlling the sizes of numbers involved,

    using complex roots of unity will provide another key feature which will make our divide-and-conquer algorithms fast (the cancelation lemma below).

    Complex numbers z = a + ib can be represented using their modulus |z | = a2 + b2 andtheir argument , dened as arg z = arctan ba , where the arctan function (pronounced arcus

    tangens) is dened so that it takes values in ( , ]:z = |z |ei arg z = |z |(cos arg z + i sinarg z ),

    see gure below.

    As you can recall, z n = |z |n ein arg z ; thus, if we take the primitive nth root of unity, i.e.,n = e

    2n i , since |n | = 1, we have |mn | = |n |m = 1, for all m. Note that kn = e

    2kn i; thus,

    all powers of n belong to the unit circle and are equally spaced, having arguments which

    are integer multiples of 2n .

  • 8/14/2019 FFTMultiplicationLER.pdf

    5/15

    5

    Besides remaining of constant size (modulus), roots of unity satisfy the following

    cancelation property:

    (dn )dk = kn .

    Thus, taking the primitive root of unity of order d times n to the power d times k, is the

    same as taking the root of unity of order n to the power k. This is demonstrated by the

    following simple calculation:

    (dn )dk = ( e2dn i)dk = ( e

    2n i )k = ( n )k .

    This fact has the following simple consequence, crucial for our algorithm.

    Lemma 0.1 (Halving Lemma) . If n > 0 is an even number, then the squares of the nth

    root of unity are exactly the n/ 2 complex roots of unity of order n/ 2.

    Proof. By the above cancelation property we have

    (kn )2 = ( 2 n2 )

    2k = kn2.

    Thus, the total number of squares of roots of unity of order n is n/ 2. This fact is crucial

    for our FFT algorithm.

  • 8/14/2019 FFTMultiplicationLER.pdf

    6/15

    6

    The Discrete Fourier Transform

    Let A = ( A0, A1, . . . , An ) be a sequence of n + 1 real or complex numbers. We can

    then form the corresponding polynomial A(x) =n j =0 A j x

    j

    , and evaluate it at all complexroots of unity of order n + 1, i.e., we can evaluate A(kn +1 ) for all 0 k n. Thesequence of values (A(1), A(n +1 ), A(2n +1 ), . . . , A(nn +1 )), is called the Discrete Fourier

    Transform (DFT) of the sequence A = ( A0, A1, . . . , An ).

    To multiply two polynomials of degree (at most) n we will evaluate them at the roots

    of unity of order 2n + 1, thus in effect taking the DFT of the (0 padded) sequence of

    their coefficients (A0, A1, . . . , An , 0, . . . , 0

    n

    ); we will then multiply the corresponding values

    at these roots of unity, and then use the inverse transformation for DFT, namely IDFT, torecover the coefficients of the product polynomial from its values at these roots of unity:

    A(x) = A0 + A1x + . . . + An1xn1 DF T = { A(1), A(2n +1 ), A(22n +1 ), . . . , A(2n2n +1 )}

    B(x) = B0 + B1x + . . . + Bn1xn1 DF T = { B(1), B (2n +1 ), B (22n +1 ), . . . , B (2n2n +1 )}

    multiplication

    {A(1)B(1), A(2n +1 )B(2n +1 ), . . . , A(2n2n +1 )B(2n2n +1 )} IDFT

    C (x) = A(x) B(x) =2n

    j =0

    j

    i=0

    AiB ji x j .

    We now have to show that both DFT and the IDFT can be computed efficiently, rather

    than in time O(n2) which the brute force polynomial multiplication would require. Thats

    precisely what our FFT algorithm accomplishes.

    FFT

    The FFT is thus a fast algorithm which, given a polynomial (or, equivalently, the se-

    quence of its coefficients) produces its values at all the roots of unity of the appropriate

    order (i.e., the DFT of the sequence of its coefficients). To make our divide and conquer al-

    gorithm runs smoothly, we will assume that we are evaluating a polynomial of degree 2 k at

  • 8/14/2019 FFTMultiplicationLER.pdf

    7/15

    7

    2k roots of unity of order 2k . This adds inessential cost, because if the starting polynomial

    is of degree n, there is a power of two smaller than 2 n (how would you nd it?). Thus, we

    would pad the original polynomial with 0 coefficients for the leading powers, so that it be-

    comes a polynomial of order 2k . We can now proceed with the divide-and-conquer method:

    We break the original polynomial into two, separating even and odd degrees (recall weare assuming that n is of the form 2k):

    A(x)

    = ( A0 + A2x2 + A4x4 + . . . An2x2(n/ 21) ) + ( A1x + A3x3 + . . . An1xn1)

    = ( A0 + A2(x2) + A4(x2)2 + . . . + An2(x2)n/ 21) + x(A1 + A3x2 + A5(x2)2 + . . . + An1(x

    2)n/ 2

    = A0(x2) + xA1(x2),

    where the two polynomials

    A0(y) = A0 + A2y + A4y2 + . . . An yn/ 21

    and

    A1(y) = A1 + A3y + A5y2 + . . . An1yn/ 21

    have n/ 2 coefficients (they are both of degree n/ 2 1), and they have to be evaluated atall values (kn )2, because we got A(x) = A0(x2) + xA1(x2).

    In order to use divide-and-conquer strategy, we have to reduce a problem of size n to

    two problems of size n/ 2. But what is a problem of size n?

    Evaluate a polynomial given by n coefficients at n input values.

    Thus, a problem of size n/ 2 is:

  • 8/14/2019 FFTMultiplicationLER.pdf

    8/15

    8

    Evaluate a polynomial given by n/ 2 coefficients at n/ 2 input values.

    We have reduced evaluation of a polynomial given by n coefficients into two subproblems

    of evaluating two polynomials given by n/ 2 coefficients, but for successful reduction we also

    have to make sure that these two polynomials are evaluated for only n/ 2 values, and this

    is where our Halving Lemma comes into the play: we need the values of A(x) = A0(x2) +

    xA1(x2) for x j = kn , but this involves evaluating A0(x) and A1(x) only at x2 j = ( kn )2. As

    we saw, by our Halving Lemma, there are only n/ 2 distinct squares of the roots kn . Thus,

    we have succeeded in reducing our problem into two subproblems of size n/ 2. To combine

    the solutions we need to form sums A(kn ) = A0((kn )2) + kn A1((kn )2), and this involves

    n multiplications ( kn with A1((kn )2) and n subsequent additions. Thus to combine the

    solutions we need O(n) operations, and we get the following recurrence:

    T (n) = 2 T (n/ 2) + O(n).

    By the Master theorem we get that T (n) = ( n log n).

    We can make the above algorithm slightly faster by realizing that kn = nkn , so wecan halve the total number of multiplications by going through only 0n , 1n , . . . ,

    n/ 2n , and

    just use 0n ,

    1n , . . . ,

    n/ 2

    n instead of n/ 2

    n , n/ 2+1

    n , . . . , n

    1

    n . Thus we get the followingpseudo code for our FFT algorithm:

    F F T (A)

    (1) n length[A](2) if n = 1

    (3) return a

    (4) A[0]

    (A0, A2, . . . An

    2)

    (5) A[1] (A1, A3, . . . An1)(6) y[0] F F T (A[0])(7) y[1] F F T (A[1])(8) n e

    2n ;

  • 8/14/2019 FFTMultiplicationLER.pdf

    9/15

    9

    (9) 1(10) for k = 0 to k = n/ 2 1 do:(11) yk

    y[0]k +

    y[1]k

    (12) yk+ n/ 2 y[0]k y

    [1]k

    (13) n(14) return y

    Steps 11 and 12 form the buttery operation , often implemented in processors with a

    separate hardware for speed, see the diagram below:

    Inverse DFT. The above evaluation of a polynomial A(x) = A0 + A1x + . . . + An1xn1

    at roots of unity kn of order n can be represented in the matrix form as follows:

    (5)

    1 1 1 . . . 1

    1 n 2n . . . n1n1 2n 22n . . .

    2(n1)n...

    ... ...

    ... ...

    1 n1n 2(n1)n . . . (n1)( n1)n

    A0

    A1

    A2...

    An

    =

    A(1)

    A(n )

    A(2n )...

    A(n1n )

    .

    Thus, if we have the values A(1) = A(0n ), A(n ), A(2n ), . . . , A(n1n ), we can get thecoefficients from

    (6)

    A0

    A1

    A2...

    An

    =

    1 1 1 . . . 1

    1 n 2n . . . n1n1 2n 22n . . .

    2(n1)n...

    ... ...

    ... ...

    1 n

    1

    n 2(n1)n . . .

    (n1)( n1)n

    1 A(1)A(n )

    A(2n )...

    A(n

    1

    n )

    .

    This is another place where something remarkable about the roots of unity is true: to

    obtain the inverse of the above matrix, all we have to do is just change the signs of the

    exponents:

  • 8/14/2019 FFTMultiplicationLER.pdf

    10/15

    10

    (7)

    1 1 1 . . . 1

    1 n 2n . . . n1n1 2n 22n . . .

    2(n1)n...

    ... ...

    ... ...

    1 n1n 2(n1)n . . . (n1)( n1)n

    1

    = 1n

    1 1 1 . . . 1

    1 1n 2n . . . (n1)n

    1 2n 22n . . . 2(n1)n

    ... ...

    ... ...

    ...

    1 (n1)n 2(n1)n . . . (n1)( n1)n

    To see this, note that if we evaluate the product

    (8)1 1 1 . . . 1

    1 n 2n . . . n1n1 2n 22n . . .

    2(n1)n...

    ... ...

    ... ...

    1 n1n 2(n1)n . . . (n1)( n1)n

    1 1 1 . . . 1

    1 1n 2n . . . (n1)n

    1 2n 22n . . . 2(n1)n

    ... ...

    ... ...

    ...

    1 (n1)n 2(n1)n . . . (n1)( n1)n

    we get that the ( i, j ) entry in the product matrix is equal to

    (9) (1 in 2in . . . i(n1)n )

    1

    jn2 jn

    ...

    (n

    1) j

    n

    =n1

    k=0

    ikn jkn =n1

    k=0

    (i j )kn

    We now have two possibilities:

    (1) i = j : then n1k=0 (i j )kn = n1k=0 0n = n1k=0 1 = n;

  • 8/14/2019 FFTMultiplicationLER.pdf

    11/15

    11

    (2) i = j : then n1k=0 (i j )kn represents a geometric series with the ratio n , and thus,(10)

    n1

    k=0

    (i j )kn = 1

    (i j )nn

    1 i

    j

    n=

    1(nn )i j1

    i

    j

    n=

    111

    i

    j

    n= 0

    This proves our claim that (7) holds. Thus, (6) implies that

    (11)

    A0

    A1

    A2...

    An

    = 1n

    1 1 1 . . . 1

    1 1n 2n . . . (n1)n

    1 2n 22n . . . 2(n1)n

    ... ...

    ... ...

    ...

    1 (n1)n 2(n1)n . . . (n1)( n1)n

    A(1)

    A(n )

    A(2n )...

    A(n1n )

    .

    But this means that, in order to invert DFT, all we have to do is to apply our FFT

    algorithm with 1n in place of n , and then just divide the result with n. Consequently,we can use the same algorithm and the same hardware for computing both the DFT and

    the IDFT (the Inverse Discrete Fourier Transform) with a minor change mentioned above!

    1. Interpretation of DFT

    So far we have followed the textbook (CLRS); however, what Cormen at al. call DFT,

    namely, the sequence ( A(0n ), A(1n ), A(2n ), . . . , A(n1n )) is usually considered the Inversetransform of the sequence of the coefficients (a0, a1, a2, . . . , a n1) of the polynomial A(x),while (A(0n ), A(1n ), A(2n ), . . . , A(

    (n1)n )) is considered the forward operation i.e.,the DFT. Clearly, since 1n (DF T IDFT ) = I (I is the identity mapping), both choices areequally legitimate, but taking ( A(0n ), A(1n ), A(2n ), . . . , A(

    (n1)n )) as the forward op-eration has an important conceptual advantage and is used more often than the textbooks

    choice.

    To explain this, recall that the scalar product (also called the dot product) of two vectors

    with real coordinates, x = ( x0, x1, . . . , x n1) and y = ( y0, y1, . . . , yn1), x,y R n , denoted

  • 8/14/2019 FFTMultiplicationLER.pdf

    12/15

    12

    by x,y (or x y ) is dened asx,y =

    n1

    i=0

    xiyi .

    If the coordinates of our vectors are complex numbers, i.e., if x,y C n , then the scalarproduct of such two wectors is dened as

    x,y =n1

    i=0

    xiyi ,

    where z denotes the complex conjugate of z , i.e., a + i b = a i b.Since e

    2ikn = e2

    ikn , we now see that equations (9) and (10) actually show that any

    two distinct rows (or columns) of the matrix corresponding to DFT are orthogonal, or,

    in other words, that for i = j vectors wi = (1 in , 2in , . . . i(n1)n ) and w j =(1 jn , 2 jn , . . .

    j (n1)n ) are mutually orthogonal. Thus, the set { wi : 0 i n } is anorthogonal basis for the space C n . From the same equations it is also clear that the norm

    wi 2 = wi , wi = n. Thus, if we set e i = 1 n wi , then ei 2 = ei , ei = 1n wi , wi = 1,which means that the set of vectors B = { ei : 0 i n } form an orthonormal base forthe vector space C n of complex sequences of length n.

    If we accept that the forward operation involves negative powers of the roots of unity

    n , it is easy to see what DFT of a sequence c = c0, c1, . . . , cn1 represents: it is just thesequence of the coordinates of the vector c , because for A(x) = c0 + c1x + . . . + cn1xn1we have

    (12) A(kn ) =n1

    i=0

    ci(kn )i =n1

    i=0

    cikin = c, ek

    Thus, A(kn ) is simply the projection of the vector c onto the basis vector ek . We can now

    represent c in the base { ei : 0 i n }:

    (13) c =n1

    i=0

    c, ei ei =n1

    i=0

    A(in ) ei

    Sequence (c0, c1, c2, . . . , cn1) can be represented in the usual base

    B = {(1, 0, 0, 0, . . . 0), (0, 1, 0, 0, . . . , 0), . . . , (0, 0, 0, . . . , 1)}

  • 8/14/2019 FFTMultiplicationLER.pdf

    13/15

    13

    Figure 1

    in the obvious way:

    (c0, c1, c2, . . . , cn1) = c0(1, 0, 0, 0, . . . 0) + c1(0, 1, 0, 0, . . . , 0) + . . . + cn1(0, 0, 0, . . . , 1)

    Thus, taking the Discrete Fourier Transform of a sequence ( c0, c1, . . . , cn1) amounts torepresenting such a sequence in a different basis, namely the basis B = { ei : 0 i n }.

    Both sides of the equation (13) represent the same vector c ; the mth coordinate of the

    left side is cm ; the mth coordinate of the right side is 1n n1k=0 A(kn )mkn ; thus, changingindex of summation from i to k we get

    (14) cm = 1n

    n1

    k=0

    A(kn )e2imk

    n

    Note that e2i m ( n k )

    n = e2imn

    n e2i m ( k )

    n = e2i m e2i m ( k )

    n = e2i m ( k )

    n . Thus, if we assume for

    simplicity that the sequence c is of odd length 2n + 1, then from the above equation,

    (15) cm = 1

    2n + 1n

    k= nA(k2n +1 )e

    2imk2n +1

    Assume now that the elements cm of the sequence corresponding to c are samples of a

    sound f (t), taken at equidistant (unit) intervals, i.e., cm = f (m); then (15) states that

  • 8/14/2019 FFTMultiplicationLER.pdf

    14/15

    14

    (16) f (m) = 1

    2n + 1

    n

    i=

    n

    A(k2n +1 )e2imk2n +1

    i.e., that the equation

    (17) f (t) = 1

    2n + 1

    n

    i= nA(k2n +1 )e

    2itk2n +1

    holds at all integer points n , . . . , 1, 0, 1, . . . , n .The values A(k2n +1 ) provided by DFT are complex numbers so we can represent them

    via their absolute value and their argument, i.e., A(k2n +1 ) = |A(k2n +1 )|earg( A(k2n +1 )) ; thus,

    the signal has been, in a sense which can be made precise, represented as a sum of complex

    exponentials, or, equivalently, as a sum of sine waves (cosines are shifted sines):

    f (t) 1

    2n + 1

    n

    i= n|A(kn )|ei arg( A(

    kn )) e

    2i kn t =

    12n + 1

    n

    i= n|A(kn )|ei(

    2 kn t+arg ( A(

    kn )))

    = 1

    2n + 1

    n

    i= n|A(kn )| cos

    2 kn

    t + arg ( A(kn )) + i sin2 k

    n t + arg ( A(kn ))

    If the signal is real valued (rather than complex valued), then it is easy to see that A(kn ) =A(kn ) and that the imaginary parts of the above expression cancel out and thus we get that

    the signal is represented as a sum of cosine waves, shifted for arg (A(kn )) with amplitudes

    22n +1 |A(kn )|, so that at integers (0 , . . . , n 1) the values of the signal match the values of such sum of cosine waves (i.e., that f (t) is interpolated ) by such a sum. So, in a sense, DFT

    roughly tells us what frequencies in the range 2n2n1 , 2 (n1)2n1

    , . . . , 22n1 , 0, 22n1

    , 2 (n1)2n1 , 2n2n1

    are present in the signal and with what amplitudes and phase shifts! This (partly) explains

    why DFT is so useful in signal processing: it gives an insight in approximate spectral con-

    tent of the signal (what frequencies are present) and on top of it, DFT can very efficiently

    computed via FFT! But this is where the story only begins; for example, one can show

    that if you apply a lter to a signal, for example if you want to attenuate or emphasize

    certain frequencies, then all you have to do is convolve the samples of the signal with a

    sequence of xed coefficients corresponding to the lter. Just as in the case of polynomial

    multiplication, to obtain such a convolution you can simply compute the FFT of the signal

  • 8/14/2019 FFTMultiplicationLER.pdf

    15/15

    15

    and multiply it with the FFT of the lter coefficients and then take the inverse FFT! But

    more about that from Dolby specialists next week!