1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering

download 1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering

of 6

Transcript of 1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering

  • 8/20/2019 1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering

    1/11

    IEEE

    TRANSACTIONSONACOUSTICS, PEECH,AND IGNALPROCESSING, VOL. ASSP-22,NO.

    2,

    APRIL

    1974

    87

    Fast Convolution Using FermatNumber Transforms with

    Applications to Digital Filtering

    Absfracf-The tructure

    of

    transformshaving the convolution

    property is developed.

    A

    particular transform is proposed that is de-

    fined on a finite ring of integers with arithmetic carried out modulo

    Fermat numbers. This Fermat number transform (FNT) is ideally

    suited todigital computation, requiring n the order of

    N

    ogN addi-

    tions, subtractions and bit shifts, but

    no

    multiplications. In addition

    to.being efficient, the Fermat number transform implementation of

    convolution

    is

    exact, i.e., there is no roundoff error. There is a re-

    strictionon sequence ength imposedbyword length butmulti-

    dimensional techniques are discussed which overcome this limita-

    tion. Results of an implementation n he

    IBM 370/155

    are presented

    andcomparedwith the ast Fourier ransform FFT)showing a

    substantial improvement in efficiency and accuracy.

    I. INTRODUCTION

    T””

    onvolution proper ty of certain ransformscan

    be used to efficiently compute the cyclic convolution

    of two long discrete sequences. One such transform is the

    discrete Fourier ransform (D FT ) with the fast Fourier

    transform (FFT) implementation

    [l]

    which operates on

    sequences in the complex number field. To compute the

    DFT of

    a

    sequence of length

    iV

    requires on the order of

    ( N / Z ) logz ( N / 2 ) complex multiplications, which often

    take most of the computation time. In addition, the

    FFT

    implementation of cyclic convolution ntroduces signifi-

    cantamounts of roundoff

    error [ Z ] ,

    thusdeteriorating

    the signal-to-noise ratio. When working with digital ma-

    chines, the data are available only with some finite pre-

    cision and, therefore, without loss of generality, the data

    can be considered to be integers with some upper bound.

    In this digital domain, th e complex number field of t he

    continuous domain can be replaced with a finite field

    or,

    more generally, a finite ring of integers with addition and

    multiplicationmoduloomenteger F . In this ring,

    using transforms similar to the DFT, the cyclic canvolu-

    tion can be performed very efficiently and without any

    roundoff error.Onesuch ransformpresentedhere sa

    number theoretic transform which operates in the ring of

    integers modulo a Fermat number. Computation of this

    transform

    of

    length

    N ,

    analogous to an

    FFI’,

    requires on

    the order

    of N

    log2 AT additions, bit shifts, and subtrac-

    tions, but no multiplication. This is considerably simpler

    than the FFT.

    Knuth [SI has proposed the use of transforms in finite

    fields. Pollard [4] discussed transforms having t he cyclic

    This work was supported by NSF Grant GK-23697.

    Rice University, Houston, Tex. 77001.

    ManuscriptreceivedJuly16, 1973; revisedNovember19,1973.

    The authors are with the Department of Electrical Engineering,

    convolution property n a finite field and also gives th e

    conditions for having transforms with the cyclic convolu-

    tion property in a finite ring of integers. Good

    [5]

    also

    mentioned the use of transforms in a finiteing of integers.

    Schonhage and Strassen

    [SI

    defined transformshaving

    th e cyclic convolution property modulo a Fermat num-

    ber and discussed their application to fast multiplication

    of very large integers. Knuth

    [7]

    elaborated on the work

    of Schonhage and Strassen. Nicholson [SI presented an

    algebraic theory of finite Fourier transfornls in any ring

    and established fastFFT-typealgorithms ocompute

    these transforms. Rader

    [SI,

    [lo] proposed finite trans-

    forms n ings of integersmodulo both Mersenne and

    Fermatnumbers. He firstproposed the application to

    digital signal processing, showed th at th e ransforms could

    be calculated using only additions nd bit shifting, howed

    the ,word-length constraint, nd suggested two-dimensional

    transformsas a possible relaxation of thatconstraint.

    Agamal and Burrus [l l] defined Fermat transforms and

    also proposed their applicationfor fast digital convolution.

    The purpose of this paper s t o develop the general

    structure of transforms with the convolution property and

    then present the number heoretic ransformmoduloa

    Fernrat number as a practical scheme for use with digital

    computation. The properties are hen discussed for the

    implementation of digital filters and the results of a soft-

    ware implementation on an

    I R M 370/155

    are given.

    It,

    s

    believed th at thepresentation here is different rom previ-

    ous works and gives more insight into these transforms.

    One result

    of

    this approach allows the maximum length

    of sequences th at can be convolved to be increased by a

    factor of

    2

    over what hadpreviously been reported n [IO].

    The choice of terminology in a new area is always diffi-

    cult. In this paper the generalname,number-theoretic

    transform, is usedor anyransform using umber-

    theoreticconcepts n tsdefinition. The twoparticular

    transforms with arithmetic modulo Mersenne and Fermat

    numbers are called klersennenumber transforms

    MNT’s)

    and Fermat number transforms (FNT’s) . The particular

    MNT’s and FNT’s with a basis function

    a = 2

    are called

    Rader transforms (RT’s)

    These ransforms are ruly digital ransforms, aking

    into account the quantization in amplitude and the finite

    precision of digital signals. They bear the same relat ion

    to digital ignal as heDFT does to discrete-time or

    sampled da ta signals and the Fourier

    or

    Laplace trans-

    forms do to continuous-time signals. In th e same manner

    th at th e relation of discrete-time signals to continuous-

  • 8/20/2019 1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering

    2/11

    88

    IEEE

    TRANSACTIONS

    O N ACOUSTICS, SPEECH, AND SIGNAL

    PROCESSING, APRIL

    1

    time signals through sampling involves a possible folding

    or

    aliasing in the frequency domain, the relation of calcu-

    lationswith the DFT to calculations with he number

    theoretic ransforms involves a possible folding of the

    amplitude that must be taken into account.

    In Section 11, the structures of t'ransforms having the

    cyclic and he noneyclic convolutionpropertiesare de-

    veloped.

    It

    is shown Ohat what we call the DFT str ucture

    is t,he only structure which supports the cyclic convolu-

    tion property and any transformaving the

    DFT

    structure

    will have the cyclic convolution property. In Section

    111,

    the necessary and sufficient conditions for the existmce

    of transforms in a ring of integers modulo of an integer F

    are establishcd. In Section IV,

    FNT's

    are defined which,

    computationally,appear to be t.he best ransforms for

    implementing convolution. In Section V, reference is given

    to

    a

    two-dimensional transform implementat'ion for long

    one-dimensional convolution. In Section VI, a hardware

    implementation to compute the FNT is suggested. Com-

    parison of the

    FNT

    with the

    FFT

    implementation of the

    I>FT, o implement convolution, is made in Sect,ion VII.

    In Section VIII,

    a

    software implementation of the FNT

    on an IBM

    370/155

    is prescnt,ed and timing comparisons

    are made with the PFT. For cyclic convolution engths

    up to 236, FNT implementations are faster by a factor

    of 3 to 5 over the best FFT implementations which make

    use of the symmet,ry

    of

    the

    DFT

    for real data.

    For

    com-

    puterswith slower multiplication, theadvantages look

    even bet'tcr. lGnally, several generalizat,ions and applica-

    tions are presented.

    11. THE STRUC TURE

    O F

    TRASSFOILMS HAVING

    THE CONVOLUTION PROPERTY

    In this section the structure of transforms having the

    cyclic convolution property, .e., the transform of cyclic

    convolution of two sequences is equal to th e product of

    their ransforms, is establishcd. The class of transforms

    considered

    is the set of lincar nonsingular (invertible)

    t,ransforms which map a sequence of length A r to another

    sequencc

    of

    length

    N .

    It is shown that what lvc call t,he

    D I T structure is tho only structure which supports the

    cyclic convolut,ion propert,y and, moreover, any transform

    having the DPT str uct urc ill have the cyclic convolution

    property.

    Let xn and h,, n = O , l , .

    .

    lV

    1,

    be wo sequences

    which are to be cyclically convolved as given by

    N- 1

    y n = xn*hn= 2 kh n -k n

    =

    O , l , . .

    -

    N

    1. (1)

    k=O

    This definition assumes the sequences are periodically

    extendedwith period N or the indiccs arc valuated

    modulo N. Here we will describe sequences as vectors of

    length

    N.

    Since only linear invertible t,ransforms are con-

    sidered, they can be represented by anN

    X N

    nonsingular

    matrix

    T

    whose clenwnts arc t k , m , k,wz = O , l , -

    -

    - ,N

    1.

    Let capital letters denote the transformed sequences, i.e.,

    X =

    T X

    H

    =

    Th

    Y = Tg.

    Consider the conditions on

    t k , m ' s

    so that

    Y = X @ H

    where @ denotes term by term multiplicat'ion of

    vectors.

    Equations ( 2 ) and (1) are combined to give

    N- l N-1 N-1

    y k

    = t k , n y n = t k , n

    xmhn-m

    n=O

    n=O

    m=O

    x 1

    N-

    1

    =

    xmhmtk,m+l

    m=O Z=O

    N- 1

    x k = tk ,mxnz

    m=O

    A - 1

    HA

    =

    tk , zhl

    I

    =

    O , l , .

    .,.I:

    1.

    z=o

    The individual terms in ( 3 ) can be rewritten as

    Y k

    = X k H k

    which, using (4 ), becomes

    N-

    1 A- 1 A- 1 A'-1

    &hltk,m+z xm:mhztk,nztk,z.

    m=o z=o

    m O

    z=o

    Matching the multiples of xmhl on both sides of (5 ) g

    tk++l

    =

    tk ,&,z

    k Z m

    = O , l , - .

    .,X

    1.

    Repeated applications of

    (6)

    gives

    t k V m =

    t k , P

    l c , m =

    O , l ,

    *

    - .,w 1.

    And, since the convolution is cyclic, the illdices in

    are added nlodulo N . This gives

    t k , m A r = 1 16,772 =

    0,1,

    * . iv 1.

    Therefore, t.he

    tic,m's

    are JVth roots of unity. Fo r T to

    nonsingular, all the t k , 1 7 ~houldbedistinct and, si

    there are only N distinct Nt h roots of unity, t'he t

    must be these i\; distinct roots. Without loss of general

    t l .1 can

    be

    taken as a root

    of

    order N , i.e., N is t,he le

    positive integersuch that tl,l = 1. Therefore,all

    t

    can be written

    as

    some powers

    of

    t l , 1 .

    Again without

    of generality, the rows

    of T

    can be arrangod so that

    tk.1 t l , l k .

    Combining

    (7)

    and

    (9)

    and denoting t l .1 by

    a

    gives

    following

    t k , m akm

    k , 7 ? % = 0,1,'

    * ',Av

    1. (

    The structure thus established causes the transform

    beorthogonal. The elements Zk ,m of are given by

    Zk,m

    = N-la--km k,m

    =

    O , l , . .,N 1. (

    To prove this, consider

  • 8/20/2019 1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering

    3/11

    AGARWAL

    AND

    BURRUS:

    FAST

    C O N V O L U T I O N APPLICATIONS TO DIGITAL

    FILTERING

    89

    TT-l

    =

    I

    or

    N-1

    N 1 Ykn Y-ln = 6

    k.2

    (12)

    7L=O

    where

    6 k , Z

    is the Kroneckerdeltafunction.Taking

    I

    1 =

    p , (12 ) can be rewritten as

    N-

    1

    1 if p

    =

    0 (mod W

    Lv-l =

    (13)

    n=O

    0 otherwise.

    To prove the orthogonality, we have Do prove (13).

    If

    p =

    O(mod

    W),

    then

    (YP

    = 1 and he first part is thus

    established.

    If p

    O(mod

    A )

    ,

    ( YP

    1 or (a" 1) 0.

    Multiplying (13) by

    ( ( Y P

    l ) ,we obtain

    N- 1

    p 1 Y ~

    1)

    ( Y ~ n

    l p l

    Y ~ ~

    1) =

    0.

    n=O

    Thus (13) is established.

    An examination of the preceding development reveals

    th at he existence of an N X N transformhaving the

    cyclic convolution property depends only on the existence

    of an Y that is a

    root.

    of

    unity of order N , and the xistence

    of

    1V-l.

    The structures of t'he transform matrix T and its

    inverse are given by (10) and

    (11)

    respectively, and

    theseare he only structures which support he cyclic

    convolutionproperty.This structure is called t'he DFT

    structure . Because of the DFT structure, any transform

    having the cyclic convolutionproperty also has a fast

    computational algorithm simiiar to theFFT if is highly

    composite Kg]. In t'he complex number field th e DFT,

    with (Y =

    e--jZriN,

    is the only transform having the cyclic

    convolution property. But, in a finite field

    or,

    more gen-

    erally, a ringwe might also have transforms with theyclic

    convolution property if there exists a root of un ity of

    order N and if X- exists. In thenext section

    we

    establish

    results or the special case of

    a

    finite ing of integers

    modulo of an integer, which seem to be most appropriate

    for practical digital computation. Appendix A lists some

    of th e properties of these transforms.

    Thus far wi: have discussed only the transforms having

    the cyclic convolutionproperty. Noncyclic convolution

    can be implemented using cyclic convolution by padding

    the sequences with

    zeros

    [l].

    If

    only noncyclic convolu-

    tion is desired, a more general class of transforms is pos-

    sible. Consider two sequences of lengths L and M , respec-

    t'ively, t'hat are noncyclically convolved giving an output

    sequence of length

    N

    =

    L + M

    1. First

    we

    pad the

    sequences with zeros to make hem of length N each.

    For

    the X N transform T , the desired restrictions on

    t k , m 1 8

    are still given by

    (6),

    but now the indices are not

    added modulo X ; his modifies

    ( 6 )

    to be

    t k ,m + l = t k ,m t k , z k = 0,1," ' , N 1

    112 = O , l , . . . , M

    1

    I

    =

    0,1,.*.,L

    1.

    (14)

    Repeated applications of (14) gives

    t k , m

    =

    k k , l m lc,m = 0,1,--.,N 1.

    (15)

    This s the same as (7 ) , but

    now we

    donothave

    (8)

    which implies cyclic convolution. Because of the absence

    of 8 ) , k 1 7 ~re not restricted t o Nt h roots of uni ty. The

    only restriction is the nonsignularityconstraintwhich

    requiresall

    tk,1 8

    to be distinct.

    A

    similarprocedure s

    discussed byKnuth [3] in connection with hemulti-

    plication of large ntegers which is almost analogous to

    noncyclic convolution. When the

    t k , l ' s

    are not restricted

    to Nth roots of uni ty, ,in general there is no 1WT-type

    algorithm to compute the transform

    or

    the inverse trans-

    form. As a matt'er of fact, t,he inverse transform docs not

    have a simple structure. Because of these limitations

    we

    restrict ourselves to transforms having the cyclic convolu-

    tion property.

    111. TRANSFORMS IN TH E RIN G

    011

    INTEGERS

    MODULO A N

    INTEGER

    In anypractical situation,

    or

    when working with digital

    machines, thedataare available only with some finite

    precision, and therefore,without loss

    of

    generalit,y, the

    da ta can be considered to be integers wit'h some upper

    bound. Tocompute corivolution in his digitaldomain,

    operations in the complex number field of the continuous

    domain can be imitated with a finite field or, rnore gen-

    erally, a finitx ring of integers under additions and multi-

    plications modulo some integer F ; and an int'eger (Y of

    order

    M

    replaces

    e--j2T/i\i

    used in a DIW. In thi s ing, when

    two nteger sequences

    z ( n )

    and h ( h ) are convolved, the

    output integer sequence

    y

    ( n )

    s congruent to the convolu-

    tion of

    x n)

    nd

    h ( n )

    modulo F . In the ring of integers

    modulo F , conventional ntcgcrscanbeunambiguously

    represent'ed if their absolute value is loss than

    F / 2 .

    If the

    input integer sequences z ( n ) and h ( n ) are so scaled that

    y ( n )

    I never exceeds

    F / 2 ,

    we would get the same results

    by implementing convolut'ion in he ring of integers

    modulo

    F

    as tha t obtained with normal arit.hmetic. This

    is similar to th e overflow constraint in fixed-point digital

    machines.

    In most digit,al filt'ering applications, h ( n ) represents

    the impulse response and is known a priori; also the max-

    imummagnitude of the nput signal isusually known.

    I n this situation, we can bound the peak output magni-

    tude by,

    Ai- 1

    1 y ( n )

    I 5 I z ( n )

    I m x

    c h ( n )

    n O

    When

    we

    relate ontinuous-time signals to discrete-

    time signals through ampling, we obtain an aliasing

    effect n he frequencydomain;all the frequencies are

    obtainedmodulo the samplingrcquency. A similar

    effec t occurs when the amplitude is discrctizod and con-

    volution is performed in the integer domain modulo an

    integer F . We obtain aliasing in amplitude which can be

    avoided

    as

    noted before.

  • 8/20/2019 1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering

    4/11

    90 IEEE TRANSACTIONS

    O N

    ACOUSTICS,

    SPEECH,NDIGNALROCESSING,PRIL 1

    In order to have a computational advantage in imple-

    menting cyclic convolution, wewill derive a transform

    having the cyclic convolutionproperty in this ring. In

    addition, we would like this ransfornl ohave a fast

    cornput'ationalalgorithm.First, wewill present esults

    about the existence of such transforms which mere shown

    to depend on the exist.enceof

    an

    a

    of order

    N

    and existence

    N - l in the ring. The number theory necessary to under-

    stand the derivations is very basic and can be found in

    any elementary book o n number theory [14].

    Let

    F

    = p l r l p 2 r z . .

    lPL

    (16)

    be the prime factorization of F . Since a is of order N ,

    we have

    a N =

    1(mod F ) (3.7)

    which implies

    all' = I(mod p i r i ) i = 1,2,..

    1.

    (18)

    If

    the transform matrix T is to be nonsingular, no two

    rows of T can be the same, which implies

    a p @(mod p i r i )

    i = 1, 1 1 ,

    P 4

    0

    5 p , q

    5 A: 1. (19)

    Equations

    (18)

    and

    (19)

    imply that

    a

    should be of order

    N

    with respect to each prime power factor p j r i of F , i.e.,

    N is the least positive integer such that

    a v = 1 (mod pi' i) i = 1,2,. ,Z .

    (20)

    If a is of order less than

    N

    with respect to any moduli

    p iT < ,

    hen 1 would be singular wit,h respect to that modu-

    lus. Also, for 1V-l to exist, N should be relatively prime

    t o F , i.e., p i should not be a factor of N . These two con-

    siderations give Theorem

    I,

    which isproven in Appen-

    dix B.

    Theorem 1

    An N X N transform T having the cyclic convolution

    property in the ring of integers modulo an integer F exists

    if

    and only if

    X

    divides

    0

    ( F )

    g cd (p l 1,pZ ,

    - a ,

    Q L

    1) .

    IV. ECERRIAT

    N U M B E R

    TRANSFORMS

    For number theoretic transforms to be more at,tractive

    i n

    comparison to the PE'T for implementing convolution,

    they should bo computationally efficient'. There arc three

    requirements that will be considered. First, I\; should be

    highly composite (prefcrably a power of 2 ) for

    a

    fast

    FE'T

    type algorithm to exist. Second, since complex multiplica-

    tions take most of the computational effort in calculating

    thc

    FF'T,

    it is important' tha t themultiplication by powers

    of

    a

    be a simple operation. This is possible if t he powers

    of a have very few bit binary representations; preferably

    also a power of two, where multiplication by a powcr of a

    reduces t o a word shift. Third, in order to facilitate arith-

    metic modulo P, F should also have a very few bit binary

    representation.

    Now we shall make a systematic investigation of good

    choices for F , for which the maximum transform l.en

    A T

    is not too small. As noted above, we would like F

    have a very few bit binary representation. The first p

    sibility is

    P

    t

    has a

    prime factor

    2

    and therefore acco

    ing to Theorem 1 of Section 111, the maximum possi

    transform ength is 1. For 2k 1, let I be a composi

    PQ,

    where

    1

    is prime. Then

    2 p

    1

    divides

    2 Q

    -

    1

    a

    the maximum possible length of the t,ransform will

    governed by he length possible for 2 p -- 1. Therefo

    only the primes I need to be considered intcrcsting. Nu

    bers

    of this form are known

    as

    Merscnne nunlbers a

    Rader [lo] has discussed convolution using Mersen

    numbers in detail. For MNT's [lo], it can be hown t

    transfornls of length at least 21 exist and the correspon

    ing a =

    -2.

    Mersennc number transforms

    are

    not of

    much nterest because 21' is not highly composito a

    therefore, we do not have fast FYI'-type comput,at,ion

    algorithms to compute the transforms. For

    Zk -

    I , sa

    is odd. Thcn 3 divides

    +

    I

    and hc largest possi

    transform length is

    2.

    Thus

    k

    is

    wen.

    Let

    IC

    be ~ 2 ~h

    s is an odd integer. Then

    2 z t

    + I divides

    2 8 z t

    +

    1

    and t

    length of the possible transform will be governed by t

    length possible for

    2zt

    -

    1.

    Therefore, int'egers of the fo

    2zL

    + 1

    are of interest.Thesenumbersare known

    Fermatnumbers and will be discussed in detail in t'

    paper. E'ennat numbers seem to be optimum in the sen

    of having transforms whose length is interesting while t

    word size is moderate. Numbers of the form 2.2 +

    1

    a

    also of limited int'erest and are discussed in Section I

    A

    systematic investigat'ion of t'hose F which require rno

    than taw bit reprcsentation is difficult. Our prelimina

    investigation in that direction does not seem t o be ve

    encouraging.

    In the light of the above discussion, it is proposed

    use F t b $- 1 b

    =

    z t , being a positive integer, as go

    choices for

    I?.

    Fr is known as the th Ferrnatnumber

    Arithmetic modulo

    17,

    can be performed using 6-bit ari

    metic, as will be discussed in Section VI. The nunher

    bit s used for the signal and t'he filter coefficients deci

    the value of b to be used so as not to cause error in t

    output due tooverflow. The value of

    b

    required is sligh

    more than double th e number of bits used for Dhc sign

    representat,ion. One simplc bound on the output was p

    sent'cd in t'he lastr section. Computationally, Fcrmat nun

    bers s ewn to be the best choices for

    F ,

    therefore,

    as

    far

    is practicable, they shouldbc used. If thovalue of

    obtained from the overflow consideration is not

    a

    pow

    os 2, it should be increased to he next power of 2

    order that Fcrrnatnumbers anbe used. Some oth

    possibilities are discussed in Section IX. Since thn arit

    metic is done modulo a k'crmat number, we call

    the

    ass

    ciated ransforms FNT's. Below we define

    F h T s

    an

    their inverse transforms for sequences of length

    lz =

    2

    X ( k )

    =

    z(n)anb(mod I ) k =

    O , l , . . . , A r

    -- 1 (2

    A-1

    L=O

    N 1

    z

    (n )

    =

    ( 2 -m )

    X(k)a-"k(modFt)

    k=O

  • 8/20/2019 1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering

    5/11

    AGARWAL ANDURRUS:ASTUTION AAPPLICATXONS TOIGITALILTERING

    91

    p ,

    :: 6 :+ 1,

    b

    = 2t (22)

    a is an nteger of order N , Le.,

    N

    is the east posit'ive

    integer such that

    ah' =

    1

    mod F , ) .

    Note hat CY-^ = and 2-" = -2b-"(mod

    F,).

    As discussed in Sections I1 and 111, we canperform

    cyclic convolut,ion of two integer sequences of length hr,

    using the FNT,

    if

    certain condit ions,are satisfied. Now

    we discuss the possible values of N for which an FNT

    exists, given a choice of

    F,.

    s implied by Theorem

    1

    of

    Section 111, for transforms (mod F , ) of length N to exist,

    N should divideO(F,)

    Fermat numbers up to

    Fd

    are all primes, therefore for

    these cases O ( F , )

    =

    2b, and we canhavean

    FNT

    for

    any length

    N = 2",

    m

    5

    b.

    For

    these Fermat primes the

    integer

    3

    is an O( of order N =

    2b,

    giving the largest pos-

    sible ransform ength. Of course thereare

    2b-1

    other

    integers also which are

    of

    order

    2b.

    They can be obtained

    by taking odd powers

    of

    3 .

    If

    an integer a is

    of

    order

    N ,

    then a P(mod

    F,)

    ill

    be

    of order

    N / p ,

    if

    N / p

    is an integer.

    Therefore,usingFermatprimes, an integer a of order

    2" m 5 b, is given by 32&"(modF,) The integer 2 is of

    order 2b =

    2t+1,

    f

    CY

    is taken as 2 or a power of 2 , all the

    powers of a would be some powers of

    2,

    and, for hese

    cases, as discussed in t'he beginning of this section, he

    FN T can be computed veryfficiently and is called the RT.

    There are no other known Fernlat primes. For digital

    fiheringapplications, F 5 ( b

    =

    3 2 ) and

    F6 ( b =

    64) seem

    t o

    be most practical. Lucas

    [13]

    has proven th at every

    prime factor of composite F , is of the form K-2t+2+ 1.

    Therefore, 21+2divides

    O(F,)

    for t

    >

    4.

    n particular

    it

    can be verified th at for F5 nd F6, ( F t ) =

    2t+2.

    There-

    fore, for these choices of Fermat numbers, the maximum

    possible transform 1engt)h s

    2t+2

    = 4b.Also, we assert th at

    a * given by (23) is of order 2, mod

    F,,

    2 2.

    qj = 2 2 t - 2 ( 2 2 1 - 1 .-

    1)

    (23)

    We denote thisaP s fi ecause

    CY;

    2 mod P,.

    The proof that aP iven by (23) is of order 21+2 with re-

    spect to any factor

    of F ,

    is given i n Appendix C. Any odd

    power of

    .\/z

    will also be of order

    2t+2.

    By raising

    q2

    to

    21+2-mthower, we obtain an integer

    U

    of order 2m,

    m 5

    t + 2.

    Table I gives Val-ues of 1V for the two most important

    values of

    CY

    and also gives the maximum possible N for

    the most practical values of b. The Fermat number trans-

    form has also been defined by Rader

    [lo],

    using

    =

    2

    but his development does not indicate the possibility of

    using Using the esults

    of

    Section

    111,

    we have ound

    the maximumpossible length N for which an FNTexists,

    for he most practical values of b, and have found he

    corresponding nteger LY TJsing CY = fi, themaximum

    possible engths for hese ransforms have increased by

    a

    factor of

    2 ,

    which makes them more useful. In th e next

    section we discuss a two-dimensional convolution scheme

    TABLE

    I

    PARAMETERS FOR SEVERAL POSSIBLE IMPLICATIONS FOR THE FNT'S

    __._ ___ I__

    3

    4

    8 2 + 1

    16

    2l'

    -+

    1 32

    16 3256 3

    64

    655 6

    3

    5

    32 232

    4

    64

    12s 128

    6 64 2e4

    -+

    1

    128

    256

    256 fi

    ._

    ...__

    . . . ~_ l l l _ _ _ I _ _ _ _

    _ _

    a This case corresponds to the

    RT.

    whereby very long one-dimensional sequences can also be

    convolved using a two-dimensional FNT.

    The basis functions for t,He FNT ar e integer exponen-

    tials mod F, , in comparison t o the complex exponentials

    for the DFT. These integer exponentials fold over after

    they cross F J 2 . Because of the nature of modulo arith-

    metic, he FNT coefficients do not seem to haveany

    physical meaning. Although the signal for which the FN T

    is being taken may

    be

    very small, ts FN T coefficients

    may lie anywhere between 0 and Ft-1. As a matter

    of

    fact, during various stages of the computation each accu-

    mulation of the signal "overflows" many times. But still

    the end result of the convolution would be exact if the

    input signals are properly bounded. FNT's f o l l o ~ ll the

    properties mentioned in AppendixA.

    Example

    T o make the ideas of this section more clear, we now

    present an example. This example will illustrate several

    points, reat,ment of negativevalues n hedata, t'he

    structure of the ransformand he inverse ransform

    matrix, negative powers of

    CY,

    requent "overflow" during

    computation,meaningless of the transformvalues,and

    exactnegs

    of

    the final answer. This example w-ill not demon-

    stra te the efhient implementation of the RT using the

    binary arithmat ic. That will be illustrated in Section

    V I

    on implementation of the RT.

    Consider two sequences 2 = (2,

    ,1,0)

    and h

    =

    (1;2,0,0),whose convolution is desired. From he over-

    flow consideration, it is sufficient 'if we work modulo

    Fz

    = 17. We want

    N

    =

    4,

    or

    F2

    the integer 2 is of order

    8, therefore i2

    4

    is an a of order 4. The transformation

    matrix T is given by

    1 1 4 - 1

    --4

    1 4

    16 13

    1

    16 1 16

    1

    I

    (mod 17)

    1 1 13 16 4 1

  • 8/20/2019 1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering

    6/11

    92

    IEEE

    TRANSACTIONS

    O N

    ACOUSTICS, SPEECH,

    AND

    SIGNAL

    PROCESSING,

    APRIL 1

    Since 4-1 = -4(mod 17), the inverse transformation ma-

    trix T-’

    is

    given by

    1 4-1

    4-2

    4-3

    T-1 = 4-1

    1

    4-2

    4-4

    4-6

    1 4-3

    4-6

    4 9

    1 1 1 1 -

    = - 4 1 1 -41 4

    1 -1 1 -1

    1 -1 4

    = ‘3 1; 1; l;

    1;f3 mod17)*

    The transforms of z nd h are given by

    X = T x =

    1 1 1 1

    -1 16 1 1 [ 3

    1

    16 1 16

    1 136 4

    Note that in

    x , -2

    was represented by

    + 17

    =

    15.

    Similarly,

    H = (3,9,16,10) and Y =

    X - H

    = (3,90,80,90)

    = (3,5,12,5)(mod 17).

    Taking the inverse transform of

    Y ,

    y

    = (2,2,14,2)(mod 17).

    According to ourassumption, ntegersaresupposed to

    lie between

    -8

    and

    8.

    Therefore,

    14

    must be represented

    as 14 - 17 = -3. This gives y

    =

    (2,2, 3 , 2 ) , which s

    the correct answer. Also, note that

    y

    is a symmetric se-

    quence, therefore Y is also a symmetric sequence. Other

    than this, the transform values have noignificance.

    V. TWO-DIMENSIONALCONVOLUTION FOR

    CONVOLVINGLONGSEQUENCES

    Arithmeticmodulo F , canbe implementedusing

    b

    = 2t bit representation of integers, as will be discussed

    later in Section VI, with some provision for representing

    2b. Themaximum ength of sequenceswhich an be

    cyclically convolved using the FN T is 46, using

    = f i ,

    TABLE

    I1

    MAXIMUM

    ONE-DIMENSIONALYCLICONVOLUTIONENGT

    USING WO-DIMEKSIONALNT

    OR

    RT

    ___

    Word length

    b

    N for

    CY

    =

    2 -N

    for

    CY = %

    16

    512

    32

    2048

    2048192

    64

    81922768

    as discussed above.Therefore, the ength of sequen

    which can be convolved is proportional to theword len

    in bits. Thus, or long sequences, word length requirem

    may be excessive. Rader [lo] suggested using a w

    dimensionalconvolution cheme to convolve ong o

    dimensionalsequences. Agarwal and

    Burrus [ll],

    [

    presented ucha two-dimensional convolution chem

    Using this scheme, cyclic convolution of length AT =

    is mplemented as

    a

    two-dimensional cyclic convolu

    of length (2L 1) by M (or 21, by M ) This t

    dimensional cyclic convolution can be implemented us

    a

    two-dimensional PNT [ll], [12] defined similar to

    one-dimensional transform. Using this two-dimensio

    scheme, the word length required is proportional to

    square root of the lengthof the sequences to be convol

    which would give for a maximum sequence ength,

    rather than 4b. If M is taken as the maximum poss

    length 4b, and 2L 1 is a small integer, then either di

    convolution or another high-speed algorithm could

    employed [la], 15] to computeconvolutionalong

    short dimension and the one-dimensional FNT could

    used along the long dimension. Computationally this co

    binationcanbevery efficient as will be shown in

    implementation in Section VIII. Table

    I1

    lists the m

    imum lengths for two-dimensional transforms.

    VI. A SUGGESTED HARDWARE

    IMPLEMENTATION

    Since the structure of the PN T s similar to th at of

    DFT, and

    N

    is

    a

    power of

    2,

    a fast implementation of

    FNT’s similar

    t o

    the FFT with radix 2 exists. As a ma

    of fact, all we have t o do is replace

    W = e--jZ*IN

    by

    any FFT algorithm [l].

    In computing theFNT,arithmetic sdone mod

    2b+ 1.

    I n this arithmetic the only allowed integers

    O,l,

    -

    and llntegers whose absolutevaluesdonot

    exceed 2b-1can be represented unambiguously. Negativ

    integers are represented by adding

    2b

    +

    1 to

    them; t

    is similar o twos complement and ones complement re

    sentation of negative integers. Using a b-bit register,

    int,egers from0 to

    2b -

    1 can be represented. The probl

    remains to represent 2 b . If the dat,a are uncorrelated,

    probability tha t thi s number will appear after an ari

    metical operation is approximately

    2-b.

    For digit’al fil

    ing applications, b would typically be 32

    or 64;

    in the

    cases the probability

    of

    occurrence of 2b is extrem

    small. If occasional error in convolution is permissi

    for these cases, probabily there is no need of any ex

  • 8/20/2019 1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering

    7/11

    AGARWAL AND BURRUS: FAST CONVOLUTION APPLICATIONS TO DIGITAL FILTERING 93

    hardware to represent

    2b. If

    the need exists, an extra bit

    could

    be

    used to represent

    2b

    at th e expense of a

    more

    complicated hardware. The following discussion is based

    on

    the b-bit representation of the integers.

    Now, we discuss how various basic arithmetical opera-

    tions can be performed modulo , .

    A .

    Negation

    Let

    h- 1

    A =

    ai2i,ai

    = 0 or 1.

    i O

    Then

    b-1

    - A = - C a22

    d = O

    h-1

    = di2i (26 1)

    i=O

    =

    di2i

    (26 1 ) + 2b+

    1

    mod F ,

    h- 1

    i =O

    = di2; + 2 mod F,.

    a- 1

    i-0

    Thus to negate a number, we have to complement each

    bit and add 2 to the result. For example,

    ( 1011)

    (mod 17):

    4

    = 0100; 4 =

    [+11:)]

    = 1101 = 13;

    13

    + 4 =

    17.

    13. Addi t i on

    When we add wo b-bit integers, we obtaina b-bit

    integer and possibly a carry bit. The carry bit represents

    2b = -

    1

    mod F,.

    To

    implement arithmeticmodulo 2b

    +

    1,

    we

    simply ubtract hecarrybit.Thus hehardwarc

    should be of the carry subtract type. For example,

    1010

    (mod 17) : 10

    + 9

    = 17

    = 2(mod

    17) +lo01

    10011

    1

    0010 = 2

    C . Subtraction

    Subtraction is implemented asn addition byirst negat-

    ing the subtrahend and then adding them. Addition must

    be done according toB.

    L . General Multiplication

    When we multiply two 6-bit integers, we get a 2b-bit

    product; Let

    Ct

    be theb-bit low-order part of it and C Hbe

    the b-bit high-order part of it., then

    A X B

    =

    C L

    +

    CW2b= C L C;i(mod F , ) .

    Thus, all we have odo is subtract he higherorder

    register rom the

    lower

    order egister. Thesubtraction

    needs to be done according to C. For example,

    0111 0101

    (mod 17):13 X 9 = 117 = 15(mod17); 117

    =

    L rr

    L ' L

    CL =

    0101

    ( -cl{) = 1010

    1111 = 15.

    E. Mult ip l icat ion

    by

    a

    Power

    of

    2

    If CY is taken as

    2 or

    a power of 2,

    RT's

    are obtained and

    the only multiplications involved in computing them are

    those by some powers of

    2.

    These multiplications are par-

    ticularlysimple to mplement n arithmetic modulo

    F,.

    Suppose we need to multiply the content's of a register

    by 2k, 0

    <

    k < 6, all we need to do is left-shift the con-

    tents of the register by lc bits and subtract the c overflow

    bits. A convenient way to do this is to append the data

    register on the left with a register of zeros, left-shift the

    double register by I positions, dropping the leading zeros,

    and then subtract the igher order register from the lower

    order register, as in

    D. If

    I is outside the range 0 < k <

    b ,

    we make use of the fact tha t

    2b = 1

    mod F,. Computa-

    tion of the inverse transform required multiplications by

    negative powers of 2 which can be converted to positive

    powers by thefollowing relationship,

    2-k

    = b--lc mod F,.

    An

    alternate method to multiply

    y

    2-k

    is

    to

    load

    the data

    in the higherorder egister of adouble egister, filling

    the lower order register with zeros, t,hen right-shift the

    double register by

    Ic

    positions and subtract the low-order

    register from he high-order register mod P,. For fast

    implementation of the bit shift operation, shift amount

    k

    should be expressed in a binary form and a t a clock pulse

    shifting should be done by a power of

    2.

    For example,

    (mod 17)

    :

    11 X Z3 = 88

    =

    3 mod 17

    11 = 0000 1011.

    0101 1000

    Shift left

    3

    positions

    __

    c r r

    C L

    CI,

    = 1000

    ( - C J r ) = +1100

    1

    0100

    - L l

    0011 = 3

    11 X

    2-3

    = 11

    X (-24-3) = -22 = 12 mod 17

    11 = 1011 0000.

  • 8/20/2019 1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering

    8/11

    94 IEEE TRANSAI

    ZTlONS ON ACOUSTICS, SPEECH, AND S I G N A L PHOCESSINU,APRIL

    19

    0001 0110

    Shift right

    3

    positions

    'H CL

    1100 = 12.

    For

    implementation

    of

    the fast RT, nlike the

    FFT,

    we

    do notneed to store the powers of

    a. For

    serial arithmetic,

    we could have

    a

    register which stores the shift amount

    k ,

    and as we go along th e fast.

    R T

    flow chart, we continually

    update the shift amount.All this is particularly simple to

    do in binary arithmetic. If OL is taken as

    f i ,

    only the odd

    powers of

    a

    require

    a

    two-bit representation. In th e fast

    FNT algorit>hnz,onlyonestage equiresmultiplications

    by odd powers of a . Multiplication by an odd power of

    fi

    would probably be best done by first multiplying by

    the just lower even power of f i , which is

    a

    power of 2,

    then multiplying by

    f i ,

    which is a two-bit number. The

    resulting increase in computation effort is very small.

    VII.

    CObilPARISON W I T H

    THE

    FFT

    As noted in the previous section, computing the K T is

    a very simple operation on a binary machine.

    Now

    lei us

    compare 'hecomplexity of variousbasicoperations in-

    volved incomputing the RT vis-a-vis the FFT.

    If

    the

    two sequences z ( n ) and

    h ( n )

    have

    61

    and

    6 2

    bit repre-

    sentations, espectively, andare of length

    N ,

    then he

    output y ( n )would need no more han

    a

    (01

    +

    02 + ogz A )

    bit representation. T o obtain the correct result

    b

    2

    bl +

    0 2 +

    og,

    X.

    In Section

    I11

    we have given a.better bound

    on the output. Roughly peaking,

    w e

    need twice the num-

    ber of bits to carry out the convolution using th e R T a s

    compared to t'he fixcd-point

    FFT

    implementation of the

    convolution. But in the DFT, every data point is treated

    as a complex number and therefore requires two words,

    one for the real part andone for he imaginary part. Thus,

    in effect, the hardware requirement for two ransforms

    is about the same. Although for real dat>a

    it

    is possible

    to make use of the symmetry properties of the D I T S

    theyrequireextracomputationandfor shepurposc of

    comparison it will be ignored, even t'hough we have taken

    this nto account forour I B M 370/155 implementation

    to be discussed in he next scct'ion. Therefore

    we

    shall

    assume, in the FE'T implementat ion, each data point is

    represented by a

    b / 2

    bit real part and

    a

    0/2

    bit imaginary

    part.

    One b / 2 bit complex addition is equivalent to two 6 / 2

    bit real additions, which are comparable to a 0-bit addi-

    tion modulo

    F, .

    Thus, the complexit'y of addi tionlsubtrac-

    tion

    is

    the same in both the transforms. Similarly,

    it

    can

    be shown that a b / 2 bit complex multiplication is com-

    parable to a &bit multiplication modulo

    F, .

    Computation

    of the RT requires multiplications by powers of 2, which

    implemented as bit shifts and subtractions become much

    simpler operations compared to complex multiplications

    required in the FFT implcn~entat~ion.

    To compute lengthN fast KT , 1% log, N additions/sub

    tract'ions,and ( N / 2 ) og, N / 2 multiplications by som

    powers of 2 are requiredwhich are implernentcd as bi

    and subtractions. To compute the convolut.ion using th

    FFT, -most of the time is talcen in computing the comple

    multiplicat,ions equired

    t o

    compute ,he ransfornls.

    comparison with

    thc

    RT

    reveals that these complcx

    nu

    tiplications are replaced

    by

    bit

    shifts

    and

    subtractio

    which are much faster operatio~~s. Thi s results n co

    siderablecomputational avings n the implementatio

    of convolution. This fact has been verified on a gcnera

    purposemachine

    (IBll.2 370/155) The

    cornputation r

    quired to multiply the two transforms is about, the sam

    for both the implementations.

    To

    convolve long sequenc

    using the two-dimensional P N T or BT, the compuhtion

    effort ncreases

    by, at

    t'hc nlost,

    a

    factor of 2. Still, th

    RT

    implementation of convolution

    is

    much faster as

    c o m

    pared to theFE'T implementation.

    RT's have some additional advantages ovcr the

    k k T

    First, he

    FYP

    implemc ntation requiresstoringall ,he

    powers of

    W

    requiringa significant, amount' of storag

    which maybean mportant actor for a small min

    computer or

    a

    special purpose hardware implcrnentatio

    Second, fixed-point, FF'Y implementation introduces a si

    nificant amount of the roundoff noise at the output,

    i-

    bits depending on the data

    [a] This

    degrades tha signa

    to-noise rat'ioduring the filteringoperations. The PN

    or RT implementation serrorfree, the onlysoum: o

    error is input A / D quanti~at~ion.

    VIII.

    I~ IPS ,EMl i :NTATrC)NO N THE IB l l 370/15

    The word length of t,hc

    IBM 360 -370

    series is

    32

    bi

    and, therefore, is well suited for t,he rnplementation

    convolution using the

    P NT

    or

    1i.T

    nloclulo

    P 5 = 2 3 2

    +

    I t has two's compl(:mcnt represcntation of rlegat.ive int

    gers, i.c., -- x is epresented as x.

    W o

    want. rwgativ

    integers

    to

    be reprcscntedas the conlplernent modu

    232+ 1,

    i.e.,

    - -x is t o

    be rcprcscnted by

    F 2 -

    1

    -- x,

    an

    to accomplish this

    1

    is

    a.dded

    whenever a nega.tivo int>e

    is encountered in the data. As noted before,

    on

    this m

    chine,

    we

    cannot represorrt

    --- l .

    If a

    -- l

    is cmmxntcr

    in the data,

    it

    is roundcd t:ithw

    to 0

    or to

    -2 .

    This i

    equivalent to introducing some initial quantizat,ion nois

    If

    the data are uncorrclatt:d, t'hc probability th at w

    appear after an arithmetical opcration during cornp1~

    tion of the transform

    is

    roughly

    2-32

    =

    pcropcmtion

    This crror int,roduced during computation

    s

    ratllcr

    wriou

    andprobably ho corrcsponding output b1oc.k

    will

    b

    meaningless.Assuming

    N = 64, thc

    probability that

    particular block is erroneous is roughly - k'

    most filtering operat'ions, his may

    be permissible.

    Logical add ( A lA ) andsubtract nstructions

    SLI

    are used t o add and subt'ract

    w o

    intcgcrs modulo

    Fj.

    If

    carry is detected aftor add,

    1

    s subtracted frorn the resul

    and if 110 carry is dctectlctl aflcr subtract,

    I

    is added t

    the result. Multiplication by 'Lk is done using tho logic

    left shift operation

    (SLl)I.,)

    and m~dltiplicat,ion

    y

    2 ';

  • 8/20/2019 1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering

    9/11

    AGARWAL AND BURRUS: FAST CONVOLUTION APPLICATIONS TO DIGITALILTERING 95

    TABLE 111

    CYCLICCONVOLUTIONIMINGSOR LENGTHN

    REAL

    SEQUENCES

    N

    ms

    ms

    FFT FNT

    or IZT

    32 16

    64

    3 . 3

    128

    31

    7 . 4

    60

    256

    16.68

    123

    256

    40

    Ob

    123

    512

    8o.G

    243

    1024

    168.00

    530

    2048

    340.Oc

    1260 720.00

    a

    Using = d.

    b

    Using 2

    by

    128 convolution.

    Using two-dimensional RT.

    doneusing logical rightshiftoperation (SRDL)

    , 0

    <

    k < 32, as explained in Section VI. Similarly, multiplica-

    tion of two integers modulo F 5 can also be done.

    Two assembler ,subprograms were written to compute

    the fast R T and the inverse for any sequence length from

    to 26, taking

    a

    as

    a

    power of

    2 .

    Two more subprograms

    were written to compute the fast FNT and it s nverse for

    length 128 sequences, using a: =

    f i ,

    given by (23). Out

    of the 7 stages of the fast algorithm, 6nly one stage re-

    quiresmultiplication byoda powers of ~, which are

    implenxnted as suggest'ed in Section VI.

    To cyclically convolve sequences longer t han 128, tyo-

    dimensionalconvolution schemeswereused [l2].One

    suchscheme was 2 by 128 convolution or length 256

    sequences discussed in Section X of [li]. Another pro-

    gram was wri tten o convolve ong one-dimension61

    se-

    quences using two-dimensional RT's, as discussed in Sec-

    tion

    I11

    of

    [12].

    Timings for various implementations of

    convolut'ion and heir comparisonwith the FFT imple-

    mentationsarc shown inTable 111. To compute hese

    timings it is assumed tha t transforms of the

    h

    sequences

    have been precalculated. Thus, timings arc for computing

    the transforms, multiplications of the transforms, and

    their inverse transforms.

    For

    the FE'T implementation, a

    very efficient algorithm [17] was used, which employed

    both radix 4 and radix 2 steps and took into account the

    fact, that the dat'a are real.

    For

    sequences up to length

    256, improvements of t,he order of 3 to 5 were obtained

    over the FFT implementations.Since thehardware of

    IBM 360/370 series isnot designed todoarithmetic

    modulo Permat numbers, extra branch instructions

    were

    necessaryaft,er add or subtract opcrat'ions to ake nto

    account the carry bit. If the hardware is designed to do

    arithmetic modulo Fermatnumbers, hese nstructions

    could be avoided, resulting n a significant reduction in

    the computationime. Copies of t'hessemblerub-

    programs arc available from the authors .

    IX. GENERALIZATIONSAND OTHER

    APPLICATIONS

    In this paperwe have discussed convolution in the ings

    of integersmodulo of Fermat numbers. These numbers

    seem to be the best choice for implementation on digit,al

    computers. Nevertheless, any integer F , for which

    0

    ( F ) >

    1

    can be used, was discussed in Section IV. Rader

    [ lo]

    proposed the use of Mersenne numbers

    , M ,

    (2"

    -

    1,

    p a

    prime). MNT's have Q

    =

    and N =

    2 p ,

    and therefore

    do not have an FFT-type fast computational algorithm.

    In many situations the quantization and overflow con-

    siderations may require arithmetic to be done using 10

    to 20-bit arithmetic.For hissituation,working n the

    integer ring modulo F 4

    2 1 6

    +

    1)

    would be insufficient and

    at the same tirne, computing the

    FiYT

    modulo F5(232+

    1 )

    would require excessive number of bits. In this situation;

    we may perform convoiution modulo zz4

    +

    1. Also, many

    computershave 24-bit word ength.Transformssimilar

    to FNT's exist for thissituation.We could take F =

    224f 1 and hen

    =

    8 gives N

    =

    16, and

    (Y

    = S1/Z

    =

    27(212

    1)

    gives N

    =

    32 as the maximum length of the

    one-dimensional transform.

    In generalone could take F = 2 sZ t +

    1, s

    is an odd

    integer. In th at case,

    a

    =

    2"

    would give

    N

    =

    2t+', and

    2t+2.

    n many situations it may be possible to have trans-

    forms

    o f

    greater ength han 2t+2 . For example, taking

    F

    =

    240+ 1, t'he maxinium possible transform length is

    256,' and, aking F = 280+ 1; the maximum possible

    transfotm length is 1024, but the corresponding

    a's

    may

    not besimple.

    As discussed earlier, the FNT implementation requires

    roughly twice the number of bits for the arithmetic. Most

    minicomputers have short word lengths and in that situ-

    ation it may not be possible to efficiently use F'KT's, but

    there is a solution to this problem. We can split the dat'a

    bits int'o two halves and then convolve them.

    = p)1/2

    = 2 [ ( s - l ) / ~ + s 2 t - ~ 1 ( 2 ~ 2 t - - l 1) would give iv=

    4 )

    =

    zz(n)

    +

    z1(n)2k, I x z ( n ) I <

    h ( n ) =

    h2(n) +

    h n ) 2 k ,

    I

    hz(n)

    <

    2k ( 2 4 )

    y = 2*h

    =L

    Zl*h1.22k + (21*h2+ zz*h1)2k

    +

    22 h2.

    ( 2 5 )

    NOW,

    ince

    21,

    h ~ ,2 ,and hz have roughly half the number

    of bits,

    it

    should be possible to convolve them using ap-

    proximately half the number of bit's. If necessary, a more

    precise analysis of the abovesituation could be easily

    performed. I n (25), the last erm, n comparison to the

    first term, is very small and can be neglected. We need to

    take two transforms for z and two transforms for h, the

    pummation hownwithin the parent,hescscan be per-

    formed in the transform domain. Finally,we need to take

    two inverse ransforms, one for zl*hl and t,he other for

    (z1*hz+ 52*hl .

    FNT's can also be used forconvolutions equired in

    block recursive realizations of recursive digital filters[lfi].

    For these realizations, convolutions form the main com-

    putational 'ask. Sinceconvolution with heIWT'scan

    be performed with both efficiency and with perfect accu-

    racy, it should be very attract,ivefor this applicat'ion. For

    very high-order recursive filters this may reduce roundoff

    noise problems associated with these filters. Other appli-

  • 8/20/2019 1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering

    10/11

    96 IEEE

    TRANSACTIONS

    ON ACOUSTICS, SPEECH, ANI)

    SIGNAL PROCESSING,

    APRIL

    1

    APPENDIX A

    1’ROPERTIES

    O F

    THE

    TRANSFORMSAVING

    G.

    Th’eore’n

    THE CYCLICONVOLUTIOSROPERTY If

    Belome summarize the definition and some important

    T {

    (n) = F ( W

    propertics of the transforms having the cyclic convolution

    property. It is assumed that allarithmeticaloperations

    are performed in a ring. Most of these properties are simi-

    H .

    Fast ~ ~ ~ ~ ~ ~ ~ t ~ t i ~ ~ ~ l , 4 l

    lar to the

    DFT

    properties [l].

    A , Definition

    Transform :

    Ti f ( n +

    m

    = F ( K ) o c - m K . (A

    If N can be factored as

    N

    = r 1 r 2 r m

    then a fast computational algorithm similar to the

    F

    can be used to compute the transform which requires

    the order of N ( r l

    + r2

    + -

    - + r,)

    arithmetical ope

    tions.

    N -

    1

    F ( k ) = f (n )a”k

    k

    = O , l , - . . , N 1.

    7L=O

    CYN

    = 1

    Inverse Transform:

    N- 1 I . Convolut.ion Property

    f ( n )

    =

    N-’ F ( k ) -nk n =

    0,1, -,lv

    - 1. (A2)

    Transform of cyclic convolut,ion of two sequences

    k 4 the product

    of

    their t’ransforms,.e., if

    C . 0rth.ogonality

    (A

    The inner product of two basis functions are given by ? J . Parseval’s Theorem

    [see (12 j and13)]

    X ( k )

    = T ( z ( n ) ]

    H ( k ) = T { h ( n ) j .

    N if

    n a

    = n mod N

    0 otherwise

    This implies t,he orthogonality of the basis functions.

    B . Periodicity

    (A41

    Then

    N- 1 N-

    N s ( n ) h ( n )=

    X ( k ) N ( - - 1 2 )

    (A

    n=O k=O

    and

    F ( k ) can also be periodically extended, similar to $he

    x

    1

    N- 1

    N

    s ( n ) h ( - n ) = X ( k ) H ( k ) . ( h

    extension of f ( n )

    a=O

    k=O

    f ( n + N ) = f ( n ) For,heart’icular case of s (n ) = h n )

    F ( K

    + N ) =

    F ( K ) .

    (AS)

    A‘- 1 N - 1

    N

    2 2 ( n ) =

    X k ) X

    16

    ( M

    E. Symmetry Proper ty

    7Z=O k=O

    If

    the signal is symmetric, i.e.,

    and

    N-

    1

    N- 1

    f n) f ( - n ) =

    f ( N

    -

    n) J V

    z n ) z

    n)

    =

    X ( k ) 2 .

    (-4

    the transformed sequence is also symmetric, Le.,

    n=o

    k=O

    Note that in his case, the conventional Parswal’s theor

    F ( K ) = F ( - K )

    =

    F ( N

    - K ) .

    (AB)

    N-1

    N- 1

    If the signal is antisymmetric, i.e.,

    f (n ) = -f(-n.)

    =

    -f(N

    n ) ,

    does not hold,ecause in aingmagnitude is undefined

  • 8/20/2019 1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering

    11/11

    K . Stretch

    Property

    Thestretchpropertyexists if the ransformfor he

    longer length sequence exists in that ing.

    L.

    Sampl ing Proper ty

    The sampling propertyexists.

    APPENDIX B

    PROOF O F THEOREM

    I

    A8 shown before, the existence of such a transform de-

    pends on the existence of a n a of order N with respect to

    each prime power factor moduli, pari. According to Euler’s

    Theorem

    [14],

    N , order of

    a,

    must divide

    cp

    ( p i 7 ; ) ,where

    (o(piT i ) s the Euler’s (o function given by

    c p ( p p )

    =

    p p - y p i 1).

    (B1)

    Thus, N j ‘p (p i + i ) ) , nd since p i isnota actor of I\-

    N

    I ( p i , i = 1,2,.-.,1or

    N

    I

    gcd(p1 ,p2

    , .* . ,pz

    1 )

    or

    N I

    O ( F ) . (B2)

    This proves the ecessary part of the theorem. To prove

    sufficiency we have to show the existence of an

    a

    of order

    N if

    hr

    [

    0

    ( F ) Note that (B2) rules out F being an even

    integerbecause n th at case 0

    8’)

    = 1. If N I 0 ( F )

    N I (pi

    -

    1) or

    N I

    ( o ( p i ) , herefore, one can find integers

    pi of order N modulo piri [14], i = 1,2, *.,1. Then, by

    the Chinese Remainder Theorem, one canfind an

    a

    of

    order N modulo F = p p . - p f t , such tha t

    a

    =

    pi(mod p i r i ) i =

    1,2,. . Z. (B3)

    This is our desired a of order

    N .

    Since

    N

    and p i are rela-

    tively prime, N-I exists in this ring. This completes the

    proof of the theorem.

    APPENDIX C

    T o prove that a p of ( 2 3 ) is of order 2t+2mod F t , we

    have to prove that it is of order 2t+2with respect to each

    prime power factor

    of F , 20).

    Let piri be a prime power

    factor of F t , then

    ap2t+2 =

    22t+l =

    1

    mod

    F t

    =

    1mod

    p p .

    AGARWAL

    AND

    BURRUS: FAST CONVOLUTION APPLICATIONS TO

    DIGITAL

    FILTERING

    97

    Let Ni e the order of apwith respect to p p , then by

    EuIer’s theorem,

    N i

    I 2‘” (C2>

    therefore N ; must be

    a

    power of 2.

    Also,

    aP

    =

    22t = -1mod

    F ,

    t+ l

    =

    -1

    mod

    p p .

    ((333

    From (C l) and (C3)

    , it

    is obvious th at

    completing the proof.

    Ni= 2t+?

    ACKNOWLEDGMENT

    Theauthors wish to hank C. M. Rader of Lincoln

    Laboratqries,Cambridge,Mass., or

    his

    comments and

    discussions.

    REFERENCES

    B. Gold and C.

    M.

    Rader, Digital Processing of Signals. New

    York: McGraw-Hill,

    1969.

    A.

    V.

    Oppenheim and C. Weinstein, “Effects of finite register

    length in digital filtering and the fast Fourier transform,’, Proc.

    D.

    E.

    Knuth, The Art

    of

    Computer Programming, vol, 2, Semi-

    numerical Algorithms. Reading,

    Mass.:

    Addison-Wesley, 1969,

    p.

    555

    and pp. 259-262.

    J.

    M.

    Pollard, “The fast Fourier transform in

    a

    finite field,”

    Math. Compul., vol.

    25,

    pp.

    365-374,

    Apr.

    1971:

    I.

    J.

    Good, “The relation between two fast Fourier transforms,”

    IEE E Trans. Comput., vol.C-20, pp. 310-317, Mar. 1971.

    A. Schonhage and V. Strassen, “Fast multiplication of large

    numbers,” (in German) Comput., vol. 7, pp. 281-292, 1971.

    D.

    E. Knuth, ‘lThe art

    of

    computing programming-errata et

    addenda,” Comput. Sei. Dep., Stanford Vniversity, Stanford,

    Calif., Rep. STAN-CS-71-194, pp. 21-26, Jan. 1971.

    forms,”

    J . Comput.

    Syst. Sci., vol. 5 ,

    pp. 524-547, 1971.

    P. J.

    Nicholson, "Algebraic

    theory

    of

    finite Fourier trans-

    C. M. Rader,’ “The number theoretic DF T and exact discrete

    convolution,” IEE E Arden House Workshop on Digital Signal

    Processing, ‘Harriman, N. Y.,. Jan. 11, 1972.

    R. C. Agarwal and

    C.

    S. Burrus, “Fast digital convolution

    Trans. Comput., vol. C-21, pp. 1269-1273, Dee. 1972.

    using Fermat transforms,” in outhwest IE EE Conf. Rec.,

    Houston, Tex., Apr.

    1973,

    pp.

    538-543.

    dimensional techniques,” IEEE Trans. Awust., Speech,and

    -, ‘‘Fakt one-dimensional digital convolution by multi-

    Signal processing, vol.

    ASSP-22,

    pp.

    1-10,

    Feb.

    1974.

    L. E. Dickson, History of the Theory of Numbers, vol. I.

    Washington, D. C.: Carnegie Institute,

    1919,

    p.

    376.

    Hill, 1948.

    0. Ore, Number Theory and

    I t s

    History. New York: McGraw-

    Proc. Conf. Applications of Walsh Functions; Washington,

    D.

    A. Pitassi, “Fast convolution usingWalsh functions,” in

    D.

    C., April 1971, pp. 130-133.

    C. S Burrus, “Block realization of digital filters,” IE EE Trans.

    Audio Electroacoust., vol. AU-20, pp. 230-235, Oct. 1972.

    R.

    C. Singleton, “An algorithm for computing the mixed

    radix

    f a s t Fourier transform,” IEEE Trans. Audio Eledroawust.,

    vol. AU-17. DD.9.3-103. June 1969.

    IEEE, V O ~ .

    60,

    pp.

    957-976,

    Aug.

    1972.

    “Discrete convolution via Mersenne transforms,” IEEE