1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering

8/20/2019 1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering

1/11

IEEE

TRANSACTIONSONACOUSTICS, PEECH,AND IGNALPROCESSING, VOL. ASSP-22,NO.

2,

APRIL

1974

87

Fast Convolution Using FermatNumber Transforms with

Applications to Digital Filtering

Absfracf-The tructure

of

transformshaving the convolution

property is developed.

A

particular transform is proposed that is de-

fined on a finite ring of integers with arithmetic carried out modulo

Fermat numbers. This Fermat number transform (FNT) is ideally

suited todigital computation, requiring n the order of

N

ogN addi-

tions, subtractions and bit shifts, but

no

multiplications. In addition

to.being efficient, the Fermat number transform implementation of

convolution

is

exact, i.e., there is no roundoff error. There is a re-

strictionon sequence ength imposedbyword length butmulti-

dimensional techniques are discussed which overcome this limita-

tion. Results of an implementation n he

IBM 370/155

are presented

andcomparedwith the ast Fourier ransform FFT)showing a

substantial improvement in efficiency and accuracy.

I. INTRODUCTION

T””

onvolution proper ty of certain ransformscan

be used to efficiently compute the cyclic convolution

of two long discrete sequences. One such transform is the

discrete Fourier ransform (D FT ) with the fast Fourier

transform (FFT) implementation

[l]

which operates on

sequences in the complex number field. To compute the

DFT of

a

sequence of length

iV

requires on the order of

( N / Z ) logz ( N / 2 ) complex multiplications, which often

take most of the computation time. In addition, the

FFT

implementation of cyclic convolution ntroduces signifi-

cantamounts of roundoff

error [ Z ] ,

thusdeteriorating

the signal-to-noise ratio. When working with digital ma-

chines, the data are available only with some finite pre-

cision and, therefore, without loss of generality, the data

can be considered to be integers with some upper bound.

In this digital domain, th e complex number field of t he

continuous domain can be replaced with a finite field

or,

more generally, a finite ring of integers with addition and

multiplicationmoduloomenteger F . In this ring,

using transforms similar to the DFT, the cyclic canvolu-

tion can be performed very efficiently and without any

roundoff error.Onesuch ransformpresentedhere sa

number theoretic transform which operates in the ring of

integers modulo a Fermat number. Computation of this

transform

of

length

N ,

analogous to an

FFI’,

requires on

the order

of N

log2 AT additions, bit shifts, and subtrac-

tions, but no multiplication. This is considerably simpler

than the FFT.

Knuth [SI has proposed the use of transforms in finite

fields. Pollard [4] discussed transforms having t he cyclic

This work was supported by NSF Grant GK-23697.

Rice University, Houston, Tex. 77001.

ManuscriptreceivedJuly16, 1973; revisedNovember19,1973.

The authors are with the Department of Electrical Engineering,

convolution property n a finite field and also gives th e

conditions for having transforms with the cyclic convolu-

tion property in a finite ring of integers. Good

[5]

also

mentioned the use of transforms in a finiteing of integers.

Schonhage and Strassen

[SI

defined transformshaving

th e cyclic convolution property modulo a Fermat num-

ber and discussed their application to fast multiplication

of very large integers. Knuth

[7]

elaborated on the work

of Schonhage and Strassen. Nicholson [SI presented an

algebraic theory of finite Fourier transfornls in any ring

and established fastFFT-typealgorithms ocompute

these transforms. Rader

[SI,

[lo] proposed finite trans-

forms n ings of integersmodulo both Mersenne and

Fermatnumbers. He firstproposed the application to

digital signal processing, showed th at th e ransforms could

be calculated using only additions nd bit shifting, howed

the ,word-length constraint, nd suggested two-dimensional

transformsas a possible relaxation of thatconstraint.

Agamal and Burrus [l l] defined Fermat transforms and

also proposed their applicationfor fast digital convolution.

The purpose of this paper s t o develop the general

structure of transforms with the convolution property and

then present the number heoretic ransformmoduloa

Fernrat number as a practical scheme for use with digital

computation. The properties are hen discussed for the

implementation of digital filters and the results of a soft-

ware implementation on an

I R M 370/155

are given.

It,

s

believed th at thepresentation here is different rom previ-

ous works and gives more insight into these transforms.

One result

of

this approach allows the maximum length

of sequences th at can be convolved to be increased by a

factor of

2

over what hadpreviously been reported n [IO].

The choice of terminology in a new area is always diffi-

cult. In this paper the generalname,number-theoretic

transform, is usedor anyransform using umber-

theoreticconcepts n tsdefinition. The twoparticular

transforms with arithmetic modulo Mersenne and Fermat

numbers are called klersennenumber transforms

MNT’s)

and Fermat number transforms (FNT’s) . The particular

MNT’s and FNT’s with a basis function

a = 2

are called

Rader transforms (RT’s)

These ransforms are ruly digital ransforms, aking

into account the quantization in amplitude and the finite

precision of digital signals. They bear the same relat ion

to digital ignal as heDFT does to discrete-time or

sampled da ta signals and the Fourier

or

Laplace trans-

forms do to continuous-time signals. In th e same manner

th at th e relation of discrete-time signals to continuous-


2/11

88

IEEE

TRANSACTIONS

O N ACOUSTICS, SPEECH, AND SIGNAL

PROCESSING, APRIL

1

time signals through sampling involves a possible folding

or

aliasing in the frequency domain, the relation of calcu-

lationswith the DFT to calculations with he number

theoretic ransforms involves a possible folding of the

amplitude that must be taken into account.

In Section 11, the structures of t'ransforms having the

cyclic and he noneyclic convolutionpropertiesare de-

veloped.

It

is shown Ohat what we call the DFT str ucture

is t,he only structure which supports the cyclic convolu-

tion property and any transformaving the

DFT

structure

will have the cyclic convolution property. In Section

111,

the necessary and sufficient conditions for the existmce

of transforms in a ring of integers modulo of an integer F

are establishcd. In Section IV,

FNT's

are defined which,

computationally,appear to be t.he best ransforms for

implementing convolution. In Section V, reference is given

to

a

two-dimensional transform implementat'ion for long

one-dimensional convolution. In Section VI, a hardware

implementation to compute the FNT is suggested. Com-

parison of the

FNT

with the

FFT

implementation of the

I>FT, o implement convolution, is made in Sect,ion VII.

In Section VIII,

a

software implementation of the FNT

on an IBM

370/155

is prescnt,ed and timing comparisons

are made with the PFT. For cyclic convolution engths

up to 236, FNT implementations are faster by a factor

of 3 to 5 over the best FFT implementations which make

use of the symmet,ry

of

the

DFT

for real data.

For

com-

puterswith slower multiplication, theadvantages look

even bet'tcr. lGnally, several generalizat,ions and applica-

tions are presented.

11. THE STRUC TURE

O F

TRASSFOILMS HAVING

THE CONVOLUTION PROPERTY

In this section the structure of transforms having the

cyclic convolution property, .e., the transform of cyclic

convolution of two sequences is equal to th e product of

their ransforms, is establishcd. The class of transforms

considered

is the set of lincar nonsingular (invertible)

t,ransforms which map a sequence of length A r to another

sequencc

of

length

N .

It is shown that what lvc call t,he

D I T structure is tho only structure which supports the

cyclic convolut,ion propert,y and, moreover, any transform

having the DPT str uct urc ill have the cyclic convolution

property.

Let xn and h,, n = O , l , .

.

lV

1,

be wo sequences

which are to be cyclically convolved as given by

N- 1

y n = xn*hn= 2 kh n -k n

=

O , l , . .

-

N

1. (1)

k=O

This definition assumes the sequences are periodically

extendedwith period N or the indiccs arc valuated

modulo N. Here we will describe sequences as vectors of

length

N.

Since only linear invertible t,ransforms are con-

sidered, they can be represented by anN

X N

nonsingular

matrix

T

whose clenwnts arc t k , m , k,wz = O , l , -

-

- ,N

1.

Let capital letters denote the transformed sequences, i.e.,

X =

T X

H

=

Th

Y = Tg.

Consider the conditions on

t k , m ' s

so that

Y = X @ H

where @ denotes term by term multiplicat'ion of

vectors.

Equations ( 2 ) and (1) are combined to give

N- l N-1 N-1

y k

= t k , n y n = t k , n

xmhn-m

n=O

n=O

m=O

x 1

N-

1

=

xmhmtk,m+l

m=O Z=O

N- 1

x k = tk ,mxnz

m=O

A - 1

HA

=

tk , zhl

I

=

O , l , .

.,.I:

1.

z=o

The individual terms in ( 3 ) can be rewritten as

Y k

= X k H k

which, using (4 ), becomes

N-

1 A- 1 A- 1 A'-1

&hltk,m+z xm:mhztk,nztk,z.

m=o z=o

m O

z=o

Matching the multiples of xmhl on both sides of (5 ) g

tk++l

=

tk ,&,z

k Z m

= O , l , - .

.,X

1.

Repeated applications of

(6)

gives

t k V m =

t k , P

l c , m =

O , l ,

*

- .,w 1.

And, since the convolution is cyclic, the illdices in

are added nlodulo N . This gives

t k , m A r = 1 16,772 =

0,1,

* . iv 1.

Therefore, t.he

tic,m's

are JVth roots of unity. Fo r T to

nonsingular, all the t k , 1 7 ~houldbedistinct and, si

there are only N distinct Nt h roots of unity, t'he t

must be these i\; distinct roots. Without loss of general

t l .1 can

be

taken as a root

of

order N , i.e., N is t,he le

positive integersuch that tl,l = 1. Therefore,all

t

can be written

as

some powers

of

t l , 1 .

Again without

of generality, the rows

of T

can be arrangod so that

tk.1 t l , l k .

Combining

(7)

and

(9)

and denoting t l .1 by

a

gives

following

t k , m akm

k , 7 ? % = 0,1,'

* ',Av

1. (

The structure thus established causes the transform

beorthogonal. The elements Zk ,m of are given by

Zk,m

= N-la--km k,m

=

O , l , . .,N 1. (

To prove this, consider


3/11

AGARWAL

AND

BURRUS:

FAST

C O N V O L U T I O N APPLICATIONS TO DIGITAL

FILTERING

89

TT-l

=

I

or

N-1

N 1 Ykn Y-ln = 6

k.2

(12)

7L=O

where

6 k , Z

is the Kroneckerdeltafunction.Taking

I

1 =

p , (12 ) can be rewritten as

N-

1

1 if p

=

0 (mod W

Lv-l =

(13)

n=O

0 otherwise.

To prove the orthogonality, we have Do prove (13).

If

p =

O(mod

W),

then

(YP

= 1 and he first part is thus

established.

If p

O(mod

A )

,

( YP

1 or (a" 1) 0.

Multiplying (13) by

( ( Y P

l ) ,we obtain

N- 1

p 1 Y ~

1)

( Y ~ n

l p l

Y ~ ~

1) =

0.

n=O

Thus (13) is established.

An examination of the preceding development reveals

th at he existence of an N X N transformhaving the

cyclic convolution property depends only on the existence

of an Y that is a

root.

of

unity of order N , and the xistence

of

1V-l.

The structures of t'he transform matrix T and its

inverse are given by (10) and

(11)

respectively, and

theseare he only structures which support he cyclic

convolutionproperty.This structure is called t'he DFT

structure . Because of the DFT structure, any transform

having the cyclic convolutionproperty also has a fast

computational algorithm simiiar to theFFT if is highly

composite Kg]. In t'he complex number field th e DFT,

with (Y =

e--jZriN,

is the only transform having the cyclic

convolution property. But, in a finite field

or,

more gen-

erally, a ringwe might also have transforms with theyclic

convolution property if there exists a root of un ity of

order N and if X- exists. In thenext section

we

establish

results or the special case of

a

finite ing of integers

modulo of an integer, which seem to be most appropriate

for practical digital computation. Appendix A lists some

of th e properties of these transforms.

Thus far wi: have discussed only the transforms having

the cyclic convolutionproperty. Noncyclic convolution

can be implemented using cyclic convolution by padding

the sequences with

zeros

[l].

If

only noncyclic convolu-

tion is desired, a more general class of transforms is pos-

sible. Consider two sequences of lengths L and M , respec-

t'ively, t'hat are noncyclically convolved giving an output

sequence of length

N

=

L + M

1. First

we

pad the

sequences with zeros to make hem of length N each.

For

the X N transform T , the desired restrictions on

t k , m 1 8

are still given by

(6),

but now the indices are not

added modulo X ; his modifies

( 6 )

to be

t k ,m + l = t k ,m t k , z k = 0,1," ' , N 1

112 = O , l , . . . , M

1

I

=

0,1,.*.,L

1.

(14)

Repeated applications of (14) gives

t k , m

=

k k , l m lc,m = 0,1,--.,N 1.

(15)

This s the same as (7 ) , but

now we

donothave

(8)

which implies cyclic convolution. Because of the absence

of 8 ) , k 1 7 ~re not restricted t o Nt h roots of uni ty. The

only restriction is the nonsignularityconstraintwhich

requiresall

tk,1 8

to be distinct.

A

similarprocedure s

discussed byKnuth [3] in connection with hemulti-

plication of large ntegers which is almost analogous to

noncyclic convolution. When the

t k , l ' s

are not restricted

to Nth roots of uni ty, ,in general there is no 1WT-type

algorithm to compute the transform

or

the inverse trans-

form. As a matt'er of fact, t,he inverse transform docs not

have a simple structure. Because of these limitations

we

restrict ourselves to transforms having the cyclic convolu-

tion property.

111. TRANSFORMS IN TH E RIN G

011

INTEGERS

MODULO A N

INTEGER

In anypractical situation,

or

when working with digital

machines, thedataare available only with some finite

precision, and therefore,without loss

of

generalit,y, the

da ta can be considered to be integers wit'h some upper

bound. Tocompute corivolution in his digitaldomain,

operations in the complex number field of the continuous

domain can be imitated with a finite field or, rnore gen-

erally, a finitx ring of integers under additions and multi-

plications modulo some integer F ; and an int'eger (Y of

order

M

replaces

e--j2T/i\i

used in a DIW. In thi s ing, when

two nteger sequences

z ( n )

and h ( h ) are convolved, the

output integer sequence

y

( n )

s congruent to the convolu-

tion of

x n)

nd

h ( n )

modulo F . In the ring of integers

modulo F , conventional ntcgcrscanbeunambiguously

represent'ed if their absolute value is loss than

F / 2 .

If the

input integer sequences z ( n ) and h ( n ) are so scaled that

y ( n )

I never exceeds

F / 2 ,

we would get the same results

by implementing convolut'ion in he ring of integers

modulo

F

as tha t obtained with normal arit.hmetic. This

is similar to th e overflow constraint in fixed-point digital

machines.

In most digit,al filt'ering applications, h ( n ) represents

the impulse response and is known a priori; also the max-

imummagnitude of the nput signal isusually known.

I n this situation, we can bound the peak output magni-

tude by,

Ai- 1

1 y ( n )

I 5 I z ( n )

I m x

c h ( n )

n O

When

we

relate ontinuous-time signals to discrete-

time signals through ampling, we obtain an aliasing

effect n he frequencydomain;all the frequencies are

obtainedmodulo the samplingrcquency. A similar

effec t occurs when the amplitude is discrctizod and con-

volution is performed in the integer domain modulo an

integer F . We obtain aliasing in amplitude which can be

avoided

as

noted before.


4/11

90 IEEE TRANSACTIONS

O N

ACOUSTICS,

SPEECH,NDIGNALROCESSING,PRIL 1

In order to have a computational advantage in imple-

menting cyclic convolution, wewill derive a transform

having the cyclic convolutionproperty in this ring. In

addition, we would like this ransfornl ohave a fast

cornput'ationalalgorithm.First, wewill present esults

about the existence of such transforms which mere shown

to depend on the exist.enceof

an

a

of order

N

and existence

N - l in the ring. The number theory necessary to under-

stand the derivations is very basic and can be found in

any elementary book o n number theory [14].

Let

F

= p l r l p 2 r z . .

lPL

(16)

be the prime factorization of F . Since a is of order N ,

we have

a N =

1(mod F ) (3.7)

which implies

all' = I(mod p i r i ) i = 1,2,..

1.

(18)

If

the transform matrix T is to be nonsingular, no two

rows of T can be the same, which implies

a p @(mod p i r i )

i = 1, 1 1 ,

P 4

0

5 p , q

5 A: 1. (19)

Equations

(18)

and

(19)

imply that

a

should be of order

N

with respect to each prime power factor p j r i of F , i.e.,

N is the least positive integer such that

a v = 1 (mod pi' i) i = 1,2,. ,Z .

(20)

If a is of order less than

N

with respect to any moduli

p iT < ,

hen 1 would be singular wit,h respect to that modu-

lus. Also, for 1V-l to exist, N should be relatively prime

t o F , i.e., p i should not be a factor of N . These two con-

siderations give Theorem

I,

which isproven in Appen-

dix B.

Theorem 1

An N X N transform T having the cyclic convolution

property in the ring of integers modulo an integer F exists

if

and only if

X

divides

0

( F )

g cd (p l 1,pZ ,

- a ,

Q L

1) .

IV. ECERRIAT

N U M B E R

TRANSFORMS

For number theoretic transforms to be more at,tractive

i n

comparison to the PE'T for implementing convolution,

they should bo computationally efficient'. There arc three

requirements that will be considered. First, I\; should be

highly composite (prefcrably a power of 2 ) for

a

fast

FE'T

type algorithm to exist. Second, since complex multiplica-

tions take most of the computational effort in calculating

thc

FF'T,

it is important' tha t themultiplication by powers

of

a

be a simple operation. This is possible if t he powers

of a have very few bit binary representations; preferably

also a power of two, where multiplication by a powcr of a

reduces t o a word shift. Third, in order to facilitate arith-

metic modulo P, F should also have a very few bit binary

representation.

Now we shall make a systematic investigation of good

choices for F , for which the maximum transform l.en

A T

is not too small. As noted above, we would like F

have a very few bit binary representation. The first p

sibility is

P

t

has a

prime factor

2

and therefore acco

ing to Theorem 1 of Section 111, the maximum possi

transform ength is 1. For 2k 1, let I be a composi

PQ,

where

1

is prime. Then

2 p

1

divides

2 Q

-

1

a

the maximum possible length of the t,ransform will

governed by he length possible for 2 p -- 1. Therefo

only the primes I need to be considered intcrcsting. Nu

bers

of this form are known

as

Merscnne nunlbers a

Rader [lo] has discussed convolution using Mersen

numbers in detail. For MNT's [lo], it can be hown t

transfornls of length at least 21 exist and the correspon

ing a =

-2.

Mersennc number transforms

are

not of

much nterest because 21' is not highly composito a

therefore, we do not have fast FYI'-type comput,at,ion

algorithms to compute the transforms. For

Zk -

I , sa

is odd. Thcn 3 divides

+

I

and hc largest possi

transform length is

2.

Thus

k

is

wen.

Let

IC

be ~ 2 ~h

s is an odd integer. Then

2 z t

+ I divides

2 8 z t

+

1

and t

length of the possible transform will be governed by t

length possible for

2zt

-

1.

Therefore, int'egers of the fo

2zL

+ 1

are of interest.Thesenumbersare known

Fermatnumbers and will be discussed in detail in t'

paper. E'ennat numbers seem to be optimum in the sen

of having transforms whose length is interesting while t

word size is moderate. Numbers of the form 2.2 +

1

a

also of limited int'erest and are discussed in Section I

A

systematic investigat'ion of t'hose F which require rno

than taw bit reprcsentation is difficult. Our prelimina

investigation in that direction does not seem t o be ve

encouraging.

In the light of the above discussion, it is proposed

use F t b $- 1 b

=

z t , being a positive integer, as go

choices for

I?.

Fr is known as the th Ferrnatnumber

Arithmetic modulo

17,

can be performed using 6-bit ari

metic, as will be discussed in Section VI. The nunher

bit s used for the signal and t'he filter coefficients deci

the value of b to be used so as not to cause error in t

output due tooverflow. The value of

b

required is sligh

more than double th e number of bits used for Dhc sign

representat,ion. One simplc bound on the output was p

sent'cd in t'he lastr section. Computationally, Fcrmat nun

bers s ewn to be the best choices for

F ,

therefore,

as

far

is practicable, they shouldbc used. If thovalue of

obtained from the overflow consideration is not

a

pow

os 2, it should be increased to he next power of 2

order that Fcrrnatnumbers anbe used. Some oth

possibilities are discussed in Section IX. Since thn arit

metic is done modulo a k'crmat number, we call

the

ass

ciated ransforms FNT's. Below we define

F h T s

an

their inverse transforms for sequences of length

lz =

2

X ( k )

=

z(n)anb(mod I ) k =

O , l , . . . , A r

-- 1 (2

A-1

L=O

N 1

z

(n )

=

( 2 -m )

X(k)a-"k(modFt)

k=O


5/11

AGARWAL ANDURRUS:ASTUTION AAPPLICATXONS TOIGITALILTERING

91

p ,

:: 6 :+ 1,

b

= 2t (22)

a is an nteger of order N , Le.,

N

is the east posit'ive

integer such that

ah' =

1

mod F , ) .

Note hat CY-^ = and 2-" = -2b-"(mod

F,).

As discussed in Sections I1 and 111, we canperform

cyclic convolut,ion of two integer sequences of length hr,

using the FNT,

if

certain condit ions,are satisfied. Now

we discuss the possible values of N for which an FNT

exists, given a choice of

F,.

s implied by Theorem

1

of

Section 111, for transforms (mod F , ) of length N to exist,

N should divideO(F,)

Fermat numbers up to

Fd

are all primes, therefore for

these cases O ( F , )

=

2b, and we canhavean

FNT

for

any length

N = 2",

m

5

b.

For

these Fermat primes the

integer

3

is an O( of order N =

2b,

giving the largest pos-

sible ransform ength. Of course thereare

2b-1

other

integers also which are

of

order

2b.

They can be obtained

by taking odd powers

of

3 .

If

an integer a is

of

order

N ,

then a P(mod

F,)

ill

be

of order

N / p ,

if

N / p

is an integer.

Therefore,usingFermatprimes, an integer a of order

2" m 5 b, is given by 32&"(modF,) The integer 2 is of

order 2b =

2t+1,

f

CY

is taken as 2 or a power of 2 , all the

powers of a would be some powers of

2,

and, for hese

cases, as discussed in t'he beginning of this section, he

FN T can be computed veryfficiently and is called the RT.

There are no other known Fernlat primes. For digital

fiheringapplications, F 5 ( b

=

3 2 ) and

F6 ( b =

64) seem

t o

be most practical. Lucas

[13]

has proven th at every

prime factor of composite F , is of the form K-2t+2+ 1.

Therefore, 21+2divides

O(F,)

for t

>

4.

n particular

it

can be verified th at for F5 nd F6, ( F t ) =

2t+2.

There-

fore, for these choices of Fermat numbers, the maximum

possible transform 1engt)h s

2t+2

= 4b.Also, we assert th at

a * given by (23) is of order 2, mod

F,,

2 2.

qj = 2 2 t - 2 ( 2 2 1 - 1 .-

1)

(23)

We denote thisaP s fi ecause

CY;

2 mod P,.

The proof that aP iven by (23) is of order 21+2 with re-

spect to any factor

of F ,

is given i n Appendix C. Any odd

power of

.\/z

will also be of order

2t+2.

By raising

q2

to

21+2-mthower, we obtain an integer

U

of order 2m,

m 5

t + 2.

Table I gives Val-ues of 1V for the two most important

values of

CY

and also gives the maximum possible N for

the most practical values of b. The Fermat number trans-

form has also been defined by Rader

[lo],

using

=

2

but his development does not indicate the possibility of

using Using the esults

of

Section

111,

we have ound

the maximumpossible length N for which an FNTexists,

for he most practical values of b, and have found he

corresponding nteger LY TJsing CY = fi, themaximum

possible engths for hese ransforms have increased by

a

factor of

2 ,

which makes them more useful. In th e next

section we discuss a two-dimensional convolution scheme

TABLE

I

PARAMETERS FOR SEVERAL POSSIBLE IMPLICATIONS FOR THE FNT'S

__._ ___ I__

3

4

8 2 + 1

16

2l'

-+

1 32

16 3256 3

64

655 6

3

5

32 232

4

64

12s 128

6 64 2e4

-+

1

128

256

256 fi

._

...__

. . . ~_ l l l _ _ _ I _ _ _ _

_ _

a This case corresponds to the

RT.

whereby very long one-dimensional sequences can also be

convolved using a two-dimensional FNT.

The basis functions for t,He FNT ar e integer exponen-

tials mod F, , in comparison t o the complex exponentials

for the DFT. These integer exponentials fold over after

they cross F J 2 . Because of the nature of modulo arith-

metic, he FNT coefficients do not seem to haveany

physical meaning. Although the signal for which the FN T

is being taken may

be

very small, ts FN T coefficients

may lie anywhere between 0 and Ft-1. As a matter

of

fact, during various stages of the computation each accu-

mulation of the signal "overflows" many times. But still

the end result of the convolution would be exact if the

input signals are properly bounded. FNT's f o l l o ~ ll the

properties mentioned in AppendixA.

Example

T o make the ideas of this section more clear, we now

present an example. This example will illustrate several

points, reat,ment of negativevalues n hedata, t'he

structure of the ransformand he inverse ransform

matrix, negative powers of

CY,

requent "overflow" during

computation,meaningless of the transformvalues,and

exactnegs

of

the final answer. This example w-ill not demon-

stra te the efhient implementation of the RT using the

binary arithmat ic. That will be illustrated in Section

V I

on implementation of the RT.

Consider two sequences 2 = (2,

,1,0)

and h

=

(1;2,0,0),whose convolution is desired. From he over-

flow consideration, it is sufficient 'if we work modulo

Fz

= 17. We want

N

=

4,

or

F2

the integer 2 is of order

8, therefore i2

4

is an a of order 4. The transformation

matrix T is given by

1 1 4 - 1

--4

1 4

16 13

1

16 1 16

1

I

(mod 17)

1 1 13 16 4 1


6/11

92

IEEE

TRANSACTIONS

O N

ACOUSTICS, SPEECH,

AND

SIGNAL

PROCESSING,

APRIL 1

Since 4-1 = -4(mod 17), the inverse transformation ma-

trix T-’

is

given by

1 4-1

4-2

4-3

T-1 = 4-1

1

4-2

4-4

4-6

1 4-3

4-6

4 9

1 1 1 1 -

= - 4 1 1 -41 4

1 -1 1 -1

1 -1 4

= ‘3 1; 1; l;

1;f3 mod17)*

The transforms of z nd h are given by

X = T x =

1 1 1 1

-1 16 1 1 [ 3

1

16 1 16

1 136 4

Note that in

x , -2

was represented by

+ 17

=

15.

Similarly,

H = (3,9,16,10) and Y =

X - H

= (3,90,80,90)

= (3,5,12,5)(mod 17).

Taking the inverse transform of

Y ,

y

= (2,2,14,2)(mod 17).

According to ourassumption, ntegersaresupposed to

lie between

-8

and

8.

Therefore,

14

must be represented

as 14 - 17 = -3. This gives y

=

(2,2, 3 , 2 ) , which s

the correct answer. Also, note that

y

is a symmetric se-

quence, therefore Y is also a symmetric sequence. Other

than this, the transform values have noignificance.

V. TWO-DIMENSIONALCONVOLUTION FOR

CONVOLVINGLONGSEQUENCES

Arithmeticmodulo F , canbe implementedusing

b

= 2t bit representation of integers, as will be discussed

later in Section VI, with some provision for representing

2b. Themaximum ength of sequenceswhich an be

cyclically convolved using the FN T is 46, using

= f i ,

TABLE

I1

MAXIMUM

ONE-DIMENSIONALYCLICONVOLUTIONENGT

USING WO-DIMEKSIONALNT

OR

RT

___

Word length

b

N for

CY

=

2 -N

for

CY = %

16

512

32

2048

2048192

64

81922768

as discussed above.Therefore, the ength of sequen

which can be convolved is proportional to theword len

in bits. Thus, or long sequences, word length requirem

may be excessive. Rader [lo] suggested using a w

dimensionalconvolution cheme to convolve ong o

dimensionalsequences. Agarwal and

Burrus [ll],

[

presented ucha two-dimensional convolution chem

Using this scheme, cyclic convolution of length AT =

is mplemented as

a

two-dimensional cyclic convolu

of length (2L 1) by M (or 21, by M ) This t

dimensional cyclic convolution can be implemented us

a

two-dimensional PNT [ll], [12] defined similar to

one-dimensional transform. Using this two-dimensio

scheme, the word length required is proportional to

square root of the lengthof the sequences to be convol

which would give for a maximum sequence ength,

rather than 4b. If M is taken as the maximum poss

length 4b, and 2L 1 is a small integer, then either di

convolution or another high-speed algorithm could

employed [la], 15] to computeconvolutionalong

short dimension and the one-dimensional FNT could

used along the long dimension. Computationally this co

binationcanbevery efficient as will be shown in

implementation in Section VIII. Table

I1

lists the m

imum lengths for two-dimensional transforms.

VI. A SUGGESTED HARDWARE

IMPLEMENTATION

Since the structure of the PN T s similar to th at of

DFT, and

N

is

a

power of

2,

a fast implementation of

FNT’s similar

t o

the FFT with radix 2 exists. As a ma

of fact, all we have t o do is replace

W = e--jZ*IN

by

any FFT algorithm [l].

In computing theFNT,arithmetic sdone mod

2b+ 1.

I n this arithmetic the only allowed integers

O,l,

-

and llntegers whose absolutevaluesdonot

exceed 2b-1can be represented unambiguously. Negativ

integers are represented by adding

2b

+

1 to

them; t

is similar o twos complement and ones complement re

sentation of negative integers. Using a b-bit register,

int,egers from0 to

2b -

1 can be represented. The probl

remains to represent 2 b . If the dat,a are uncorrelated,

probability tha t thi s number will appear after an ari

metical operation is approximately

2-b.

For digit’al fil

ing applications, b would typically be 32

or 64;

in the

cases the probability

of

occurrence of 2b is extrem

small. If occasional error in convolution is permissi

for these cases, probabily there is no need of any ex


7/11

AGARWAL AND BURRUS: FAST CONVOLUTION APPLICATIONS TO DIGITAL FILTERING 93

hardware to represent

2b. If

the need exists, an extra bit

could

be

used to represent

2b

at th e expense of a

more

complicated hardware. The following discussion is based

on

the b-bit representation of the integers.

Now, we discuss how various basic arithmetical opera-

tions can be performed modulo , .

A .

Negation

Let

h- 1

A =

ai2i,ai

= 0 or 1.

i O

Then

b-1

- A = - C a22

d = O

h-1

= di2i (26 1)

i=O

=

di2i

(26 1 ) + 2b+

1

mod F ,

h- 1

i =O

= di2; + 2 mod F,.

a- 1

i-0

Thus to negate a number, we have to complement each

bit and add 2 to the result. For example,

( 1011)

(mod 17):

4

= 0100; 4 =

[+11:)]

= 1101 = 13;

13

+ 4 =

17.

13. Addi t i on

When we add wo b-bit integers, we obtaina b-bit

integer and possibly a carry bit. The carry bit represents

2b = -

1

mod F,.

To

implement arithmeticmodulo 2b

+

1,

we

simply ubtract hecarrybit.Thus hehardwarc

should be of the carry subtract type. For example,

1010

(mod 17) : 10

+ 9

= 17

= 2(mod

17) +lo01

10011

1

0010 = 2

C . Subtraction

Subtraction is implemented asn addition byirst negat-

ing the subtrahend and then adding them. Addition must

be done according toB.

L . General Multiplication

When we multiply two 6-bit integers, we get a 2b-bit

product; Let

Ct

be theb-bit low-order part of it and C Hbe

the b-bit high-order part of it., then

A X B

=

C L

+

CW2b= C L C;i(mod F , ) .

Thus, all we have odo is subtract he higherorder

register rom the

lower

order egister. Thesubtraction

needs to be done according to C. For example,

0111 0101

(mod 17):13 X 9 = 117 = 15(mod17); 117

=

L rr

L ' L

CL =

0101

( -cl{) = 1010

1111 = 15.

E. Mult ip l icat ion

by

a

Power

of

2

If CY is taken as

2 or

a power of 2,

RT's

are obtained and

the only multiplications involved in computing them are

those by some powers of

2.

These multiplications are par-

ticularlysimple to mplement n arithmetic modulo

F,.

Suppose we need to multiply the content's of a register

by 2k, 0

<

k < 6, all we need to do is left-shift the con-

tents of the register by lc bits and subtract the c overflow

bits. A convenient way to do this is to append the data

register on the left with a register of zeros, left-shift the

double register by I positions, dropping the leading zeros,

and then subtract the igher order register from the lower

order register, as in

D. If

I is outside the range 0 < k <

b ,

we make use of the fact tha t

2b = 1

mod F,. Computa-

tion of the inverse transform required multiplications by

negative powers of 2 which can be converted to positive

powers by thefollowing relationship,

2-k

= b--lc mod F,.

An

alternate method to multiply

y

2-k

is

to

load

the data

in the higherorder egister of adouble egister, filling

the lower order register with zeros, t,hen right-shift the

double register by

Ic

positions and subtract the low-order

register from he high-order register mod P,. For fast

implementation of the bit shift operation, shift amount

k

should be expressed in a binary form and a t a clock pulse

shifting should be done by a power of

2.

For example,

(mod 17)

:

11 X Z3 = 88

=

3 mod 17

11 = 0000 1011.

0101 1000

Shift left

3

positions

__

c r r

C L

CI,

= 1000

( - C J r ) = +1100

1

0100

- L l

0011 = 3

11 X

2-3

= 11

X (-24-3) = -22 = 12 mod 17

11 = 1011 0000.


8/11

94 IEEE TRANSAI

ZTlONS ON ACOUSTICS, SPEECH, AND S I G N A L PHOCESSINU,APRIL

19

0001 0110

Shift right

3

positions

'H CL

1100 = 12.

For

implementation

of

the fast RT, nlike the

FFT,

we

do notneed to store the powers of

a. For

serial arithmetic,

we could have

a

register which stores the shift amount

k ,

and as we go along th e fast.

R T

flow chart, we continually

update the shift amount.All this is particularly simple to

do in binary arithmetic. If OL is taken as

f i ,

only the odd

powers of

a

require

a

two-bit representation. In th e fast

FNT algorit>hnz,onlyonestage equiresmultiplications

by odd powers of a . Multiplication by an odd power of

fi

would probably be best done by first multiplying by

the just lower even power of f i , which is

a

power of 2,

then multiplying by

f i ,

which is a two-bit number. The

resulting increase in computation effort is very small.

VII.

CObilPARISON W I T H

THE

FFT

As noted in the previous section, computing the K T is

a very simple operation on a binary machine.

Now

lei us

compare 'hecomplexity of variousbasicoperations in-

volved incomputing the RT vis-a-vis the FFT.

If

the

two sequences z ( n ) and

h ( n )

have

61

and

6 2

bit repre-

sentations, espectively, andare of length

N ,

then he

output y ( n )would need no more han

a

(01

+

02 + ogz A )

bit representation. T o obtain the correct result

b

2

bl +

0 2 +

og,

X.

In Section

I11

we have given a.better bound

on the output. Roughly peaking,

w e

need twice the num-

ber of bits to carry out the convolution using th e R T a s

compared to t'he fixcd-point

FFT

implementation of the

convolution. But in the DFT, every data point is treated

as a complex number and therefore requires two words,

one for the real part andone for he imaginary part. Thus,

in effect, the hardware requirement for two ransforms

is about the same. Although for real dat>a

it

is possible

to make use of the symmetry properties of the D I T S

theyrequireextracomputationandfor shepurposc of

comparison it will be ignored, even t'hough we have taken

this nto account forour I B M 370/155 implementation

to be discussed in he next scct'ion. Therefore

we

shall

assume, in the FE'T implementat ion, each data point is

represented by a

b / 2

bit real part and

a

0/2

bit imaginary

part.

One b / 2 bit complex addition is equivalent to two 6 / 2

bit real additions, which are comparable to a 0-bit addi-

tion modulo

F, .

Thus, the complexit'y of addi tionlsubtrac-

tion

is

the same in both the transforms. Similarly,

it

can

be shown that a b / 2 bit complex multiplication is com-

parable to a &bit multiplication modulo

F, .

Computation

of the RT requires multiplications by powers of 2, which

implemented as bit shifts and subtractions become much

simpler operations compared to complex multiplications

required in the FFT implcn~entat~ion.

To compute lengthN fast KT , 1% log, N additions/sub

tract'ions,and ( N / 2 ) og, N / 2 multiplications by som

powers of 2 are requiredwhich are implernentcd as bi

and subtractions. To compute the convolut.ion using th

FFT, -most of the time is talcen in computing the comple

multiplicat,ions equired

t o

compute ,he ransfornls.

comparison with

thc

RT

reveals that these complcx

nu

tiplications are replaced

by

bit

shifts

and

subtractio

which are much faster operatio~~s. Thi s results n co

siderablecomputational avings n the implementatio

of convolution. This fact has been verified on a gcnera

purposemachine

(IBll.2 370/155) The

cornputation r

quired to multiply the two transforms is about, the sam

for both the implementations.

To

convolve long sequenc

using the two-dimensional P N T or BT, the compuhtion

effort ncreases

by, at

t'hc nlost,

a

factor of 2. Still, th

RT

implementation of convolution

is

much faster as

c o m

pared to theFE'T implementation.

RT's have some additional advantages ovcr the

k k T

First, he

FYP

implemc ntation requiresstoringall ,he

powers of

W

requiringa significant, amount' of storag

which maybean mportant actor for a small min

computer or

a

special purpose hardware implcrnentatio

Second, fixed-point, FF'Y implementation introduces a si

nificant amount of the roundoff noise at the output,

i-

bits depending on the data

[a] This

degrades tha signa

to-noise rat'ioduring the filteringoperations. The PN

or RT implementation serrorfree, the onlysoum: o

error is input A / D quanti~at~ion.

VIII.

I~ IPS ,EMl i :NTATrC)NO N THE IB l l 370/15

The word length of t,hc

IBM 360 -370

series is

32

bi

and, therefore, is well suited for t,he rnplementation

convolution using the

P NT

or

1i.T

nloclulo

P 5 = 2 3 2

+

I t has two's compl(:mcnt represcntation of rlegat.ive int

gers, i.c., -- x is epresented as x.

W o

want. rwgativ

integers

to

be reprcscntedas the conlplernent modu

232+ 1,

i.e.,

- -x is t o

be rcprcscnted by

F 2 -

1

-- x,

an

to accomplish this

1

is

a.dded

whenever a nega.tivo int>e

is encountered in the data. As noted before,

on

this m

chine,

we

cannot represorrt

--- l .

If a

-- l

is cmmxntcr

in the data,

it

is roundcd t:ithw

to 0

or to

-2 .

This i

equivalent to introducing some initial quantizat,ion nois

If

the data are uncorrclatt:d, t'hc probability th at w

appear after an arithmetical opcration during cornp1~

tion of the transform

is

roughly

2-32

=

pcropcmtion

This crror int,roduced during computation

s

ratllcr

wriou

andprobably ho corrcsponding output b1oc.k

will

b

meaningless.Assuming

N = 64, thc

probability that

particular block is erroneous is roughly - k'

most filtering operat'ions, his may

be permissible.

Logical add ( A lA ) andsubtract nstructions

SLI

are used t o add and subt'ract

w o

intcgcrs modulo

Fj.

If

carry is detected aftor add,

1

s subtracted frorn the resul

and if 110 carry is dctectlctl aflcr subtract,

I

is added t

the result. Multiplication by 'Lk is done using tho logic

left shift operation

(SLl)I.,)

and m~dltiplicat,ion

y

2 ';


9/11

AGARWAL AND BURRUS: FAST CONVOLUTION APPLICATIONS TO DIGITALILTERING 95

TABLE 111

CYCLICCONVOLUTIONIMINGSOR LENGTHN

REAL

SEQUENCES

N

ms

ms

FFT FNT

or IZT

32 16

64

3 . 3

128

31

7 . 4

60

256

16.68

123

256

40

Ob

123

512

8o.G

243

1024

168.00

530

2048

340.Oc

1260 720.00

a

Using = d.

b

Using 2

by

128 convolution.

Using two-dimensional RT.

doneusing logical rightshiftoperation (SRDL)

, 0

<

k < 32, as explained in Section VI. Similarly, multiplica-

tion of two integers modulo F 5 can also be done.

Two assembler ,subprograms were written to compute

the fast R T and the inverse for any sequence length from

to 26, taking

a

as

a

power of

2 .

Two more subprograms

were written to compute the fast FNT and it s nverse for

length 128 sequences, using a: =

f i ,

given by (23). Out

of the 7 stages of the fast algorithm, 6nly one stage re-

quiresmultiplication byoda powers of ~, which are

implenxnted as suggest'ed in Section VI.

To cyclically convolve sequences longer t han 128, tyo-

dimensionalconvolution schemeswereused [l2].One

suchscheme was 2 by 128 convolution or length 256

sequences discussed in Section X of [li]. Another pro-

gram was wri tten o convolve ong one-dimension61

se-

quences using two-dimensional RT's, as discussed in Sec-

tion

I11

of

[12].

Timings for various implementations of

convolut'ion and heir comparisonwith the FFT imple-

mentationsarc shown inTable 111. To compute hese

timings it is assumed tha t transforms of the

h

sequences

have been precalculated. Thus, timings arc for computing

the transforms, multiplications of the transforms, and

their inverse transforms.

For

the FE'T implementation, a

very efficient algorithm [17] was used, which employed

both radix 4 and radix 2 steps and took into account the

fact, that the dat'a are real.

For

sequences up to length

256, improvements of t,he order of 3 to 5 were obtained

over the FFT implementations.Since thehardware of

IBM 360/370 series isnot designed todoarithmetic

modulo Permat numbers, extra branch instructions

were

necessaryaft,er add or subtract opcrat'ions to ake nto

account the carry bit. If the hardware is designed to do

arithmetic modulo Fermatnumbers, hese nstructions

could be avoided, resulting n a significant reduction in

the computationime. Copies of t'hessemblerub-

programs arc available from the authors .

IX. GENERALIZATIONSAND OTHER

APPLICATIONS

In this paperwe have discussed convolution in the ings

of integersmodulo of Fermat numbers. These numbers

seem to be the best choice for implementation on digit,al

computers. Nevertheless, any integer F , for which

0

( F ) >

1

can be used, was discussed in Section IV. Rader

[ lo]

proposed the use of Mersenne numbers

, M ,

(2"

-

1,

p a

prime). MNT's have Q

=

and N =

2 p ,

and therefore

do not have an FFT-type fast computational algorithm.

In many situations the quantization and overflow con-

siderations may require arithmetic to be done using 10

to 20-bit arithmetic.For hissituation,working n the

integer ring modulo F 4

2 1 6

+

1)

would be insufficient and

at the same tirne, computing the

FiYT

modulo F5(232+

1 )

would require excessive number of bits. In this situation;

we may perform convoiution modulo zz4

+

1. Also, many

computershave 24-bit word ength.Transformssimilar

to FNT's exist for thissituation.We could take F =

224f 1 and hen

=

8 gives N

=

16, and

(Y

= S1/Z

=

27(212

1)

gives N

=

32 as the maximum length of the

one-dimensional transform.

In generalone could take F = 2 sZ t +

1, s

is an odd

integer. In th at case,

a

=

2"

would give

N

=

2t+', and

2t+2.

n many situations it may be possible to have trans-

forms

o f

greater ength han 2t+2 . For example, taking

F

=

240+ 1, t'he maxinium possible transform length is

256,' and, aking F = 280+ 1; the maximum possible

transfotm length is 1024, but the corresponding

a's

may

not besimple.

As discussed earlier, the FNT implementation requires

roughly twice the number of bits for the arithmetic. Most

minicomputers have short word lengths and in that situ-

ation it may not be possible to efficiently use F'KT's, but

there is a solution to this problem. We can split the dat'a

bits int'o two halves and then convolve them.

= p)1/2

= 2 [ ( s - l ) / ~ + s 2 t - ~ 1 ( 2 ~ 2 t - - l 1) would give iv=

4 )

=

zz(n)

+

z1(n)2k, I x z ( n ) I <

h ( n ) =

h2(n) +

h n ) 2 k ,

I

hz(n)

<

2k ( 2 4 )

y = 2*h

=L

Zl*h1.22k + (21*h2+ zz*h1)2k

+

22 h2.

( 2 5 )

NOW,

ince

21,

h ~ ,2 ,and hz have roughly half the number

of bits,

it

should be possible to convolve them using ap-

proximately half the number of bit's. If necessary, a more

precise analysis of the abovesituation could be easily

performed. I n (25), the last erm, n comparison to the

first term, is very small and can be neglected. We need to

take two transforms for z and two transforms for h, the

pummation hownwithin the parent,hescscan be per-

formed in the transform domain. Finally,we need to take

two inverse ransforms, one for zl*hl and t,he other for

(z1*hz+ 52*hl .

FNT's can also be used forconvolutions equired in

block recursive realizations of recursive digital filters[lfi].

For these realizations, convolutions form the main com-

putational 'ask. Sinceconvolution with heIWT'scan

be performed with both efficiency and with perfect accu-

racy, it should be very attract,ivefor this applicat'ion. For

very high-order recursive filters this may reduce roundoff

noise problems associated with these filters. Other appli-


10/11

96 IEEE

TRANSACTIONS

ON ACOUSTICS, SPEECH, ANI)

SIGNAL PROCESSING,

APRIL

1

APPENDIX A

1’ROPERTIES

O F

THE

TRANSFORMSAVING

G.

Th’eore’n

THE CYCLICONVOLUTIOSROPERTY If

Belome summarize the definition and some important

T {

(n) = F ( W

propertics of the transforms having the cyclic convolution

property. It is assumed that allarithmeticaloperations

are performed in a ring. Most of these properties are simi-

H .

Fast ~ ~ ~ ~ ~ ~ ~ t ~ t i ~ ~ ~ l , 4 l

lar to the

DFT

properties [l].

A , Definition

Transform :

Ti f ( n +

m

= F ( K ) o c - m K . (A

If N can be factored as

N

= r 1 r 2 r m

then a fast computational algorithm similar to the

F

can be used to compute the transform which requires

the order of N ( r l

+ r2

+ -

- + r,)

arithmetical ope

tions.

N -

1

F ( k ) = f (n )a”k

k

= O , l , - . . , N 1.

7L=O

CYN

= 1

Inverse Transform:

N- 1 I . Convolut.ion Property

f ( n )

=

N-’ F ( k ) -nk n =

0,1, -,lv

- 1. (A2)

Transform of cyclic convolut,ion of two sequences

k 4 the product

of

their t’ransforms,.e., if

C . 0rth.ogonality

(A

The inner product of two basis functions are given by ? J . Parseval’s Theorem

[see (12 j and13)]

X ( k )

= T ( z ( n ) ]

H ( k ) = T { h ( n ) j .

N if

n a

= n mod N

0 otherwise

This implies t,he orthogonality of the basis functions.

B . Periodicity

(A41

Then

N- 1 N-

N s ( n ) h ( n )=

X ( k ) N ( - - 1 2 )

(A

n=O k=O

and

F ( k ) can also be periodically extended, similar to $he

x

1

N- 1

N

s ( n ) h ( - n ) = X ( k ) H ( k ) . ( h

extension of f ( n )

a=O

k=O

f ( n + N ) = f ( n ) For,heart’icular case of s (n ) = h n )

F ( K

+ N ) =

F ( K ) .

(AS)

A‘- 1 N - 1

N

2 2 ( n ) =

X k ) X

16

( M

E. Symmetry Proper ty

7Z=O k=O

If

the signal is symmetric, i.e.,

and

N-

1

N- 1

f n) f ( - n ) =

f ( N

-

n) J V

z n ) z

n)

=

X ( k ) 2 .

(-4

the transformed sequence is also symmetric, Le.,

n=o

k=O

Note that in his case, the conventional Parswal’s theor

F ( K ) = F ( - K )

=

F ( N

- K ) .

(AB)

N-1

N- 1

If the signal is antisymmetric, i.e.,

f (n ) = -f(-n.)

=

-f(N

n ) ,

does not hold,ecause in aingmagnitude is undefined


11/11

K . Stretch

Property

Thestretchpropertyexists if the ransformfor he

longer length sequence exists in that ing.

L.

Sampl ing Proper ty

The sampling propertyexists.

APPENDIX B

PROOF O F THEOREM

I

A8 shown before, the existence of such a transform de-

pends on the existence of a n a of order N with respect to

each prime power factor moduli, pari. According to Euler’s

Theorem

[14],

N , order of

a,

must divide

cp

( p i 7 ; ) ,where

(o(piT i ) s the Euler’s (o function given by

c p ( p p )

=

p p - y p i 1).

(B1)

Thus, N j ‘p (p i + i ) ) , nd since p i isnota actor of I\-

N

I ( p i , i = 1,2,.-.,1or

N

I

gcd(p1 ,p2

, .* . ,pz

1 )

or

N I

O ( F ) . (B2)

This proves the ecessary part of the theorem. To prove

sufficiency we have to show the existence of an

a

of order

N if

hr

[

0

( F ) Note that (B2) rules out F being an even

integerbecause n th at case 0

8’)

= 1. If N I 0 ( F )

N I (pi

-

1) or

N I

( o ( p i ) , herefore, one can find integers

pi of order N modulo piri [14], i = 1,2, *.,1. Then, by

the Chinese Remainder Theorem, one canfind an

a

of

order N modulo F = p p . - p f t , such tha t

a

=

pi(mod p i r i ) i =

1,2,. . Z. (B3)

This is our desired a of order

N .

Since

N

and p i are rela-

tively prime, N-I exists in this ring. This completes the

proof of the theorem.

APPENDIX C

T o prove that a p of ( 2 3 ) is of order 2t+2mod F t , we

have to prove that it is of order 2t+2with respect to each

prime power factor

of F , 20).

Let piri be a prime power

factor of F t , then

ap2t+2 =

22t+l =

1

mod

F t

=

1mod

p p .

AGARWAL

AND

BURRUS: FAST CONVOLUTION APPLICATIONS TO

DIGITAL

FILTERING

97

Let Ni e the order of apwith respect to p p , then by

EuIer’s theorem,

N i

I 2‘” (C2>

therefore N ; must be

a

power of 2.

Also,

aP

=

22t = -1mod

F ,

t+ l

=

-1

mod

p p .

((333

From (C l) and (C3)

, it

is obvious th at

completing the proof.

Ni= 2t+?

ACKNOWLEDGMENT

Theauthors wish to hank C. M. Rader of Lincoln

Laboratqries,Cambridge,Mass., or

his

comments and

discussions.

REFERENCES

B. Gold and C.

M.

Rader, Digital Processing of Signals. New

York: McGraw-Hill,

1969.

A.

V.

Oppenheim and C. Weinstein, “Effects of finite register

length in digital filtering and the fast Fourier transform,’, Proc.

D.

E.

Knuth, The Art

of

Computer Programming, vol, 2, Semi-

numerical Algorithms. Reading,

Mass.:

Addison-Wesley, 1969,

p.

555

and pp. 259-262.

J.

M.

Pollard, “The fast Fourier transform in

a

finite field,”

Math. Compul., vol.

25,

pp.

365-374,

Apr.

1971:

I.

J.

Good, “The relation between two fast Fourier transforms,”

IEE E Trans. Comput., vol.C-20, pp. 310-317, Mar. 1971.

A. Schonhage and V. Strassen, “Fast multiplication of large

numbers,” (in German) Comput., vol. 7, pp. 281-292, 1971.

D.

E. Knuth, ‘lThe art

of

computing programming-errata et

addenda,” Comput. Sei. Dep., Stanford Vniversity, Stanford,

Calif., Rep. STAN-CS-71-194, pp. 21-26, Jan. 1971.

forms,”

J . Comput.

Syst. Sci., vol. 5 ,

pp. 524-547, 1971.

P. J.

Nicholson, "Algebraic

theory

of

finite Fourier trans-

C. M. Rader,’ “The number theoretic DF T and exact discrete

convolution,” IEE E Arden House Workshop on Digital Signal

Processing, ‘Harriman, N. Y.,. Jan. 11, 1972.

R. C. Agarwal and

C.

S. Burrus, “Fast digital convolution

Trans. Comput., vol. C-21, pp. 1269-1273, Dee. 1972.

using Fermat transforms,” in outhwest IE EE Conf. Rec.,

Houston, Tex., Apr.

1973,

pp.

538-543.

dimensional techniques,” IEEE Trans. Awust., Speech,and

-, ‘‘Fakt one-dimensional digital convolution by multi-

Signal processing, vol.

ASSP-22,

pp.

1-10,

Feb.

1974.

L. E. Dickson, History of the Theory of Numbers, vol. I.

Washington, D. C.: Carnegie Institute,

1919,

p.

376.

Hill, 1948.

0. Ore, Number Theory and

I t s

History. New York: McGraw-

Proc. Conf. Applications of Walsh Functions; Washington,

D.

A. Pitassi, “Fast convolution usingWalsh functions,” in

D.

C., April 1971, pp. 130-133.

C. S Burrus, “Block realization of digital filters,” IE EE Trans.

Audio Electroacoust., vol. AU-20, pp. 230-235, Oct. 1972.

R.

C. Singleton, “An algorithm for computing the mixed

radix

f a s t Fourier transform,” IEEE Trans. Audio Eledroawust.,

vol. AU-17. DD.9.3-103. June 1969.

IEEE, V O ~ .

60,

pp.

957-976,

Aug.

1972.

“Discrete convolution via Mersenne transforms,” IEEE

1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering

Documents

Transcript of 1974 Fast Convolution Using Fermat Number Transforms With Applications to Digital Filtering