Fast Fourier Transform - MIT OpenCourseWare

30
6.973 Communication System Design – Spring 2006 Massachusetts Institute of Technology Vladimir Stojanović Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Fast Fourier Transform: Practical aspects and Basic Architectures Lecture 9

Transcript of Fast Fourier Transform - MIT OpenCourseWare

Page 1: Fast Fourier Transform - MIT OpenCourseWare

6.973 Communication System Design – Spring 2006Massachusetts Institute of Technology

Vladimir Stojanović

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

Fast Fourier Transform:Practical aspects and Basic Architectures

Lecture 9

Page 2: Fast Fourier Transform - MIT OpenCourseWare

Multiplication complexity per output point

CTFFT and SRFFT

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 2

CTFFT and SRFFT

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

3 4 5 6 7 8 9 10n = log2 N

M/N

- radix 2- radix 4- split-radix

- lower boundMr

Mc

Figure by MIT OpenCoursWare.

Page 3: Fast Fourier Transform - MIT OpenCourseWare

3

Multiplies and adds

Real multiplies

Real adds

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design

N Radix 2

Radix 2

Radix 4

Radix 4

SRFFT

SRFFT

PFA

PFA

Winograd

Winograd

16

32

64

128

256

512

10242048

16

32

64

128

256

512

10242048

30

60

120

240

504

1008

2520

30

60

120

240

504

1008

2520

24

88

264

712

1800

4360

1024823560

20

208

1392

7856

20

6868

196

516

1284

3076

717216388

136

276

632

1572

3548

9492

N

152

408

1032

2504

5896

13566

3072868616

148

976

5488

28336

148

388

964

2308

5380

12292

2765261444

384

888

2076

4812

13388

29548

84076

384

888

2076

5016

14540

34668

99628

100

200

460

1100

2524

5804

17660

Figure by MIT OpenCourseWare.

Page 4: Fast Fourier Transform - MIT OpenCourseWare

Structural considerationsHow to compare different FFT algorithms?Many metrics to choose from

The ease of obtaining the inverse FFTIn-place computationRegularity

ComputationInterconnect

Parallelism and pipeliningQuantization noise

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 4

Page 5: Fast Fourier Transform - MIT OpenCourseWare

Inverse FFTFFTs often used for computing FIR filtering

Fast convolution (FFT + pointwise multiply + IFFT)In some applications (like 802.11a)

Can reuse FFT block to do the IFFT (half-duplex scheme)Simple trick [Duhamel88]

Swap the real and imaginary inputs and outputsIf FFT(xR,xI,N) computes the FFT of sequence xR(k)+jxI(k)Then FFT(xI,xR,N) computes the IFFT of jxR(k)+xI(k)

Exchange the real and imag part

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 5

Page 6: Fast Fourier Transform - MIT OpenCourseWare

In-place computationMost algorithms allow in-place computation

Cooley-Tukey, SRFFT, PFANo auxilary storage of size dependent on N is neededWFTA (Winograd Fourier Transform Algorithm) does not allow in-place computation

A drawback for large sequences

Cooley-Tukey and SRFFT are most compatible with longer size FFTs

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 6

Page 7: Fast Fourier Transform - MIT OpenCourseWare

Regularity, parallelismRegularity

Cooley-Tukey FFT very regularRepeat butterflies of same type

Sums and twiddle multipliesSRFFT slightly more involved

Different butterfly types in parallele.g. radix-2 and radix-4 used in parallel on even/odd samples

PFA even more involvedRepetitive use of more complicated modules (like cyclic convolution, for prime length DFTs)

WFTA most involvedRepetition of parts of the cyclic conv. modules from PFA

ParallelizationFairly easy for C-TFFT and SRFFT

Small modules applied on sets of data that are separable and contiguousMore difficult for PFA

Data required for each module not in contiguous locations

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 7

Page 8: Fast Fourier Transform - MIT OpenCourseWare

6.973 Communication System Design 8

Quantization noiseRoundoff noise generated by finite precision of operations inside FFT (adds, multiplies)CTFFT (lengths 2n)

Four error sources per butterfly (variance 2-2B/12)Total variance per butterfly 2-2B/3

Each output node receives signals from a total of N-1 butterflies in the flow graph (N/2 from the first stage, N/4 in the second, …)

Total variance for each output ~ N/3*2-2B

Assuming input power 1/3N2 (|x(n)|<1/N to avoid overflow)Output power is 1/3N

Error-to-signal ratio is then N22-2B (needs 1 additional bit per stage to maintain SER)Since a maximum magnitude increases by less than 2x from stage to stage we can prevent the overflow by requiring that |x(n)|<1 and scaling by ½ from stage-to-stage

The output will be 1/N of the previous case, but the input magnitude can be Nx larger, improving the SERError-to-signal ratio is then 4N*2-2B (needs 1/2 additional bit per stage to maintain SER)

Radix-4 and SRFFT generate less roundoff noise than radix-2WFTA

Fewer multiplications (hence fewer noise sources)More difficult to include proper rescaling in the algorithm

Error-to-signal ratio is higher than in CTFFT or SRFFTTwo more bits are necessary to represent data in WFTA for same error order as CTFFT

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

Page 9: Fast Fourier Transform - MIT OpenCourseWare

Particular casesDFT algorithms for real data sequence xk

Xk has Hermitian symmetry (XN-k=Xk*)

X0 is real, and when N even, XN/2 real as wellN input values map to

2 real and N/2-1 complex conjugate values when N even1 real and (N-1)/2 complex conjugate values when N odd

Can exploit the redundancyReduce complexity and storage by a factor of 2

If take the real DFT of xR and xI separately2N additions are sufficient to obtain complex DFTGoal to obtain real DFT with half multiplies and half adds

Example DIF SRFFTX2k requires half-length DFT on real dataThen b/c of Hermitian symmetry X4k+1=X*

4(N/4-k-1)+3

Only need to compute one DFT of size N/4 (not two)

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 9

Page 10: Fast Fourier Transform - MIT OpenCourseWare

6.973 Communication System Design 10

DFT pruningIn practice, may only need to compute a few tones

Or only a few inputs are different from zeroTypical cases: spectral analysis, interpolation, fast convComputing a full FFT can be wasteful

Goertzel algorithmCan be obtained by simply pruning the FFT flow graphAlternately, looks just like a recursive 1-tap filter for each tone

x(n)

WN-k

X(k)

z-1

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

Page 11: Fast Fourier Transform - MIT OpenCourseWare

Related transformsMostly focused on efficient matrix-vector product involving Fourier matrix

No assumption made on the input/output vectorSome assumptions on these leads to related transformsDiscrete Hartley Transform (DHT)Discrete Cosine (and Sine) Transform (DCT, DST)

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 11

Page 12: Fast Fourier Transform - MIT OpenCourseWare

6.973 Communication System Design 12

Related transforms: DHT

Proposed as an alternative to DFTInitial (false) claims of improved arithmetic complexity

Real-valued FFT complexity is equivalent

Self-inverseProvided that X0 further weighted by 1/sqrt(2)Inverse real DFT on Hermitian data

Same complexity as the real DFT so no significant gain from self-inverse property of DHT

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

Page 13: Fast Fourier Transform - MIT OpenCourseWare

Related transforms: DCT

Lots of applications in image and video processingScale factor of 1/sqrt(2) for X0 left out

Formula above appears as a sub-problem in length-4N real DFTMultiplicative complexity can be related to real DFT

Practical algorithms depend on the transform lengthN odd: Permutations and sign changes map to real DFTN even: Map into same length real DFT + N/2 rotations

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 13

Page 14: Fast Fourier Transform - MIT OpenCourseWare

Relationship with FFTDHT, DCT, DST and related transforms can all be mapped to DFT

All transforms use split-radix algorithms Figure by MIT OpenCourseWare.

For minimum number of operationsCite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.

MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

6.973 Communication System Design 14

RSDFT

RDFTCDFT

ODFT DHT

PT

A

DCT2

43

5

6

b b

bb

ba a

a a

a1

(2D)

1

2

3

4

5

6

a

b

a

b

a

b

a

b

a

b

Complex DFT 2n

Complex DFT 2nx2n

Odd DFT 2n-1

1 odd DFT 2n-1+ 1 complex DFT 2n-1

1 real DFT 2n

Complex DFT 2n

Real DFT 2n

Real DFT 2n

Real DFT 2n

1 real DFT 2n-1 + 1 complex DFT 2n-2

1 real DFT 2n-1 + 2 DCTs 2n-2

Real DFT 2n

Real symmDFT 2n

DCT 2n

DHT 2n

+2n+1 - 4 additions

+ (3.2n-2 - 4) multiplications + (2n +3.2n-2 -n) additions

+ (3.2n-1 -2) multiplications + (3.2n-1 -3) additions

+ 3.2n-1 -2 additions

+2n+1 additions2 complex DFT's 2n-2

+ 2(3.2n-2-4) multiplications + (2n+3.2n-1 -8) additions

1 DHT 2n

-2 additions1 real DFT 2n

+2 additions

3.2n-1 odd DFT 2n-1 + 1 complex DFT 2n-1x2n-1

+ n.2n additions

1 real symmetric DFT 2n + 1 real antisymmetric DFT 2n

+ (6n+10), 4n-1 additions1 real symmetric DFT 2n-1 + 1 inverse real DFT

+ 3(2n-3-1)+1 multiplications + (3n-4).2n-3+1 additions

2 real DFT's 2n

Page 15: Fast Fourier Transform - MIT OpenCourseWare

Implementation issues

General purpose computersDigital signal processorsVector/multi processorsVLSI ASICs

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 15

Page 16: Fast Fourier Transform - MIT OpenCourseWare

6.973 Communication System Design

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

16

Implementation on general purpose computersFFT algorithms built by repetitive use of basic building blocks

CTFFT and SRFFT butterflies are small – easily optimizablePFA/WFTA blocks are larger

More time is spent on load/store operationsThan in actual arithmetic (cache miss and memory access latency problem)Locality is of utmost importanceThis is the reason why PFA and WFTA do not meet the performance expected from their computation complexity!PFA drawback partially compensated since only a few coefficients have to be stored

Compilers can optimize the FFT code by loop-unrolling (lots of parallelism) and tailoring to cache size (aspect ratio)

Page 17: Fast Fourier Transform - MIT OpenCourseWare

Digital Signal Processors

Built for multiply/accumulate based algorithmsNot matched by any of the FFT algorithms

Sums of products changed to fewer but less regular computations

Today’s DSPs take into account some FFT requirements

Modulo counters (a power of 2 for CTFFT and SRFFT)Bit-reversed addressing

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 17

Page 18: Fast Fourier Transform - MIT OpenCourseWare

18

Vector and multi-processorsMust deal with two interconnected problems

The vector size of the data that can be processed at the maximalrate

Has to be full as often as possibleLoading of the data should be made from data available inside the cache memory to save time

In multi-processors performance dependent on interconnection network

Since FFT deterministic, resource allocation can be solved off-lineArithmetic units specialized for butterfly operationsArrays with attached shuffle networksPipelines of arithmetic units with intermediate storage and reorderingMostly favor CTFFTs

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design

Page 19: Fast Fourier Transform - MIT OpenCourseWare

ASICsArea and throughput are important

A – area, T- time between two successive DFT computationsAsymptotic lower bound for AT2

Achieved by several micro-architecturesShuffle-exchange networksSquare grids

Outperform the more traditional micro-architectures only for very large NCascade connection with variable delay

Dedicated chips often based on traditional micro-architectures efficiently mapped to layout

Cost dominated by number of multiplies but also by cost of communicationCommunication cost very hard to estimate

Dedicated arithmetic unitsButterfly unitCORDIC unit

Still, many heuristics and local tricks to reduce complexity and improve communication

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 19

Page 20: Fast Fourier Transform - MIT OpenCourseWare

Architectures

1, N, N2 cell type – direct transformCascade (pipelined) FFTFFT networkPerfect-shuffle FFTCCC network FFTThe Mesh FFT

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 20

Page 21: Fast Fourier Transform - MIT OpenCourseWare

The naive approach

Compute all terms in the matrix-vector productN2 multiplications required

Three degrees of parallelismCalculate on one multiply-add cellOn N multiply-add cellsOn N2 multiply-add cells

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 21

Page 22: Fast Fourier Transform - MIT OpenCourseWare

1 multiply-add cellPerformance O(N2logN)

For large FFTs storage of intermediate results is a problemN-long FFT requires

N/r*logrN, radix-r butterfly operations2N*logrN read or write RAM accesses

E.g. to do the 8K FFT in 1ms, need to access internal RAM every 9ns, using radix-4

To speed upEither use higher radix (to reduce the overall number of memory accesses at the price of increase in arithmetic complexity)Or partition the memory to r banks accessed simultaneously (morecomplex addressing and higher area)

Need a very high rate clockCite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.

MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

6.973 Communication System Design 22

Figure by MIT OpenCourseWare.

COEFROM

INPUTBUFFER

FFTRAMBUTTERFLY

N CONTROL

DATA IN DATA OUT

Page 23: Fast Fourier Transform - MIT OpenCourseWare

Cascade FFTCascade of logN multiply-add cells

Nicely suited for decimation in frequency FFT

E.g. 8pt DIF FFT

Produces the output values in bit-reversed orderCite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.

MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

6.973 Communication System Design 23

x y

Figure by MIT OpenCourseWare.

DFTN = 4

DFTN = 2

DFTN = 2

DFTN = 2

DFTN = 2

X0 X0

X4

X4

X1

X1

X5

X5

X2

X2

X6

X6X3

X3

X7 X7

w18

w28

w38

DFT of2

DFT ofN / 2

Multiplicationby twiddle

factors

{ }x2k

DFTN = 4

{ }x2k+1

Figure by MIT OpenCourseWare.

Page 24: Fast Fourier Transform - MIT OpenCourseWare

FFT networkOne of the most obvious implementations

Provide a multiply-add cell for each execution statementEach cell also has a register holding a particular value of zj (twiddle factor)How many such cells do we need for length-N (radix-2 DIT)?

One possible layoutlogN rows, N/2 cells each row

Pipelined performance O(logN)A new problem instance can enter the network as soon as the previous one has left the first rowDelay limited by cell’s multiply-add and long-wire driver to the next row O(logN)

Total network delay is O(log2N)Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.

MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

6.973 Communication System Design 24

X0X4 X1X5 X2X6

X0X2 X1X3 X6X4

X0X1 X3X2 X4X5

y0 y4 y2 y6 y1 y5 y3 y7

X7X6

X7X5

X3X7

Figure by MIT OpenCourseWare.

Page 25: Fast Fourier Transform - MIT OpenCourseWare

25

FFT networkInputs are in “bit-shuffled” order (decimated)Outputs are in “bit-reversed” order

Minimizes the amount of interconnects

General scheme for interconnectionsNumber the cells naturally

0 to N/2-1, from left to rightCell i in the first row is connected to two cells in

d

X0X4 X1X5 X2X6

X0X2 X1X3 X6X4

X0X1 X3X2 X4X5

y0 y4 y2 y6 y1 y5 y3 y7

X7X6

X7X5

X3X7

the second rowCell i and (i+N/4) mod N/2

Cell i in the second row is connected to cellsi and floor(i/(N/4))+((i+N/8) mod N/4) in the thirdrow

Cell i in the k-th row (k=1,…logN-1) is connecteto (k+1)-th row

Cell i and cell floor(i/(N/2k))+((i+N/2k+1) mod N/2k)

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design

Figure by MIT OpenCourseWare.

Page 26: Fast Fourier Transform - MIT OpenCourseWare

The perfect-shuffle networkN/2 element network perfectly suited for FFT, radix-2 DIT

26

Each multiply-add cell associated with xk and xk+1 (k- even number between 0 and N-1)A connection from cell with xk to cell with xj when j=2k mod N-1 (this mapping is one-to-one)

Represents “circular left shift” of the logN-bit binary representation of k

First the xk values are loaded into cellsIn each iteration, output values are shuffled among cellsAt the end of logN steps, final data is in cell registers in bit-reversed order

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design

X0,X1 X2,X3 X4,X5 X6,X7

Figure by MIT OpenCourseWare.

Page 27: Fast Fourier Transform - MIT OpenCourseWare

Cube-Connected-Cycles (CCC) network

N cells capable of performing N-element FFT in O(logN) stepsClosely related to the FFT network

Just has circular connections between first and last rows (and uses N instead of N/2logN cells)

Does not exist for all N (only for N=(K/2)*logK for integer KCite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.

MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

6.973 Communication System Design 27

Figure by MIT OpenCourseWare.

Page 28: Fast Fourier Transform - MIT OpenCourseWare

28

The Mesh implementationApproximately sqrt(N) rows and columnsN-long FFT in logN steps

View as time-multiplexed version of the FFT network

In each step, N/2 nodes take the role of N/2 cells in FFT networkOther half routes the data other nodes

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design

Figure by MIT OpenCourseWare.

Page 29: Fast Fourier Transform - MIT OpenCourseWare

Performance summary

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design 29

Design Area Time DelayArea*Time2

1-cell DFT

N-cell DFT

N log N

N log NN log N

N5log3N

N3log3N

N2log N

N2log N

N2log N

Perfect Shuffle N2 / log2 N N2log2N log2N log2 N

CCC log2NN2log2NN2 / log2 N log2 N

1-proc FFT N3log5N N log2NN log2 NN log N

Cascade N3log3N N log2NN log NN log N

FFT Network N2log2N log2Nlog NN2

N2-cell DFT log N N2log3N N2log NN2log N

Mesh N2log2NN log2 N N N

Figure by MIT OpenCourseWare.

Cascade FFT has the best trade-offLess complicated wiring and NlogN delay

FFT network is as fast as N2 cell FFT but much less area (only N/2logN cells)Perfect-Shuffle and CCC use less cells than FFT network, but take a bit more time

Page 30: Fast Fourier Transform - MIT OpenCourseWare

30

Readings[1] C.D. Thompson "Fourier Transforms in VLSI," no. UCB/CSD-82-105, 1982.

Same, but hard to find publication[2] C.D. Thompson "Fourier Transforms in VLSI," IEEE Trans. Computers vol. 32, no. 11, pp. 1047-1057, 1983.

[3] P. Duhamel and M. Vetterli "Fast fouriertransforms: a tutorial review and a state of the art," Signal Process. vol. 19, no. 4, pp. 259-299, 1990.

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Downloaded on [DD Month YYYY].

6.973 Communication System Design