Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class...

14
DSP (2015 Spring) Computation of DFT NCTU EE 1 Computation of DFT Efficient algorithms for computing DFT – Fast Fourier Transform. (a) Compute only a few points out of all N points (b) Compute all N points What are the efficiency criteria? Number of multiplications Number of additions Chip area in VLSI implementation DFT as a Linear Transformation Matrix representation of DFT Definition of DFT: 1 , , 1 , 0 , ) ( 1 ) ( 1 , , 1 , 0 , ) ( ) ( 1 0 1 0 N n W k X N n x N k W n x k X N k kn N N n kn N where Let , ) 1 ( ) 1 ( ) 0 ( , ) 1 ( ) 1 ( ) 0 ( N X X X N x x x N N X x and ) 1 )( 1 ( ) 1 ( 2 ) 1 ( ) 1 ( 2 4 2 1 2 1 1 1 1 1 1 1 N N N N N N N N N N N N N N N N W W W W W W W W W W Thus, N N N N N N N N N N N X W X W x x W X * 1 1 IDFT point - point DFT - Because the matrix (transformation) N W has a specific structure and because k N W has par- ticular values (for some k and n), we can reduce the number of arithmetic operations for computing this transform.

Transcript of Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class...

Page 1: Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class note/dsp9.pdf · 2016-02-15 · DFT as a Linear Transformation Matrix representation

DSP (2015 Spring) Computation of DFT

NCTU EE 1

Computation of DFT

Efficient algorithms for computing DFT – Fast Fourier Transform.

(a) Compute only a few points out of all N points

(b) Compute all N points

What are the efficiency criteria?

Number of multiplications

Number of additions

Chip area in VLSI implementation

DFT as a Linear Transformation Matrix representation of DFT

Definition of DFT:

1,,1,0,)(1

)(

1,,1,0,)()(

1

0

1

0

NnWkXN

nx

NkWnxkX

N

k

knN

N

n

knN

where

Let ,

)1(

)1(

)0(

,

)1(

)1(

)0(

NX

X

X

Nx

x

x

NN

Xx

and

)1)(1()1(2)1(

)1(242

12

1

1

1

1111

NNN

NN

NN

NNNN

NNNN

N

WWW

WWW

WWW

W

Thus,

NN

NNN

NNN

N

N

N

XW

XWx

xWX

*

1

1IDFTpoint -

point DFT-

Because the matrix (transformation) NW has a specific structure and because k

NW has par-

ticular values (for some k and n), we can reduce the number of arithmetic operations for

computing this transform.

Page 2: Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class note/dsp9.pdf · 2016-02-15 · DFT as a Linear Transformation Matrix representation

DSP (2015 Spring) Computation of DFT

NCTU EE 2

Example 3] 2 1 0[][ nx

jj

jj

WWW

WWW

WWW

WWWW

WWWW

WWWW

WWWW

11

1111

11

1111

1

1

1

1111

14

24

34

24

04

24

34

24

14

94

64

34

04

64

44

24

04

34

24

14

04

04

04

04

04

4W

Only additions are needed to compute this specific transform.

(This is a well-known radix-4 FFT)

Thus, the DFT of ][nx is

j

j

22

2

22

6

444 xWX

Fast Fourier Transform -- Highly efficient algorithms for computing DFT

General principle: Divide-and-conquer

Specific properties of kNW

Complex conjugate symmetry: *)( knN

knN WW

Symmetry: kN

Nk

N WW 2

Periodicity: kN

NkN WW

Particular values of k and n: e.g., radix-4 FFT (no multiplications)

Direct computation of DFT

1

0

1

0

Re][ImIm][Re

Im][ImRe][Re

1,,1,0 ,][][

N

nkn

Nkn

N

knN

knN

N

n

knN

WnxWnxj

WnxWnx

NkWnxkX

For each k, we need N complex multiplications and N-1 complex additions. 4N real

multiplications and 4N-2 real additions.

Page 3: Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class note/dsp9.pdf · 2016-02-15 · DFT as a Linear Transformation Matrix representation

DSP (2015 Spring) Computation of DFT

NCTU EE 3

We will show how to use the properties of k

NW to reduce computations.

Radix-2 algorithms: Decimation-in-time; Decimation-in-frequency

Composite N algorithms: Cooley-Tukey; Prime factor

Winograd algorithm

Chirp transform algorithm

Radix-2 Decimation-in-time Algorithms -- Assume N-point DFT and 2N

Idea: N-point DFT 2

N -point DFT 4

N -point DFT

4

N -point DFT

2

N -point DFT 4

N -point DFT

4

N -point DFT

Sequence: ]1[]2[]3[]2[]1[]0[ Nxnxxxxx

Even index: ]2[]2[]0[ Nxxx

Odd index: ]1[]3[]1[ Nxxx

2/2/

22

22

12

0

12

0

)12(2

12

odd

2

even

1

0

]12[]2[

][][

1,,1,0 ,][][

NN

jN

j

N

N

r

N

r

krN

rkN

rn

n

knN

rn

n

knN

N

n

knN

WeeW

WrxWrx

WnxWnx

NkWnxkX

][][

]12[]2[][

point DFT-2

12

02/

point DFT-2

12

02/

kHWkG

WrxWWrxkX

kN

N

N

r

rkN

kN

N

N

r

rkN

Page 4: Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class note/dsp9.pdf · 2016-02-15 · DFT as a Linear Transformation Matrix representation

DSP (2015 Spring) Computation of DFT

NCTU EE 4

Comparison:

(a) Direct computation of N-point DFT (N frequency samples):

~ 2N complex multiplications and 2N complex adds

(b) Direct computation of 2

N -point DFT:

~ 2

2

N complex multiplications and

2

2

N complex adds

+ additional N complex multis and N complex adds

~ (Total:) 22

222

NN

NN

complex multis and adds

(c) N2log -stage FFT

Since 2N , we can further break 2

N -point DFT into two 4

N -point DFT and

so on.

Page 5: Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class note/dsp9.pdf · 2016-02-15 · DFT as a Linear Transformation Matrix representation

DSP (2015 Spring) Computation of DFT

NCTU EE 5

At each stage: ~ N complex multis and adds

Total: ~ NN 2log complex multis and adds (--> NN

2log2

)

Number of points, N

Direct Computation: Complex Multis

FFT: Complex Multis

Speed Im-provement Factor

4 16 4 4.08 64 12 5.3

16 256 32 864 4,096 192 21.3

256 65,536 1,024 64.01024 1,048,576 5,120 204.8

Butterfly: Basic unit in FFT

Two multiplications:

Page 6: Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class note/dsp9.pdf · 2016-02-15 · DFT as a Linear Transformation Matrix representation

DSP (2015 Spring) Computation of DFT

NCTU EE 6

One multiplication:

In-place computations

Only two registers are needed for computing a butterfly unit.

][][][

][][][

11

11

qXWpXqX

qXWpXpX

mr

Nmm

mr

Nmm

Advantage: less storage!

In order to retain the in-place computation property, the input data are accessed in the

bit-reversed order.

Note: The outputs are in the normal order (same as the “position”)

Position Binary equivalent Bit reversed Sequence index

6 110 011 3 2 010 010 2

Page 7: Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class note/dsp9.pdf · 2016-02-15 · DFT as a Linear Transformation Matrix representation

DSP (2015 Spring) Computation of DFT

NCTU EE 7

Remark: Index 3 input data is placed at position 6.

We may also place the inputs in the normal order; then the outputs are in the bit-reversed

order.

If we try to maintain the normal order of both inputs and outputs, then in-place compu-

tation structure is destroyed.

Page 8: Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class note/dsp9.pdf · 2016-02-15 · DFT as a Linear Transformation Matrix representation

DSP (2015 Spring) Computation of DFT

NCTU EE 8

Radix-2 Decimation-in-frequency Algorithms Dividing the output sequence ][kX into smaller pieces.

1,,1,0,)()(1

0

NkWnxkX

N

n

knN

If k is even, rk 2 .

12

02

12

0

2

22]2[2

12

0

12

0

22

2

12

0

1

2

22

1

0

2][

2][

2][

)2(][][

12

,,1,0 ,][]2[

N

n

nrN

N

n

nrN

rnN

rNN

rnN

Nnr

N

N

n

N

n

Nnr

Nnr

N

N

n

N

Nn

nrN

nrN

N

n

knN

WN

nxnx

WN

nxnx

WWWW

WN

nxWnx

NnnWnxWnx

NrWnxrX

Similarly, if k is odd, 12 rk .

12

022

][]12[

N

n

nrN

nN WW

NnxnxrX

Page 9: Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class note/dsp9.pdf · 2016-02-15 · DFT as a Linear Transformation Matrix representation

DSP (2015 Spring) Computation of DFT

NCTU EE 9

12

02

12

02

2][]12[

2][]2[

N

n

nrN

nN

N

n

nrN

WWN

nxnxrX

WN

nxnxrX

Let

2][][

2][][

Nnxnxnh

Nnxnxng

We can further break ]2[ rX into even and odd groups …

Again, we can reduce the two-multiplication butterfly into one multiplication. Hence, the

computational complexity is bout NN

2log2

. The in-place computation property holds if the

outputs are in bit-reversed order (when inputs are in the normal order).

Page 10: Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class note/dsp9.pdf · 2016-02-15 · DFT as a Linear Transformation Matrix representation

DSP (2015 Spring) Computation of DFT

NCTU EE 10

FFT for Composite N -- Cooley-Tukey Algorithm:

21NNN

10

10:index Freq.

10

10:index Time

22

11211

22

11212

Nk

NkkNkk

Nn

NnnnNn

Remark: ),( 21 nnn and ),( 21 kkk

Goal: Decompose N-point DFT into two stages:

1N -point DFT 2N -point DFT

point-

1

0

factortwiddle

point-

1

0212

1

0

1

01

212

1

0

1

0212

211

1

0

2

2

2

22

2

21

1

1

1

11

1

2

2

1

1

1221

222

21121

111

112

2

2

1

1

212211

][

][

][

][

10 ,][][

N

N

n

nkN

nkN

N

N

n

nkN

N

n

N

n

nkNNN

W

nNkN

nkN

W

nkNN

N

n

N

n

nnNkNkN

N

n

knN

WWWnnNx

WWWWnnNx

WnnNx

kNkX

NkWnxkX

nkN

nkN

Procedure

(1) Compute 1N -point DFT: (row transform)

1

021212

1

1

11

1][],[

N

n

nkNWnnNxknG

(2) Multiply twiddle factors:

],[],[~

121221 knGWknG nk

N

(3) Compute 2N -point DFT: (column transform)

1

012211

2

2

22

2],[

~][

N

n

nkNWknGkNkX

Page 11: Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class note/dsp9.pdf · 2016-02-15 · DFT as a Linear Transformation Matrix representation

DSP (2015 Spring) Computation of DFT

NCTU EE 11

(Computation of N=15-point DFT by means of 3-point and 5-point DFTs.)

Page 12: Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class note/dsp9.pdf · 2016-02-15 · DFT as a Linear Transformation Matrix representation

DSP (2015 Spring) Computation of DFT

NCTU EE 12

Extension: NNNN 21

21 If

point DFT-for tionsmultiplica ofnumber )(Let

NNN

NN

)(: transfrmcolumn 3.

:factors twiddle2.

)(:ormrow transf 1.

21

21

12

NN

NNN

NN

1)()(

)()()(

2

2

1

1

2112

N

N

N

NN

NNNNNN

In general,

1

)1()(

)(i i

i

N

NNN

In fact, the term 1 should be 2

1 because rearranging the butterfly structure

would make half of the branches becoming “1”.

Special Case: 221 NNN

Radix-2: 221 NNN and N2log

2

1)( NN multiplications because )2( requires no multiplications.

Radix-4: 421 NNN and N4log

2

1)( NN multiplications because )4( requires no multiplications. This FFT

has fewer stages than Radix-2 ==> fewer multiplications.

jj

jj

WWW

WWW

WWW

WWWW

WWWW

WWWW

WWWW

11

1111

11

1111

1

1

1

1111

14

24

34

24

04

24

34

24

14

94

64

34

04

64

44

24

04

34

24

14

04

04

04

04

04

4W

Page 13: Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class note/dsp9.pdf · 2016-02-15 · DFT as a Linear Transformation Matrix representation

DSP (2015 Spring) Computation of DFT

NCTU EE 13

Inverse FFT IDFT:

1

0

][1

][N

k

knNWkX

Nnx

(*)

DFT:

1

0

][][N

n

nkNWnxkX

Hence, take the conjugate of (*) :

)( DFT1

][1

][1

][1

][

*

1

0

*

1

0

*

*1

0

*

kXN

WkXN

WkXN

WkXN

nx

N

k

knN

N

k

knN

N

k

knN

Take the conjugate of the above equation:

**

**

)(FFT 1

)(DFT 1

][

kXN

kXN

nx

Thus, we can use the FFT algorithm to compute the inverse DFT.

The Goertzel Algorithm

12)

2(

kjNk

Nj

kNN eeW

1

0

)(1

0

][][][N

r

rNkN

N

r

krN

kNN WrxWrxWkX

If we define x[n] = 0 for n < 0 and n N

and ])[(][][][][ )( nuWnxrnuWrxny knN

r

rNkNk

,

Nnk nykX

][][

Page 14: Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class note/dsp9.pdf · 2016-02-15 · DFT as a Linear Transformation Matrix representation

DSP (2015 Spring) Computation of DFT

NCTU EE 14

11

1)(

zWzH

kN

k

If x[n] is complex, we need 4 real multiplications and 4 real additions to compute each

yk[n].

To compute yk[N], we need to compute yk[1], yk[2], …, yk[N-1].

We need 4N real multiplications and 4N real additions to compute X[k].

Remarks:

less efficient than the direct method.

Avoid the computation or storage of the coefficients kn

NW .

To reduce the number of multiplications,

21

1

11

1

)/2cos(21

1

)1)(1(

1)(

zzNk

zW

zWzW

zWzH

kN

kN

kN

kN

k

If x[n] is complex, we only need 2 real multiplications and 4 real additions to implement

the poles of the system. (The complex multiplication by k

NW needs not be performed at every iteration.)

To compute X[k], we need 2N real multiplications and 4N real additions for the poles

and 4 real multiplications and 4 real additions for the zero.

Remarks:

Avoid the computation or storage of the coefficients kn

NW .

Only need to compute and save kNW and )/2cos( Nk .