Meeting 13 - eecs.umich.edu
Transcript of Meeting 13 - eecs.umich.edu
Meeting 13
Summer 2009 Doing DSP Workshop
Today:
◮ Admin comments.
◮ Decimation in time DFT.
◮ Other fast algorithms.
◮ An old friend.
One graphic from TI materials.
Learn all you can from the mistakes of others. You won’t have time to make them
all yourself. — Alfred Sheinwold
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 1/46 Tuesday – June 16, 2009
Projects
Audio waveform synthesizer –
sine, square wave, triangle, etc.
◮ Darin Rajabian
OFDM.
◮ Yu Wang
Motor speed control lab demon
stration.
◮ Zharori Cong
◮ B.K. Kim
Remote camera using ZigBee.
◮ James Kim
◮ Jordan Adams
Digital Filter Study.
◮ Vindhya Reddy
◮ Joanna Widjaja
Ultrasonic Vision Aide.
◮ Ronald Deang
Not cast in concrete.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 2/46 Tuesday – June 16, 2009
Suggested Project Phases
◮ Start up.
◮ Basically define the task, locate useful resources, and verbalize
a possible plan of attack.
◮ Initial Start.
◮ Develop the initial proposal. If applicable, do MATLAB
simulation. Identify required parts and other resources needed
to be purchased. Should have a reasonably clear understanding
of what is to be done and how. Set up goals and time line.
◮ Work in earnest.
◮ Program, build, debug. Repeat.
◮ Completion. Sometime in August.
◮ Demonstration to the workshop.◮ Poster.
Feel free to use ChihWei and myself as resources.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 3/46 Tuesday – June 16, 2009
Updated tentative schedule
Week of June 15: Exercise 5, controlSTICK ADC, DAC, xfer meas..
Tuesday – Fast DFTs.
Thursday – Xilinx 8bit PicoBlaze microcomputer (VHDL).
Week of June 22: Exercise 6, realtime FFT and waveform evaluation.
Tuesday – TBD. KM away.
Thursday – TBD. KM away.
Weeks following —
Lecture and lab complete, focus on projects.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 4/46 Tuesday – June 16, 2009
Lab floor to be done this week
The floors in EECS 4341 are scheduled to be stripped and waxed this
week.
Except for the tables we got everything up off of the floor on Friday.
We have kept the lab functional by moving the computers onto the tables
to retain access.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 5/46 Tuesday – June 16, 2009
Lab tables to be replaced next week.
Friday all of the computers will be shut down an placed in
temporary storage.
The current lab tables will be removed and be replaced with
“real” tables with built in shelving. This hopefully will be done
early in the week.
Once the new lab tables are in place the computers, scopes and
signal generators will be put onto the new benches. Hopefully
the lab will be fully operational by the end of the week.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 6/46 Tuesday – June 16, 2009
Today
◮ Fast DFT algorithms.
◮ Some observations on the C28x FFT support.
◮ A useful C version of the FFT.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 7/46 Tuesday – June 16, 2009
“The” Fast Fourier Transform (FFT) Algorithm
There are many fast algorithms (FFTs) that can be used to
compute the Discrete Fourier Transform (DFT). The DFT is
defined as
X[k] =
N−1∑
n=0
x[n]e−j2πkn/N , k = 0,1, . . . , N − 1.
The nominal computational cost is N2 complex MACs.
Any algorithm that significantly reduces this number can be
considered as being fast.
There are many fast algorithms. Some algorithms are faster
than others.
The metric by which to judge algorithms by is not always clear.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 8/46 Tuesday – June 16, 2009
Many ways of computing the DFTThe paper An Algorithm
for the Machine Computa
tion of Complex Fourier Se
ries by Cooley and Tukey
in 1965 was the first “mod
ern” (or should we say early
computer period?) publica
tion of a fast algorithm for
computing the DFT. This
paper triggered the devel
opment of a large number
of alternative procedures.
The FFT was first discov
ered by Gauss in 1805. It
was used to calculate the
obit of an asteroid. Was
found in one of his work
books written in Latin. But
that’s another story.
Some DFT algorithms:
◮ brute force
◮ Singleton’s DFT speed upprocedure
◮ Goertzel algorithm
◮ decimation in time
◮ decimation in frequency
◮ other radix algorithms
◮ four◮ eight◮ split radix
◮ Winograd’s short lengthconvolution algorithm
◮ prime factor method(GoodThomas)
◮ Winograd Fourier transformalgorithm (WFTA)
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 9/46 Tuesday – June 16, 2009
Need and capability
Everything has its time.
Richard Garwin had a need (nuclear monitoring).
John Tukey had an idea how to solve it.
James Cooley coded it up and made it work.
Computers were just then coming into general use.
And, of course, Gauss did “it” first.
Good/Thomas published the Prime Factor Algorithm earlier.
The Chinese Remainder Theorem is very, very old.
Almost every implementor has a different view and shares it.
There are well over 3000 publications about the FFT.
More appear to being generated almost continuously.
Who knows how many publications that use/mention it.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 10/46 Tuesday – June 16, 2009
Concepts important to fast DFT algorithms
Roots of unity, powers of WN = e−j2π/N .
Symmetry of the sine and cosine.
Index mappings.
Matrix Kronecker products.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 11/46 Tuesday – June 16, 2009
Performance characterization constantly changes
Early effort largely minimized multiplication.
This evolved into minimizing the number of arithmetic operations.
Using today’s processors the goal is largely to minimize data movement.
Implementing an FFT on ASIC, arithmetic becomes important again.
Almost always can trade between memory and execution time.
How does one do a gigapoint FFT?
How to exploit parallelism?
Bit serial arithmetic versions exist.
N specific FFT code generators exist.
Can be pipelined.
Is there a lower bound on computational cost?
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 12/46 Tuesday – June 16, 2009
The decimation-in-time radix-2 FFT
◮ N is assumed to be an integer power of 2.◮ Divide the x[n] into two N/2 value sets based even/odd
index values.◮ Form the DFT of each set and combine results to form N
value DFT.◮ Repeat the procedure on each of the N/2 values DFTs.◮ And so on.
The resulting nominal complex MAC count isN2 × log2(N).
N log2(N)N2 × log2(N) N2
64 6 192 4096
128 7 448 16384
256 8 1024 65536
512 9 2034 262144
1024 10 5120 1048576
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 13/46 Tuesday – June 16, 2009
Separating the even and odd indexed samples
Start with the forward transform equation
X[k] =
N−1∑
n=0
x[n]e−j2πkn/N , k = 0,1, . . . , N − 1 .
Even numbers have the form 2p and odd numbers have the form 2q + 1
where p and q go from 0,1,2, . . . , N/2− 1.
X[k] =
N/2−1∑
p=0
x[2p]e−j2πk2p/N+
N/2−1∑
q=0
x[2q + 1]e−j2πk(2q+1)/N
=
N/2−1∑
p=0
x[2p]e−j2πkp/(N/2) + e−j2πk/NN/2−1∑
q=0
x[2q + 1]e−j2πkq/(N/2) .
We now have a weighted sum of two N/2 value DFTs. Repeat the process.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 14/46 Tuesday – June 16, 2009
The signal flow graph
+
+
+
+
+
+
+
+
X[0]
X[1]
X[2]
X[3]
X[4]
X[5]
X[6]
X[7]
W 0
N
W 1N
W 2N
W 3
N
W 5
N
W 4N
W 6
N
W 7N
Xe[0]
Xe[1]
Xe[2]
Xe[3]
Xo[0]
Xo[1]
Xo[2]
Xo[3]
x[0]
x[2]
x[6]
x[4]
x[1]
x[3]
x[5]
x[7]
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 15/46 Tuesday – June 16, 2009
Exploiting symmetry
+
+
+
+
+
+
+
+
X[0]
X[1]
X[2]
X[3]
X[4]
X[5]
X[6]
X[7]
W 0
N
W 1N
W 2N
W 3
N
Xe[0]
Xe[1]
Xe[2]
Xe[3]
Xo[0]
Xo[1]
Xo[2]
Xo[3]
x[0]
x[2]
x[4]
x[6]
x[1]
x[3]
x[5]
x[7]
−
−
−
−
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 16/46 Tuesday – June 16, 2009
Repeat until done
+
+
+
+
+
+
+
+
X[0]
X[1]
X[2]
X[3]
X[4]
X[5]
X[6]
X[7]
W 0
8
W 1
8
W 2
8
W 3
8
x[0]
x[2]
x[4]
x[6]
x[1]
x[3]
x[5]
x[7]
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
W 0
8
W 2
8
W 2
8
W 0
8
−
−
−
− −
−
−
−
−−
−
−
W 0
8
W 0
8
W 0
8
W 0
8
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 17/46 Tuesday – June 16, 2009
Butterflies and bit reverse addresses
If one can do two butterflies simul
taneously then an algorithm exists
that allows in/out normal ordering
and inplace computation.
normal bit reverse
000 000
001 100
010 010
011 110
100 001
101 101
110 011
111 111From the Wikepedia.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 18/46 Tuesday – June 16, 2009
Pseudo code
Can organize using three loops. One each for level or layer, group,
butterfly.
nFFTs = N/2; FFTsize = 2;for(r = 0; r < R; r++) {
for(fft = 0; fft < nFFTs; fft++) {for(butterfly = 0; butterfly < (FFTsize/2); butterfly++) {
top_index = fft*FFTsize+butterfly;bot_index = top_index+(FFTsize/2);w_index = butterfly*nFFTs;temp = W[w_index]*data[bot_index];data[bot_index] = data[top_index]-temp; // update bot first!data[top_index] = data[top_index]+temp; // now update top
}}nFFTs = (nFFTs/2); FFTsize = (FFTsize*2);
}
The input values assumed to have been reordered.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 19/46 Tuesday – June 16, 2009
Perhaps speeding up the indexing
nFFTs = N/2; FFTsize = 2;for(r = 0; r < R; r++) {
FFTstart = 0;for(fft = 0; fft < nFFTs; fft++) {
w_index = 0;for(butterfly = 0; butterfly < (FFTsize/2); butterfly++) {
top_index = FFTstart+butterfly;bot_index = top_index+(FFTsize/2);temp = W[w_index]*data[bot_index];data[bot_index] = data[top_index]-temp; // update bot first!data[top_index] = data[top_index]+temp; // now update topw_index = w_index+nFFTs;
}FFTstart = FFTstart+FFTsize;
}nFFTs = (nFFTs>>1); FFTsize = (FFTsize<<1); // shifts are easy to do
}
The input values assumed to have been reordered.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 20/46 Tuesday – June 16, 2009
Reordering the input
+
+
+
+
+
+
+
+
X[0]
X[1]
X[2]
X[3]
X[4]
X[5]
X[6]
X[7]
W 0
8
W 1
8
W 2
8
W 3
8
x[0]
x[2]
x[4]
x[6]
x[1]
x[3]
x[5]
x[7]
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
W 0
4
W 1
4
W 1
4
W 0
4
−
−
−
− −
−
−
−
−−
−
−
W 0
2
W 0
2
W 0
2
W 0
2
0-4
0+4
2+6
2-6
1+5
1-5
3+7
3-7
0+4
1+5
2+6
3+7
0-4
1-5
3-7
2-6
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 21/46 Tuesday – June 16, 2009
Continuing the reordering
+
+
+
+
+
+
+
+
X[0]
X[1]
X[2]
X[3]
X[4]
X[5]
X[6]
X[7]
W 08
W 18
W 28
W 38
x[0]
x[2]
x[4]
x[6]
x[1]
x[3]
x[5]
x[7]
+
+
+
+
+
+
+
+
W 04
W 14
W 14
W 04
−
−
−
−
−
−−
−
W 02
W 02
W 02
W 02
a
b
c
d
e
f
g
h
a
b
d
c
e
f
g
h+
+
+
+
+
+
+
+
−
−
−
−
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 22/46 Tuesday – June 16, 2009
Reordered radix-8 DIT
X[0]
X[1]
X[2]
X[3]
X[4]
X[5]
X[6]
X[7]
W 0
8
W 1
8
W 2
8
W 3
8
x[0]
x[2]
x[4]
x[6]
x[1]
x[3]
x[5]
x[7]
+
+
+
+
+
+
+
+
W 0
8
W 2
8
W 2
8
W 0
8
−
−
−
−
−
−−
−
W 0
8
W 0
8
W 0
8
W 0
8+
+
+
+
+
+
+
+
−
−
−
−
+
+
+
+
+
+
+
+
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 23/46 Tuesday – June 16, 2009
Can start going the other way
+
+
+
+
+
+
+
+
X[0]
X[1]
X[2]
X[3]
X[4]
X[5]
X[6]
X[7]
W 0
N
W 1N
W 2N
W 3
N
x[0]
x[2]
x[6]
x[4]
x[1]
x[3]
x[5]
x[7]
−
−
−
−
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 24/46 Tuesday – June 16, 2009
The radix-8 DIF FFT
X[0]
X[1]
X[2]
X[3]
X[4]
X[5]
X[6]
X[7]
W 0
8
W 1
8
W 2
8
W 3
8
x[0]
x[2]
x[4]
x[6]
x[1]
x[3]
x[5]
x[7]
+
+
+
+
+
+
+
+
W 0
8
W 2
8
W 2
8
W 0
8
−
−
−
−
−
−−
−
W 0
8
W 0
8
W 0
8
W 0
8+
+
+
+
+
+
+
+
−
−
−
−
+
+
+
+
+
+
+
+
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 25/46 Tuesday – June 16, 2009
The flood gate was opened
Basically we exploited the way we wrote the indices of the values being
transformed and ended up with a fast algorithm.
We also got a “new” algorithm by manipulating the signal flow graph.
There are lots of ways to write indices and lots of ways to reorder the
data flow.
Which is “best”? What is meant by best?
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 26/46 Tuesday – June 16, 2009
How many ways are there to index?
Numbers can be written as polynomials. For example we can
write
123410 = 1× 103+ 2× 102
+ 3× 103+ 4× 100.
We refer to 10 as being the radix.
N =
D−1∑
k=0
dkrk
Similarly we can write numbers in binary form as
123410 = 1× 210+ 0× 29
+ 0× 28+ 1× 27
+ 1× 26+ 0× 25
+ 1× 24+ 0× 23
+ 0× 22+ 1× 21
+ 0× 20,
= 100110100102.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 27/46 Tuesday – June 16, 2009
A simple factoring of N
Numbers can also be written as the product of their factors.
For example
1234 = 2× 617.
Consider the number N = N1N2 where N1 and N2 are relatively
prime. It can be shown that we can uniquely write the integer
values from 0 through N − 1 as
n = n2N1 +n1, n1 = 0,1, . . . , N1 − 1, n2 = 0,1, . . . , N2 − 1
or alternatively as
k = k1N2 + k2, k1 = 0,1, . . . , N1 − 1, k2 = 0,1, . . . , N2 − 1.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 28/46 Tuesday – June 16, 2009
FFT based only on simple factoring
X[k] =
N−1∑
n=0
x[n]e−j2πkn/N ,
X[k1N2 +n2] =
N1−1∑
n1=0
N2−1∑
n2=0
x[n2N1 +n1]e−j2π(k1N2+k2)(n2N1+n1)/(N1N2)
=
N1−1∑
n1=0
N2−1∑
n2=0
x[n2N1 +n1]e−j2π(k1n1N2+k2n2N1+k2n1)/(N1N2)
=
N1−1∑
n1=0
e−j2πk1n1/N1
e−j2πk2n1/N
N2−1∑
n2=0
x[n2N1 +n1]e−j2πk2n2/N2
.
Procedure: Form N1 N2point DFTs.
Weight the results using twiddlefactors.
Form N2 N1point DFTs.
N1N22 +N1N2 +N2N
21 = N1N2(N1 + 1+N2)
For N = 15 = 3× 5 compare N2= 225 to N1N2(N1 + 1+N2) = 135.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 29/46 Tuesday – June 16, 2009
Prime Factor Algorithm index mapping
A more generalized mapping of the indices is
n = ((K1n1 +K2n2))N where 0 ≤ n1 < N1
0 ≤ n2 < N2
and
k = ((K3k1 +K4k2))N where 0 ≤ k1 < N1
0 ≤ k2 < N2.
The ( )N denotes using the quantity contained in the parentheses
moduloN .
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 30/46 Tuesday – June 16, 2009
Prime factor decomposition
((kn))N = ((K1K3n1k1 +K1K4n1k2 +K2K3n2k1 +K2K4n2k2))N
If values of K1, K2, K3, and K4 can be determined such that
((K1K4))N = ((K2K3))N = 0
then the DFT becomes
X[k] =
N1−1∑
n1=0
e−j2πk1n1K1K3/N
N2−1∑
n2=0
x[n1, n2]e−j2πk2n2K2K4/N .
Both the condition for generating 1to1 index maps and the above
modulo relationship can be satisfied. The result is a mapping of a
onedimensional DFT into a twodimensional DFT. For this case the
number of complex multiplications is
N2N21 +N1N
22 .
For N = 15 = 3× 5 this gives 120 complex multiplications.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 31/46 Tuesday – June 16, 2009
PFA cost in terms of multiplications
For Nf relatively prime factors of N the number of complex
multiplications become
N
Nf−1∑
i=0
Ni.
For N = 8184 = 3× 8× 11× 31 the number of complex
multiplications is 53× 8184 as compared to the unmodified
DFT which uses 8184× 8184.
The PFA uses a factor of about 154 fewer.
If it were possible to use a 8192 value transform instead, a DIT
FFT would nominally use (8192/2)× 13 complex
multiplications. This is a factor of about 1260 fewer
multiplications than needed by the unmodified DFT definition.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 32/46 Tuesday – June 16, 2009
C28x FFT and related functions
Started out with sprc081.zip and eventually end up in
c:\tidcs\c28\dsp_tbox. Moved to lab 6 directory.
Documented in FFT Library Module user’s Guide C28xFoundation
Software contained in fft_mdl.pdf. Essential reading.
3322--bbiitt RReeaall FFFFTT
EExxeeccuuttiioonn CCyycclleessFFFFTT ssiizzee
CCaassee 11 :: TTFF((QQ3311)) CCaassee 22 :: TTFF((QQ3300)) CCaassee 33 :: TTFF((QQ3300)) && OOTTPP
128 6509 6763 7017
256 14756 15394 16032
512 33081 34615 36149
1024 73422 77004 80536
3322--bbiitt CCoommpplleexx FFFFTT
128 11159 11671 12183
256 25901 27181 28461
512 59075 62146 65217
1024 132823 139991 147159
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 33/46 Tuesday – June 16, 2009
Time and storage
1024 real using 60 MHz clock takes about 1.2 ms.
32bit 1024 real requres 2048 16bit words for data.
1024 complex using 60 MHz clock takes about 2.2 ms.
32bit 1024 complex requires 4096 16bit words for data.
The total RAM on the C28017 controlSTICK is 6K 16bit words.
TI functions use DIT with input in bitreverse addressing form.
The TI functions do not scale as part of the transform.
TI does not provide an inverse FFT for the C28x.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 34/46 Tuesday – June 16, 2009
FFT input scaling
Consider a solitary sinusoidal input where Bbit sample values
are placed into the low bits:
cos(2πfct) =ej2πfct + e−j2πfct
2
For an Nvalue DFT the gain at the fc frequency (assuming it
matches an analysis frequency) is N. If a 1024 point transform
is taken then the result might require 10+B1 bits.
Using C28017’s 12bit samples the maximum amplitude FFT
value is 2047× 1024/2 = 1,048064. This will fit in 21bits.
Actually, max complex DC in is the worst case input waveform.
Another “bad” waveform is a complex max amplitude square
wave. The fundamental has amplitude 2/π instead of 1/2.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 35/46 Tuesday – June 16, 2009
Scaling when taking the IDFT
For the 12bit single sine wave using the DFT to compute the
IDFT will increase the word size by 10 bits. The result will fit
using a 32bit word size. Simply transform then shift right by
10 bits.
Do we need 21 bits or 12 bits for the result? We started with 12.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 36/46 Tuesday – June 16, 2009
What to do if there aren’t enough bits?
On a fixed point DSP computer floating point is not normally an
option. When simulating floating point, performance takes a big
hit.
One could scale the partial results by a factor of 2 for each FFT
layer. This commonly done. It is conservative often scaling
values more than necessary costing in noise performance.
A hybrid fixed point floating point technique termed block
floating point is often a viable option.
One can find code examples on the web both for TI and
Motorola DSP devices for both scaling procedures. The block
point scaling is well supported in the Motorola DSP devices.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 37/46 Tuesday – June 16, 2009
Block floating point
I can write values as m× 2c where m is a two’s complement
fraction having magnitude less than one and c is a two’s complement
integer.
0.25 can be written as 0.5× 2−1 or as 0.0625× 22
16 can be written as 0.5× 25 or as 0.03125× 29
If we have a array of values all using the same value of c we have
a set of values referred to as being in block floating point form.
In order to keep values fractional they must be scaled such that
the magnitude of the largest value is less than 1.
FFTs formed using block floating point are generally more accurate than
fixed point FFTs and less accurate than equivalent floating point ones.
The DSP56303 has hardware support that allows block floating
point FFTs to be formed only slightly slower that fixed point ones.
I don’t know how well the C5510 supports block floating point.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 38/46 Tuesday – June 16, 2009
Singleton’s DFT speed-up procedure
In 1969 Singleton published a simple algorithm that reduces the
number of multiply operations for DFT’s by a factor of four.
Z[k] =
N−1∑
n=0
W knN z[n]
=
N−1∑
n=0
ze[n] cos(2πkn/N)− jzo[n] sin(2πkn/N)
Write Z[k] in terms of even and odd parts as Z[k] = Ze[k]+ Zo[k].
Ze[k] =
N−1∑
n=0
ze[n] cos(2πkn/N)
Zo[k] = −j
N−1∑
n=0
zo[n] sin(2πkn/N)
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 39/46 Tuesday – June 16, 2009
Singleton’s procedure continued
For N odd we can write
Ze[k] = z[0]+
(N−1)/2∑
n=1
(z[n]+ z[N −n]) cos(2πkn/N),
Zo[k] = −j
(N−1)/2∑
n=1
(z[n]− z[N −n]) sin(2πkn/N).
Because of symmetry the values of Ze[k] and Zo[k] need only be
computed for 0 ≤ k ≤ (N − 1)/2. Note that there were pairs of two’s
that cancelled out.
The above sums can be evaluated using
multiplies = 2N − 1
2
N − 1
2+ 2
N − 1
2
N − 1
2= (N − 1)2.
Depending on whether or not there are two ALUs and how they are
arranged the multiplication of complex values by real values in a
single instruction time may be possible. This would result in the
reduction of the number of multiplies by an another factor of two.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 40/46 Tuesday – June 16, 2009
Singleton’s procedure completed
The even N case is left as an exercise (not assigned).
There is going to be a pass through the data to compute the
even and parts at the start of the procedure and a similar pass
at the end. This will add some additional overhead.
Depending on how the particular DSP Architecture we are using
does things, a speed up of perhaps as much of 4 to 8 times may
be possible over brute force.
This works even if N is prime!
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 41/46 Tuesday – June 16, 2009
Is all sweetness and light?
Of course, not all is sweetness and light. There are many worries
associated with efficiently computing DFTs. Some of these are:
◮ It is not always possible to compute a DFT inplace. Quite
often it is necessary to swing between a pair of working
areas as one moves between layers.◮ Does there exist code or at least an algorithm for efficiently
computing DFTs for the prime factors? There is always the
possibility of Singleton’s procedure at least dangling the
prospect of a four times speed up. However, better
speedups may be possible.◮ The transformed values generally need to be reordered. The
use of permutation arrays are useful but these too consume
memory resources.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 42/46 Tuesday – June 16, 2009
Three multiplier complex multiplication
In general (a+ jb)× (c + jd) = (ac − bd)+ j(bc + ad).
This can be written using three multiplications as
(a+jb)×(c+jd) = a(c−d)+(a−b)d+j[b(c+d)+(a−b)d] .
~
ÅJÇ
ÅHÇ
Ä
Ç~ÅJÄÇ
ÄÅH~Ç
When multiplying by a constant, c + jd, the c +d and c − d can be
table lookup.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 43/46 Tuesday – June 16, 2009
C FFT — An old friend!
/* Fast Fourier Transform Function (fft2)
Adapted from:
The Fast Fourier Transform and its ApplicationsJ. W. Cooley, P. A. Lewis, and P. D. WelchIEEE Transactions on Education, Vol. 12, No. 1,March 1969, pp 27-34.
28Feb87 Converted to C .. K. Metzger06Feb91 High-C conversion .. K.Metzger
Function forms the discrete Fourier transform of an arrayof double precision complex values. An integer power oftwo number of values is assumed to be contained in a hugearray.
void fft2(data, log2n, direction)
data huge pointer to double precision complex valueddata stored re,im,re,im,...
log2n int log base 2 of number of points totransform. Allowed range is 1 thru NLIMIT.
direction int which is - if going from time to frequency(uses -sine and divides values by number ofcomplex values). If >=0 goes from frequency totime.
*/
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 44/46 Tuesday – June 16, 2009
Bit reverse reorder the input
void fft2(double *data, int log2n, int direction){
unsigned n, i, j, el, le, le_half, to_freq;register unsigned val_i, rev_i;double *ptr1, *ptr2, temp, dbl_n, arg, t_re, t_im, u_re, u_im, w_re, w_im;
if (pi==0.0) pi=4.0*atan(1.0);to_freq=(direction<0) ? 1 : 0;dbl_n=(double)(n=1<<log2n);for (i=1; i<n-1; i++) {
val_i=i; rev_i=0;for (j=0; j<(unsigned)log2n; j++) {
rev_i=(rev_i<<1)|(val_i&0x0001);val_i>>=1;
}if (rev_i>i) {
temp= *(ptr1=data+(i<<1));
*ptr1= *(ptr2=data+(rev_i<<1));
*ptr2++=temp;temp= *(++ptr1);
*ptr1= *ptr2;
*ptr2=temp;}
}
The C5510 has hardware to (hopefully) simplify this task.
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 45/46 Tuesday – June 16, 2009
Compute the FFT and maybe normalize
for (el=0; el<(unsigned)log2n; el++) {le=(le_half=1<<el)<<1;u_re=1.0; u_im=0.0;w_re=cos(arg=pi/le_half);w_im=(to_freq) ? -sin(arg) : sin(arg);for (j=0; j<le_half; j++) {
for (i=j; i<n; i+=le) {ptr2=(ptr1=data+((i+le_half)<<1))+1;t_re= *ptr1*u_re-*ptr2*u_im;t_im= *ptr1*u_im+*ptr2*u_re;ptr2=data+(i<<1);
*ptr1++= *ptr2++-t_re;
*ptr1= *ptr2-t_im;
*ptr2--+=t_im;
*ptr2+=t_re;}t_re=u_re;u_re=u_re*w_re-u_im*w_im;u_im=t_re*w_im+u_im*w_re;
}}if (to_freq) {
for (i=0; i<n; i++) {
*data++/=dbl_n;
*data++/=dbl_n;}
}return;
}
Doing DSP Workshop – Summer 2009 Meeting 13 – Page 46/46 Tuesday – June 16, 2009