Spectrum analysis using the fast Fourier transform (FFT) · analysis package using the FFT has been...
Transcript of Spectrum analysis using the fast Fourier transform (FFT) · analysis package using the FFT has been...
Dept. for Speech, Music and Hearing
Quarterly Progress andStatus Report
Spectrum analysis using thefast Fourier transform (FFT)
Liljencrants, J.
journal: STL-QPSRvolume: 10number: 1year: 1969pages: 007-013
http://www.speech.kth.se/qpsr
B. SPECTRUM ANALYSIS USING THE FAST FOURIER TRANSFORM (FFT).
J. Liljenc rants
0. Abstract
As part of a general speech analysis computer program a spectrum
analysis package using the F F T has been implemented, The input is p res -
ent a s a t ime s e r i e s of periodic sanlples f rom the signal. After applica-
t ion of a suitable t ime window the corresponding d iscre te Four i e r t r ans -
fo rm (DFT) is computed, and finally the spec t ra l magnitude is determined.
This paper will deal with some pract ical problems with this procedure.
In par t icular attention i s given to important limitation on the t ime window
imposed by the sampling theorem. Thus a dual sampling theorem may
be formulated, the effect of which does not s e e m to be generally recognized.
Given a t ime s e r i e s x 0 < k 5 N-1 the DFT is in general defined a s k'
The frequency s e r i e s A i s elegantly and economically computed using r
the F F T a lgor i th~n, re f (1 , '2 ) .~his s e r i e s a lso contains N samples:
0 = r = N-1. These cover a frequency range f rom ze ro up to the sampling
frequency of the input t ime function. It is well known f rom the sanlpling
theorem that this t ime function must be band limited to half the sampling
frequency, i. e. , the Nyquist frequency, p r io r to the sampling.
The x may be complex, but if we r e s t r i c t ourselves to the special k c a s e with a r ea l t ime s e r i e s then the JL a r e redundant in such a way that
r A i s the complex conjugate to A The upper half of the A s e r i e s is r N- r ' r thus a mi r ro red irnage of the lower half. In the continuous Four i e r t rans-
fo rm the correspondent to the upper half of A is t h ~ t par t which extends r towards negative frequencies. The f i r s t s tep to preserve the computing
economy i s now to make use of the imaginary par t of x This is done k' by putting
where x and x a r e two (independent) r ea l t ime ser ies . Rk Ik
STL-QPSR 1/1969 8.
After the transformation their respective spectra may be separated
by proper selection of the odd and even parts of A : r
In this way the FFT i s fully utilized by computing two spectra at a time.
Let us now consider the window on the time series. The simplest
thing would be just to pick N consecutive time samples of the input and
use a s x This means a rectangular time window of length N. Each k '
spectral component of the input will give a contribution to the transform.
This contribution has a shape that equals the transform of the time win-
dow. The rectangular time window will thus give the familiar sinx/x
Fourier kernel in the frequency domain. Suppose for the ease of visual-
ization that xk a r e constant. Then the t e rm A. in the transform will be
non-zero, but all the other Ar will fall on those frequencies where the
sinx/x function goes through zero. Similarly, i f the input is sinusoidal
with a frequency corresponding exactly to one of the Ar terms, only this
t e rm will be non-zero in the transform. We thus seemingly have an
ideally high frequency resolution. In the hard life, however, there is
nothing to prevent the input to contain frequency components different
from the discrete set of N/Z frequencies represented by Ar. With a
sinusoidal input the worst case would be an odd number of half cycles,
and the A will then sample the sirur/x function at all i t s peaks except r
the essential one at the center.
In the analysis of stationary siznals this effect can be reduced by
averaging spectra taken at different times. In the case of speech analysis
this method i s generally out of the question because the signal i s not sta-
tionary. On the contrary, one is mostly interested in keeping the product
of the time and frequency window lengths to a minimum. A natural choice
would then be the gaussian window often approximated in analog spectrum
analyzers. A time limited approximation to this, suited for computer
applications i s the "Hamming window". This consists of an inverted one
period cosine bell on a pedestal. The pedestal i s designed to give side
lobes in the transform that a r e about 40 dB down from the main lobe,
independent of the frequency distance from the main lobe.
We can now for a moment consider a pract ical example. Let the input
signal be a t r a in of shor t pulses having a slowly rising repetition frequency,
going f rom, eay 80 Hz up to 240 Hz. After low pass filtering to 10 kHz we
sample the signal with a 20 kHz sampling frequencyr Let us then com-
pute spec t ra of the signal a t 10 msec intervals, the base length of the
t ime ( ~ a m m i n ~ ) window being 12.8 msec , i r e. 28 = 256 samples. Using
this same number f o r N in ~ ~ ~ ( 1 ) will give a frequency spacing between
the Ar in the t ransform of 20,000/256 = 78.1 Hz which may be a reasonable
resolution f o r speech work. Fig. I-B-1 shows the resul t of this experi-
ment in the f o r m of a time-frequency-intensity spec t rogram (the spectro-
g r a m i s plotted with an ink- jet recorder , s ee (3)) . We observe that when
the pulse repetition frequency is below 160 Hz something is being resolved
in the spectrogram, but i t is certainly not the harmonics of the input since
the b a r s go i n the wrong direction with a rising fundamental. When the
repetition frequency exceeds 160 Hz the picture s t a r t s to make sense,
however. On second thought one will real ize that the peculiar phenomena
a r e the resul t of violating the sampling theorem conditions. In o rde r to
resolve the s t ruc ture of equidistant spec t ra l l ines in the d iscre te spectrum
the re must be a t l eas t two samples fo r each spec t ra l line. In o r d e r not to
conflict with this we must impose a limitation on the t ime s e r i e s s o that
i t cannot produce a spectrum with too closely spaced harmonic spec t ra l
lines. Thus the t ime s e r i e s must not contain a period longer than N/2
samples where N is the number of samples transformed. A means to
avoid this is to allow the t ime window to occupy not more than half the
length of the t ime s e r i e s , while the r e s t of the t ime s e r i e s is to be zero.
We can thus formulate a duality to the sampling theorem:
To represent a sampled t ime signal with an N-point (two-sided)
d iscre te spectrum, the signal must be t ime limited (low pass "liftered")
to N/2 samples before the spectrum is computed.
As a consequence of this we f i r s t may note that the shape of the t ime
window is ra ther important. F o r instance a rectangular window of length
~ / 2 will have a t ransform with its zeros placed two frequency samples
apart , and a magnitude falling off only proportional to the frequency.
This "smearing" of the spectrum i s now independent of the frequency.
Secondly we observe that the computing economy of the F F T is ad-
versely affected s ince half the number of t ime samples always have to
Fig. I-B-2. Spectrum sections of a synthetic neutra l vowel with F = l o 5 Hz. 0
T ime window length i s 256 samples . Top spec t rum computed with N=256, middle and bottom with N=512. In a l l c a s e s only the lower half of the computed spec t rum samples a r e shown.
Fig . I -B-3 . S p e c t r o g r a m s of a na tu ra l (top) and a synthetic (bottom) v e r s i o n of the sentence "Did you extinguish the f i r e ? " .
Fig. I-B-4. Same material a s in Fig. I - B - 3 (only the first three words) plotted a s "mountains". Frequency scale goes from right to left and time from front to rear.
The comment will be that the pitch-synchronous spectrum analysis
i s a good device to suppress the adverse effects of harmonics disturbing
the spectrum formant envelope. But having taken the trouble tc define
the pitch periods i t i s doubtful that there i s more to gain by further em-
phasizing the formant peaks.
2. Program particulars
Pr io r to computation the input i s supposed to be present on the disk
storage. A sampling frequency of 20 kHz i s assumed. Spectra a r e com-
puted at time increments of 200 samples, corresponding to 10 msec. Be-
ginning and end of subsections of the material to be processed may be
marked by the operator who can monitor the waveform on a display screen.
The total time available i s 6 seconds. The operator can view 25 o r 8 msec
of the wave and he can n;v% along the time axis by turning rotary knobs
on a control unit. Also individual pitch periods may be marked.
The time window used may be either rectangular with limits deter-
mined by operator m.arks, o r a Hamming window. In the latter case the
base length of the window may be selected to be 12.8, 6.4, o r 3.2 msec
(256, 128, o r 64 samples).
The transformation with the FFT subprogram is performed with 9 N=2 =5 12 and uses the feature of processing two separate spectra simul-
taneously.
The spectral magnitude is computed a s the sum of the squares of the
real and imaginary parts. This addition has to be done in double preci-
sion to preserve the dynamic range. The result i s finally converted to
dB using a log routine. Each of the two computed spectra now occupies
256 points. In order to compress the data they a r e now reduced to half
the number. Every pair of data points i s replaced by a single point, the
value of which i s set to the larger one of the pair.
This i s apparently a waste of computational effort. The primary reason
for it i s to minimize aliasing e r ro r s due to the beat between the spectral
harmonics and frequency step between data points. Another reason is that
the analysis bandwidth can be controlled keeping the time integration con-
stant (= time window length). Thus the operator may select higher anal-
ysis bandwidths, and the output data points a r e then taken a s the l a rger
of the closest 3, 4, 5, o r 6 points in the computed spectrum. The
STL-QPSR 1/1969 12.
resultant nominal analysis bandwidths a r e basically 80 Hz, and optionally
120, 160, 200, and 240 Hz. This way the operator has a two-dimensional
control over bandwidth and integration time. It i s operationally roughly
equivalent to the procedure used in the analysis with band pass fil ters
followed by rectifiers and integrating low pass filters. But of course
the "smearing" i s then made in the time dimension instead of the frequency
dimension a s in the program.
3. The F F T subroutine
The F F T algorithm and some associated service routines a r e se t to-
gether in a subroutine. This i s called with control parameters contained
in the computer arithmetic registers , Q and A. The Q register contains
a mode number: I
(9) = 0. Initialize mode. (A) = log2(N), and (calling instruction + 1) = I
address of array. Internal constants a r e se t up, and an auxiliary table
containing ~ / 4 + 1 points on a cosine a r c i s computed and put immediately
af ter the a r r a y to be transformed.
(Q) = 1. ~ i r e c t / i n v e r s e select. If (A)=O the direct t ransform is selected,
i f (A)$o a complement instruction i s inserted in the F F T algorithm to ren-
d e r the inverse transform.
(Q) = 2. Execute transformation. After the computation the data points
a r e stored in bit-reversed order. Before exit they a r e "unshuffled" into
order. Both the transformation and the unshuffling a r e done in place,
i. e., the result replaces the input and no intermediate storage a r ea i s
used.
(Q) = 3. Separate odd and even parts of complex spectrum into two sep-
a ra te complex spectra, This i s applicable when two rea l se r i es a r e t rans-
formed simultaneously. The separation is done in place.
(Q) = 4. Merge into odd and even parts. The inverse to the previous
operation.
The subroutine i s written in machine code for the CDC 1700 which has
a 16 bit word length. All computations a r e done in single-precision
integer arithmetic.
In the F F T algorithm the principle of decimation in frequency i s used.
To preserve a reasonably homogeneous numerical magnitude in both the
STL-QPSR 1/1969 13,
time and frequency ser ies a rescaling by the factor 1/@ i s performed
in each complex multiplication. The effect of this i s to introduce the
factor 1/ into the right member of expression (1). Then the direct
and inverse transforms will differ only by the sign in the exponential,
A test with N = 512 indicated the useful dynamic range to be better than
60 dB.
Also, for N = 512, the execution time of the FFT and the unshuffling
is 560 msec, closely proportional to N* log2N for other N being powers
of 2. The initializing takes 27 msec (only needed once) and the separa-
tion and merge routines take 24 msec each. The lat ter execution times
a r e proportional to N.
Program length (CDC 1700):
FFT algorithm 80
Bit reversal and unshuffling 80
Initiation 80
Odd - even separation and m e r p 100
Administ ration 36
Constants and temps . 34
41 0 words
References:
(1) Cooley, J. W., Tukey, J. W. : "An Algorithm for the Machine Calculation of Complex Fourier Seriesf ' , Math, of Comput., 19 (1965), pp. 297-301. -
(2) IEEE, AE Subcommittee on Measurement Concepts, Helms, H. D., chairman: "What i s the Fast Fourier Transform? I ' IEEE Transact., AU-15 ( ~ u n e 1967), special issue on FFT, and Proc. 1 ~ ~ ~ , T c l 9 6 7 ) , - pp. 1664-1674.
(3) Liljencrants, J. : "Use of an Ink- Jet Recorder a s a Computer Plotter", STL-QPSR 1/1968, pp, 27-29.