Spectrum analysis using the fast Fourier transform (FFT) · analysis package using the FFT has been...

Dept. for Speech, Music and Hearing

Quarterly Progress andStatus Report

Spectrum analysis using thefast Fourier transform (FFT)

Liljencrants, J.

journal: STL-QPSRvolume: 10number: 1year: 1969pages: 007-013

http://www.speech.kth.se/qpsr

http://www.speech.kth.se

http://www.speech.kth.se/qpsr

B. SPECTRUM ANALYSIS USING THE FAST FOURIER TRANSFORM (FFT).

J. Liljenc rants

0. Abstract

As part of a general speech analysis computer program a spectrum

analysis package using the F F T has been implemented, The input is p res -

ent a s a t ime s e r i e s of periodic sanlples f rom the signal. After applica-

t ion of a suitable t ime window the corresponding d iscre te Four i e r t r ans -

fo rm (DFT) is computed, and finally the spec t ra l magnitude is determined.

This paper will deal with some pract ical problems with this procedure.

In par t icular attention i s given to important limitation on the t ime window

imposed by the sampling theorem. Thus a dual sampling theorem may

be formulated, the effect of which does not s e e m to be generally recognized.

Given a t ime s e r i e s x 0 < k 5 N-1 the DFT is in general defined a s k'

The frequency s e r i e s A i s elegantly and economically computed using r

the F F T a lgor i th~n, re f (1 , '2 ) .~his s e r i e s a lso contains N samples:

0 = r = N-1. These cover a frequency range f rom ze ro up to the sampling

frequency of the input t ime function. It is well known f rom the sanlpling

theorem that this t ime function must be band limited to half the sampling

frequency, i. e. , the Nyquist frequency, p r io r to the sampling.

The x may be complex, but if we r e s t r i c t ourselves to the special k c a s e with a r ea l t ime s e r i e s then the JL a r e redundant in such a way that

r A i s the complex conjugate to A The upper half of the A s e r i e s is r N- r ' r thus a mi r ro red irnage of the lower half. In the continuous Four i e r t rans-

fo rm the correspondent to the upper half of A is t h ~ t par t which extends r towards negative frequencies. The f i r s t s tep to preserve the computing

economy i s now to make use of the imaginary par t of x This is done k' by putting

where x and x a r e two (independent) r ea l t ime ser ies . Rk Ik

STL-QPSR 1/1969 8.

After the transformation their respective spectra may be separated

by proper selection of the odd and even parts of A : r

In this way the FFT i s fully utilized by computing two spectra at a time.

Let us now consider the window on the time series. The simplest

thing would be just to pick N consecutive time samples of the input and

use a s x This means a rectangular time window of length N. Each k '

spectral component of the input will give a contribution to the transform.

This contribution has a shape that equals the transform of the time win-

dow. The rectangular time window will thus give the familiar sinx/x

Fourier kernel in the frequency domain. Suppose for the ease of visual-

ization that xk a r e constant. Then the t e rm A. in the transform will be

non-zero, but all the other Ar will fall on those frequencies where the

sinx/x function goes through zero. Similarly, i f the input is sinusoidal

with a frequency corresponding exactly to one of the Ar terms, only this

t e rm will be non-zero in the transform. We thus seemingly have an

ideally high frequency resolution. In the hard life, however, there is

nothing to prevent the input to contain frequency components different

from the discrete set of N/Z frequencies represented by Ar. With a

sinusoidal input the worst case would be an odd number of half cycles,

and the A will then sample the sirur/x function at all i t s peaks except r

the essential one at the center.

In the analysis of stationary siznals this effect can be reduced by

averaging spectra taken at different times. In the case of speech analysis

this method i s generally out of the question because the signal i s not sta-

tionary. On the contrary, one is mostly interested in keeping the product

of the time and frequency window lengths to a minimum. A natural choice

would then be the gaussian window often approximated in analog spectrum

analyzers. A time limited approximation to this, suited for computer

applications i s the "Hamming window". This consists of an inverted one

period cosine bell on a pedestal. The pedestal i s designed to give side

lobes in the transform that a r e about 40 dB down from the main lobe,

independent of the frequency distance from the main lobe.

We can now for a moment consider a pract ical example. Let the input

signal be a t r a in of shor t pulses having a slowly rising repetition frequency,

going f rom, eay 80 Hz up to 240 Hz. After low pass filtering to 10 kHz we

sample the signal with a 20 kHz sampling frequencyr Let us then com-

pute spec t ra of the signal a t 10 msec intervals, the base length of the

t ime ( ~ a m m i n ~ ) window being 12.8 msec , i r e. 28 = 256 samples. Using

this same number f o r N in ~ ~ ~ ( 1 ) will give a frequency spacing between

the Ar in the t ransform of 20,000/256 = 78.1 Hz which may be a reasonable

resolution f o r speech work. Fig. I-B-1 shows the resul t of this experi-

ment in the f o r m of a time-frequency-intensity spec t rogram (the spectro-

g r a m i s plotted with an ink- jet recorder , s ee (3)) . We observe that when

the pulse repetition frequency is below 160 Hz something is being resolved

in the spectrogram, but i t is certainly not the harmonics of the input since

the b a r s go i n the wrong direction with a rising fundamental. When the

repetition frequency exceeds 160 Hz the picture s t a r t s to make sense,

however. On second thought one will real ize that the peculiar phenomena

a r e the resul t of violating the sampling theorem conditions. In o rde r to

resolve the s t ruc ture of equidistant spec t ra l l ines in the d iscre te spectrum

the re must be a t l eas t two samples fo r each spec t ra l line. In o r d e r not to

conflict with this we must impose a limitation on the t ime s e r i e s s o that

i t cannot produce a spectrum with too closely spaced harmonic spec t ra l

lines. Thus the t ime s e r i e s must not contain a period longer than N/2

samples where N is the number of samples transformed. A means to

avoid this is to allow the t ime window to occupy not more than half the

length of the t ime s e r i e s , while the r e s t of the t ime s e r i e s is to be zero.

We can thus formulate a duality to the sampling theorem:

To represent a sampled t ime signal with an N-point (two-sided)

d iscre te spectrum, the signal must be t ime limited (low pass "liftered")

to N/2 samples before the spectrum is computed.

As a consequence of this we f i r s t may note that the shape of the t ime

window is ra ther important. F o r instance a rectangular window of length

~ / 2 will have a t ransform with its zeros placed two frequency samples

apart , and a magnitude falling off only proportional to the frequency.

This "smearing" of the spectrum i s now independent of the frequency.

Secondly we observe that the computing economy of the F F T is ad-

versely affected s ince half the number of t ime samples always have to

Fig. I-B-2. Spectrum sections of a synthetic neutra l vowel with F = l o 5 Hz. 0

T ime window length i s 256 samples . Top spec t rum computed with N=256, middle and bottom with N=512. In a l l c a s e s only the lower half of the computed spec t rum samples a r e shown.

Fig . I -B-3 . S p e c t r o g r a m s of a na tu ra l (top) and a synthetic (bottom) v e r s i o n of the sentence "Did you extinguish the f i r e ? " .

Fig. I-B-4. Same material a s in Fig. I - B - 3 (only the first three words) plotted a s "mountains". Frequency scale goes from right to left and time from front to rear.

The comment will be that the pitch-synchronous spectrum analysis

i s a good device to suppress the adverse effects of harmonics disturbing

the spectrum formant envelope. But having taken the trouble tc define

the pitch periods i t i s doubtful that there i s more to gain by further em-

phasizing the formant peaks.

2. Program particulars

Pr io r to computation the input i s supposed to be present on the disk

storage. A sampling frequency of 20 kHz i s assumed. Spectra a r e com-

puted at time increments of 200 samples, corresponding to 10 msec. Be-

ginning and end of subsections of the material to be processed may be

marked by the operator who can monitor the waveform on a display screen.

The total time available i s 6 seconds. The operator can view 25 o r 8 msec

of the wave and he can n;v% along the time axis by turning rotary knobs

on a control unit. Also individual pitch periods may be marked.

The time window used may be either rectangular with limits deter-

mined by operator m.arks, o r a Hamming window. In the latter case the

base length of the window may be selected to be 12.8, 6.4, o r 3.2 msec

(256, 128, o r 64 samples).

The transformation with the FFT subprogram is performed with 9 N=2 =5 12 and uses the feature of processing two separate spectra simul-

taneously.

The spectral magnitude is computed a s the sum of the squares of the

real and imaginary parts. This addition has to be done in double preci-

sion to preserve the dynamic range. The result i s finally converted to

dB using a log routine. Each of the two computed spectra now occupies

256 points. In order to compress the data they a r e now reduced to half

the number. Every pair of data points i s replaced by a single point, the

value of which i s set to the larger one of the pair.

This i s apparently a waste of computational effort. The primary reason

for it i s to minimize aliasing e r ro r s due to the beat between the spectral

harmonics and frequency step between data points. Another reason is that

the analysis bandwidth can be controlled keeping the time integration con-

stant (= time window length). Thus the operator may select higher anal-

ysis bandwidths, and the output data points a r e then taken a s the l a rger

of the closest 3, 4, 5, o r 6 points in the computed spectrum. The

STL-QPSR 1/1969 12.

resultant nominal analysis bandwidths a r e basically 80 Hz, and optionally

120, 160, 200, and 240 Hz. This way the operator has a two-dimensional

control over bandwidth and integration time. It i s operationally roughly

equivalent to the procedure used in the analysis with band pass fil ters

followed by rectifiers and integrating low pass filters. But of course

the "smearing" i s then made in the time dimension instead of the frequency

dimension a s in the program.

3. The F F T subroutine

The F F T algorithm and some associated service routines a r e se t to-

gether in a subroutine. This i s called with control parameters contained

in the computer arithmetic registers , Q and A. The Q register contains

a mode number: I

(9) = 0. Initialize mode. (A) = log2(N), and (calling instruction + 1) = I

address of array. Internal constants a r e se t up, and an auxiliary table

containing ~ / 4 + 1 points on a cosine a r c i s computed and put immediately

af ter the a r r a y to be transformed.

(Q) = 1. ~ i r e c t / i n v e r s e select. If (A)=O the direct t ransform is selected,

i f (A)$o a complement instruction i s inserted in the F F T algorithm to ren-

d e r the inverse transform.

(Q) = 2. Execute transformation. After the computation the data points

a r e stored in bit-reversed order. Before exit they a r e "unshuffled" into

order. Both the transformation and the unshuffling a r e done in place,

i. e., the result replaces the input and no intermediate storage a r ea i s

used.

(Q) = 3. Separate odd and even parts of complex spectrum into two sep-

a ra te complex spectra, This i s applicable when two rea l se r i es a r e t rans-

formed simultaneously. The separation is done in place.

(Q) = 4. Merge into odd and even parts. The inverse to the previous

operation.

The subroutine i s written in machine code for the CDC 1700 which has

a 16 bit word length. All computations a r e done in single-precision

integer arithmetic.

In the F F T algorithm the principle of decimation in frequency i s used.

To preserve a reasonably homogeneous numerical magnitude in both the

STL-QPSR 1/1969 13,

time and frequency ser ies a rescaling by the factor 1/@ i s performed

in each complex multiplication. The effect of this i s to introduce the

factor 1/ into the right member of expression (1). Then the direct

and inverse transforms will differ only by the sign in the exponential,

A test with N = 512 indicated the useful dynamic range to be better than

60 dB.

Also, for N = 512, the execution time of the FFT and the unshuffling

is 560 msec, closely proportional to N* log2N for other N being powers

of 2. The initializing takes 27 msec (only needed once) and the separa-

tion and merge routines take 24 msec each. The lat ter execution times

a r e proportional to N.

Program length (CDC 1700):

FFT algorithm 80

Bit reversal and unshuffling 80

Initiation 80

Odd - even separation and m e r p 100

Administ ration 36

Constants and temps . 34

41 0 words

References:

(1) Cooley, J. W., Tukey, J. W. : "An Algorithm for the Machine Calculation of Complex Fourier Seriesf ' , Math, of Comput., 19 (1965), pp. 297-301. -

(2) IEEE, AE Subcommittee on Measurement Concepts, Helms, H. D., chairman: "What i s the Fast Fourier Transform? I ' IEEE Transact., AU-15 ( ~ u n e 1967), special issue on FFT, and Proc. 1 ~ ~ ~ , T c l 9 6 7 ) , - pp. 1664-1674.

(3) Liljencrants, J. : "Use of an Ink- Jet Recorder a s a Computer Plotter", STL-QPSR 1/1968, pp, 27-29.

Spectrum analysis using the fast Fourier transform (FFT) · analysis package using the FFT has been...

Documents

Transcript of Spectrum analysis using the fast Fourier transform (FFT) · analysis package using the FFT has been...