Speech signal time frequency representation
-
Upload
nikolay-karpov -
Category
Education
-
view
800 -
download
0
Transcript of Speech signal time frequency representation
Methods and algorithms of speech recognition course
Nikolay V. Karpov
nkarpov(а)hse.ru
Lection 2
Time-Frequency Representation
This lecture is concerned with the spectrogram as a representation of how the frequency components within a signal vary with time.
Define what we mean by normalised time and frequency Define the short-term discrete Fourier transformaton Look at the effect of different window lengths on time and
frequency resolution Derive a quantitative description of the frequency
resolution of the short term DFT and compare the performance of common windows
Derive a quantitative description of the time resolution of the short term DFT
Explain the uncertainty principle and illustrate its effects with some examples
Review some of the properties of the DFT
Time-Frequency Representation
Sampling theorem says that when the analog signal x(t) is band-restricted between 0 and W[Hz] and when x(t) is sampled at every T = 1/2W, the original signal can be completely reproduced by
Sampling in the time domain
For those signals the frequency bandwidths of which are not known, a low-pass filter is used to restrict the bandwidths before sampling. When a signal is sampled contrary to the sampling theorem, aliasing distortion occurs, which distorts the high-frequency components of the signal, as shown
Sampling in the frequency domain
Simple continuous signal
Normalised radian frequency Normalised time
Normalised Frequency
1
)2
()2()(
i w
itg
w
ixtx )()()()
2( ixiTx
F
ix
w
ix
s
)2cos()( ftAtx
)2cos()cos())(2cos()( iN
kAiA
F
ifAtx
s
5.0)1(2)1(20
N
k
Ff
s
wt
wttg
2
)2sin()(
i
Designate
With sampled data systems it is customary to change the time scale and work in units of the sample period. In normalised units the sampling frequency (fs) and
period (T) both equal 1. They can therefore be omitted from equations: any such omissions can be deduced using dimensional consistency arguments.
The Nyquist frequency is ½ Hz = πradians/second To convert back to real units:
◦ any time quantity must be multiplied by the real sample period (or divided by the real sample frequency)
◦ any frequency or angular frequency quantity must be multiplied by the real sample frequency (or divided by the real sample period)
Normalised Time
The spectrogram shows the energy in a signal at each frequency and at each time. We calculate this by evaluating the short-term discrete Fourier transform
Spectrogram
Z- tramsform
c
n
n
n
dzzzXj
zX
znxzX
1)(2
1)(
)()(
The Fourier transform of x(n) can be computed from the z-transform as:
Fourier transform
n
ni
ezenxzXX i
)()()(
The Fourier transform may be viewed as the z-transform evaluated around the unit circle.The Discrete Fourier Transform (DFT) is defined as a sampled version of the Fourier shown above:
1
0
/2)()(N
n
NknienxkX
1
0
/2)()(N
n
NkniekXnx
The Fast Fourier Transform (FFT) is simply an efficient computation of the DFT.
The Discrete Cosine Transform (DCT) is simply a DFT of a signal that is assumed to be real and even (real and even signals have real spectra!)
The Discrete Cosine Transform
1,...,1,0;)/2cos()(2)0()(1
0
NkNknnxxkXN
n
Note that these are not the only transforms used in speech processing Wavelets; Fractals; Wigner distributions; etc.
We often want to estimate the “power spectrum” of a non-stationary signal at a particular instant of time.
Multiply by a finite length window and take the DFT.
For a window of length N ending on sample m we have:
Short-Term Discrete Fourier Transform
1
0
/)(2)()();(N
n
NnmkienmxnwmkX
k-frequency, m-time
gives the power at a frequency of k/N Hz (normalized) for a window centred at m-0.5(N-1).
the (m-i) term in the exponent means that the phase origin remains consistent by cancelling out the linear phase shift introduced by a delay of m samples.
the window samples are numbered backwards in time (for convenience later) hence the summation is performed backwards in time.
The values X(k;m) are based on the N signal values from m–N+1 to m.
the frequency resolution is 1/N Hz (normalised units) the spectrum is periodic and (since w and x are real) conjugate
symmetric:
there are only 0.5N+1independent frequency values
Short-Term Discrete Fourier TransformNotes:
),(),(),( *2mkXmkXmkX
)()()( * kXkNXkX
Sound «is»
Window multiplication
Hamming window of length401 samples centred on 400
Line spectrum from larynx oscillation (at about 0.01 normalised Hz) is superimposed on vocal tract resonances
Vocal sound
Wide window (N=400) Independent frequency
values=N/2+1=201
Short window eliminates the fine detail (N=30)
N/2+1=16
Zero padded window gives more spectrum points and an illusion of more detail (N=480=30×16). This is the normal case for a spectrogram
N/2+1=241
Three different Hamming windows:
By setting r= N–1–i we can rewrite this as:
Frequency resolution
1
0
/)(2exp)()();(N
n
NnmkinmxnwmkX
1
0
/2exp)(*))1(2
exp();(N
rm Njkrry
N
NmjmkX
)1()1()( rNmxrNwrym
This is a standard DFT multiplied by a phase-shift term that is proportional to k: this compensates for the starting time of the window: m–N+1
y(r) is a product of two signals so its DFT is the convolution of the DFT’s of w(N–1–r) and x(m–N+1+r)
Two signals resolution
Time-domen windowing
)/)2
(2cos( NN
nkck Hamming window
Slidelobe -42.5 dBMainlobe width 0.039Linkage factor 0.03%a=1.81, b=4
Rectangle window
Slidelobe -13.3 dBMainlobe width(-3dB) 0.027Leakage factor 9.14%a=1.21, b=2
104654.0)( cnw
1)( nw
Other window functions
Bartlett(L)
Blackman(L)
Rectwin(L)
Chebwin(L,R)
Hamming(L)
Hann(L)
Kaiser(L,beta)
taylorwin(L)
triang(L)
Time-Domain Windowing
Спектр сигнала при использовании окна Блэкмана - Наттала
Спектр сигнала при использовании окна Блэкмана
Time Resolution: Filter-Bank Viewpoin
BW = 45 HzNT = 44 ms
BW = 300 HzNT = 7 ms
/aI/ from “my” with 45Hz and 300Hz bandwidth spectrograms
Time Resolution: Filter-Bank Viewpoin
45Hz, 44ms 300Hz, 7ms
Vertical slice through spectrogram:
mT= 0.1s. 45 Hz gives finer frequency resolution
Horizontal slice through spectrogram:
k/NT= 3.5 kHz 300 Hz gives finer time resolution
45Hz and 300Hz bandwidth spectrograms
You cannot get good time resolution and good frequency resolution from the same spectrogram. Duration of window = N/fs=N*T. Frequency resolution = fs×a/N
◦ Equal amplitude frequency components with this separation will give distinct peaks
◦ Most windows have a≈2 Time resolution = 2N/(a*fs)
◦ Amplitude variations with this period will be attenuated by 6 dB For all window functions, the product of the time and
frequency resolutions is equal to 2. Linguistic analysis typically uses a window length of 10–
20 ms. The transfer function of the vocal tract does not change significantly in this time.
Uncertainty Principle
To keep all the information about time variation of spectral components, you need only sample the spectrum twice as fast as the spectral magnitudes are varying. Using normalised frequencies:
–∞ dB bandwidth for an N-point window = b/N◦ b = 4 for a Hamming window
Significant variation of spectral components occurs at frequencies below 2b/N
we must sample the spectrum at a frequency of b/N◦ the separation between spectral samples is the window
width divided by b
Overlapping Windows
Exact line-spectrum of a periodic signal {xm} Sampled continuous spectrum of zero-extended {xm} Sampled continuous spectrum of infinite {xm} convolved with spectrum of
rectangular window FFT is an algorithm for calculating DFT
Energy Conservation (Parseval’s theorem)
Symmetries {xm} ⇒ {Xk} Discrete⇒Periodic:Xk+N=Xk Real⇒Hermitian: X–k= X*k Periodic:xm+N/r = x m ⇒Discrete:Xk= 0for k ≠ir Skew Periodic:xm+N/2r = –xm ⇒Odd Harmonics: Even:xm= x N–m ⇒Real Odd:xm= –x N–m ⇒Purely Imaginary Real & Even⇒Real & Even Real & Odd⇒Purely Imaginary and Odd
DFT Properties