Formant Tracking Using LPC Root Solving
-
Upload
shabaz-patel -
Category
Documents
-
view
85 -
download
0
description
Transcript of Formant Tracking Using LPC Root Solving
-
Robust formant tracking
using LPC root solving Team Members
Patel Shabaz Basheer EE09B041
Pinjala Sandeep EE09B025
Indian Institute of Technology Hyderabad
-
Motivation
The application of formants is useful in different applications such as speech
recognition, enhancement, noise reduction, hearing aid adaptive filters, etc.
-
Formants
Formants are defined as the spectral peaks of the sound spectrum of the
voice. In speech science and phonetics, formant is also used to mean an
acoustic resonance of the human vocal tract.
-
Algorithm
S[n]
-
Pre-emphasis Filter
In order to improve the overall SNR ratio in the given band of frequencies, the
magnitude of the higher frequencies are increased w.r.t the magnitude of the
usually lower frequencies.
In this algorithm we use a common method of pre-emphasizing such as
filtering the speech signal with the help of HPF (High Pass Filter).
Using this, the above mentioned glottal waveform and radiation load
contribution is removed and the energy is redistributed to approximately all
the frequency in the given band region.
A pre-emphasis High Pass Filter would be given by
H[n] =H[n] a1*H[n-1]
-
Hilbert Transformer (Conversion into
analytic signal)
The conversion of a real signal into an analytical signal has many advantages
and the main advantage while dealing with the adaptive filter banks is that
the analytic signal forms a complex signal for corresponding filtering.
Sc[n] = SR[n] + j*SH[n]
Where,
Sc[n] is the analytic signal,
SR[n] is the real signal,
SH[n] is the Hilbert Transform
-
Algorithm
S[n]
-
Adaptive Band-pass Filtering
The Adaptive Band-pass filter suppress interference from neighboring formant
frequencies while tracking an individual formant frequency as it varies with
time. Hence, it tracks only a single formant frequency.
Adaptive Band-pass filter consists :
1) All Zero Filters (AZF)
2) Dynamic Tracking Filter (DTF)
-
AZF (All Zero Filters)
The AZF in each formant filter is the Adaptive All Zero Filter whose three zero
locations are always set to the value of the previous formant frequency
estimated from the other three formant filters.
The Filters Transfer Function is:-
The value of Kk[n] ensures that the gain is unity and there is zero phase lag at the estimated formant
frequency of the kth component. There is an additional zero which is present at the location of the
pitch estimate and to suppress the pitch effect the zero is included in the filter.
-
Algorithm
S[n]
-
DTF (Dynamic Tracking Filters)
The Dynamic Tracking Filter (DTF) in each formant filter is a single pole
dynamic tracking filter for which the pole location is always set to the
previous value of the formant estimate. The transfer function of the kth DTF
at index n is:
-
Algorithm
S[n]
-
Voiced Speech Detector
This detector checks if the initial window frame speech signal considered is
the voiced part of the signal.
This is done by finding the pitch period of the signal window by finding its
autocorrelation.
This pitch period would lie in the range of 4ms to 9 ms for male and female
speaker.
-
Energy Detector
After the speech signal is filtered using the adaptive band-pass filter-bank,
energy of the signal in that window frame is calculated. The energy of that
formant band must be higher than a specified energy threshold value.
The LPC root solving is only done if both minimum energy criteria and that
particular window frame belongs to the voiced part of the speech
-
LPC Root Solving
Linear Prediction analysis provides a good approximation to the vocal track
spectral envelope especially to the voiced region of speech where all pole
model of LPC is used.
During unvoiced transient region of speech, this LPC model is less effective
than for voiced regions and but still provides acceptable results.
The Linear Predication method can be stated as finding the coefficients ak
which results in the best prediction i.e. which minimizes the mean-squared
prediction error of the speech sample s[n] in terms of the past samples s[n-k]
The Linear predictor of order p is:
E[n] = S[n] -
-
Moving Average
The Moving Average computes the Moving average of each formant frequency
and assigns the estimated value of Moving Average if the segment is unvoiced
or the energy of the formant frequency is below the threshold value.
In all the other cases when the energy is above a threshold value and the
speech being voiced, the estimated value of the formant from the LPC
analysis is assigned.
The Formant assigns Moving average of the formant frequency is given by:
-
Results
The above discussed algorithm has been applied over the speech .wav files.
Formant Tracker performance for the database speech signal
-
Formant Tracker performance on the database speech signal with a
background noise of SNR of 40dB
-
RMS errors of formant trackers in presence of AWGN of varying SNR values
-
Discussion
As the adaptive filter is used with initial values of formant frequencies, the
outputs also depend on these specific initial values given. So, in few cases
when the actual formant frequency does not lie near the initial formant
frequency given as input, we would be few more poles and zeros rather than
removing those. Although, it is found that possibility of such cases are rare.
Difficulty also arises if background noise or a sudden change in the formant
frequencies causes the tracker to wander far away from the true formant
values. Hence, it was necessary to place limit on the frequency range
allowable for each formant.
-
References
[1] Bruce, Ian C., et al. "Robust formant tracking in noise." Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on. Vol. 1. IEEE, 2002.
[2] A. Rao and R. Kumaresan, On decomposing speech into modulated components, IEEE Trans. Speech Audio Processing, vol. 8, no. 3, pp. 240254, May 2000.
[3] Poonam Jindal, Algorithms for tracking formant frequencies of a continuous speech with speaker variability, Thesis.
[4] Snell, Roy C., and Fausto Milinazzo. "Formant location from LPC analysis data." Speech and Audio Processing, IEEE Transactions on 1.2 (1993): 129-134.
-
Demo on Matlab!!!
-
Thank you
-
Questions