Formant Tracking Using LPC Root Solving

Robust formant tracking

using LPC root solving Team Members

Patel Shabaz Basheer EE09B041

Pinjala Sandeep EE09B025

Indian Institute of Technology Hyderabad

Motivation

The application of formants is useful in different applications such as speech

recognition, enhancement, noise reduction, hearing aid adaptive filters, etc.

Formants

Formants are defined as the spectral peaks of the sound spectrum of the

voice. In speech science and phonetics, formant is also used to mean an

acoustic resonance of the human vocal tract.

Algorithm

S[n]

Pre-emphasis Filter

In order to improve the overall SNR ratio in the given band of frequencies, the

magnitude of the higher frequencies are increased w.r.t the magnitude of the

usually lower frequencies.

In this algorithm we use a common method of pre-emphasizing such as

filtering the speech signal with the help of HPF (High Pass Filter).

Using this, the above mentioned glottal waveform and radiation load

contribution is removed and the energy is redistributed to approximately all

the frequency in the given band region.

A pre-emphasis High Pass Filter would be given by

H[n] =H[n] a1*H[n-1]

Hilbert Transformer (Conversion into

analytic signal)

The conversion of a real signal into an analytical signal has many advantages

and the main advantage while dealing with the adaptive filter banks is that

the analytic signal forms a complex signal for corresponding filtering.

Sc[n] = SR[n] + j*SH[n]

Where,

Sc[n] is the analytic signal,

SR[n] is the real signal,

SH[n] is the Hilbert Transform

Algorithm

S[n]

Adaptive Band-pass Filtering

The Adaptive Band-pass filter suppress interference from neighboring formant

frequencies while tracking an individual formant frequency as it varies with

time. Hence, it tracks only a single formant frequency.

Adaptive Band-pass filter consists :

1) All Zero Filters (AZF)

2) Dynamic Tracking Filter (DTF)

AZF (All Zero Filters)

The AZF in each formant filter is the Adaptive All Zero Filter whose three zero

locations are always set to the value of the previous formant frequency

estimated from the other three formant filters.

The Filters Transfer Function is:-

The value of Kk[n] ensures that the gain is unity and there is zero phase lag at the estimated formant

frequency of the kth component. There is an additional zero which is present at the location of the

pitch estimate and to suppress the pitch effect the zero is included in the filter.

Algorithm

S[n]

DTF (Dynamic Tracking Filters)

The Dynamic Tracking Filter (DTF) in each formant filter is a single pole

dynamic tracking filter for which the pole location is always set to the

previous value of the formant estimate. The transfer function of the kth DTF

at index n is:

Algorithm

S[n]

Voiced Speech Detector

This detector checks if the initial window frame speech signal considered is

the voiced part of the signal.

This is done by finding the pitch period of the signal window by finding its

autocorrelation.

This pitch period would lie in the range of 4ms to 9 ms for male and female

speaker.

Energy Detector

After the speech signal is filtered using the adaptive band-pass filter-bank,

energy of the signal in that window frame is calculated. The energy of that

formant band must be higher than a specified energy threshold value.

The LPC root solving is only done if both minimum energy criteria and that

particular window frame belongs to the voiced part of the speech

LPC Root Solving

Linear Prediction analysis provides a good approximation to the vocal track

spectral envelope especially to the voiced region of speech where all pole

model of LPC is used.

During unvoiced transient region of speech, this LPC model is less effective

than for voiced regions and but still provides acceptable results.

The Linear Predication method can be stated as finding the coefficients ak

which results in the best prediction i.e. which minimizes the mean-squared

prediction error of the speech sample s[n] in terms of the past samples s[n-k]

The Linear predictor of order p is:

E[n] = S[n] -

Moving Average

The Moving Average computes the Moving average of each formant frequency

and assigns the estimated value of Moving Average if the segment is unvoiced

or the energy of the formant frequency is below the threshold value.

In all the other cases when the energy is above a threshold value and the

speech being voiced, the estimated value of the formant from the LPC

analysis is assigned.

The Formant assigns Moving average of the formant frequency is given by:

Results

The above discussed algorithm has been applied over the speech .wav files.

Formant Tracker performance for the database speech signal

Formant Tracker performance on the database speech signal with a

background noise of SNR of 40dB

RMS errors of formant trackers in presence of AWGN of varying SNR values

Discussion

As the adaptive filter is used with initial values of formant frequencies, the

outputs also depend on these specific initial values given. So, in few cases

when the actual formant frequency does not lie near the initial formant

frequency given as input, we would be few more poles and zeros rather than

removing those. Although, it is found that possibility of such cases are rare.

Difficulty also arises if background noise or a sudden change in the formant

frequencies causes the tracker to wander far away from the true formant

values. Hence, it was necessary to place limit on the frequency range

allowable for each formant.

References

[1] Bruce, Ian C., et al. "Robust formant tracking in noise." Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on. Vol. 1. IEEE, 2002.

[2] A. Rao and R. Kumaresan, On decomposing speech into modulated components, IEEE Trans. Speech Audio Processing, vol. 8, no. 3, pp. 240254, May 2000.

[3] Poonam Jindal, Algorithms for tracking formant frequencies of a continuous speech with speaker variability, Thesis.

[4] Snell, Roy C., and Fausto Milinazzo. "Formant location from LPC analysis data." Speech and Audio Processing, IEEE Transactions on 1.2 (1993): 129-134.

Demo on Matlab!!!

Thank you

Questions

Formant Tracking Using LPC Root Solving

Documents

Transcript of Formant Tracking Using LPC Root Solving