[IEEE 2010 International Conference on Measuring Technology and Mechatronics Automation (ICMTMA...

Speech denoising and Syllable segmentation based on Fractal dimension

PAN Feng, National Key Laboratory on ISN,

Xidian University, Xi'an,China

[email protected]

DING Na-naNetwork and Information Security Key Laboratory ,

Electronics Department, Engineering College of the APF, Xi’an,China

[email protected]

Abstract—In order to enhance the effect of existing wavelet denoising and determine beginning—ending points of each syllable in continuous speech, the thesis improves algorithms based on fractal theory. Firstly, the algorithm use dynamic threshold algorithm which combines fractal dimension with wavelet transform to denoise the speech signal; on this basis, the paper design an algorithm which is based on fractal dimension trajectory to carry out syllable segmentation. The experimental results show that the improved algorithms not only betterly carry out speech denoising and syllable segmentation, but also have good robustness. In the case of low SNR the algorithm is still able to maintain high accuracy rate.

Keywords-component; speech recognition fractal dimensionspeech denoising syllable segmentation

I. INTRODUCTION

In real-life environment, speech signal is always interfered by some noise which is from surrounding architectures transmission media electrical facilities and so on. Sometimes the surrounding noise effects so heavily that the speech signal can not be distinguished .The lower quality of speech signal results in serious deterioration of speech signal processing system. Now pretreatment of speech signal plays an very important role to improve the performance of system .we mainly discussed the application of fractal method de-noising of speech signal.

Syllable segmentation are an important access to extract audio structure and content and are a basis for further audio retrieval and analysis.

At present, primary speech denoising methods are Wavelet denoising[1] Spectral subtract denoising[2]. Dominating Syllable segmentation methods are based on Points --- Differential fractal dimension[3] Energy and Short-term zero crossing rate[4] MEL frequency[5]. Obviously, with the growing database, there is greater discrepancy with the real-time requirements efficiency SNR on the bearing degree of robustness or other objects.

literature[1] proposed the Wavelet de-noising algorithm, which is a common method of noise reduction for voice. But according to dynamic characteristics of input signal, it could

not primely decide restrict condition so as to achieve optimal filter. Literature[3] proposed the Points --- Differential fractal dimension algorithm, but its calculation is so fussy that the algorithm apparently could not meet the real-time requirements for huge database.

On the basis of full fractal theory study, the paper propose Speech denoising and Syllable segmentation based on fractal dimension. Firstly, Wavelet denoising method in literature[1] is improved. The algorithm utilizes dynamic threshold algorithm which combines fractal dimension with wavelet transform to denoise speech. On this basis, the paper mends the algorithm of literature[3] and utilizes the mean of fractal dimension trajectory as reference for syllable segmentation points. The experimental results show that the algorithm not only improves the effect of speech denoising and the accuracy of syllable segmentation, but also reduced the rate of arithmetic operations. Thus the algorithm will lay a good foundation for completing the whole process of speech recognition.

II. THE CALCULATION AND RULES OF FRACTAL DIMENSION

A. The calculation of fractal dimension

In speech signal, as quantitative characterization of chaotic phenomena,the fractal dimension can effectively reflect the character changes of signal. Therefore, it can serve as characteristic parameters for speech denoising and segmentation.

There are many definitions of fractal dimension method, this paper uses box-counting dimension for analysis.

We use square grid with the side length S to cover over digital waveform F, which is composed of sampling sequence X (t) (0<t<T , T is the number of sampling points).Suppose the number of the intersecting square is N (s), then box-counting dimension of F is as follows:

0

ln( ( ))( ) limln (1/ )B s

N sD FN s (2 - 1)

Specific steps of box-counting fractal dimension algorithm [6]are as follows:

2010 International Conference on Measuring Technology and Mechatronics Automation

978-0-7695-3962-1/10 $26.00 © 2010 IEEE

DOI 10.1109/ICMTMA.2010.587

433

Normalize original speech signal to the unit square area,

then gain the normalized signal x (t);

Divide square area into grids with the side length s, calculate logN (s) and log (1 / s); change the size of s, then calculate the corresponding logN (s) and log (1 / s);

According to the least-square fitting straight line algorithm, fitting out of the slope of logN (s) and logN (l / s). Base on this, we are able to calculate box-counting dimension of the voice frame.

B. The rules of fractal dimension

The paper experiments fractal dimension value with a number of phonemes sampling rate 8KHZ 16bit precision, then found the following rules:

(1)Speech waveform which contains high noise is relative zigzag and its fractal dimension value will be large. On the other round, the speech waveform is relative flat, the value will be small.

(2) The fractal dimension value is mainly focus between 1.2 and 1.8, in other words, for phoneme there is regular spatial distribution of fractal dimension.

(3) The fractal dimension value varies as waveform changes, it is proportional to frequency distribution. The fractal dimension distribution of sonant is mainly between 1.2 and 1.5 while the distribution of surd is focus between 1.5 and 1.8. The value of male is smaller than of female.

III. THE DYNAMIC THRESHOLD DENOISING ALGORITHM

To extract pure signal from original speech as far as possible and optimize denoising effect, combining with the rule that fractal dimension can reflect the size of noise, the Dynamic Threshold denoising algorithm which is based on fractal dimension and wavelet transform is proposed. By using this algorithm, adjust the threshold value of fractal dimension dynamically, so as to achieve the optimal filter. The whole process shows in Figure 1:

Figure 1

Collect a speech signal pretreat speech fragments and calculate the fractal dimension value of every frame getdemension 1. the paper carry out wavelet decomposing under the small wavelet db3,then gain wavelet coefficients under different scales; According to Neyman-Pearson criterion[8],determine T; dispose and reconstruct wavelet coefficients, filtrate the noise and voice whose value is greater than T, then the result is the preliminary denoising signals; Calculate fractal dimension value of every frame and gain the dimension 2, adjust the threshold value dynamically

dim 1' *dim 2

ensionT Tension

; Then reconstruct wavelet again and

filtrate the noise and voice whose value is greater than T ', the worked result is the final denoising signal. Figure 2 shows the experimental results of a random sampling which contains male "duan dian jian ce" voice under matlab7.0 environment. In each subgraph, the horizontal axis shows the sampling points, while the vertical axis shows the peak value.

Figure 2 Identify applicable sponsor/s here. (sponsors)

434

As is showing in Figure 2, in the absence of unconspicuous waveform loss, the Dynamic Threshold denoising algorithm has definite effect on speech denoising and enhancement. On calculation aspect, because the box-counting dimension has simple calculation, so the computational complexity of this algorithm is almost same as traditional Wavelet denoising method. On waveform aspect, the reducing noise degree of this algorithm is also superior to the traditional Wavelet denoising method. Therefore, the algorithm can effectively reduce background noise and improve voice quality at the same time.

IV. THE SYLLABLE SEGMENTATION ALGORITHM

The fractal dimension trajectory is the sequence composed of each frame fractal dimension, it can fully reflect fractal dimension changes of voice. Because the box-counting dimension algorithm can reserve dimension value of noisy segments better, unobvious surge in silent segments will also lead to rapid box-counting dimension value decline, so there are jump points in the boundary between silent and voice segments.

Figure 3 In Figure 3, we evidently observe that the fractal dimension

values of this speech signal are between 1.1 and 1.5, the noisy segments generally have larger fractal dimension value than the silent segments. Owing to fractal dimension mutations that phonation caused, referenced segmentation points is provided. So Syllable segmentation algorithm based on fractal dimension trajectory is proposed. Concrete steps are as follows:

Calculate fractal dimension value of each frame of speech signal and label the value as D[i](i = 1 2 ... ... N N is the number of frames), then get the fractal dimension trajectory;

Calculate the mean of fractal dimension values and label the mean value as mid(D);

If fractal dimension values of the i-th frame and the later four frame are all bigger than mid(D), then determine the initial sampling point of the i-th frame as jumping-off point corresponding to one syllable;

If fractal dimension values of the i-th frame and later four frame are all smaller than mid(D), then determine the last sampling point of the ith frame as end point corresponding to one syllable;

Determine the next frame in turn, if i is smaller than N-4, then back to step , otherwise end the whole decision procedure.

Figure 4 shows the sampling results of speech segmentation using this algorithm.

Figure 4

As experimental results show, this Syllable segmentation algorithm can clearly and accurately show syllable segmentation points of speech signal. Because of the ideal results, calculation amount is also relative small for the experiment, so we can easy to implement and the algorithm will lay the foundation on follow-up speech recognition work.

V. EXPERIMENTAL RESULTS

A. Test Data Description

Using CoolEdit software, record multi-segment voice signal and establish a database for analysis and experiments. The database consists of three men and three women who respectively records 16 different terms, storage them as format 'wav'. In the experiment, all voice signals adopt 8KHZ sampling frequency and 16-bit quantization precision, each frame contains 160 sampling points, frame shift is 80. For different algorithms, using the same experimental data to test them, relativity performance of algorithm has reference.

B. Test Results

In the process of voice processing and speech denoising, it is practical to filter wideband noise from noisy speech signal for the sake of improving SNR. Testing at different SNR, the paper respectively use the Wavelet denoising method and Dynamic Threshold denoising algorithm to deal with speech denoising under matlab7.0 environment, then calculate the average SNR after speech denoising. Table 1 shows the experimental results.

435

Table 1 Experimental results of denoising Shown from the test results, comparing with traditional

Wavelet denoising algorithm, the average SNR has been improved a certain extent through Dynamic Threshold denoising method. The algorithm can effectively reduce background noise and meet the needs of practical applications.

In the experiment of Syllable segmentation, it is impossible that the start time of algorithm experiment just overlap the actual start time of voice. At the same time, the change of algorithm parameters will also affect the syllable start time determining. Therefore, the paper introduces a new parameter---the start time tolerance( =5%* Voice length). Suppose that the syllable is segmented correctly, if the interval is less than

between experimental start time and actual start time. Compare this syllable segmentation algorithm with the algorithm based on Points--- Differential fractal dimension, the experimental results are as follows:

Table 2 The experimental comparison of segmentation

It can be seen that the proposed algorithm has been greatly improved segmentation effect at different SNR test. When the noisy speech SNR is low, the accuracy rate will still maintain a high level. Within the allowable error range, you can quickly and effectively split audio clips. Evidently, it can meet the exact requirements of voice recognition system for segmenting syllable.

VI. CONCLUSION

On the basis of fully studying fractal dimension, Dynamic Threshold algorithm is proposed, which adjusts the threshold

dynamically according to fractal dimension after the first wavelet reconstruction. Reconstructing speech signal again, the algorithm extracts speech signal as pure as possible. On this basis, the paper segment syllable according to the fractal dimension trajectory.

Simulation results show that denoising performance has improved comparing with the traditional denoising algorithm. In the case of low SNR, the Syllable segmentation algorithm based on fractal dimension trajectory remains relative high accuracy, which the other segmentation algorithms can not be matched.

At the same time, the Dynamic Threshold denoising algorithm also exists defect, because there is a certain loss of voice waveforms in the process of denosing. With the improvement of the theory and in depth practice, the deeper research is needed.

ACKNOWLEDGMENT

This work has been supported by National Natural Science Foundation of China(60842006).

REFERENCES

[1] S.MANIKA NDAN,Asst Prof,Department of ECE, SPEECH ENHANCEMENT BASED ON WAVELET DENOISING,Academic Open Internet Journal,2006

[2] M. Talbi, H. Belgacem, and A. Cherif (Tunisia),A Wavelet Packet Speech Denoising using Spectral Entropy,2007

[3] Dashun Que, Continuous Speech Real-time Segmentation Technology Based on ShortTime FractalDimension[J], television technology, 2008: 33-37

[4] Xiaoliang FENG,Communication University of China,status-changed based syllable detection in chinese continuous speech,2006

[5] S.Jothilakshmi, Department of Computer Science and Engineering, Annamalai University, Unsupervised speaker segmentation with residual phase and MFCC features,2009

[6] Fan Wang, Speech Phonology segmentation based on multi-scale fractal dimension of Chinese [J], Journal of Tsinghua University (Natural Science Edition), 2002.42 (l) :68-71.

[7] Fan Wang, Speech Phonology segmentation based on multi-scale fractal dimension of Chinese [J], Journal of Tsinghua University (Natural Science Edition), 2002.42 (l) :68-71.

[8] Jianbin Zhu, Yuan Li, Speech enhancement based on wavelet transform of the study [J], Science and Technology Information, 2001:32-33

436

[IEEE 2010 International Conference on Measuring Technology and Mechatronics Automation (ICMTMA...

Documents

Transcript of [IEEE 2010 International Conference on Measuring Technology and Mechatronics Automation (ICMTMA...