145636087612--483-486-Suhail Kazi Composed Paid 25-3-13 REF-Not in the Text

8/13/2019 145636087612--483-486-Suhail Kazi Composed Paid 25-3-13 REF-Not in the Text

1/4

Sci.Int.(Lahore),25(3),483-486, 2013 ISSN 1013-5316; CODEN: SINTE 8 483

SPEECH TEXT SYNTHESIS: VARIATIONS OF PITCH AND

DURATION ARE INVESTIGATED USING MATLAB2Suhail Kazi,

1Yasir Saleem,

3Mujtaba H. Jaffery

1Hafiz Muhammad Falak Sher,

1Muhammad Munwar Iqbal

1University Engineering & Technology, Lahore, Pakistan2Universiti Teknologi Malaysia (UTM), Johor, Malaysia

3CREPS, Department of Electrical Engineering, COMSATS Institute of Information Technology, Lahore-Pakistan.

*CorrespondingAuthor: [email protected],ABSTRACT: In this paper work on speech processing is presented in two different parts. First part is on

speech synthesis using KLATT synthesizer (4) which is software for a cascade/parallel formant synthesizer.

In KLATT synthesizer, formant values are placed that are extracted here using WAVESURFER (6),

software, and a speech is produced by putting values of these extracted formants that is then compared

with the original signal for analysis. Here actually we want to synthesize text to speech using KLATT

synthesizer. Different sounds ba, bi, da, di, ga and giare recorded in the PRAAT (7) and then open in the

WAVE SURFER and manually formants are calculated. In second part of this paper, we record a vowel

and calculate total duration of the speech by using MATLAB from length of the speech and sampling rate

then extract the samples for 50ms duration to calculate the pitch period. We can manually mark the pitch

periods to calculate the pitch of speech signal. By using MATLAB vary the pitch, increase or decrease, and

vary duration of speech signal and construct a new signal of varied pitch and duration. Different vowels

are recorded of different aged people and analyzed.

Keywords: Speech synthesis, Text-To-Speech, Pitch, Duration, Matlab, Wave Surfer

1. INTRODUCTION

Speech is the natural form of human communication.Speech sounds are produced by air pressure vibrations,generated by pushing inhaled air from the lungs throughthe vibrating vocal cords and vocal tract and out from thelips and nose airways.Just as the written form of a language is a sequence ofelementary alphabet, speech is also a sequence ofelementary acoustic sounds or symbols known as

phonemes that convey the spoken form of a language.There are about 40-60 phonemes in the English language

from which a very large number of spoken words can beconstructed. In practice the production of each phonemicsound is affected by the context of the neighboring

phonemes. So diphones and triphones are selectedcarefully that acquire complete information. In KLATsynthesizer diphones and triphones are used to synthesizethe speech. The diphone and triphones are selectedcarefully and concatenated to form the speech.2. LITERATURE REVIEW

Text to speech and speech to text is a very hot area ofresearch. Considerable effort has been dedicated in theearlier two to three decades to developing simulations ofhuman dialogue assembly and observation (Ladefoged2005). Commonly the two separate modalities have been

deliberated yet a quantity of investigators require keenout.In the English consonant, the most perceptible hint and

cue can be provided to pronouncing the English languageby the Vowels. Some researchers have worked on it andchecked the effect of such type of consonant for thelanguage other than English, for example French andArabic. Through the conducted research, we can compareLebanese speakers of English word final consonants forthe advanced and intermediate.

Speech to Text (STT) applications are faced the mainhurdle of new technology required for audio information

processing. High cost of infrastructure and required toconduct dialogue recognition study precludes many smallinvestigation groups from valuing new concepts on large-scale tasks. Speech to Text classification systemcomprises the following:1. Open access through Internet2. Complete documents and operative guidelines3. advanced system with intervallic promotions4. Application having Object-oriented design5. On-line practical maintenanceSpeech-DM is a KDD application which is designed tohandle the speech and discovers the patterns of pitchvariant. It must comprise of data pre-processing,management, mining, and post-processing. Figure 1 givesits architecture.

Figure1:SpeechDMArchitecture.

Speech-DM facilitates interface for watching and settingconstraints throughout training and testing.


2/4

ISSN 1013-5316; CODEN: SINTE 8 Sci.Int.(Lahore),25(3),483-486, 2013484

3. SPEECH SYNTHESISSpeech synthesis is the artificial production of humanspeech. A typical Text-To-Speech (TTS) system convertsnormal language text into speech.

Figure 2: Text-to-Speech System

It is composed of two parts a front-end and a back-end.The front-end assigns phonetic transcriptions to eachword, and divides and marks the text into prosodic units,like phrases, clauses, and sentences. The process of

assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion.Phonetic transcriptions and prosody information togethermake up the symbolic linguistic representation that isgiven by the front-end. The back-end then converts thesymbolic linguistic representation into sound. In certainsystems, this part includes the computation of the target

prosody (pitch contour, phoneme durations) which is thenimposed on the output speech.Synthesized speech can be created by concatenating

pieces of recorded speech that are already stored. Hereconcatenated pieces are in the form of diphones andtriphones.First of all we record the sound sba, bi, da, di, ga, gi using

PRAAT and save as it as wavfile then open it in theWAVESURFER to manually calculate the formants.Using WAVESURFER we can easily guess the formants.i.e. F0, F1, F2, F3.Table 1: Formants calculated for different sounds using

wave surferSr.no Sounds F0 F1 F2 F31 Ba 159hz 675ms 221hz

725ms 222hz1125ms 662hz

3087hz1103hz1213hz

3325hz2536hz2626hz

2 Bi 171hz 450ms 221hz500ms 222hz850ms 441hz

1323hz1544hz2426hz

2205hz2426hz3197hz

3 Da 79hz 630ms 221hz680ms 223hz880ms 226hz

1764hz1544hz2426hz

2205hz2426hz3197hz

4 Di 153hz 450ms 221hz500ms 223hz800ms 226hz

1544hz1544hz2426hz

2756hz2646hz3197hz

5 Ga 136hz 415ms 992hz435ms 1103hz835ms 772hz

1874hz1433hz1213hz

3418hz3528hz2426hz

6 Gi 145hz 500ms 221hz550ms 226hz850ms 230hz

1213hz1433hz2646hz

2977hz3087hz3308hz

4. SYNTHESIS SCRIPT OF KLAT SYNTHESIZER

Now put values of formants, given in the Table 1, in thesynthesis script of KLAT. Synthesizer, of differentsounds.Here just take example of sound gaTIME = 000; F1=992; F2=1874; F3=3418; F0=135.53;

AV=72

TIME + 20; F1=1103; F2=1433; F3=3528; AV=72TIME + 20; F1=1323; F2=1433; F3=3638; AV=72TIME = 400; F1=772; F2=1213; F3=2426; F0=135.53;

AV=72TIME + 30; AV=0We will obtain the speech sound and compare it with theoriginal one which is same sound that had been recordedin PRAAT just different in the manner of speaking(robotic speaking).Similarly we will put values in the synthesis script for

other formants and getting different sounds which thencompare with their original ones.

5. QUALITY CHECKING OF A SPEECH

SYNTHESIZER

The enhanced quality of a speech synthesizer is assessedthrough its resemblance to the human voice as well asthrough its capability to be understood.

6. APPLICATIONS OF SPEECH SYNTHESIS

It permits environmental obstacles to be removedintended for people by means of a broad variety of

disabilities.

They are as well commonly utilized to support those by

means of harsh speech impairment typically in the courseof a committed voice results communication support.

Speech synthesis methods are as well employed in

entertainment assembly for example as games as well as

animations.

7. VARYING DURATION OF SPEECH SIGNALPrior to discussing about the second part of paper, first

record vowels in the PRAAT and save it as wavefile.Then we will follow the given steps below for variation in

the duration of the speech signal.

Reading the vowelCalculate total duration of the speech signal usingMATLAB Command.close allspch=wavread('g:/44100.wav'); %path in the drive forwavefile%

fs=44100; %sampling ratets=length(spch); %length of speecht=ts/fs; % total duration of speech

Extracting the samples for duration of 50 msExtract the samples for the 50ms from the speech signalusing MATLAB Command.time=0.05; % 50 msecsamp= time * fs; %samples in 50 msectemp= buffer(spch,samp,1,'nodelay');spch_samp= temp(:,5);spch_samp=spch_samp';

pitch_period = 0.007 ; % 8 msec manually calculatedfigure,plot(spch_samp)


3/4

Sci.Int.(Lahore),25(3),483-486, 2013 ISSN 1013-5316; CODEN: SINTE 8 485

t_pitch_prd=floor(time/pitch_period); % total pitchperiod in 50msec vowel.

Extracting one pitch period from vowelWe want to calculate one pitch period. We will extract theone pitch period from the vowel using the MATLABcommand.

L=length(spch_samp); %total # ofsamples in 50msec

samp_one_pitch= L/t_pitch_prd;temp= buffer(spch_samp,samp_one_pitch,1,'nodelay');% one pitch period samples of onlyone_pitch1=temp(:,1);one_pitch1=one_pitch1';one_pitch2=temp(:,2);one_pitch2=one_pitch2';one_pitch3=temp(:,3);one_pitch3=one_pitch3';one_pitch4=temp(:,4);one_pitch4=one_pitch4';one_pitch5=temp(:,5);one_pitch5=one_pitch5';one_pitch6=temp(:,6);one_pitch6=one_pitch6';

plot(one_pitch2)l_one= length(one_pitch2);

Now we can vary the duration of speech signal by simplyadding the pitch period to the original speech signalaccording to my requirement.

Suppose take a speech signal of 200ms and we want tovary the duration of the speech signal up-to 250ms. Wecalculate pitch period for vowel aaa by using PRAAT,which is 7ms. So we will have to add 7 pitch periods tovary my speech signal approximately up-to 250ms.Similarly we can vary the duration of speech signals uptoseveral milli seconds.This was the manual way to vary the duration of the

speech signal.

Figure 4. Extracted pitch period

Changing the duration

Now we will discuss another method to vary the durationof the speech signal which is mathematical way to varythe speech signal using MATLAB.disp('enter the duration of vowel you need in sec')dur_time=input('enter duration in seconds = ')temp=dur_time;leng= floor(temp/pitch_period)new_vowel=spch_samp;

for i=1:leng%k= mod(2*i,7)%one_pitch=one_pitch;

new_vowel= [new_vowel one_pitch2];i=i+1;end

figure, plot(new_vowel)%wavwrite(new_vowel,44100,'s');%wavwrite(spch,44100,'2s');%figure, plot(new_vowel)wavwrite(new_vowel,44100,'vowel1')

figure, plot(new_vowel)Key Points about Varying the Duration1. If we want to double the duration of the speech

signal, just duplicate the pattern and this will notchange the pitch period of that speech signal.

2. If we want to decrease the duration of the speechsignal, just subtracting the pitch periods regularlyuntil the requirement met.

Precautions1. Samples should be taken carefully. Dont take samplesfrom low frequency components.2. Always take sample from high frequency components.3.Transitions should not be stretched because we facing

problem in hearing. Instead we stretch vowel(like middle

part).VARIATION IN THE PITCH

Now we will discuss step by step how the pitch of thespeech signal can be increased and decreased.

Pitch period ChangingPitch period can be changed by hamming window usingthe MATLAB command below.win=hamming(l_one);

figure,plot(one_pitch2)figure,plot(win)aq = one_pitch2.*win';

figure, plot(aq)long=[aqaqaqaqaqaqaqaqaqaqaqaqaqaq];wavwrite(long,44100,'short');

Increasing PitchAs we know that a speech signal having high and lowfrequency components. To increase the pitch we have tocontract the high frequency components. Because lowfrequency components contraction creating listening

problems.Following command is used in the MATLAB to increase

pitch.new_pitch=input('enter the pitch')diff= floor(new_pitch-(1/0.007)); % difference of pitchsamp_less=2*diff; %samples which are to deleted toincrease pitchnew_samp= length(one_pitch1)-samp_less;temp= buffer(aq,new_samp,1,'nodelay');

temp=temp(:,1);temp=temp';

pitch=tempfor i=1:15pitch=[pitch temp]

i=i+1;endwavwrite(pitch,44100,'nn');


4/4

ISSN 1013-5316; CODEN: SINTE 8 Sci.Int.(Lahore),25(3),483-486, 2013486

Figure 5. Increased pitch period

Decreasing PitchPitch can be decreased by stretching the high frequencycomponents.Following command is used in the MATLAB to decreasethe pitch.close allnew_pitch=input('enter the pitch short ')diff= floor((1/0.007)- new_pitch);samp_increase= 2*diff;

rd=random('exp',(1:30));temp=buffer(rd,samp_increase,1,'nodelay');temp=temp(:,1);temp=temp';

pitch_prd=[one_pitch2 temp];for i=1:7pitch=[pitch pitch_prd];

i=i+1;endwavwrite(pitch,44100,'loo')%shrt=[one_pitch2 rd];%temp=shrt;%for i=1:7

%shrt=[shrt temp];

%i=i+1;%end%short=[shrtshrtshrtshrtshrtshrtshrtshrtshrtshrt];

wavwrite(short,44100,'short')

Figure 6. Decreased pitch period

Precautions (For Pitch)To avoid the listening problem, stretch or contract onlythe high frequency components. Dont take low frequencycomponents.

That was all about increasing or decreasing of the pitchspeech signal.

8. CONCLUSION

In this paper we have discussed how the speech can besynthesized from text data. For this we use KLATsynthesizer which gives output in the form of speechsound which is almost same to the original sound. We cansay from observations that the output sound of KLAT

synthesizer is Robotic type speech sound. But it is not asclear as original voice. Further to put formants in theKLAT synthesizer, we used WAVESURFER instead ofPRAAT for formant estimation because formantsestimation from WAVESURFER is easy to find thanPRAAT, but we record my sound in the PRAAT.Then we saw different techniques relating to frequencyselection and windowing for varying the pitch andduration or the speech signal. To increase the pitch wehave to decrease the pitch and vice versa.

REFERENCES:

1. Rabiner L.R. and Juang B.H. (1993) Fundamentals ofSpeech Recognition. Prentice-Hall, Englewood

Cliffs, NJ.2. Ladefoged, Peter, and Keith Johnson. A course in

phonetics. Wadsworth Publishing Company, 2010.3. Dutoit, Thierry. An introduction to text-to-speech

synthesis. Vol. 3. Springer, 1997.4. http://www.asel.udel.edu/speech/tutorials/synthesis/[

Retrieved from Internet on dated: Saturday, January01, 2013]

5. James MatthewsSAPI 5.0 Tutorial II: Text-to-Speechhttp://www.generation5.org/content/2001/sr01.asp.[Retrieved from Internet on dated: Saturday, January01, 2013]

6. http://www.wavesurfer.findmysoft.com/[Retrievedfrom Internet on dated: Saturday, January 02, 2013]

7. http://www.fon.hum.uva.nl/praat/download_win.html[Retrieved from Internet on dated: Saturday, January01, 2013]

8. Text-to-speech Demohttp://www.research.att.com/projects/tts/. [Retrievedfrom Internet on dated: Saturday, January 01, 2013 ]

9. Alan W. Black, Perfect synthesis for all of the peopleall of the time. IEEE TTS Workshop 2002.

10. Furui S. (1989), Digital Speech Processing, Synthesisand Recognition, Marcel Dekker.

11. McClellan, J. H., Burrus, C. S., Oppenheim, A. V.,Parks, T. W., Schafer, R. W., &Schuessler, H. W.(1998). Computer-based exercises for signal

processing using MATLAB 5 (Vol. 5). Prentice Hall.

12. Schafer, R. and L. Rabiner (1978). Digital processingof speech signals, Englewood Cli s, NJ: Prentice-Hall.

13. D. Pennell and Y. Liu, "Normalization of textmessages for text-to-speech," ICASSP, 2010.

14. Martin, J. H. and D. Jurafsky (2000). Speech andlanguage processing, prentice hall.

145636087612--483-486-Suhail Kazi Composed Paid 25-3-13 REF-Not in the Text

Documents

Transcript of 145636087612--483-486-Suhail Kazi Composed Paid 25-3-13 REF-Not in the Text