Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi.
145636087612--483-486-Suhail Kazi Composed Paid 25-3-13 REF-Not in the Text
-
Upload
g-naveen-kumar -
Category
Documents
-
view
219 -
download
0
Transcript of 145636087612--483-486-Suhail Kazi Composed Paid 25-3-13 REF-Not in the Text
-
8/13/2019 145636087612--483-486-Suhail Kazi Composed Paid 25-3-13 REF-Not in the Text
1/4
Sci.Int.(Lahore),25(3),483-486, 2013 ISSN 1013-5316; CODEN: SINTE 8 483
SPEECH TEXT SYNTHESIS: VARIATIONS OF PITCH AND
DURATION ARE INVESTIGATED USING MATLAB2Suhail Kazi,
1Yasir Saleem,
3Mujtaba H. Jaffery
1Hafiz Muhammad Falak Sher,
1Muhammad Munwar Iqbal
1University Engineering & Technology, Lahore, Pakistan2Universiti Teknologi Malaysia (UTM), Johor, Malaysia
3CREPS, Department of Electrical Engineering, COMSATS Institute of Information Technology, Lahore-Pakistan.
*CorrespondingAuthor: [email protected],ABSTRACT: In this paper work on speech processing is presented in two different parts. First part is on
speech synthesis using KLATT synthesizer (4) which is software for a cascade/parallel formant synthesizer.
In KLATT synthesizer, formant values are placed that are extracted here using WAVESURFER (6),
software, and a speech is produced by putting values of these extracted formants that is then compared
with the original signal for analysis. Here actually we want to synthesize text to speech using KLATT
synthesizer. Different sounds ba, bi, da, di, ga and giare recorded in the PRAAT (7) and then open in the
WAVE SURFER and manually formants are calculated. In second part of this paper, we record a vowel
and calculate total duration of the speech by using MATLAB from length of the speech and sampling rate
then extract the samples for 50ms duration to calculate the pitch period. We can manually mark the pitch
periods to calculate the pitch of speech signal. By using MATLAB vary the pitch, increase or decrease, and
vary duration of speech signal and construct a new signal of varied pitch and duration. Different vowels
are recorded of different aged people and analyzed.
Keywords: Speech synthesis, Text-To-Speech, Pitch, Duration, Matlab, Wave Surfer
1. INTRODUCTION
Speech is the natural form of human communication.Speech sounds are produced by air pressure vibrations,generated by pushing inhaled air from the lungs throughthe vibrating vocal cords and vocal tract and out from thelips and nose airways.Just as the written form of a language is a sequence ofelementary alphabet, speech is also a sequence ofelementary acoustic sounds or symbols known as
phonemes that convey the spoken form of a language.There are about 40-60 phonemes in the English language
from which a very large number of spoken words can beconstructed. In practice the production of each phonemicsound is affected by the context of the neighboring
phonemes. So diphones and triphones are selectedcarefully that acquire complete information. In KLATsynthesizer diphones and triphones are used to synthesizethe speech. The diphone and triphones are selectedcarefully and concatenated to form the speech.2. LITERATURE REVIEW
Text to speech and speech to text is a very hot area ofresearch. Considerable effort has been dedicated in theearlier two to three decades to developing simulations ofhuman dialogue assembly and observation (Ladefoged2005). Commonly the two separate modalities have been
deliberated yet a quantity of investigators require keenout.In the English consonant, the most perceptible hint and
cue can be provided to pronouncing the English languageby the Vowels. Some researchers have worked on it andchecked the effect of such type of consonant for thelanguage other than English, for example French andArabic. Through the conducted research, we can compareLebanese speakers of English word final consonants forthe advanced and intermediate.
Speech to Text (STT) applications are faced the mainhurdle of new technology required for audio information
processing. High cost of infrastructure and required toconduct dialogue recognition study precludes many smallinvestigation groups from valuing new concepts on large-scale tasks. Speech to Text classification systemcomprises the following:1. Open access through Internet2. Complete documents and operative guidelines3. advanced system with intervallic promotions4. Application having Object-oriented design5. On-line practical maintenanceSpeech-DM is a KDD application which is designed tohandle the speech and discovers the patterns of pitchvariant. It must comprise of data pre-processing,management, mining, and post-processing. Figure 1 givesits architecture.
Figure1:SpeechDMArchitecture.
Speech-DM facilitates interface for watching and settingconstraints throughout training and testing.
-
8/13/2019 145636087612--483-486-Suhail Kazi Composed Paid 25-3-13 REF-Not in the Text
2/4
ISSN 1013-5316; CODEN: SINTE 8 Sci.Int.(Lahore),25(3),483-486, 2013484
3. SPEECH SYNTHESISSpeech synthesis is the artificial production of humanspeech. A typical Text-To-Speech (TTS) system convertsnormal language text into speech.
Figure 2: Text-to-Speech System
It is composed of two parts a front-end and a back-end.The front-end assigns phonetic transcriptions to eachword, and divides and marks the text into prosodic units,like phrases, clauses, and sentences. The process of
assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion.Phonetic transcriptions and prosody information togethermake up the symbolic linguistic representation that isgiven by the front-end. The back-end then converts thesymbolic linguistic representation into sound. In certainsystems, this part includes the computation of the target
prosody (pitch contour, phoneme durations) which is thenimposed on the output speech.Synthesized speech can be created by concatenating
pieces of recorded speech that are already stored. Hereconcatenated pieces are in the form of diphones andtriphones.First of all we record the sound sba, bi, da, di, ga, gi using
PRAAT and save as it as wavfile then open it in theWAVESURFER to manually calculate the formants.Using WAVESURFER we can easily guess the formants.i.e. F0, F1, F2, F3.Table 1: Formants calculated for different sounds using
wave surferSr.no Sounds F0 F1 F2 F31 Ba 159hz 675ms 221hz
725ms 222hz1125ms 662hz
3087hz1103hz1213hz
3325hz2536hz2626hz
2 Bi 171hz 450ms 221hz500ms 222hz850ms 441hz
1323hz1544hz2426hz
2205hz2426hz3197hz
3 Da 79hz 630ms 221hz680ms 223hz880ms 226hz
1764hz1544hz2426hz
2205hz2426hz3197hz
4 Di 153hz 450ms 221hz500ms 223hz800ms 226hz
1544hz1544hz2426hz
2756hz2646hz3197hz
5 Ga 136hz 415ms 992hz435ms 1103hz835ms 772hz
1874hz1433hz1213hz
3418hz3528hz2426hz
6 Gi 145hz 500ms 221hz550ms 226hz850ms 230hz
1213hz1433hz2646hz
2977hz3087hz3308hz
4. SYNTHESIS SCRIPT OF KLAT SYNTHESIZER
Now put values of formants, given in the Table 1, in thesynthesis script of KLAT. Synthesizer, of differentsounds.Here just take example of sound gaTIME = 000; F1=992; F2=1874; F3=3418; F0=135.53;
AV=72
TIME + 20; F1=1103; F2=1433; F3=3528; AV=72TIME + 20; F1=1323; F2=1433; F3=3638; AV=72TIME = 400; F1=772; F2=1213; F3=2426; F0=135.53;
AV=72TIME + 30; AV=0We will obtain the speech sound and compare it with theoriginal one which is same sound that had been recordedin PRAAT just different in the manner of speaking(robotic speaking).Similarly we will put values in the synthesis script for
other formants and getting different sounds which thencompare with their original ones.
5. QUALITY CHECKING OF A SPEECH
SYNTHESIZER
The enhanced quality of a speech synthesizer is assessedthrough its resemblance to the human voice as well asthrough its capability to be understood.
6. APPLICATIONS OF SPEECH SYNTHESIS
It permits environmental obstacles to be removedintended for people by means of a broad variety of
disabilities.
They are as well commonly utilized to support those by
means of harsh speech impairment typically in the courseof a committed voice results communication support.
Speech synthesis methods are as well employed in
entertainment assembly for example as games as well as
animations.
7. VARYING DURATION OF SPEECH SIGNALPrior to discussing about the second part of paper, first
record vowels in the PRAAT and save it as wavefile.Then we will follow the given steps below for variation in
the duration of the speech signal.
Reading the vowelCalculate total duration of the speech signal usingMATLAB Command.close allspch=wavread('g:/44100.wav'); %path in the drive forwavefile%
fs=44100; %sampling ratets=length(spch); %length of speecht=ts/fs; % total duration of speech
Extracting the samples for duration of 50 msExtract the samples for the 50ms from the speech signalusing MATLAB Command.time=0.05; % 50 msecsamp= time * fs; %samples in 50 msectemp= buffer(spch,samp,1,'nodelay');spch_samp= temp(:,5);spch_samp=spch_samp';
pitch_period = 0.007 ; % 8 msec manually calculatedfigure,plot(spch_samp)
-
8/13/2019 145636087612--483-486-Suhail Kazi Composed Paid 25-3-13 REF-Not in the Text
3/4
Sci.Int.(Lahore),25(3),483-486, 2013 ISSN 1013-5316; CODEN: SINTE 8 485
t_pitch_prd=floor(time/pitch_period); % total pitchperiod in 50msec vowel.
Extracting one pitch period from vowelWe want to calculate one pitch period. We will extract theone pitch period from the vowel using the MATLABcommand.
L=length(spch_samp); %total # ofsamples in 50msec
samp_one_pitch= L/t_pitch_prd;temp= buffer(spch_samp,samp_one_pitch,1,'nodelay');% one pitch period samples of onlyone_pitch1=temp(:,1);one_pitch1=one_pitch1';one_pitch2=temp(:,2);one_pitch2=one_pitch2';one_pitch3=temp(:,3);one_pitch3=one_pitch3';one_pitch4=temp(:,4);one_pitch4=one_pitch4';one_pitch5=temp(:,5);one_pitch5=one_pitch5';one_pitch6=temp(:,6);one_pitch6=one_pitch6';
plot(one_pitch2)l_one= length(one_pitch2);
Now we can vary the duration of speech signal by simplyadding the pitch period to the original speech signalaccording to my requirement.
Suppose take a speech signal of 200ms and we want tovary the duration of the speech signal up-to 250ms. Wecalculate pitch period for vowel aaa by using PRAAT,which is 7ms. So we will have to add 7 pitch periods tovary my speech signal approximately up-to 250ms.Similarly we can vary the duration of speech signals uptoseveral milli seconds.This was the manual way to vary the duration of the
speech signal.
Figure 4. Extracted pitch period
Changing the duration
Now we will discuss another method to vary the durationof the speech signal which is mathematical way to varythe speech signal using MATLAB.disp('enter the duration of vowel you need in sec')dur_time=input('enter duration in seconds = ')temp=dur_time;leng= floor(temp/pitch_period)new_vowel=spch_samp;
for i=1:leng%k= mod(2*i,7)%one_pitch=one_pitch;
new_vowel= [new_vowel one_pitch2];i=i+1;end
figure, plot(new_vowel)%wavwrite(new_vowel,44100,'s');%wavwrite(spch,44100,'2s');%figure, plot(new_vowel)wavwrite(new_vowel,44100,'vowel1')
figure, plot(new_vowel)Key Points about Varying the Duration1. If we want to double the duration of the speech
signal, just duplicate the pattern and this will notchange the pitch period of that speech signal.
2. If we want to decrease the duration of the speechsignal, just subtracting the pitch periods regularlyuntil the requirement met.
Precautions1. Samples should be taken carefully. Dont take samplesfrom low frequency components.2. Always take sample from high frequency components.3.Transitions should not be stretched because we facing
problem in hearing. Instead we stretch vowel(like middle
part).VARIATION IN THE PITCH
Now we will discuss step by step how the pitch of thespeech signal can be increased and decreased.
Pitch period ChangingPitch period can be changed by hamming window usingthe MATLAB command below.win=hamming(l_one);
figure,plot(one_pitch2)figure,plot(win)aq = one_pitch2.*win';
figure, plot(aq)long=[aqaqaqaqaqaqaqaqaqaqaqaqaqaq];wavwrite(long,44100,'short');
Increasing PitchAs we know that a speech signal having high and lowfrequency components. To increase the pitch we have tocontract the high frequency components. Because lowfrequency components contraction creating listening
problems.Following command is used in the MATLAB to increase
pitch.new_pitch=input('enter the pitch')diff= floor(new_pitch-(1/0.007)); % difference of pitchsamp_less=2*diff; %samples which are to deleted toincrease pitchnew_samp= length(one_pitch1)-samp_less;temp= buffer(aq,new_samp,1,'nodelay');
temp=temp(:,1);temp=temp';
pitch=tempfor i=1:15pitch=[pitch temp]
i=i+1;endwavwrite(pitch,44100,'nn');
-
8/13/2019 145636087612--483-486-Suhail Kazi Composed Paid 25-3-13 REF-Not in the Text
4/4
ISSN 1013-5316; CODEN: SINTE 8 Sci.Int.(Lahore),25(3),483-486, 2013486
Figure 5. Increased pitch period
Decreasing PitchPitch can be decreased by stretching the high frequencycomponents.Following command is used in the MATLAB to decreasethe pitch.close allnew_pitch=input('enter the pitch short ')diff= floor((1/0.007)- new_pitch);samp_increase= 2*diff;
rd=random('exp',(1:30));temp=buffer(rd,samp_increase,1,'nodelay');temp=temp(:,1);temp=temp';
pitch_prd=[one_pitch2 temp];for i=1:7pitch=[pitch pitch_prd];
i=i+1;endwavwrite(pitch,44100,'loo')%shrt=[one_pitch2 rd];%temp=shrt;%for i=1:7
%shrt=[shrt temp];
%i=i+1;%end%short=[shrtshrtshrtshrtshrtshrtshrtshrtshrtshrt];
wavwrite(short,44100,'short')
Figure 6. Decreased pitch period
Precautions (For Pitch)To avoid the listening problem, stretch or contract onlythe high frequency components. Dont take low frequencycomponents.
That was all about increasing or decreasing of the pitchspeech signal.
8. CONCLUSION
In this paper we have discussed how the speech can besynthesized from text data. For this we use KLATsynthesizer which gives output in the form of speechsound which is almost same to the original sound. We cansay from observations that the output sound of KLAT
synthesizer is Robotic type speech sound. But it is not asclear as original voice. Further to put formants in theKLAT synthesizer, we used WAVESURFER instead ofPRAAT for formant estimation because formantsestimation from WAVESURFER is easy to find thanPRAAT, but we record my sound in the PRAAT.Then we saw different techniques relating to frequencyselection and windowing for varying the pitch andduration or the speech signal. To increase the pitch wehave to decrease the pitch and vice versa.
REFERENCES:
1. Rabiner L.R. and Juang B.H. (1993) Fundamentals ofSpeech Recognition. Prentice-Hall, Englewood
Cliffs, NJ.2. Ladefoged, Peter, and Keith Johnson. A course in
phonetics. Wadsworth Publishing Company, 2010.3. Dutoit, Thierry. An introduction to text-to-speech
synthesis. Vol. 3. Springer, 1997.4. http://www.asel.udel.edu/speech/tutorials/synthesis/[
Retrieved from Internet on dated: Saturday, January01, 2013]
5. James MatthewsSAPI 5.0 Tutorial II: Text-to-Speechhttp://www.generation5.org/content/2001/sr01.asp.[Retrieved from Internet on dated: Saturday, January01, 2013]
6. http://www.wavesurfer.findmysoft.com/[Retrievedfrom Internet on dated: Saturday, January 02, 2013]
7. http://www.fon.hum.uva.nl/praat/download_win.html[Retrieved from Internet on dated: Saturday, January01, 2013]
8. Text-to-speech Demohttp://www.research.att.com/projects/tts/. [Retrievedfrom Internet on dated: Saturday, January 01, 2013 ]
9. Alan W. Black, Perfect synthesis for all of the peopleall of the time. IEEE TTS Workshop 2002.
10. Furui S. (1989), Digital Speech Processing, Synthesisand Recognition, Marcel Dekker.
11. McClellan, J. H., Burrus, C. S., Oppenheim, A. V.,Parks, T. W., Schafer, R. W., &Schuessler, H. W.(1998). Computer-based exercises for signal
processing using MATLAB 5 (Vol. 5). Prentice Hall.
12. Schafer, R. and L. Rabiner (1978). Digital processingof speech signals, Englewood Cli s, NJ: Prentice-Hall.
13. D. Pennell and Y. Liu, "Normalization of textmessages for text-to-speech," ICASSP, 2010.
14. Martin, J. H. and D. Jurafsky (2000). Speech andlanguage processing, prentice hall.