first_seminar
-
Upload
parakrant-sarkar -
Category
Documents
-
view
90 -
download
0
Transcript of first_seminar
![Page 1: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/1.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
Pause Modeling for Storytelling Style SpeechSynthesis
First Seminarby
Parakrant Sarkar[Roll No: 12IT72P08]
Under the Supervision ofDr. K.Sreenivasa Rao
School of Information TechnologyIndian Institute of Technology Kharagpur
October 7, 2015
![Page 2: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/2.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
OUTLINE
Introduction
Objective
Motivation
Literature Review
Story Corpus
Story Synthesis Framework
Predicting the Position of Pause
Prediction of Pause Duration
Future Work
Acknowledgments
References
![Page 3: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/3.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
INTRODUCTION
![Page 4: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/4.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
INTRODUCTION
What is Storytelling ?
“Storytelling is the art of using language, vocalization and/orphysical movement and gesture to reveal the elements of astory to a specific, live audience.” – National StorytellingAssociation
Source: http://www.eldrbarry.net/roos/st defn.htm
![Page 5: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/5.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
OBJECTIVE
![Page 6: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/6.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
OBJECTIVE
Neutral TTS Story Text Incorporating the
Story Specific Information
I Development of Story Text-to-Speech (TTS) using theneutral TTS for Hindi.
I Modeling the pause for storytelling style speech.
![Page 7: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/7.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
MOTIVATION
![Page 8: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/8.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
MOTIVATION
I Pauses play a vital role in the synthesis of storytelling stylespeech.
I Variation in the duration of the pause enhances the qualityof the story.
I Pauses are used for separating phrases and emphasizingkeywords to introduce suspense and climax in the story.
![Page 9: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/9.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
LITERATURE REVIEW
![Page 10: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/10.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
LITERATURE REVIEW
Authors Year Model Database ContibutionPaul Tayloret al.
1998 HiddenMarkovModels
Spoken EnglishCorpus
Based on the sequence ofpart-of-speech (POS)-tag,phrase breaks are assigned
KyuchulYoon
2006 DecisionTree
KoreanNewswire Cor-pus
Korean ToBI labels are usedas features
SeungwonKim et al.
2006 MaximumEntropy
SynthFemale01Corpus
POS tags, a lexicon, lengths ofeojeols are used a features.
P. Zervas etal.
2003 BayesianInduction
Modern GreekCorpus
Similar set of features areused.
K Gosh etal.
2012 NeuralNetwork
Neutral BengaliCorpus
Morphological, positionaland structural features areused
A Parlikaret al. [2]
2012 Grammarbasedapproach
ARCTIC-A,Europarl,F2B, Obama andEmma corpus
Grammar is derived to learnthe phrase breaks.
A Vadapalliet al.
2013 CART IIIT-H Indic, IIIT-MCIT
Explored the use of word-terminal syllables to modelphrase breaks
![Page 11: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/11.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
STORY SPEECHCORPUS
![Page 12: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/12.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
SPEECH CORPUS
I 100 story texts are collected: Panchatantra andAkbar-Birbal.
I # sentences per story: 20-25I # total words: 24400I Duration of the speech corpus: 3 hours (approx.)
![Page 13: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/13.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
STORY SYNTHESISFRAMEWORK
![Page 14: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/14.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
STORY SYNTHESIS FRAMEWORK
I Story-specific emotion detection (SSED) moduleI Story-specific prosody generation (SSPG) moduleI Story-specific prosody incorporation (SSPI) module
![Page 15: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/15.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
STORY SYNTHESIS INTEGRATED WITH SSPP MODULE
NEUTRAL TTS SSPI module
Storytelling style
speech
Story text
SSED module
Story-specific
emotion tagged
text
Prosody Rule-set
Neutral Speech
SSPP module
SSPG module
I Story-specific emotion detection (SSED) moduleI Story-specific prosody generation (SSPG) moduleI Story-specific prosody incorporation (SSPI) moduleI Story-specific pause prediction (SSPP) module
![Page 16: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/16.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
PREDICTING THEPOSITION OF PAUSE
![Page 17: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/17.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
METHODS FOLLOWED
I Rule-based
I Simple Rule-based model
I Grammar-based model
I Data-driven Model
![Page 18: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/18.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
SIMPLE RULE-BASED MODEL
Rule: Words containing the terminal syllables with highconditional probability should have a pause after it.
P(pause
terminal syllable) =
N(pause, terminal syllable)N(terminal syllable)
(1)
∀N(terminal syllable) > 50
![Page 19: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/19.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
Table: Terminal syllables with conditional probabilities
Terminal Syllable Conditional Probabilityjuur 0.875chii 0.75
chaat 0.77kho 0.69ho 0.60ta 0.56yo 0.53me 0.51trii 0.40kar 0.39
![Page 20: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/20.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
RESULTS:SIMPLE RULE-BASED MODEL
Table: F1 Measure for Simple Rule-based model
Recall Precision F1 ScoreNP 0.62 0.68 0.65P 0.28 0.33 0.30
![Page 21: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/21.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
GRAMMAR-BASED APPROACH
Rule: Chunks/Phrases with high conditional probabilityshould have a pause after it.
P(pausechunk
) =N(pause, chunk)
N(chunk)(2)
![Page 22: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/22.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
Table: Production rules of grammar based on POS tags
Production Rules Conditional ProbabilityNP→ QC RP NN PSP 0.32NP→ QC NN PSP 0.18VGF→ VM VAUX 0.25VGNF→ VM VAUX 0.12JJP→ JJ 0.44CCP→ CC 0.14BLK→ VAUX 0.2RBP→ RB PSP 0.49
![Page 23: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/23.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
RESULTS:GRAMMAR-BASED APPROACH
Table: F1 measure for Grammar-based approach
Recall Precision F1 ScoreNP 0.69 0.82 0.74P 0.67 0.49 0.57
![Page 24: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/24.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
DATA-DRIVEN MODEL
Features used for training: CARTI Five gram window is followed.I Features at word-level.
1. Positional:I Position of the current word.I # Total words
2. Structural:I # Total phones in the word.I # Total syllables in the word.I # Total phones in the utterance.
3. Morphological:I Parts of Speech of the wordI Phonetic strength of current word.
4. EmotionI Emotion of the current word.
![Page 25: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/25.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
RESULTS:DATA-DRIVEN MODEL
Table: F1 Measures for Data-driven Model
Recall Precision F-1 ScoreNon-pause 0.89 0.94 0.91
Pause 0.68 0.81 0.74
![Page 26: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/26.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
ANALYSIS OF RESULTS FOR PREDICTING THE POSTION
OF PAUSE
1. In Rule-based model, Grammar-based approach givesbetter prediction of pauses as compared to the simplerule-based approach.
2. Limitations of Rule-based model:
I It is time consuming to derive the rules manually.I Applicability of derived rules are limited to a particular
corpus.I Rules may therefore not be general enough for unseen text.
3. These shortcomings are mitigated by Machine learningapproaches.
![Page 27: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/27.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
Prediction of PauseDuration
![Page 28: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/28.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
FEATURES USED FOR TRAINING:CART
1. Morphological:I Terminal syllable of the current, previous and following
two words.2. Structural
I Position of the vowel in the terminal syllableI Number of segments (i.e. consonants) before and after the
nucleus (i.e. vowel) in the terminal syllable.3. Positional
I Total number of phones in the terminal syllable of thecurrent, previous and following two words.
I Position of the current word from the beginning andending of the utterance.
![Page 29: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/29.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
RESULTS:SINGLE CART MODEL
Table: Performance of Single CART model based on µ,σ and γx,y
x (in ms) y (in ms) µ (in ms) σ (in ms) γx,ySingle CART 241.60 247.21 107.12 155.87 0.67
![Page 30: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/30.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
Is it possible to reduceaverage prediction µ
![Page 31: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/31.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
PROPOSED PAUSE PREDICTION MODEL
Break/ Non-break
Story-speech
Corpus
Short / Medium / Long Pause
Short Pause
Duration Predictor
Medium Pause
Duration Predictor
Long Pause
Duration Predictor
Story text
First Stage
Second Stage
Third Stage
Figure: Three stage pause prediction model
![Page 32: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/32.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
ANALYSIS OF PAUSES BASED ON DURATIONPAUSES ARE CLASSIFIED AS:
I Long pause (>250 ms)I Medium pause (150− 250 ms)I Short pause (<150 ms)
Table: Pause Pattern in the story-speech corpus
Pause Type Mean(ms) StdDev(ms) %in originallong pause 455.07 125.99 17
medium pause 210.12 32.33 25short pause 92.97 29.81 38
![Page 33: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/33.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
RESULTS:SECOND STAGE
Table: F-1 measure for Short, Medium and Long pause prediction
Recall Precision F-1 Scorelong pause 0.73 0.62 0.67
medium pause 0.50 0.52 0.51short pause 0.53 0.63 0.58
![Page 34: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/34.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
RESULTS:THIRD STAGE
Table: Performance of CART trees for long, medium and shortpauses using objective measures (µ,σ andγx,y)
x (in ms) y (in ms) µ (in ms) σ (in ms) γx,yCART long 347.96 368.71 76.13 55.05 0.65
CART medium 208.43 199.30 26 17.03 0.66CART short 87.33 91.01 34.05 22.19 0.77
Overall 144.44 147.08 32.38 22.04 0.57
![Page 35: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/35.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
SUMMARY AND CONCLUSION
1. A story synthesis framework is developed using theneutral Hindi TTS system for synthesizing story stylespeech.
2. Modeling of pause pattern for Story TTS.
![Page 36: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/36.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
FUTURE WORK
1. Subjective listening test need to be carried out.2. Non-linear classifiers like neural networks and support
vector machines can be used.3. Using unsupervised word-level features for modeling the
pauses4. Analysis and modeling of the pause pattern based on
discourse modes.5. Analysis and modeling for the prosodic parameters (i.e.
pitch, duration, intensity) for various modes of discoursein storytelling.
6. Prosody modeling of the TTS system built on the storyspeech corpus.
![Page 37: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/37.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
Acknowledgments
The authors would like to thank the Department ofInformation Technology, Government of India, for fundingthe project, Development of Text-to-Speech synthesis for IndianLanguages Phase II, Ref. no. 11(7)/2011HCC(TDIL). The authorswould also like to thank all the participants for the listeningtests.
![Page 38: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/38.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
DISSEMINATION OF RESEARCH
Conference:
1. P Sarkar, A. Haque, A. Kr.Dutta, G. Reddy M, Harikrishna DM,P. Dhara,R. Verma,Narendra NP, Sunil SB, J. Yadav, K.S. Rao,“Designing prosody rule-set for converting neutral TTS speechto storytelling style speech for Indian languages: Bengali, Hindiand Telugu,” in Seventh International Conference ContemporaryComputing (IC3), 2014 on , pp.473-477, 7-9 Aug. JIIT Noida, 2014
2. R. Verma, P Sarkar, K. S Rao, “Conversion of neutral speech tostorytelling style speech,” in 2015 Eighth International ConferenceAdvances in Pattern Recognition (ICAPR), on, pp.1-6, 4-7 Jan. ISIKolkata, 2015
3. P Sarkar, K. S Rao, “Data-driven pause prediction for speechsynthesis in storytelling style speech,” in 2015 Twenty FirstNational Conference on Communications (NCC) , pp.1-5, 27 Feb. - 1Mar, IIT Bombay, 2015
![Page 39: first_seminar](https://reader031.fdocuments.in/reader031/viewer/2022030311/58ee6f081a28ab224c8b462d/html5/thumbnails/39.jpg)
Introduction Objective Motivation Literature Review Story Corpus Story Synthesis Framework Predicting the Position of Pause Prediction of Pause Duration Future Work Acknowledgments References
REFERENCES I
[1] A. Vadapalli, P. Bhaskararao, and K. Prahallad, “Significance of word-terminalsyllables for prediction of phrase breaks in Text-to-Speech systems for IndianLanguages,” in 8th ISCA Speech Synthesis Workshop. Barcelona, Spain: ISCA,August 31– September 2 2013, pp. 189 – 194.
[2] A. Parlikar and A. W. Black, “A grammar based approach to style specific phraseprediction,” in Interspeech, 2011, pp. 2149–2152.
[3] N. S. Krishna and H. A. Murthy, “A New Prosodic Phrasing Model for IndianLanguage Telugu,” in INTERSPEECH. ISCA, 2004.
[4] K. Ghosh and K. Sreenivasa Rao, “Data-Driven Phrase Break Prediction for BengaliText-to-Speech System,” in Contemporary Computing - 5th International Conference,IC3 2012, Noida, India, August 6-8, 2012. Proceedings, ser. Communications inComputer and Information Science. Springer Berlin Heidelberg, 2012, vol. 306,pp. 118 – 129.