Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman
-
Upload
shuvo-habib -
Category
Education
-
view
892 -
download
0
Transcript of Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman
Progress on Bangla Text-To-Speech System
Presented By:Dr. M. Shahidur Rahman
Professor, Dept. of Computer Science & Engg.Shahjalal University of Science & Technology
2
Outline
• Introduction to TTS• How TTS works• Present Bangla TTS systems• Problems of the present Bangla TTS• Directions to improve the performance of
Bangla TTS• Discussion…
3
What is a TTS?
• The goal of text-to-speech (TTS) synthesis is to convert an arbitrary input text into intelligible and natural sounding speech – TTS is not a “cut-and-paste” approach that strings together
isolated words – Instead, TTS employs linguistic analysis to infer correct
pronunciation and prosody (i.e., NLP) and acoustic representations of speech to generate waveforms (i.e., DSP)
4
TTS ApplicationsApplications: Services for the visually impaired community Services for the Illiterate people with difficulties in reading Enable use of Computers and IT services
Reading email aloud Using Word processor Using Internet
Commercial TTS Systems: Festival Bell Labs TTS
6
Different TTS Systems
Phoneme-Based TTS System• Phonemes are:
– The minimal distinctive phonetic units– Relatively small in number (39 phonemes in English)
• Disadvantage– Phonemes ignore transitional sound !!!
7
Different TTS Systems (cont’d)
Diphone-Based TTS System: Diphones are:
– Made up of 2 phonemes– Incorporate transitional sound– Produce better sounding speech– Ex. কক = ক + কঅ + অক + ক
Disadvantage:• Over 1500 diphones in English language !!!
8
Text Pre-Processing
• Convert raw text, which may include numbers, abbreviations, etc., into the equivalent of written-out words
9
Word to Diphone Converter (Phonetization)
PurposeTranslate words to their diphone representations
(Ex. রা�জা� -> Diphones: { রা + রাআ + আজা + জাআ})mark the text into prosodic units such as phrases,
clauses and sentences
Resource– Dictionary of words and their diphones
12
Altering Pitch/Duration/Amplitude
• For smooth concatenation, altering pitch, duration and amplitude at the concatenation point is very important.
15
Altering Duration
• Increase number of PSOLA iterations (overlaps) to increase duration
• Decrease number of PSOLA iterations (overlaps) to decrease duration
16
Altering Amplitude
Multiplying the signal by a constantIf constant > 1, amplitude increaseIf constant < 1, amplitude decrease
17
Concatenation
Diphones Word• Using PSOLA at the joining ends• Ensures smooth transition
Words Sentence• Straight joining at the end points due to
presence of pauses
19
Types of Concatenative speech synthesis
• Concatenative synthesis with a fixed inventory– contain one sample for each unit, and perform
prosodic modification to match the required prosody
• Unit-selection-based synthesis– store several instances of each unit, thus
improving the chances of finding a well-matched unit
20
Progress of Bangla TTS
• KATHA Developed in BRAC university Unit based system using Festival framework 4355 Diphones Takes 2 sec to generate a 10 sec utterance
• BANGLA VAANI syllable based synthesis system Developed in Kolkata
• SUBACHAN Developed by SUST people Diphone based synthesis system 527 Diphones Takes 45ms to generate a 10 sec utterance
Speech Signal From Kotha and Subachan
• (Voice of kotha) তি�তি প্রধা�� কতি� হলে�ও বে�শ তিকছু� প্র�ন্ধ- তি�ন্ধ রাচ� ও প্রক�শ কলেরালেছু
• (Voice of Subachan) তি�তি প্রধা�� কতি� হলে�ও বে�শ তিকছু� প্র�ন্ধ- তি�ন্ধ রাচ� ও প্রক�শ কলেরালেছু
• (Voice of kotha) জা���ন্দ দা�শ তি��শ শ��ব্দী�রা অ��ম প্রধা� আধা�তিক ����� কতি�
• (Voice of Subachan) জা���ন্দ দা�শ তি��শ শ��ব্দী�রা অ��ম প্রধা� আধা�তিক ����� কতি�
21
22
Problems: Homograph Ambiguity
• Homographs are words that share the same spelling but differ in meaning and pronunciation
23
Solution: Homograph Disambiguation
Collect all possible homograph words Determine POS tag of the homograph
words Ex. বেছুলে�রা� ম�লে �� (bol) বে!�লেছু।
�# তিম যা�লে� তিক � �� (bolo)।• Bayes Theorem can also be applied to determine the
likelihood of a word.
24
Problems: Improper Concatenation
Not concatenated properly
Signal from the the utterance of রা�শে�দ
25
Solution: Improper Concatenation
• PSOLA• Reducing number of concatenation point
– Ex 1. Sentence-> ।ক�ম�� ভা�� বেছুলে� Diphones-> ক� + আম� + আ� ভা�+ আলে�� বেছু+এলে�Instead of ক + কআ + আম + মআ + আ� + …�– Ex 2. ফ��( পৃ*তি+�� -> পৃ* + ইতি+ + ই��
• Vowel sound is periodic, thus suitable for appropriate concatenation
• Use 1000 most frequently spoken word