Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth...

15
Spotting Multilingual Consonant- Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech and Vision Laboratory Department of Computer Science and Engineering Indian Institute of Technology Madras, Chennai – India

Transcript of Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth...

Page 1: Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.

Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models

Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana

Speech and Vision LaboratoryDepartment of Computer Science and Engineering

Indian Institute of Technology Madras, Chennai – IndiaEmail: {svg,chandra, yegna}@cs.iitm.ernet.in

Page 2: Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.

is bu le Tin ki mu khya sa mA chAr

mu nnAL mu da la mei ccar sel vi jey la li ta

I rO ju vAr ta lo lu mu khyam sa lu

Speech Signal-to-Symbol Transformation

Phonetic engine: Capable of speech signal-to-symbol transformation independent of vocabulary and language

Page 3: Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.

Approaches to Speech Signal-to-Symbol Transformation

• Based on segmentation and labeling– Segmentation of continuous speech signal into regions

of subword units– Assignment of labels to the segmented regions using a

subword unit classifier

• Based on spotting subword units in continuous speech– Detection of anchor points in continuous speech– Assignment of labels to the segments around the anchor

points using a subword unit classifier

Page 4: Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.

Spotting CV Units in Continuous Speech

• CV type units have the highest frequency of occurrence in speech in Indian languages

• Subword units of CCV, CCCV and CVC types also contain CV segments

• Vowel onset point (VOP) can be used as an anchor point for recognition of CV units

• Detection of VOPs using distributions of feature vectors of C and V regions

• Models for classification of CV segments

Page 5: Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.

Significant Events in a CV Unit

Page 6: Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.

VOP Detection using AANN Models

• AANN models for capturing the distribution of data

• One AANN for the consonant region of a CV unit

• Another AANN for the vowel region of a CV unit

Page 7: Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.

System for Detection of VOPs using AANNs

Page 8: Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.

Illustration of Detection of VOPs

(a) Waveform, (b) Hypothesised region labels for each frame, (c)Hypothesised VOPs, and (d) Manually marked (actual) VOPs for the Tamil language sentence /kArgil pahudiyilirundu UDuruvalkArarhaL/

Page 9: Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.

Broadcast News Corpus of Indian LanguagesDescription

(Number of)

Language

Tamil Telugu Hindi Multilingual

Bulletins 33 20 19 72

Training bulletins 27 16 16 59

Testing bulletins 6 4 3 13

CV classes considered 123 138 103 196

Training CV segments 43,541 41,725 20,236 1,05,502

Sentences for testing 1,416 1,348 630 3,094

Page 10: Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.

Performance for Detection of VOPs• Matching hypothesis: A hypothesis with a deviation upto 25 msecs from an

actual VOP

• Missing hypothesis: There is no hypothesis with a deviation upto 25 msecs from an actual VOP

• Spurious hypothesis:

– Multiple hypotheses with a deviation upto 25 msecs

– A hypothesis with a deviation greater than 25 msecs

VOP Hypotheses (in %)

Matching Missing Spurious

68.62 31.38 6.21

Page 11: Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.

Classification of CV Segments using SVMs

Page 12: Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.

System for Spotting CV Units

• The system gives a 5-best performance of about 74.63% for spotting CV units in 300 test sentences containing 3,924 syllable-like units

Page 13: Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.

Illustration of Spotting CV UnitsVOP locations

(Sample numbers) Lattice of 5-best hypothesised CVs

Actual

syllable

Actual Hypothesised 1 2 3 4 5

280 320 pA kA vA ha shu kAr

---------- 720 kA pA hA nA pa -----

2360 2440 gi yE hi ya yai gil

3800 3760 hA pA pa sA sa pa

4920 4800 hu gu mu vu pu hu

5480 5560 bI vi Ti Ni dI di

6320 6200 yi lA li zi tI yi

7400 7480 li ni ru ja lai li

8200 ------ VOP Missed run

9440 9480 du Ru ja dE rA du

11160 11120 mu kU va pO vA U

12080 12080 Du da dA nA tu Du

12520 -------- VOP Missed ru

13200 13240 va da kai hi vA val

14520 14560 kA ka ga cha zA kA

15840 ------- VOP Missed rar

16960 16960 ha kA ka ga sa haL

Page 14: Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.

Summary and Conclusions

• Spotting multilingual CV units in continuous speech

• AANN models for detecting VOPs

• SVM classifier for recognition of CV units around the VOPs

• Need to reduce # missing VOPs

• Further processing of hypothesised CV lattice

Page 15: Spotting Multilingual Consonant-Vowel Units of Speech using Neural Network Models Suryakanth V.Gangashetty, C. Chandra Sekhar, and B.Yegnanarayana Speech.

Thank You