Performance Analysis of Kannada Phonetics: Vowels ...SN Computer Science (2020) 1:84 Page 3 of 8 84...

8
Vol.:(0123456789) SN Computer Science (2020) 1:84 https://doi.org/10.1007/s42979-020-0088-7 SN Computer Science ORIGINAL RESEARCH Performance Analysis of Kannada Phonetics: Vowels, Fricatives and Stop Consonants Using LP Spectrum M. Latha 1  · M. Shivakumar 2  · R. Manjula 3 Published online: 14 March 2020 © Springer Nature Singapore Pte Ltd 2020 Abstract Kannada is an alpha syllabary language wherein each individual akshara/alphabet having different phonetic structures can be divided into sequence of numerous consonants and vowels. In our analysis, we concentrate on vowels, fricatives and stop consonants of prerecorded samples of various subjects. The analysis is carried out with respect to basic acoustic features and spectrogram to extract formant frequency and formant analysis to identify the complex natural frequency of the vocal tract functions using linear prediction method. Keywords Alpha syllabary language · Formant frequency · Fricatives · Linear prediction analysis Introduction Kannada belongs to Dravidian languages which constitute four major languages: Kannada, Tamil, Malayalam and Telugu. Kannada is the official language of Karnataka state, and on average 50–60 million people speak this language [1]. Kannada language has agglutinative, moderately com- plex and alphasyllabic script. It has both phonemic and syl- labic characteristics which are encoded in the orthography. Akshara is the basic unit of orthographic representation which stands for a syllable. Each syllable may represent a vowel, a consonant, amalgamated consonant and fusion of multiple consonants and a vowel where the complexity of syllable block increases phonologically and visually depend- ing on the number of consonants and an vowel involved in syllable block. Sounds are broadly classified into vowels and consonants. Vowels are those which allow unrestricted air- flow in the vocal tract. Consonants are those which restrict airflow at some point and have weaker intensity than vow- els. Generally, aksharamala means the collection of many aksharas and can be separated into smaller units inside the syllable. In Kannada Language, aksharamala is comprised of 13 vowels and 34 consonants. Kannada Language has three types of words, namely Saralakshara, Gunithakshara and Ottakshara [2]. The phonological density and visuospatial complexity increase from Saralakshara to Gunithakshara and Gunith- akshara to Ottakshara. The akshara is typically formed com- prising three distinct types of unicodes. Unicodes of first type are formed with the combination of consonants and an implicit vowel (/Ca/), those of second type are accomplished including consonants and vowels (/CV/) and those of third type are generated with a probability of more than two or more consonants within the cluster (/CCV/) [3]. Acoustics is the study of sound waves produced by human vocal organs. Acoustic studies describe the physical prop- erties of sound and help to distinguish one sound from the other in terms of quality and quantity [4]. A study of acous- tic characteristics of any language begins with the phonetic analysis of that language. Any language can be described in terms of set of distinctive sounds called phonetics. Phonetics This article is part of the topical collection “Advances in Computational Intelligence, Paradigms and Applications” guest edited by Young Lee and S. Meenakshi Sundaram.” * M. Latha [email protected] M. Shivakumar [email protected] R. Manjula [email protected] 1 Department of Electronics and Communication Engineering, GSSS Institute of Engineering and Technology for Women, Mysuru, Affiliated to VTU, Belagavi, Karnataka, India 2 GSSS Institute of Engineering and Technology for Women, Mysuru, Affiliated to VTU, Belagavi, Karnataka, India 3 JSS Institute of Speech and Hearing, Mysuru, Karnataka, India

Transcript of Performance Analysis of Kannada Phonetics: Vowels ...SN Computer Science (2020) 1:84 Page 3 of 8 84...

Page 1: Performance Analysis of Kannada Phonetics: Vowels ...SN Computer Science (2020) 1:84 Page 3 of 8 84 SN Computer Science onlyvowels/consonantsofKannadalanguageandnotthe combinationofboth.Forvariousdatasamples,formantfre-

Vol.:(0123456789)

SN Computer Science (2020) 1:84 https://doi.org/10.1007/s42979-020-0088-7

SN Computer Science

ORIGINAL RESEARCH

Performance Analysis of Kannada Phonetics: Vowels, Fricatives and Stop Consonants Using LP Spectrum

M. Latha1 · M. Shivakumar2 · R. Manjula3

Published online: 14 March 2020 © Springer Nature Singapore Pte Ltd 2020

AbstractKannada is an alpha syllabary language wherein each individual akshara/alphabet having different phonetic structures can be divided into sequence of numerous consonants and vowels. In our analysis, we concentrate on vowels, fricatives and stop consonants of prerecorded samples of various subjects. The analysis is carried out with respect to basic acoustic features and spectrogram to extract formant frequency and formant analysis to identify the complex natural frequency of the vocal tract functions using linear prediction method.

Keywords Alpha syllabary language · Formant frequency · Fricatives · Linear prediction analysis

Introduction

Kannada belongs to Dravidian languages which constitute four major languages: Kannada, Tamil, Malayalam and Telugu. Kannada is the official language of Karnataka state, and on average 50–60 million people speak this language [1]. Kannada language has agglutinative, moderately com-plex and alphasyllabic script. It has both phonemic and syl-labic characteristics which are encoded in the orthography. Akshara is the basic unit of orthographic representation which stands for a syllable. Each syllable may represent a vowel, a consonant, amalgamated consonant and fusion of

multiple consonants and a vowel where the complexity of syllable block increases phonologically and visually depend-ing on the number of consonants and an vowel involved in syllable block. Sounds are broadly classified into vowels and consonants. Vowels are those which allow unrestricted air-flow in the vocal tract. Consonants are those which restrict airflow at some point and have weaker intensity than vow-els. Generally, aksharamala means the collection of many aksharas and can be separated into smaller units inside the syllable. In Kannada Language, aksharamala is comprised of 13 vowels and 34 consonants. Kannada Language has three types of words, namely Saralakshara, Gunithakshara and Ottakshara [2].

The phonological density and visuospatial complexity increase from Saralakshara to Gunithakshara and Gunith-akshara to Ottakshara. The akshara is typically formed com-prising three distinct types of unicodes. Unicodes of first type are formed with the combination of consonants and an implicit vowel (/Ca/), those of second type are accomplished including consonants and vowels (/CV/) and those of third type are generated with a probability of more than two or more consonants within the cluster (/CCV/) [3].

Acoustics is the study of sound waves produced by human vocal organs. Acoustic studies describe the physical prop-erties of sound and help to distinguish one sound from the other in terms of quality and quantity [4]. A study of acous-tic characteristics of any language begins with the phonetic analysis of that language. Any language can be described in terms of set of distinctive sounds called phonetics. Phonetics

This article is part of the topical collection “Advances in Computational Intelligence, Paradigms and Applications” guest edited by Young Lee and S. Meenakshi Sundaram.”

* M. Latha [email protected]

M. Shivakumar [email protected]

R. Manjula [email protected]

1 Department of Electronics and Communication Engineering, GSSS Institute of Engineering and Technology for Women, Mysuru, Affiliated to VTU, Belagavi, Karnataka, India

2 GSSS Institute of Engineering and Technology for Women, Mysuru, Affiliated to VTU, Belagavi, Karnataka, India

3 JSS Institute of Speech and Hearing, Mysuru, Karnataka, India

Page 2: Performance Analysis of Kannada Phonetics: Vowels ...SN Computer Science (2020) 1:84 Page 3 of 8 84 SN Computer Science onlyvowels/consonantsofKannadalanguageandnotthe combinationofboth.Forvariousdatasamples,formantfre-

SN Computer Science (2020) 1:8484 Page 2 of 8

SN Computer Science

is the study of speech sounds, and phonemes are the sym-bols to show how the word is pronounced. There are three types of phonetics, namely articulatory, acoustic and audi-tory phonetics [2]. These phonetics extensively deal with the characterization, classification and recognition of speech sounds in speech production systems. A thorough knowledge is very essential in the successful assessment of how vowels and consonants are generated in order to estimate the pho-nological disorders.

Kannada language uses 49 phonemic letters classified into three groups, namely:

1. Swaragalu/vowels Vowels are independently exist-ing letters which are called as Swaras. There are 13 vowels in the Kannada language. They are

. Hrasva Swara and Deerga Swara are the two types of swaras used in Kannada language. Hrasva Swara are an independ-ent vowel pronounced during single matra time. They

are . Deerga Swara are an independ-ent vowel pronounced in two matra time. They are

. Vowels are always voiced and syllabic as shown in Table 1.

2. Vyanjanagalu/consonants Consonants are depend-ent on vowels and can be divided into Var-geeya and Avargeeya. There are 34 consonants in Kannada language. Vargeeya consonants are

and Avargeeya con-

sonants are , as shown in Tables 2, 3 and 4.

3. Yogavaahakagalu There are two types of yogavaahaka-galu in Kannada language. They are Anuswaras and Visarga .

Basic Principle to Form Different Aksharas in Kannada Language

To develop an individual akshara/alphabet, consonants should be combined with a vowel.

In practicing this basic principle, kagunitha are formed in Kannada language with the combination of all consonants (Vyanjanas) and the existing vowels.

The current research work concentrates on analysis of speech samples in Kannada language using linear prediction analysis. We have considered the speech samples containing

Table 1 Kannada vowels

Table 2 List of structured consonants

Table 3 List of unstructured consonants

Page 3: Performance Analysis of Kannada Phonetics: Vowels ...SN Computer Science (2020) 1:84 Page 3 of 8 84 SN Computer Science onlyvowels/consonantsofKannadalanguageandnotthe combinationofboth.Forvariousdatasamples,formantfre-

SN Computer Science (2020) 1:84 Page 3 of 8 84

SN Computer Science

only vowels/consonants of Kannada language and not the combination of both. For various data samples, formant fre-quency is measured and tabulated as the outcome.

Related Work

Many authors have studied speech at phonetic level since phonetics is very much essential in speech analysis. Exten-sive research work has been carried out regarding crea-tion of speech database, segmentation of speech into syl-lable, feature extraction from the speech data for speech recognition. Many research works have been focused on identifying the syllables such as vowels, stop consonants and fricatives from the speech data and extracting best features to categorize the speech into different alphabets (Aksharas) [3].

From the survey, it is observed that many innovations are done in Indian languages, namely Hindi, Telugu, Bengali, Punjabi, Marathi, Tamil, and Malayalam [4]. Most of the innovative research works are carried out primarily with respect to feature extraction techniques and classification, which improves the speech recognition task [5].

Recently, many research works focus on speech feature extraction techniques to extract more useful or dominant information hidden in the speech signal to improve the quality of life for people with disabilities [7]. According to the survey, the existing feature extraction techniques are linear predictive coding (LPC), discrete wavelet transform (DWT), mel-frequency cepstral coefficients (MFCCs), prin-cipal component analysis, RASTA filtering and probabilistic linear discriminate analysis (PLDA) [8]. Out of these tech-niques available, LPC is most commonly used technique in speech signal processing [9].

Many works describe that the applications related to speech analysis are quite challenging to solve due to the unstructured nature of speech signals. Various signal pro-cessing, neuroscience-based methods, supervised and unsupervised machine learning techniques are explored to

solve the same. Speech has a lot of variabilities depend-ing on the environment, dialects, accents, mood and tone of the speaker. In this connection, it is very much essential to understand the importance of language specifications in speech recognition [6].

Methodology

Block diagram

See Fig. 1.

Construction of database

A speech database is created by recording speech from nor-mal subject in a recording room in the absence of back-ground noise. Speech syllable recorded was either an vowel or a stop consonant or a fricative as shown in Table 5 and was recorded in. WAV format. Recorded speech samples were analyzed to obtain FFT and LPC spectrum.

Sampling Process

Speech samples recorded are analog in nature, i.e., its ampli-tude varies with time. In order to process these signals, it is

Table 4 Glides, sibilants, fricatives, laterals and continuants

Fig. 1 Block diagram for the estimation of formant frequency using FFT/LPC spectrum

Page 4: Performance Analysis of Kannada Phonetics: Vowels ...SN Computer Science (2020) 1:84 Page 3 of 8 84 SN Computer Science onlyvowels/consonantsofKannadalanguageandnotthe combinationofboth.Forvariousdatasamples,formantfre-

SN Computer Science (2020) 1:8484 Page 4 of 8

SN Computer Science

essential to convert them into digital form. To convert a sig-nal from continuous time to discrete time, a process called sampling is used. During the process of sampling, signal is considered only at certain instants of time depending on the sampling frequency which is chosen so as to satisfy Nyquist criteria.

Many statistical signal processing applications’ require-ment changes with the change in the sampling frequency. In this current work, the sampling frequency used is Fs = 10 and 16 kHz sampled signals are further divided into different frequency bands and each frequency band is down converted to baseband. Resampling is performed at a lower rate to compare with the original signal. Each fre-quency band is coded uniquely using quantization process based on the signal amplitude.

Windowing Technique

Windowing function is used to extract a portion of the speech signal by returning nonzero value during the inter-val and zero value outside the interval. There are many window functions including Kaiser, Hamming, Bartlett, Hanning, Blackman, etc., to improve the basic rectangu-lar window design. Each window function has its own characteristics and suitability for different applications. A window function can be selected based on the window size and the frequency range of the signal. Small window size gives better performance in low-density noise. Large window size measures high density noise. Hence, a suit-able size of the window must be chosen to minimize the noise in the speech signal. In this work, LP analysis was performed using the Hamming window. The Hamming window function w(n) gives a balanced representation between its main lobe width and side lobe attenuation. The Hamming window function w(n) is given as

where N is the number of samples in a frame and n is the total number of frames.

Autocorrelation Function

The most extensively used function in linear predictive analysis is autocorrelation method. The autocorrelation function is performed on the individual phonetics. It is a mathematical tool for comparing the two signals over suc-cessive time intervals. Delay time and amplitude are the important parameters measured using this function. Low-frequency pitch components were identified and quanti-fied using autocorrelation function. Thus, for high-pitch samples, the short frames are used for analysis and the length of the frame is around 5–20 ms, whereas for low-pitch samples, the long frames are used and the length of the frame is around 20–50 ms. The resulted autocorrelation function of the vocal tract impulse response will generate significant improvement in measuring formant frequencies.

(1)w(n) = 0.53836 − 0.46164 cos

(

2�n

N − 1

)

Table 5 Speech database Sl. nos Recorded speech samples

Vowels1. KaKa2. Kiki3. KuKuStop consonants1. Pata2. Tata3. KasaFricatives1. SaSa2. Shasha

Table 6 Estimation of formant frequency

Vowels Formant frequency (Hz)

F1 F2 F3 F4

/a/-/kaka/ 800 2400 3600 5700/i/-/Kiki/ 1600 2000 4000 6200/u/-/kuku/ 1800 1600 3800 6800

Fig. 2 Speech samples for the utterances of /kaka/, /kiki/ and /kuku

Page 5: Performance Analysis of Kannada Phonetics: Vowels ...SN Computer Science (2020) 1:84 Page 3 of 8 84 SN Computer Science onlyvowels/consonantsofKannadalanguageandnotthe combinationofboth.Forvariousdatasamples,formantfre-

SN Computer Science (2020) 1:84 Page 5 of 8 84

SN Computer Science

FFT Spectrum/LPC Spectrum

Time-domain signal can be converted to frequency-domain signal with the help of fast Fourier transform (FFT). The FFT is an algorithm to compute the discrete Fourier trans-form (DFT). It is defined over set of N samples{X(n)}

The linear predictive coding is an analytical tool employed in statistical signal processing. Linear predictive coding is the process of converting segments of a real-time signal into frames of 20 ms for storage and transmission.

In the present work, linear predictive coding is used to measure the basic speech parameters, namely pitch, inten-sity, amplitude, formants, temporal and spectral envelope in compressed form using linear predictive model. The interpretation of the speech signal can be easily exploited using the LP analysis. The prediction values of current sample are measured by the linear combination of past p samples. The predicted samples can be described as follows:

(2)Xn =

N−1∑

k=0

xke−2�kn

N , N − 1

where p is the order of prediction, ak is the linear prediction coefficients, s(n) is the windowed speech sequence and ω(n) is the windowing sequence.

Formant Frequency Estimation

In order to estimate the pitch period/frequency, the input speech signal should be a low-pass-filtered signal since low-pass filtering removes the intrusive high-frequency components and out-of-band noise which leads to precise results. The fundamental frequency lies in the low-fre-quency region which is less than 500 Hz.

Formant frequency represents the characteristic of the vocal tract during speech production. Formants are specific to individual phonemes and useful in the application of speech recognition. Formants are measured using spectro-gram, where F1 indicates the size of the oral cavity in terms of width and tongue position, F2 denotes the size of the oral

(3)S(n) =

p∑

k=1

aks(n − k)

(4)s(n) = x(n) ⋅ w(n)

Fig. 3 Corresponding short-time autocorrelation function, FFT and LPC spectrum for (Fs = 10 kHz)

Page 6: Performance Analysis of Kannada Phonetics: Vowels ...SN Computer Science (2020) 1:84 Page 3 of 8 84 SN Computer Science onlyvowels/consonantsofKannadalanguageandnotthe combinationofboth.Forvariousdatasamples,formantfre-

SN Computer Science (2020) 1:8484 Page 6 of 8

SN Computer Science

cavity in terms of length and tongue position, F3 represents the movement of lip of the speaker and F4 illustrates the significant resonant peak. The numerical representation of the formant frequency makes easier the task of analyzing the individual phonetics effectively.

Implementation

The procedure of implementing the LPC analysis is explained in the following steps:

1. A speech database is created by recording speech sylla-bles which involve vowels, stop consonants and fricative.

2. Data are read from the file named filename.wav.3. Sampling frequencies used in the analysis are 10 kHz

and 16 kHz. Since speech is a mixture of low and high frequencies, changing the sampling frequency will help in analyzing higher frequencies.

Fig. 4 Speech samples for the utterances of /SaSa/ and /shasha/

Fig. 5 Corresponding short-time autocorrelation function, FFT and LPC spectrum for (Fs = 16 kHz)

Table 7 Estimation of formant frequency in fricatives

Fricatives Formant frequency (Hz)

F1 F2 F3 F4

/SaSa/ 1800 4200 6000 7800/shasha/ 1700 3800 6000 7400

Page 7: Performance Analysis of Kannada Phonetics: Vowels ...SN Computer Science (2020) 1:84 Page 3 of 8 84 SN Computer Science onlyvowels/consonantsofKannadalanguageandnotthe combinationofboth.Forvariousdatasamples,formantfre-

SN Computer Science (2020) 1:84 Page 7 of 8 84

SN Computer Science

4. A Hamming windowing is performed to remove the abnormalities at the beginning and end of the samples.

5. FFT is performed to extract small abnormalities in the recorded syllables.

6. F1, F2, F3 and F4 in Hz are the formant frequencies estimated with respect to vowels, fricatives and stop consonants.

Results and Conclusions

Vowel Analysis

Vowel is a speech sound produced without blocking the res-piratory systems. Vowels are and paired with a consonant to make a syllable. This work helps in understand-ing acoustic characteristics of speech samples, and different

Fig. 6 Voiced speech segment for the utterance of stop consonants /pata/, /tata/ and /kasa/

Fig. 7 Corresponding short-time autocorrelation function, FFT and LPC spectrum for (Fs = 16 kHz)

Page 8: Performance Analysis of Kannada Phonetics: Vowels ...SN Computer Science (2020) 1:84 Page 3 of 8 84 SN Computer Science onlyvowels/consonantsofKannadalanguageandnotthe combinationofboth.Forvariousdatasamples,formantfre-

SN Computer Science (2020) 1:8484 Page 8 of 8

SN Computer Science

parameters such as formant frequencies in Hz and period in millisecond are estimated. Table 6 demonstrates the values of formant frequencies obtained during the analysis (Figs. 2, 3).

Fricative Analysis

Fricative is a speech sound produced by partial obstruction of the breath and capable of finding the interference in the utterance of the sound. Fricatives are grouped into five types based on the manner of articulation such as labiodental, den-tal, alveolar, post-alveolar and glottal fricatives (Figs. 4, 5).

The fundamental frequency associated for the utterance of the/S/is around 3500 Hz, and the fundamental frequency for the utterance of the/s/is between 4000 and 4500 Hz. The spectrogram illustrates that fricatives occurring in the specified frequency range will have high energy level above 4000 Hz. Hence, this spectrum has some formant peaks which may be prolongation of the sound during the aspira-tion phase, which are tabulated for analysis in Table 7.

Stop Consonant Analysis

In phonetics, the stop consonants are the sounds which are produced completely by blocking the airflow. There are two kinds of stop consonants, namely voiced and unvoiced stops. The voiceless stops are the sounds of [p], [t] and [k] which are also called as plosives. The voiced stops are the sounds of [b], [d] and [g]. Voiced stops are pronounced with vibration of the vocal cords, and voiceless stops are pronounced without vibration of the vocal cords (Figs. 6, 7).

As a result, the spectrum of these stop consonants obtained is quite flat. The energy ratio from high frequency to low frequency drops significantly toward the end of the stop consonant, which represents the end of the syllable. The corresponding values are tabulated in Table 8.

Conclusion

Speech analysis using LPC is applied for the speech data-base which contains fricatives, consonants and vowels in Kannada language. The resultant features are analyzed

to distinguish the different syllables by identifying the formant frequency effectively. LPC analysis produces a smooth spectrum in which much of the influence in the excitation is removed. The results clearly provide the information on frequency variations observed during the analysis for the individual Kannada phonetics. The result-ant spectrum of the vowel analysis represents high ampli-tude in the first formant frequency F1. Fricatives represent high amplitude in last formant frequency F4, and the spec-trum of the stop consonants is quite flat in the middle of the formant frequencies F2 and F3. From this work, it is able to demonstrate and estimate the frequency variation with respect to vowels, stops and fricatives.

References

1. Reddy MV, Hanumanthappa M. Kannada phonemes to speech dictionary: statistical approach. Int J Eng Res Appl. 2017;7(1):77–80.

2. Sarika Hegde KK, Achary KK, Shetty S. Statistical analysis of features and classification of alphasyllabary, sounds in Kannada language. New York: Springer; 2014. https ://doi.org/10.1007/s1077 2-014-9250-8.

3. Nag S, Treiman R, Snowling MJ. Learning to spell in an alpha-syllabary: the case of Kannada. Writ Syst Res. 2010;2:41–52. https ://doi.org/10.1093/wsr/wsq00 1.

4. Hemakumar G. Acoustic phonetic characteristics of Kannada language. IJCSI Int J Comput Sci Issues. 2011;8(6):1694-0814.

5. Anil Kumar C, Shiva Prasad KM, Manjunatha MB, KodandaRa-maiah GN. Basic acoustic features analysis of vowels and C-V-C of Indian English Language. ITSI-TEEE. 2015;3(1):20–3.

6. Shiva Prasad KM, Anil Kumar C, Manjunatha MB, KodandaRa-maiah GN. Gender based acoustic features and spectrogram analy-sis for Kannada phonetics. ITSI-TEEE. 2015;3(1):16–9.

7. Shaughnessy DO. Speech communication human and machine. 2nd ed. Hyderabad: University Press, (INDIA) Ltd; 2009.

8. Atal BS, Hanauer SL. Speech analysis and synthesis by linear prediction of the speech wave. J Acoust Soc Am. 1971;50:637–55.

9. Singer E, Torres-Carrasquillo PA, Gleason TP, Campbell WM, Reynolds DA. Acoustic, phonetic, and discriminative approaches to automatic language identification. In: Eurospeech03; 2003.

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Table 8 Estimation of formant frequency in stop consonants

Fricatives Formant frequency (Hz)

F1 F2 F3 F4

/pata/ 10 2000 4000 6000/tata/ 400 1800 3400 5000/kasa/ 1800 2200 4000 6000