CHAPTER 5 RAGA IDENTIFICATION -...

105

CHAPTER 5

RAGA IDENTIFICATION

Chapter 4 dealt with how feature extraction algorithms were

designed by exploiting Carnatic music characteristics, and hence, these

features could be used for identifying the specific content of Carnatic music.

One of the important Carnatic music specific content is the Raga, which is

typically used to convey the Emotion and characteristic of a particular song.

In this Chapter, we discuss the algorithms that are proposed for Carnatic

music Raga identification.

5.1 MELODY AND RAGA

Melody is a central area in all types of music. Melody distinguishes

speech from music. Melody is fundamental in Western music. Several

attempts have been made to identify melody (Deutsch 1978, Schulkind et al

2003, Madsen and Widmer 2007). While melody is a typical characteristic of

Western music, it has an organized representation called Raga in the context

of Indian music. As discussed earlier, the concept of Raga is specific to Indian

music; it is defined as the arrangement of notes in a pre-defined order, and

each Raga is given a name based on its characteristics (Sambamurthy 1983).

Moreover, the role of Raga and its interpretation varies between Carnatic

music and Hindustani music.

A parent Raga called Melakarta Raga is one, in which all the seven

swaras of S, R, G, M, P, D, and N are present. A child Raga called Janya

Raga, is derived from a parent Raga; in this only 5 or 6 of the seven swaras

106

are present. As a child Raga is derived from a parent Raga by omitting one or

two of the seven swaras, a child Raga can be thought of as derived from

multiple parent Ragas.

Carnatic music establishes a hierarchical relationship to classify the

Raga in terms of a Parent-Child relationship. In our work, we have proposed

three different approaches for the identification of both parent and child

Ragas of Carnatic music, based on identifying the swaras in a given musical

piece. This mapping of frequency to swara is challenging, due to the narrow

range of frequency for each note, the presence of Gamakas and the use of

microtones. Hence, the observation is that the process of Raga identification

should also consider other characteristics of the Raga, in addition to the

swaras that comprise the Arohana and Avarohana of the Raga.

5.2 CHARACTERISTICS OF RAGA

A Raga is characterized by its Arohana and Avarohana. In addition

to this, a Raga is also characterized by the Raga lakshana which conveys the

semantic information about the Raga. As described in literature, a Raga

lakshana has 13 essential features (Sambamurthy 1983) and comprise of the

following components:

Graha: Note at which a Raga commences

Amsa: The note that reveals the melodic entity of the Raga - or

svarupa - or jiva swara

Nyasa: The note on which the Raga can be concluded

Mandra: The lowest note that can be played in the Raga

Tara: The highest note that can be played in the Raga

Alpatva: The note used sparingly in the Raga

107

Bahutva: The note used frequently in the Raga

Apanyasa: The same sangati is sung in Tara and Madhya sthayi

Vinayasa : Raga Sancharas are stopped at a swara - then

elaborated in Mandra and tara sthayi – a characteristic that

defines the pattern of singing a particular Raga

Sanyasa: The Raga is sung and elaborated and finally closed at

the Adhara Shadja Swara

Shadava: 6 note sancharas

Audava: 5 note sancharas

Antara Marga: Introduction of the note or chayya of another

Raga

The identification of this Raga lakshana requires the identification

of the swaras that occur in the input music signal. In this thesis, the swaras are

identified to determine the Arohana, Avarohana and the Raga lakshana, which

are later used to identify the Raga of the input music.

5.3 AROHANA AVAROHANA APPROACH

In this thesis, we propose three approaches to Raga identification:

the Arohana Avarohana approach, LDA approach and the Raga model based

approach. In the first approach, the use of the Arohana and Avarohana is

explored, as the basis for Raga identification.

5.3.1 Algorithm

The primary component of the Raga is the swara comprising of the

Arohana and the Avarohana. If the swaras comprising the Arohana and

Avarohana were identified, it would make it easy to identify the Raga. The

Arohana Avarohana algorithm was carried out using two scenarios, one with a

108

known fixed frequency corresponding to the shadja ‘S’, and the other by

estimating the tonic using our algorithm to indicate the frequency of the

shadja ‘S’.

Songs were chosen from Singers whose tonic is already assumed.

The input signal is segmented using the segmentation algorithm as discussed

in Chapter 4. As already mentioned, at the end of the segmentation phase the

assumption is that, every segment would probably correspond to a swara.

Therefore, the dominant frequency corresponding to every segment is

identified. The choice of the dominant frequency is decided using the spectral

energy and the spectral centroid feature. The spectral centroid is computed

and the frequency at which 95% of the energy is present is determined as the

dominant frequency. After identifying the frequency components from every

segment, these frequencies are converted to a swara notation, by determining

the ratio between the frequency and the known tonic using Table 1.4 of

Chapter 1 (Sambamurthy 1983).

Using these swaras that are identified, the Arohana and Avarohana

are determined by choosing one swara from all the 7 swaras. From the

Arohana and Avarohana the Raga look up table is used to determine the Raga

of the input song. A simple string matching approach is used, to compare the

identified Arohana Avarohana pattern with the Raga database, to determine

the Raga of the input song. The Raga look up table consists of the name of the

Raga, the Arohana, Avarohana of the Raga in the form of swara components,

as shown in Figure 5.1.

Raga Name Arohana Avarohana

Figure 5.1 Arohana Avarohana model

109

This algorithm lacked efficiency, since the known fixed frequency

corresponding to ‘S’ of the Singer was used, but as discussed earlier in

Chapter 1, this frequency is a variable depending on the song. Hence, the

same algorithm for Raga identification was used by accommodating the

estimated variable tonic, using our algorithm as explained in Chapter 4.

5.3.2 Analysis of the Arohana Avarohana Approach

For the purpose of Raga identification we considered songs

belonging to only parent Ragas sung by musicians, like Nithyasree,

M.S.Subalakshmi and Balamuralikrishna. For testing this algorithm, we

considered songs in the Ragas, Sankarabharanam, Kanakangi, and

Karaharapriya and in Adi Tala or Rupaka Tala. The details of the identified

Ragas, based on the Singer and the percentage of identification, are shown in

Figure 5.2.

Figure 5.2 Raga Identification - Arohana Avarohana Approach

110

A comparison is performed on the three Ragas using three Singers,

with fixed and estimated tonic to achieve an average identification rate of

61% and 66% respectively. It can also be observed from Figure 5.2, that the

Raga identification algorithm, which incorporated the tonic identification

algorithm, performed better in many situations, compared to the algorithm

that assumed only the fixed frequency for ‘S’.

This algorithm was tested only for parent Ragas where all the seven

swaras are present. However, in the case of Parent Ragas with microtones, the

so called “vivadhi” swaras were identified incorrectly leading to a wrong

identification of the Raga. In addition, due to the narrow frequency range of

the swaras and the Gamakas (pitch inflexions given to swaras), the

identification of the swaras was difficult, using only the dominant frequency

which resulted in limited accuracy. In addition, the tonic that was estimated

was based on using known frequency mutating signals corresponding to the

frequencies of ‘S’, ‘R1’, .. ‘N”. This may not be the case always since the

singer can chose a tonic in between the intervals, like a value between ‘S’ and

‘R1’ or ‘M2’ and ‘P’ and so on. Hence, in such situations the algorithm was

not able to determine the Raga due to the difficulty in tonic estimation.

Another limitation of this algorithm is that, it is not able to handle

missing swaras (Child Raga), differing swaras (Child/vakra Raga) and

jumbled swara patterns (vakra Ragas) in Arohana and Avarohana. In the case

of missing swaras, the narrow range of frequencies between notes leads to one

swara being misinterpreted as another. In the case of differing swaras the

large number of possible combinations made the interpretation of the Arohana

and Avarohana difficult, while for a jumbled swara pattern determining the

sequence of swaras was difficult.

Considering these limitations of the Arohana Avarohana approach,

we propose yet another approach for Raga identification, which is not just

111

based on the Arohana and Avarohana, but also on the Raga lakshana

characteristics. The Raga lakshana characteristics conveying semantic

information about the Raga can be mapped to the contextual information used

in text processing.

Therefore, we considered analyzing the musical piece to identify

Raga similar to a text based approach, where we have used the ‘Amsa’

characteristics of Raga lakshana in order to convey semantic information.

5.4 LDA APPROACH

To overcome the limitations of the Arohana and Avarohana

approach for Raga identification, we propose the use of a probabilistic Latent

Dirichlet Allocation (LDA) model (Hu 2009), which incorporates additional

Raga lakshana parameters to determine the Raga. The LDA is an

unsupervised statistical model, which is being used for document

classification to determine the underlying topics in a given document, under

the assumption that a document contains a random mixture of topics. In this

work, we have constructed the LDA for identifying the Raga(s) available in a

given input music signal, based on the assumption that the musical piece is a

random mixture of notes, and hence, notes map to the words in a topic, and

the topics in a document map to the Raga.

5.4.1 LDA for Music

In the work proposed by Hu (2009), the author has analysed the

performance of the LDA for text, images and music. As explained in Chapter

2, the Dirichlet parameters and indicate the distribution of the topics in a

given document. In the work for images, for each image segment, the

Dirichlet parameters are identified to determine the topics available in a given

image. In the work for music, the author has used the LDA for determining

112

the harmonic structure available in a given Western musical piece, using the

Dirichlet parameters. The authors have established the note pattern catering to

the major and minor scales using the LDA. This inspired us to try this

approach for Carnatic music analysis, which has a pre-defined arrangement of

notes, called the Raga. To determine this possibility we explored the semantic

information conveyed by the Raga, by means of the ‘Amsa’ Raga lakshana.

5.4.2 LDA based Algorithm

For the purpose of Raga identification, we explore the characteristic

phrase of notes, which is unique for every Raga. A sequence of notes that

occurs contiguously and the maximum number of times is the characteristic

phrase, which defines the ‘Amsa’ of a given Raga. This characteristic phrase

helps in distinguishing the Child Raga from its Parent, despite the swaras in

the Arohana and Avarohana are the same for the Parent and Child Raga. This

characteristic swara phrase of the Raga is used to increase the weight that the

LDA gives to the swaras, to identify the Raga. The LDA parameters and

are derived using this characteristic phrase of the Raga. The initial value of ,

need to be computed from a random mixture of songs. This value is assumed

to be uniform for all Ragas initially. It is computed with all possible

combinations of swaras, where equal weight is given to all the phrases. The

parameter , estimates the weight associated with a sequence of notes for a

given Raga and is computed by assuming the initial value of , using Bayes

equation.

The Dirichlet parameters and of the LDA are computed for all

Ragas during the training phase. These parameters require the swaras of the

song. The swaras are determined by initially segmenting using our

segmentation algorithm to identify frequency components. Then using our

tonic estimation algorithm a ratio is computed with this frequency and the

other identified frequency components. The ratio is then mapped to swaras.

113

After determining the swaras, the sequence of swaras are considered for a

length of four to compute the dirichlet parameters. In our work, we have

assumed as the generic distribution of the patterns of swaras in all Carnatic

songs, since this parameter is estimated from a random mixture. The

individual Raga’s distribution of swara patterns is given by the vector. In

our algorithm, during the training phase, we assume the initial probability

value of , by assuming all combinations of swaras of length 4. This value is

re ned by studying the frequency of the occurrence of the swara patterns in

the songs from the training set. Similarly, we initialize and recompute this

value, based on songs belonging to a particular Raga.

Using the computed value, the value of is modified, using the

Baye’s theorem. The process of computation for all Ragas, including Parent,

child and vakra, is performed to determine the corresponding probability

vectors.

In our algorithm, the estimated value of is maintained for a

pattern length. If the pattern length is 4, all songs with the identifying pattern

length of 4 will have one common , and every Raga has a vector for itself.

Using a permutation algorithm, we generate all possible combinations of the 4

length patterns that include all the 7 swaras. We initialise equal probabilities

to these patterns, with a value as 1/(7*7*7*7). To compute this value of and

we considered the 7 swaras S, R, G, M, P, D, N without representing them

as microtones.

After initialising the probability value for , we need to train this

value of with some songs belonging to all Ragas, to represent the generic

information regarding Ragas. In the training process, for every 4 length

pattern we encounter in the song, there is an increase in weight in for that

corresponding pattern. After re-computing the value of with this process by

using a training corpus, the value of is determined from for each Raga.

114

To construct , the Raga specific parameter, the system needs to be

trained with the songs of a particular Raga. Given a song, for every four

length pattern encountered, the weight of that pattern is retrieved from

vector and its weight is increased, thereby storing the pattern’s probability in

the vector. The initial value of is 1/74 which is incremented in steps of

0.004 to compute using Baye’s theorem. Then during the next computation

of , weight is incremented in steps of 0.04 to compute the value of . The

initial value of is computed and is increased by 0.05 to encounter a new

characteristic phrase pattern. Using this procedure, the top 20 patterns for a

given Raga are found with the training set, and stored as the vector. Once

again this vector is used to refine vector, which is done by adding a small

weight to the patterns of vector, that are encountered in the vector. After

gets re-computed, the same procedure is repeated to determine this vector for

all the Ragas. The pseudocode of the LDA construction algorithm is given in

the following algorithm.

LDAConstruct ()

{

Determine 4 length pattern combinations and assign equal probability

= 1 / (7*7*7*7)

For every Raga

{

Compute by choosing songs belonging to all Ragas by assigning a little

weight if the 4 length pattern is encountered in the song,

Compute by choosing the song belonging to one Raga, and if the 4-length

pattern occurs add a little weight by choosing from ,

Re-compute using the computed vector.

}

}

115

In the testing phase, the input song is given, and using vector the of the

song is determined. This is compared with the s of all the Ragas to find the

Raga whose computed is closest to an available . The closeness is

determined by the relative positions between the top 10 patterns in the input

and the available values.

5.4.3 Analysis of the LDA Approach

In our approach for determining the initial value of we used the

characteristic swara phrase of length four. The challenge was in determining

the unique phrase of notes for each Raga. In general, a Raga is characterized

by more than one characteristic note phrase which occurs frequently, and this

is represented using the dirichlet parameters and . A Raga can have

multiple characteristic phrases of length 3, 4, or 5. We started with a

characteristic phrase of 3, but however found that this did not yield correct

Raga identification. Hence we increased the length to 4 to compute the

dirichlet parameters and . We then also increased the length to 5 but found

that the LDA process did not yield better Raga identification accuracy. For

instance, if in the characteristic phrase the swara occurs for more than one

duration like the P,DP in Sankarabharanam, the identification of the phrase

was incorrect. We have not been able to handle the swara phrase as indicated

for Sankarabharanam where a note has been occurring for a longer duration,

as we have merged this together in our segmentation algorithm and

considered as one swara. Therefore, the length of the characteristic phrase was

restricted to four. In this work, we identified the characteristic phrase, from

the literature on Carnatic music (Sambamurthy P 1983), and also by

computing manually, by observing the swara representation of Carnatic songs

for each Raga. The results of the algorithm are given in

Figure 5.3.

116

Figure 5.3 Raga identification using LDA

From the figure it is observed, that the performance of the

Melakarta Ragas like, Sankarabaranam, Thodi and Kalyani is higher

compared to that of Child Ragas Madhyamavathi, Mohanam, Sindhubairavi

and Bilahari. In addition to this, if the characteristic swara phrase of the Ragas

is of length four, the performance is better than that of Ragas whose

characteristic phrase is of a different length. In addition, since we have

considered the swara representation without considering the microtones, the

computation of is also affected, leading to lesser efficiency. This is due to

the fact that the swaras R3 and G1, and similarly D3 and N1, share the same

frequency values leading to misinterpretation if we do not consider them

separately. In addition, the determination of the wrong swaras, due to an error

in the tonic, wrong frequency identification leading to wrong swara mapping

due to the presence of Gamakas, also lead to an incorrect identification of the

characteristic phrase thereby resulting in an error in Raga identification.

Therefore to tackle cases where the characteristic length of 4 for computing

the dirichlet parameters did not yield correct results, we designed the

supervised Raga model based approach to Raga identification.

117

5.5 RAGA MODEL APPROACH

The algorithms that exist for the process of Raga identification of

Hindustani or Carnatic music are based on extracting the raw signal level

features in terms, of the temporal, spectral, Cepstral features, constructing a

Classifier using these features, and determining the Raga either with or

without the help of the swara components. Chordia and Rae (2007), used

Pitch Class distribution (PCD) and Pitch Class Dyad Distribution (PCDD) for

Hindustani Raga identification, which was later, tried for Carnatic music. The

drawback of this system is the necessity of converting the input to a MIDI

representation (Chordia et al 2009), which essentially results in the loss of the

Gamakas available in the signal, because of the loss in the conversion.

Among the approaches that we have proposed, the Arohana

Avarohana approach has the drawback in determining the swara from the

frequency components, and is also not successful for Child Raga

determination. The LDA based approach, which was adopted from text

classification, has difficulty in computing the probabilistic dirichlet

parameters. In addition, since the characteristic swara phrase is a predefined

one, the use of the LDA is not fully justified.

Therefore, we combine our approaches to tackle all the drawbacks

that the individual approaches have, and conclude that the process of Raga

identification as performed by these algorithms cannot be a one step process

but rather should be a multi-step process. Therefore, as motivated by the idea

of constructing a swara model (Chordia and Rae 2008) and using our

Aroahana Avarohana approach, with the need for a multi-step algorithm as

the basis, we have constructed a Raga model. This model for the

determination of Raga is based on three major aspects - raw signal level

features, the Arohana-Avarohana pattern and other Raga lakshana

characteristics.

118

5.5.1 Components of the Raga Model

The Raga model comprises musical features and signal level

features, as shown in Figure 5.4. The musical features are represented in

terms of the Arohana, Avarohana, and the Raga lakshana characteristics.

Name Arohana Avarohana Musical Parameters - Raga Lakshana

Signal Level parameters

Rag

a

S R1

R2

R3 … N1

N2

N3

N3

N2

N1

D3 … R2

R1 S

Rag

a Ph

rase

Star

t Sw

ara

End

Sw

ara

Swar

a Fr

eque

ntly

us

ed

Swar

as th

at c

an

take

Gam

aka

CIC

CM

FCC

Flux

Cen

troi

d

Figure 5.4 Raga Model

The signal level features are represented, using the spectral and

Cepstral features consisting of the Spectral Centroid, Spectral flux, MFCC,

and CICC. The construction of the Raga model is described below.

5.5.2 Construction of the Raga Model

For implementing the Raga model, we need to specify how this

model is to be represented, so as to help in the later stages of Raga

identification.

5.5.2.1 Arohana, Avarohana and Raga Lakshana

The Arohana and Avarohana, are the primary components of the

Raga; they are the first constituents of the Raga model and are important in

distinguishing Ragas. This information is available in the literature, and in our

work we have indicated the swaras comprising the Arohana and Avarohana as

a Boolean value of either 0 or 1, indicating the presence or absence of a

swara. For this purpose, the swaras at the microtone level to indicate 3

119

variations of R, G, D, N and two variations of M, are represented as the

Arohana and Avarohana. The Raga model is created in such a manner that the

first 72 rows correspond to the Parent Ragas, and then, from row 73 onwards

the model corresponds to the Child Ragas. In some Child Ragas the Arohana

and Avarohana are different, and hence, in the Raga model there is a need to

represent them as two separate components. However, for the Vakra Ragas

the presence or absence of swaras in the Arohana, Avarohana, is indicated,

but this cannot convey the sequence of swaras that comprise the Arohana,

Avarohana.

The next component of the Raga model is the set of musical

parameters conveying other Raga lakshana characteristics. In this work, we

have considered features such as the Graha, Amsa, Nyasa, and Bahutva, since

these characteristics could be determined directly from the raw signal level

features, and are more pertinent for the identification of the Raga. Based on

the Raga lakshana, the musical parameters that we have considered are the

Starting swara, Ending swara, Sequence of swaras indicating the

characteristic phrase of a Raga, the most frequently used swara, and swaras

that can take the Gamaka. The starting and ending swaras are represented as

strings as “S”, “R1”, and “R2” etc. The Raga phrase, which is a string

consisting of three to four swaras, is represented as a sequence and there may

be more than one such characteristic pattern for a Raga, and these patterns are

unique for a particular Raga. The pattern as a string, along with its

representation as a prefix function suitable for use by string-matching

algorithms, is part of the Raga model. As explained earlier, since the

frequency span for a particular swara is very narrow, it is possible to interpret

one swara for another adjacent swara. Hence, this characteristic phrase

component helps to correctly identify the Raga even if there is an error in

identifying the swara component.

120

In addition to the above, another important component of the Raga

the Gamaka, is also used. Gamakas can be defined as pitch fluctuations. In

Carnatic music, not all swaras take discrete frequencies. The swaras can take

a range of frequency. For example, the frequency between 240 Hz and 256.4

Hz is thought of as R1, as against the exact value of 256.4 Hz. As discussed in

Chapter 1, this small range of frequency can be covered in multiple ways,

either as a valley, as a continuously changing value, a mountain or a simple

jump. The representation, of which swaras would take what type of Gamakas,

varies from Raga to Raga, and hence, this component is also represented in

the Raga model as a Raga lakshana.

The last component of the musical parameters list is again

represented as a string, where each character of the string is a probable

candidate for taking the Gamaka for a particular Raga. This component is

very useful to identify pitch fluctuations, and is used to disambiguate a swara.

If a particular swara can take the Gamaka for a Raga, and if it is identified as

an adjacent swara, then it would be corrected to the actual swara, because the

reason for the fluctuation in frequency is due to the gamaka, which had

resulted in the wrong swara. The musical parameters are populated in the

Raga model by referring to the literature of music (Sambamurthy P 1983),

and by interviewing musicologists. In addition to this, from the music

literature, patterns are extracted to determine the Raga lakshana

characteristics, and are validated against those derived from musicologists to

populate the Raga model.

5.5.2.2 Signal level parameters

The last components in the Raga model are the raw signal level

features. As discussed earlier, in cases where the swara identification is not

accurate, the signal level features can help in Raga identification. The

121

construction of a Gaussian mixture model based on signal level features, and

the use of the same for the process of Raga determination has been discussed

(Sudha 2009). We have considered both spectral and Cepstral features; and

the features used are the Spectral flux, Spectral centroid, MFCC, and CICC,

which are extracted from songs belonging to the same Raga, and the range of

these values each Raga can take is represented in the Raga model.

The spectral flux and Spectral centroid give information about the

dominant frequency in a typical segment, which is again mandatory for

conveying the swara. This is represented as a vector consisting of the spectral

centroid values for the Arohana or Avarohana pattern. This value is

determined by the method of training and extracting values for each Raga

using different Singers. The coefficients of the MFCC and the CICC are

determined for the Raga, and a feature vector consisting of the first 9

coefficients is determined by means of training and stored in the model. After

creating the Raga model with all these parameters, it is available for use in

identifying the Raga.

5.5.3 Raga Identification Algorithm

Most of the algorithms available for Hindustani and Carnatic Raga

identification, including the ones that we have proposed, use a one-step

procedure for Raga identification (Chordia et al 2009), (Pandey et al 2003). In

our work, the process of Raga identification is a three-pronged one, as shown

in Figure 5.5 which is a multi-faceted model.

122

Figure 5.5 Raga identification using Raga model

During the Raga identification phase, the frequency components are

extracted from the input music signal, followed by extracting features and the

tonic, which refers to the frequency of ‘S’ as explained in Chapter 4. Using

the extracted frequency components and the tonic, the ratio between the

frequency components and this frequency is determined to identify the swaras

constituting the input. From these swara components the Raga lakshana is

identified by observing the swaras available in the consecutive segments.

These swara sequences along with the signal level features are used for the

process of Raga identification.

In the first step of the three-pronged approach the swara pattern is

determined from the input musical piece to identify the Arohana and

Avarohana, which is compared with the Raga model to determine the Raga.

This is the first level of identification. Due to the design of the Raga Mode,l

in cases where all the seven swaras are present only the first 72 rows need to

be compared.

123

The second step in the three-pronged model of Raga identification

is the use of other Raga lakshana components of the Raga model. From the

swara pattern that has already been identified, the following components are

determined by considering the swaras available in the consecutive segments:

Starting swara – corresponding to the swara of the first segment,

which is the swara at the onset

Ending swara – corresponding to the swara of the last segment,

which is the swara at the offset

Most commonly occurring swara – count of which swara is used

the maximum

Most commonly occurring phrase of swaras – the sequential

occurrence of a phrase of swaras that has a maximum count

From the input song, the most repeating phrase of the swara is

determined using a string-matching algorithm. The string matching algorithm

that is done here is the reverse of KMP string matching. In this algorithm, the

most frequently occurring pattern is determined, rather than identifying the

presence of the pattern. This pattern is the ‘Amsa’ characteristic of the Raga

lakshana. The characteristic phrase consists of swaras, and is typically of

length six. Therefore, we observe the swara sequence using a window of

length seven (maximum seven swaras), to determine the pattern that occurs

the maximum number of times. In the case where more than one pattern

occurs for a maximum a number of times, all the patterns are identified.

After determining this pattern, all the Raga lakshana characteristics

that have been identified are compared with the Raga model. The first

comparison is performed using the characteristic phrase, to identify the Raga

following which, other components of Raga lakshana, like the starting swara,

ending swara, and frequently occurring swaras are compared.

124

During the second step the Raga lakshana component also

considers notes that can take the Gamaka. In this case, a swara could

correspond to the frequency of the next swara. Hence, during the process of

Raga identification, the Raga model checked to identify swaras that could

take the Gamaka. From the identified set of swaras the commonly occurring

phrase is determined, and in the case of a mismatch in one swara, this swara is

identified, and is replaced with the adjacent swara and again validated. The

error that this Raga model is not able to correct is the one that is encountered

due to the wrong computation of the tonic, leading to incorrect swara

mapping.

A score is assigned to determine the number of Raga lakshana

components that match. If a majority of these Raga lakshana components

match with the pre-determined Raga lakshana components of a particular

Raga, this is the identified Raga, according to the second step of the three-

pronged process of Raga identification.

After identifying the Raga in the first and second steps, signal level

parameters like the MFCC, CICC, Spectral flux and Spectral centroid, which

were extracted from the input song are compared with the features in the Raga

model, by computing the Euclidean distance between the pre-defined set and

the newly computed set to find the similarity. This step is performed to isolate

errors that are possible due to the presence of the Gamakas, while identifying

the tonic and mapping it to the swaras. This step also isolates errors that may

occur while identifying the Arohana and/or Avarohana either due to incorrect

swaras or due to incorrect extraction of the Arohana Avarohana string from

the correct swaras.

As discussed earlier, Ragas can be classified into Parent, Child and

Vakra. An ambiguous Arohana Avarohana is typical of Vakra Ragas. In the

Vakra Raga, the Arohana or Avarohana can contain more than 7 swaras or

125

can contain swaras in a non-sequential fashion in either the Arohana,

Avarohana or both. Hence there is a repetition of some swaras in terms of its

variation, which is present in the Arohana or Avarohana, or both. We have

represented the Arohana and Avarohana using binary values. From the

extracted swaras, determining a unique Arohana/Avarohana is simple for

Parent and Child Ragas, while for Vakra Ragas, the jumbled pattern

constituting the Arohana and Avarohana is difficult to identify. For example,

the Raga Anandha bairavi has its Arohana as SGRGMPDPS and Avarohana

as SNDPMGRS. Hence, in the Raga model, for the Arohana of this Raga, the

swaras S, R, G, M, P and D will be checked, and hence, extracting the

Arohana pattern from the swaras is difficult, leading to ambiguity in the Raga.

Hence, we use the characteristic phrase of this Raga, namely, SGGM, SP,

SGMP and its signal level features for Raga identification.

After the third step of the identification process, the final Raga is

chosen as the one, which is determined by more than one step in the three-

pronged process of Raga identification. In the event, where all the algorithms

gave different Ragas, the one given by the Signal Parameters is chosen, since

this value is the result of the average of all songs belonging to one Raga,

using the assumption that the error is in the swara determination.

5.5.4 Results and Analysis

After identifying the Raga, the following section discusses the Data

used, and the analysis of the Raga model’s approach to Raga identification

and its comparison with the other algorithms.

5.5.4.1 Data Used

In this work, Carnatic music songs sung by different Singers were

sampled at the rate of 44.1 KHz, and used for processing. In this work, for

126

training and testing, we have used songs sung by Singers Dr. M.S.

Subbulakshmi, Dr. M. Balamuralikrishna, Ms. Sowmya, Mr. Sikkil

Gurucharan, Ms. Sudha Raghunathan, Ms. Nithyasree Mahadevan, and Mr.

Ilayaraja. In addition to this, for the purpose of training we have also used the

data set called ‘AAlapana’ which is available from “SaReGaMa” consisting

of songs sung by different Singers pertaining to a single Raga. In both the

training and testing phases, the input song is a polyphonic music consisting of

the Instrument and the voice signal.

During the model construction phase nearly 20 songs for each Raga

are used, and the signal features are determined to be used as a parameter in

the Raga model. In every song, 5 arbitrary segments are chosen to determine

the features already indicated, and the Raga model is populated with these

feature values, which is used for comparison during the Raga identification

phase.

Nearly 27 Ragas, viz, 14 Parent Ragas and 13 Ragas belonging to

the Child or Vakra were considered, and a total of 1200 songs belonging to all

Ragas sung by both male and female Singers are used for the process of

testing.

5.5.4.2 Analysis of the Raga Model for all types of Ragas

Figure 5.6 shows the performance of the three-pronged Raga model

identified for all three types of Ragas, namely, Parent, Child, and Vakra

Ragas.

127

Figure 5.6 Performance of Raga model for Parent, Child and Vakra Ragas

In general, the identification of Parent Ragas is simpler. However,

the use of a three-pronged approach shows (Figure 5.6) that the accuracy of

the identification of Parent (81.5%), Child (77.7%) and Vakra (75.7%) Ragas

are comparable. This comparable performance was possible due to the use of

three types of evidences in determining the Raga.

5.5.4.3 Comparison and Analysis of Raga Identification Algorithms

We performed a comparative analysis of the Raga model based

identification with three of our other algorithms –the Arohana Avarohana

algorithm, the LDA model based algorithm, and the signal level based

comparison algorithm, where our specially designed CICC coefficient was an

important parameter. In addition, the PCD algorithm suggested for Carnatic

music (Chordia et al 2009) was also used for comparison. The average

performance of the five algorithms for each of the three types of Ragas is

given in Table 5.1.

128

Table 5.1 Comparison of average performance of Raga identification

algorithms

AlgorithmPerformance

Signal Parameters

Arohana Avarohana

PCD LDA RagaModel

Average Performance across Parent Ragas (14 Ragas – Average of 45 Songs per Raga)

64.2% 76.5% 74.7% 72% 81.5%

Average Performance across Child Ragas (9 Ragas – Average of 42 Songs per Raga)

50.4% 45.9% 53.5% 72% 77.7%

Average Performance across Vakra Ragas (4 Ragas – Average of 38 Songs per Raga)

39.7% 20% 28.2% 54% 75.7%

Average for all Ragas

56% 57.9% 60.7% 69.3% 79.4%

Signal level parameters conveyed mostly the timbral features, and

hence, were not able to perform well independently. The Arohana Avarohana

algorithm was not able to determine the exact swara pattern, and the

determination of the swaras was especially difficult for Child Ragas. The

PCD algorithm had the problem of converting the input to the MIDI

representation, and hence, lost information, and therefore, had an average

identification rate of 60.7%. The Raga model based three-pronged process

had an average identification of 79.4%. This was due to the multiple levels of

check in the algorithm, which uses the Raga model.

129

The table shows that the performance of all the algorithms is

comparable, and relatively good for the well structured Parent Ragas. The

Arohana-Avarohana algorithm gives a good performance (76.5%), because

there is regularity in the swara pattern. However, this algorithm is based on

the correct identification of swaras. The table shows that the three-pronged

Raga model gives an even better performance (81.5%) for Parent Ragas,

because three types of characteristics – the signal level, swara level and Raga

lakshana level are used.

For the Child Ragas, the regularity is based on characteristic

phrases, uniquely determining the Ragas rather than the complete Arohana-

Avarohana pattern. The LDA algorithm based on classifying the Ragas of

songs, based on the 4 length swara pattern gave good results (72%). Again,

the multi pronged approach of the Raga model gave even better results (77.7%).

For the Vakra Ragas, the Arohana Avarohana is jumbled, and

hence the algorithm based on this Arohana Avarohana pattern performed

poorly (20%). Moreover, although the LDA performed moderately (54%), the

jumbled nature of these swaras meant that the 4 length pattern used by the

LDA was not very effective. Again, the multi pronged approach, especially

the use of the Raga lakshanas, improved the performance (75.7%).

The Raga identification error rates of the various algorithms are

given in Table 5.2 where the error rate is defined as the ratio between the

Ragas of songs that have not been identified to the ones that have been tested

for identification. Here, we only consider songs with the specified 27 Ragas.

From the table it is noticeable that the Raga model based algorithm has the

least error rate when compared with the other four algorithms.

130

Table 5.2 Raga Identification Error Rate

Algorithm Error Rate (%) Using only Arohana and Avarohana 42.1Using only Signal level parameters 44Using PCD 39.2Using LDA 30.6Using Raga model (three-pronged) 20.5

5.5.4.4 Analysis of Parent Raga Identification

In this section we discuss in detail, the performance of the

algorithms for different Parent Ragas, which have been identified. The results

are shown in Figure 5.7, which shows that the performance of both the

Arohana Avarohana algorithm and the Raga Model algorithm was much

better than that of the other algorithms, including the PCD, for Karaharapriya

and Sarasangi, because the swaras are comparatively distinct.

Figure 5.7 shows that the performance for the Ragas Mecha Kalyani and

Dheera Sankarabaranam, where the only difference is in the swara ‘M’-Mecha

kalyani (65.8%) uses M2 and Sankarabharanam (69.6%) uses M1 respectively

- was lower than the average of all the Parent Ragas for all the algorithms

(73.8%). This is because even if one swara was identified incorrectly, an input

song was identified as Sankarabharanam when the input given is Mecha

Kalyani. The Arohana Avarohana algorithm identified the swaras correctly

when there is a clear separation between the adjacent swaras, and there is little

influence of the Gamakas. The Raga model algorithm tackled this issue to a

limited extent, by utilizing a three-pronged identification process, where the

performance for the identification of Mecha Kalyani (77%) and for

Sankarabharanam (73%) was better, but still lower than the average for Parent

Ragas (81.5%). The performance of Thodi was better than the average for all

the algorithms, and in fact PCD performed the best, because the distinctness

131

between the adjacent swaras is even more marked and amenable for histogram

tracking used by PCD.

Figure 5.7 Comparison of performance for Parent Raga identification

5.5.4.5 Analysis of Child and Vakra Raga Identification

From Figure 5.8 it can be seen that the performance of the Raga

model for even a child Raga like, Malahari is good (87%), since this Raga is

uniquely specified by the “DPMGRS” which is indicated by the Raga model

as one of the lakshanas . It outperforms the LDA (77%) which uses only a 4

length phrase for identification. In addition to the common phrase, the starting

and end swaras are the other components used by the Raga Model. The

starting swara of many Ragas is ‘S’ but there are Ragas whose starting swara

is different. The performance for the Ragas Mohanam (85% for Raga Model)

and Vasantha (76% for Raga Model) has improved because of the inclusion of

their starting swara ‘G’ by the Raga Model (Figure 5.8). The performance of

132

the PCD algorithm was good for Hamsadhwani, a child Raga which has

distinct swara components. This is because this Raga has very little or no

Gamaka for any of its swaras in the sample data considered.

As already discussed, there is ambiguity between the Ragas

Anandha Bairavi and Reethigowlai where the frequency of the usage of N is

used to resolve ambiguity. The Raga Anandha Bairavi has a limited usage of

‘N’ when compared to Reetigowlai; hence, distinguishing the two Ragas

requires the frequently used swara specified by the Raga Model. This is

evident from the fact that the performance of the Raga Model is better

(Anandha Bairavi -78%, Reetigowlai -78%) when compared to that of all the

other algorithms (Figure 5.8).

Figure 5.8 Comparison of performance for Child and Vakra Raga

identification

133

5.5.4.6 Statistical analysis of results

The analysis carried out so far and indicated by Figures 5.7 and 5.8

is an evidence of True Positive rate (TP), which is defined as the ratio of a

Raga correctly identified against the total number of songs tested for that

Raga.

In addition, we have carried out the statistical analysis of the results

of the Raga identification procedure. This is performed by determining the

False Positive and the True Negative values for all the identified Ragas, and

by plotting the results for the various algorithms. The True Negative (TN) is

defined as the ratio of the number of songs for a Raga that is unidentified,

against the total number of songs that is tested for that Raga. The False

Positive (FP) is where a Raga is identified as another Raga. This is computed

as the ratio of the number of songs that is identified as a Raga (Y) against the

number of songs that is tested for a Raga (X). The details of the False Positive

and True Negative values are specified in Table 5.3.

Table 5.3 True Negative (TN) rate and False Positive (FP) rate

comparison of the Algorithms for Raga identification

SignalParameters

(SP)

Arohana Avarohana

(ARAV)PCD LDA

RagaModel (RM)

Algorithm

Performance TN% FP% TN% FP% TN% FP% TN% FP% TN% FP%

For Parent Ragas

18.6 17.3 12.6 11.1 16 9.35 18.14 10.1 13.5 4.85

For Child and Vakra Ragas

21 32 18.7 43.3 30.5 23.8 21.8 11.8 18 5.1

Overall Ragas 19.8 24.6 15.7 27.2 23.2 16.6 19.9 10.9 15.8 4.9

When the signal level parameters alone are extracted for testing, the

signal values could correspond to the Singer, song, Instrument, Genre or any

134

other musical component; the TN-SP is high for the Parent Ragas and is much

higher for the Child and Vakra Ragas, which is above the average as shown in

Table 5.3.

A high value of the True Negative (TN) is better than a high value

of the False Positive. This is the situation of all the algorithms, and is shown

in Table 5.4. However, the FP and TN of the Raga Model is much lower than

that of the other two algorithms. For the Parent Ragas, the TN rates of the

Araohana Avarohana algorithm (12.6%) are comparable with the TN rates of

the Raga model (13.5%). On the other hand, for the other algorithms, the True

Negative is very high and much above the average. Thus, these two

algorithms had a good precision compared to the other algorithms for the

Parent Ragas.

On the other hand, for the Child and Vakra Ragas, the TN rate of

the Arohana Avarohna algorithm is the same as for the Raga Model (18.7%

and 18%). However, the False positive rate of Arohana Avarohana algorithm

is very high (43.3%) which is due to the fact that the child Ragas were

identified as its parent Ragas, due to incorrect Arohana Avarohana. The PCD

algorithm too had a higher TN comparable with the LDA based Raga

identification algorithm (16% and 18.14%) for Parent Ragas; however, the

incorporation of the characteristic phrase in the LDA algorithm improved the

TN rate for the child Ragas (21.8%) and is better than the PCD algorithm

(30.5%).

The reason for the high value of the TN and FP for the signal

parameters based algorithm, is that the signal values could be used for

multiple analyses, in terms of the Genre, Emotion, Singer, etc. by proper

training, since they conveyed the timbral feature rather than the melodic

feature. Hence, the determination of the Raga, using the signal parameter

tends to yield very poor results. For example, the Ragas Mechakalyani,

135

Karaharapriya and Keeravani had the same TN rate (17%) but showed a

difference in their FP rates (7%, 4% and 3% respectively). This difference in

the FP is due to the presence of the Gamakas in many swaras, in the input

song being considered.

The algorithms are also analyzed for their statistical significance by

performing the linear analytical comparison between the TP and FP values of

each algorithm. This linear analytical procedure uses the ordinary least

squares linear regression and so does not account for the error in the

reference/comparative method. The scatter plot is generated using a standard

tool, which is available for a spreadsheet package, and the following

observations are made with reference to estimating the components of

Standard Error and the regression value R2, by performing a scatter plot

between the True Positive and False positive rates of all the algorithms. The

values of these components are given in Table 5.4.

Table 5.4 Estimate of Regression coefficient for Raga identification

Type of Algorithm \ Parameters R2

Arohana Avarohana 0.01Signal Parameters 0.02PCD 0.12LDA 0.16Raga Model (three-pronged) 0.66

The goodness of the linear fit is estimated using the R2 value, which

is the measure of regression. The value of R2 varies from 0 to 1, with a higher

value indicating a closer fit, and a value of 1 indicating the best fit. As shown

in Table 5.5, the Raga model (three-pronged) algorithm had a much higher

value, when compared to the Arohana Avarohana algorithm, Pitch class

distribution, the LDA or the signal parameters based algorithm.

136

5.5.4.7 Complexity Analysis of Raga model algorithm

As far as the algorithm analysis is concerned, there is O(n)

comparison with respect to the Arohana and Avarohana, where ‘n’ is the

number of Ragas in the Raga model. The next level of comparison, which is

based on the signal parameter, is (n), where for each Raga; four signal

parameters have to be compared. The third level of comparison is (n) where

each of the total ‘n’ Ragas, will take another (k) character comparison, thus

accounting for (nk) comparisons, where ‘k’ is a constant which refers to the

number of character comparisons that happen during the comparison with the

Raga lakshana. Thus, the overall running time for the comparison with the

Raga model takes O(n) + (n ) + (nk ) time, resulting in (nk ) time

which is higher than O(n), with respect to the Arohana and Avarohana

algorithm, and (n) with respect to the algorithm that takes the signal

parameters only. The LDA algorithm is time consuming to compute the

probability value, and hence, has a higher time complexity when compared

with the other algorithms for Raga identification.

After identifying the Raga from the input signal, the other

non-music components, namely the Singer, Instrument, Emotion and Genre

need to be identified. For this purpose, the features that were designed for

Carnatic music, namely, the tonic, CICC and other extracted Signal level

features are used. In addition to these features, the Raga that has been

identified is also used to construct an appropriate model, leading to non-music

component identification, which is discussed in the next chapter of this thesis.

CHAPTER 5 RAGA IDENTIFICATION -...

Documents

Transcript of CHAPTER 5 RAGA IDENTIFICATION -...