Computational Approaches for Melodic Description in Indian Art Music Corpora

5[Y a M U[ M 3 [MOT _ R[ [PUO6 _O U U[ U PUM 3 a_UO 5[ [ M

.6[O [ M ET _U_ B _ M U[

a_UO E OT [ [Se 9 [a6 M Y [R R[ YM U[ M P 5[YYa UOM U[ E OT [ [SU _

F Ub _U M B[Y a MN M 4M O [ M D MU

ET _U_ PU O [ 1 ET _U_ O[YYU Y YN _1

.

)/ [b YN ().

Aa U

[PaO U[

a_UO 5[ [ MM P 6M M_ _

[Pe 6 _O U [ _M P C _ M U[ _

[Pe BMB [O __U S

3 UOM U[ _

DaYYM e M Pa a B _ O Ub _ClSM C O[S U U[

4

Aa U

[PaO U[

a_UO 5[ [ MM P 6M M_ _

[Pe 6 _O U [ _M P C _ M U[ _

[Pe BMB [O __U S

3 UOM U[ _

DaYYM e M Pa a B _ O Ub _ClSM C O[S U U[

Chapter 1Introduction

Information technology (IT) is constantly shaping the world that we live in. Its ad-vancements have changed human behavior and influenced the way we connect withour environment. Music being a universal socio-cultural phenomena has been deeplyinfluenced by these advancements. The way music is created, stored, disseminated,listened to, and even learned has changed drastically over the last few decades. Amassive amount of audio music content is now available on demand. Thus, it becomesnecessary to develop computational techniques that can process and automatically de-scribe large volumes of digital music content to facilitate novel ways of interactionwith it. There are different information sources such as editorial metadata, social data,and audio recordings that can be exploited to generate a description of music. Melody,along with harmony and rhythm, is a fundamental facet of music, and therefore, anessential component in its description. In this thesis, we focus on describing melodicaspects of music through an automated analysis of audio content.

1.1 Motivation

“It is melody that enables us to distinguish one work from another. Itis melody that human beings are innately able to reproduce by singing,humming, and whistling. It is melody that makes music memorable: weare likely to recall a tune long after we have forgotten its text.”

(Selfridge-Field, 1998)

The importance of melody in our musical experiences makes its analysis and descrip-tion a crucial component in music content processing. It becomes even more import-ant for melody dominant music traditions such as Indian art music (IAM), where theconcept of harmony (functional harmony as understood in common practice) does notexist, and the complex melodic structure takes the central role in music aesthetics.

1

Chapter 3Music Corpora and Datasets

3.1 IntroductionA research corpus is a collection of data compiled to study a research problem. A welldesigned research corpus is representative of the domain under study. It is practicallyinfeasible to work with the entire universe of data. Therefore, to ensure scalability ofinformation retrieval technologies to real-world scenarios, it is to important to developand test computational approaches using a representative data corpus. Moreover, aneasily accessible data corpus provides a common ground for researchers to evaluatetheir methods, and thus, accelerates knowledge sharing and advancement.

Not every computational task requires the entire research corpus for developmentand evaluation of approaches. Typically a subset of the corpus is used in a specificresearch task. We call this subset a test corpus or test dataset. The models built over atest dataset can later be extended to the entire research corpus. Test dataset is a staticcollection of data specific to an experiment, as opposed to a research corpus, whichcan evolve over time. Therefore, different versions of the test dataset used in a specificexperiment should be retained for ensuring reproducibility of the research results.Note that a test dataset should not be confused with the training and testing split ofa dataset, which are the terms used in the context of a cross validation experimentalsetup.

In MIR, a considerable number of the computational approaches follow a data-drivenmethodology, and hence, a well curated research corpus becomes a key factor in de-termining the success of these approaches. Due to the importance of a good datacorpus in research, building a corpus in itself is a fundamental research task (MacMul-len, 2003). MIR can be regarded as a relatively new research area within informationretrieval, which has primarily gained popularity in last two decades. Even today, asignificant number of the studies in MIR use ad-hoc procedures to build a collectionof data to be used in the experiments. Quite often the audio recordings used in theexperiments are taken from a researcher’s personal music collection. Availability of agood representative research corpus has been a challenge in MIR (Serra, 2014). This

61

Chapter 4Melody Descriptors and

Representations

4.1 Introduction

In this chapter, we describe methods for extracting relevant melodic descriptors andlow-level melody representation from raw audio signals. These melodic features areused as inputs to perform higher level melodic analyses by the methods described inthe subsequent chapters. Since these features are used in a number of computationaltasks, we consolidate and present these common processing blocks in the currentchapter. This chapter is largely based on our published work presented in Gulati et al.(2014a,b,c).

There are four sections in this chapter, and in each section, we describe the extractionof a different melodic descriptor or a representation.

In Section 4.2, we focus on the task of automatically identifying the tonic pitchin an audio recording. Our main objective is to perform a comparative evalu-ation of different existing methods and select the best method to be used in ourwork.

In Section 4.3, we present the method used to extract the predominant pitchfrom audio signals, and describe the post-processing steps to reduce frequentlyoccurring errors.

In Section 4.4, we describe the process of segmenting the solo percussion re-gions (Tani sections) in the audio recordings of Carnatic music.

In Section 4.5, we describe our nyas-based approach for segmenting melodiesin Hindustani music.

87

Chapter 5Melodic Pattern Processing:

Similarity, Discovery andCharacterization

5.1 IntroductionIn this chapter, we present our methodology for discovering musically relevant melodicpatterns in sizable audio collections of IAM. We address three main computationaltasks involved in this process: melodic similarity, pattern discovery and characteriz-ation of the discovered melodic patterns. We refer to these different tasks jointly asmelodic pattern processing.

“Only by repetition can a series of tones be characterized as somethingdefinite. Only repetition can demarcate a series of tones and its purpose.Repetition thus is the basis of music as an art”

(Schenker et al., 1980)

Repeating patterns are at the core of music. Consequently, analysis of patterns isfundamental in music analysis. In IAM, recurring melodic patterns are the buildingblocks of melodic structures. They provide a base for improvisation and composition,and thus, are crucial to the analysis and description of ragas, compositions, and artistsin this music tradition. A detailed account of the importance of melodic patterns inIAM is provided in Section 2.3.2.

To recapitulate, from the literature review presented in Section 2.4.2 and Section 2.5.2we see that the approaches for pattern processing in music can be broadly put intotwo categories (Figure 5.1). One of the types of approaches perform pattern detection(or matching) and follow a supervised methodology. In these approaches the systemknows a priori the pattern to be extracted. Typically such a system is fed with exem-plar patterns or queries and is expected to extract all of their occurrences in a piece of

121

Chapter 6Automatic Raga Recognition

6.1 Introduction

In this chapter, we address the task of automatically recognizing ragas in audio re-cordings of IAM. We describe two novel approaches for raga recognition that jointlycapture the tonal and the temporal aspects of melody. The contents of this chapter arelargely based on our published work in Gulati et al. (2016a,b).

Raga is a core musical concept used in compositions, performances, music organiz-ation, and pedagogy of IAM. Even beyond the art music, numerous compositions inIndian folk and film music are also based on ragas (Ganti, 2013). Raga is thereforeone of the most desired melodic descriptions of a recorded performance of IAM, andan important criterion used by listeners to browse its audio music collections. Despiteits significance, there exists a large volume of audio content whose raga is incorrectlylabeled or, simply, unlabeled. A computational approach to raga recognition will al-low us to automatically annotate large collections of audio music recordings. It willenable raga-based music retrieval in large audio archives, semantically-meaningfulmusic discovery and musicologically-informed navigation. Furthermore, a deeperunderstanding of the raga framework from a computational perspective will pave theway for building applications for music pedagogy in IAM.

Raga recognition is the most studied research topic in MIR of IAM. There exist aconsiderable number of approaches utilizing different characteristic aspects of ragassuch as svara set, svara salience and arohana-avrohana. A critical in-depth review ofthe existing approaches for raga recognition is presented in Section 2.4.3, wherein weidentify several shortcomings in these approaches and possible avenues for scientificcontribution to take this task to the next level. Here we provide a short summary ofthis analysis:

Nearly half of the number of the existing approaches for raga recognition donot utilize the temporal aspects of melody at all (Table 2.3), which are crucial

177

Chapter 7Applications

7.1 Introduction

The research work presented in this thesis has several applications. A number of theseapplications have already been described in Section 1.1. While some applicationssuch as raga-based music retrieval and music discovery are relevant mainly in thecontext of large audio collections, there are several applications that can be developedon the scale of the music collection compiled in the CompMusic project. In thischapter, we present some concrete examples of such applications that have alreadyincorporated parts of our work. We provide a brief overview of Dunya, a collection ofthe music corpora and software tools developed during the CompMusic project and,Saraga and Riyaz, the mobile applications developed within the technology transferproject, Culture Aware MUsic Technologies (CAMUT). In addition, we present threeweb demos that showcase parts of the outcomes of our computational methods. Wealso briefly present one of our recent studies that performs musicologically motivatedexploration of melodic structures in IAM. It serves as an example of how our methodscan facilitate investigations in computational musicology.

7.2 Dunya

Dunya55 comprises the music corpora and the software tools that have been developedas part of the CompMusic project. It includes data for five music traditions - Hindus-tani, Carnatic, Turkish Makam, Jingju and Andalusian music. By providing accessto the gathered data (such as audio recordings and metadata), and the generated data(such as audio descriptors) in the project, Dunya aims to facilitate study and explor-ation of relevant musical aspects of different music repertoires. The CompMusiccorpora mainly comprise audio recordings and complementary information that de-scribes those recordings. This complementary information can be in the form of the

55http://dunya.compmusic.upf.edu/

207

Chapter 8Summary and Future Perspectives

8.1 IntroductionIn this thesis, we have presented a number of computational approaches for analyzingmelodic elements at different levels of melodic organization in IAM. In tandem, theseapproaches generate a high-level melodic description of the audio collections in thismusic tradition. They build upon each other to finally allow us to achieve the goalsthat we set at the beginning of this thesis, which are:

To curate and structure representative music corpora of IAM that comprise au-dio recordings and the associated metadata, and use that to compile sizable andwell annotated tests datasets for melodic analyses.

To develop data-driven computational approaches for the discovery and char-acterization of musically relevant melodic patterns in sizable audio collectionsof IAM

To devise computational approaches for automatically recognizing ragas in re-corded performances of IAM.

Based on the results presented in this thesis, we can now say that our goals are suc-cessfully met. We started this thesis by presenting our motivation behind the analysisand description of melodies in IAM, highlighting the opportunities and challengesthat this music tradition offers within the context of music information research andthe CompMusic project (Chapter 1). We provided a brief introduction to IAM and itsmelodic organization, and critically reviewed the existing literature on related topicswithin the context of MIR and IAM (Chapter 2).

In Chapter 3, we provided a comprehensive overview of the music corpora and data-sets that were compiled and structured in our work as a part of the CompMusicproject. To the best of our knowledge, these corpora comprise the largest audiocollections of IAM along with curated metadata and automatically extracted music

221

Chapter 2Background

2.1 Introduction

In this chapter, we present our review of the existing literature related with the workpresented in this thesis. In addition, we also present a brief overview of the relevantmusic and mathematical background. We start with a brief discussion on the termin-ology used in this thesis (Section 2.2). Subsequently, we provide an overview of theselected music concepts to better understand the computational tasks addressed inour work (Section 2.3). We then present a review of the relevant literature, which wedivide into two parts. We first present a review of the work done in computational ana-lysis of IAM (Section 2.4). This includes approaches for tonic identification, melodicpattern processing and automatic raga recognition. In the second part, we presentrelevant work done in MIR in general, including topics related to pattern processing(detection and discovery), and key estimation (Section 2.5). Finally, we provide abrief overview of selected scientific concepts (Section 2.6).

2.2 Terminology

In this section, we provide our working definition of selected terms that we haveused throughout the thesis. To begin with, it is important to understand the mean-ing of melody in the context of this thesis. As we see from the literature, definingmelody in itself has been a challenge, with no consensus on a single definition ofmelody (Gómez et al., 2003; Salamon, 2013). We do not aim here to formally definemelody in the universal sense, but present its working definition within the scope ofthis thesis. Before we proceed, it is important to understand the setup of a concertof IAM. Every performance of IAM has a lead artist, who plays the central role (alsoliterally positioned in the center of the stage), and all other instruments are consideredas accompaniments. There is a main melody line by the lead performer, and typic-ally, also a melodic accompaniment that follows the main melody. A more detailed

13

5

[PaO U[

6

Chapter 1Introduction

Information technology (IT) is constantly shaping the world that we live in. Its ad-vancements have changed human behavior and influenced the way we connect withour environment. Music being a universal socio-cultural phenomena has been deeplyinfluenced by these advancements. The way music is created, stored, disseminated,listened to, and even learned has changed drastically over the last few decades. Amassive amount of audio music content is now available on demand. Thus, it becomesnecessary to develop computational techniques that can process and automatically de-scribe large volumes of digital music content to facilitate novel ways of interactionwith it. There are different information sources such as editorial metadata, social data,and audio recordings that can be exploited to generate a description of music. Melody,along with harmony and rhythm, is a fundamental facet of music, and therefore, anessential component in its description. In this thesis, we focus on describing melodicaspects of music through an automated analysis of audio content.

1.1 Motivation

“It is melody that enables us to distinguish one work from another. Itis melody that human beings are innately able to reproduce by singing,humming, and whistling. It is melody that makes music memorable: weare likely to recall a tune long after we have forgotten its text.”

(Selfridge-Field, 1998)

The importance of melody in our musical experiences makes its analysis and descrip-tion a crucial component in music content processing. It becomes even more import-ant for melody dominant music traditions such as Indian art music (IAM), where theconcept of harmony (functional harmony as understood in common practice) does notexist, and the complex melodic structure takes the central role in music aesthetics.

1

[ UbM U[

3a [YM UO Ya_UO P _O U U[

7

[ UbM U[

] ]_Ue]gl& <]fWbiYel&YUeW & IUX]b&

IYWb YaXUg]ba

GYXU[b[l& =XhWUg]ba&=agYegU]a Yag&

=a UaWYX _]fgYa]a[

b chgUg]baU_hf]Wb_b[l

8

[ UbM U[

IYceYfYagUg]baY[ YagUg]ba] ]_Ue]glDbg]Zf <]fWbiYelGUggYea UeUWgYe]mUg]ba

DY_bX]W<YfWe]cg]ba

9

5[Y a_UO B [V O

• Serra, X. (2011). A multicultural approach to music information research. In Proc. of ISMIR, pp. 151–156.

<YfWe]cg]ba

<UgU Xe]iYa

h_gheY U UeY

FcYa UWWYff

IYcebXhW]V]_]gl

10

PUM M a_UO

11

]aXhfgUa]BUhfghi B( ?Ua[h_]

UeaUg]WM][aYf Af Ue

E YU [ [Se

§ E[ UO1 4M_ U OT GM U _ MO [__ M U_ _

§ [Pe1 BU OT [R T P[YU M Y [PUO _[a O

§ ClSM1 [PUO R MY c[ W

§ DbM M1 3 M [S[a_ [ M [

§ 3 mTM M Mb mTM M1 3_O PU S P _O PU S [S __U[

§ 5TM M 1 3N_ MO U[ [R T Y [PUO [S __U[

§ [ UR_1 5TM MO U_ UO Y [PUO M _

§ 9MYMWM_1 9 UPU S Y [PUO S _ a _ 5M M UO Ya_UO

12

A [ a U U _ M P 5TM S _

§ A [ a U U _

13

A [ a U U _ M P 5TM S _

§ A [ a U U _§ H _ aPU P Ya_UO MPU U[

14

A [ a U U _ M P 5TM S _

§ A [ a U U _§ H _ aPU P Ya_UO MPU U[§ : [ T[ UO MO[a_ UOM OTM MO U_ UO_

15

A [ a U U _ M P 5TM S _

§ A [ a U U _§ H _ aPU P Ya_UO MPU U[§ : [ T[ UO MO[a_ UOM OTM MO U_ UO_

• http://www.music- ir.org/mirex/wiki/MIREX_HOME• Salamon, J. & Gómez, E. (2012). Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE

Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.

DAI=O /DAI=O 3 /<: AE<A9E 2

16

< <

A [ a U U _ M P 5TM S _

§ A [ a U U _§ H _ aPU P Ya_UO MPU U[§ : [ T[ UO MO[a_ UOM OTM MO U_ UO_§ [PUO M _ à NaU PU S N [OW_

17

A [ a U U _ M P 5TM S _

§ A [ a U U _§ H _ aPU P Ya_UO MPU U[§ : [ T[ UO MO[a_ UOM OTM MO U_ UO_§ [PUO M _ à NaU PU S N [OW_§ ClSM R MY c[ W1

• Martinez, J. L. (2001). Semiosis in Hindustani Music. Motilal Banarsidass Publishers. 18

rK Y ex[U ]f beY Z]kYX g Ua U bXY& UaX _Yff Z]kYX g Ua g Y Y_bXl& VYlbaX g YbXY UaX f beg bZ Y_bXl& UaX e]W Ye Vbg g Ua U []iYa bXY be U []iYaY_bXl(s

RDUeg]aYm& G(30S

DbXY 5 Ix[U 5 DY_bXl

A [ a U U _ M P 5TM S _

§ 5TM S _

19

A [ a U U _ M P 5TM S _

§ 5TM S _§ Y [bU_M [ e Ya_UO MPU U[

Time (s)

Freq

uenc

y (C

ent)

1 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0

200

400

600

800

1000

1200

1400

20

A [ a U U _ M P 5TM S _

§ 5TM S _§ Y [bU_M [ e Ya_UO MPU U[§ [PUO M _O U U[ OTM SU S M P U P RU P

0 2 4 6 8 10

Time (s)

�100

0

100

200

300

400

500

600

Freq

uenc

y(C

ents

)

S

m

g

Characteristic movement

• http://www.freesound.org/people/sankalp/sounds/360428/ 21

A [ a U U _ M P 5TM S _

§ 5TM S _§ Y [bU_M [ e Ya_UO MPU U[§ [PUO M _O U U[ OTM SU S M P U P RU P§ [ _ M PM P R O R ]a Oe E[ UO U OT

9#++

9#++0

:#+,-

?#+ -

?#32

22

A [ a U U _ M P 5TM S _

§ 5TM S _§ Y [bU_M [ e Ya_UO MPU U[§ [PUO M _O U U[ OTM SU S M P U P RU P§ [ _ M PM P R O R ]a Oe E[ UO U OT§ >[ S MaPU[ O[ PU S_

q+ FLI

23

4 [MP ANV O Ub _

§ 5[Y U M U[ Oa M U[ [R O[ [ M [R 3

§ 6U_O[b e OTM MO UfM U[ [R Y [PUO M _

§ 3a [YM UO lSM O[S U U[

24

5[Y a M U[ M EM_W_

DY_bX]W<YfWe]cg]ba

25

5[Y a M U[ M EM_W_

Kba]W AXYag]Z]WUg]ba #=iU_hUg]ba

DY_bX]W<YfWe]cg]ba

26

5[Y a M U[ M EM_W_


DY_bX]W<YfWe]cg]ba

Elxf Y[ YagUg]ba

27

5[Y a M U[ M EM_W_


DY_bX]W<YfWe]cg]ba

DY_bX]W ] ]_Ue]gl

Elxf Y[ YagUg]ba

28

5[Y a M U[ M EM_W_


DY_bX]W<YfWe]cg]ba

GUggYea <]fWbiYel

DY_bX]W ] ]_Ue]gl

Elxf Y[ YagUg]ba

time (s) 10 30

time (s) 10 30

29

5[Y a M U[ M EM_W_


DY_bX]W<YfWe]cg]ba

GUggYea UeUWgYe]mUg]ba

GUggYea <]fWbiYel

DY_bX]W ] ]_Ue]gl

Elxf Y[ YagUg]ba

time (s) 10 30

time (s) 10 30

30

5[Y a M U[ M EM_W_


9hgb Ug]W Ix[U IYWb[a]g]ba

DY_bX]W<YfWe]cg]ba

GUggYea <]fWbiYel

DY_bX]W ] ]_Ue]gl

Elxf Y[ YagUg]ba

time (s) 10 30

time (s) 10 30

31

GUggYea UeUWgYe]mUg]ba

a_UO 5[ [ M M P 6M M_ _

32

Chapter 3Music Corpora and Datasets

3.1 IntroductionA research corpus is a collection of data compiled to study a research problem. A welldesigned research corpus is representative of the domain under study. It is practicallyinfeasible to work with the entire universe of data. Therefore, to ensure scalability ofinformation retrieval technologies to real-world scenarios, it is to important to developand test computational approaches using a representative data corpus. Moreover, aneasily accessible data corpus provides a common ground for researchers to evaluatetheir methods, and thus, accelerates knowledge sharing and advancement.

Not every computational task requires the entire research corpus for developmentand evaluation of approaches. Typically a subset of the corpus is used in a specificresearch task. We call this subset a test corpus or test dataset. The models built over atest dataset can later be extended to the entire research corpus. Test dataset is a staticcollection of data specific to an experiment, as opposed to a research corpus, whichcan evolve over time. Therefore, different versions of the test dataset used in a specificexperiment should be retained for ensuring reproducibility of the research results.Note that a test dataset should not be confused with the training and testing split ofa dataset, which are the terms used in the context of a cross validation experimentalsetup.

In MIR, a considerable number of the computational approaches follow a data-drivenmethodology, and hence, a well curated research corpus becomes a key factor in de-termining the success of these approaches. Due to the importance of a good datacorpus in research, building a corpus in itself is a fundamental research task (MacMul-len, 2003). MIR can be regarded as a relatively new research area within informationretrieval, which has primarily gained popularity in last two decades. Even today, asignificant number of the studies in MIR use ad-hoc procedures to build a collectionof data to be used in the experiments. Quite often the audio recordings used in theexperiments are taken from a researcher’s personal music collection. Availability of agood representative research corpus has been a challenge in MIR (Serra, 2014). This

61

5[Y a_UO 5[ [ M

33

5[Y a_UO 5[ [ M

§ E MY RR[

34

5[Y a_UO 5[ [ M

§ E MY RR[

§ B [O Pa M P P _US O U UM

• Serra, X. (2014). Creating research corpora for the computational study of music: the case of the Compmusic project. In Proc. of the 53rd AES Int. Conf. on Semantic Audio. London.

• Srinivasamurthy, A., Koduri, G. K., Gulati, S., Ishwar, V., & Serra, X. (2014). Corpora for music information research in Indian art music. In Int. Computer Music Conf./Sound and Music Computing Conf., pp. 1029–1036. Athens, Greece. 35

5[Y a_UO 5[ [ M

36

5[Y a_UO 5[ [ M

9hX]b =X]gbe]U_ YgUXUgU

<f Dhf]WVeU]am

37

5M M UO M P :U Pa_ M U a_UO 5[ [ M

8 9

.=3

.20891573

9 5

38

A MOO __ a_UO 5[ [ M

8 9

.=3

.20891573

9 5

39

A MOO __ a_UO 5[ [ M

U Uf

YWg]baf

DY_bX]W c eUfYf

40

E _ 6M M_ _

§ E[ UO P URUOM U[

§ el_ D SY M U[

§ [PUO DUYU M U e

§ ClSM C O[S U U[

41

[Pe 6 _O U [ _ M P C _ M U[ _

42

Chapter 4Melody Descriptors and

Representations

4.1 Introduction

In this chapter, we describe methods for extracting relevant melodic descriptors andlow-level melody representation from raw audio signals. These melodic features areused as inputs to perform higher level melodic analyses by the methods described inthe subsequent chapters. Since these features are used in a number of computationaltasks, we consolidate and present these common processing blocks in the currentchapter. This chapter is largely based on our published work presented in Gulati et al.(2014a,b,c).

There are four sections in this chapter, and in each section, we describe the extractionof a different melodic descriptor or a representation.

In Section 4.2, we focus on the task of automatically identifying the tonic pitchin an audio recording. Our main objective is to perform a comparative evalu-ation of different existing methods and select the best method to be used in ourwork.

In Section 4.3, we present the method used to extract the predominant pitchfrom audio signals, and describe the post-processing steps to reduce frequentlyoccurring errors.

In Section 4.4, we describe the process of segmenting the solo percussion re-gions (Tani sections) in the audio recordings of Carnatic music.

In Section 4.5, we describe our nyas-based approach for segmenting melodiesin Hindustani music.

87

[Pe 6 _O U [ _ M P C _ M U[ _

Kba]W AXYag]Z]WUg]ba#=iU_hUg]ba

Elxf Y[ YagUg]ba

GeYXb ]aUag DY_bXl =fg] Ug]baUaX cbfg cebWYff]a[

KUa] Y[ YagUg]ba

43

[Pe 6 _O U [ _ M P C _ M U[ _

Kba]W AXYag]Z]WUg]ba#=iU_hUg]ba

Elxf Y[ YagUg]ba

GeYXb ]aUag DY_bXl =fg] Ug]baUaX cbfg cebWYff]a[

KUa] Y[ YagUg]ba

44

E[ UO P URUOM U[ 1 3 [MOT _

2.4R

EL

AT

ED

WO

RK

ININ

DIA

NA

RT

MU

SIC25

Method Features Feature Distribution Tonic SelectionMRS (Sengupta et al., 2005) Pitch (Datta, 1996) NA Error minimization

MRH1/MRH2 (Ranjani et al., 2011) Pitch (Boersma & Weenink,2001) Parzen-window-based PDE GMM fitting

MJS (Salamon et al., 2012) Multi-pitch salience(Salamon et al., 2011) Multi-pitch histogram Decision tree

MSG (Gulati et al., 2012) Multi-pitch salience(Salamon et al., 2011) Multi-pitch histogram Decision tree

Predominant melody(Salamon & Gómez, 2012) Pitch histogram Decision tree

MAB1 (Bellur et al., 2012) Pitch (De Cheveigné &Kawahara, 2002) GD histogram Highest peak

MAB2 (Bellur et al., 2012) Pitch (De Cheveigné &Kawahara, 2002) GD histogram Template matching


MCS (Chordia & Sentürk, 2013)

ABBREVIATIONS: NA=Not applicable; GD=Group Delay, PDE=Probability Density Estimate

Table 2.1: Summary of the existing tonic identification approaches.

• Salamon, J., Gulati, S., & Serra, X. (2012). A multipitch approach to tonic identification in Indian classical music. In Proc. of Int. Soc. for Music Information Retrieval Conf. (ISMIR), pp. 499–504. Porto, Portugal.

• Gulati, S., Salamon, J., & Serra, X. (2012). A two-stage approach for tonic identification in indian art music. In Proc. of the 2nd CompMusic Workshop, pp. 119–127. Istanbul, Turkey: Universitat Pompeu Fabra.

2.4R

EL

AT

ED

WO

RK

ININ

DIA

NA

RT

MU

SIC25

Method Features Feature Distribution Tonic SelectionMRS (Sengupta et al., 2005) Pitch (Datta, 1996) NA Error minimization

MRH1/MRH2 (Ranjani et al., 2011) Pitch (Boersma & Weenink,2001) Parzen-window-based PDE GMM fitting

MJS (Salamon et al., 2012) Multi-pitch salience(Salamon et al., 2011) Multi-pitch histogram Decision tree

MSG (Gulati et al., 2012) Multi-pitch salience(Salamon et al., 2011) Multi-pitch histogram Decision tree

Predominant melody(Salamon & Gómez, 2012) Pitch histogram Decision tree


MAB2 (Bellur et al., 2012) Pitch (De Cheveigné &Kawahara, 2002) GD histogram Template matching


MCS (Chordia & Sentürk, 2013)

ABBREVIATIONS: NA=Not applicable; GD=Group Delay, PDE=Probability Density Estimate

Table 2.1: Summary of the existing tonic identification approaches.

Dh_g]c]gW % _Uff]Z]WUg]ba GeYXb ]aUag c]gW % gY c_UgY)cYU c]W

45

5[Y M M Ub 7bM aM U[

§ D b Y T[P_

§ DUd PUb _ PM M_ _

TIDCM1

TIDCM2

TIDCM3

TIDIITM1

TIDIITM2

TIDIISc

,1+3-/.,2-2.1,//

--+/+..+,1

=kWYecgf DYUa XheUg]ba# ]af

46

C _a _

4.2T

ON

ICID

EN

TIFIC

AT

ION:

APPR

OA

CH

ES

AN

DC

OM

PAR

AT

IVE

EV

AL

UA

TIO

N91

MethodsTIDCM1 TIDCM2 TIDCM3 TIDIISc TIDIITM1 TIDIITM2

TP TPC TP TPC TP TPC TP TPC TP TPC TP TPC

MRH1 - 81.4 69.6 84.9 73.2 90.8 81.8 83.6 92.1 97.4 80.2 86.9

MRH2 - 63.2 65.7 78.2 68.5 83.5 83.6 83.6 94.7 97.4 83.8 88.8

MAB1 - - - - - - - - 89.5 89.5 - -

MAB2 - 88.9 74.5 82.9 78.5 83.4 72.7 76.4 92.1 92.1 86.6 89.1

MAB3 - 86 61.1 80.5 67.8 79.9 72.7 72.7 94.7 94.7 85 86.6

MJS - 88.9 87.4 90.1 88.4 91 75.6 77.5 89.5 97.4 90.8 94.1

MSG - 92.2 87.8 90.9 87.7 90.5 79.8 85.3 97.4 97.4 93.6 93.6

Table 4.1: Accuracies (%) for tonic pitch (TP) and tonic pitch-class (TPC) identification by seven methods on six different datasets using onlyaudio data. The best accuracy obtained for each dataset is highlighted using bold text. The dashed horizontal line divides the methods basedon supervised learning (MJS and MSG) and those based on expert knowledge (MRH1, MRH2, MAB1, MAB2 and MAB3). TP column for TIDCM1 ismarked as ‘-’, because it consists of only instrumental excerpts for which we not evaluate tonic pitch accuracy. MAB1 is only evaluated on TIDIITM1since it works on the whole concert recording.

94 MELODY DESCRIPTORS AND REPRESENTATIONS

Methods TIDCM1 TIDCM2 TIDCM3 TIDIISc TIDIITM1 TIDIITM2

MRH1 87.7 83.5 88.9 87.3 97.4 91.7

MRH2 79.55 76.3 82 85.5 97.4 91.5

MAB1 - - - - 97.4 -

MAB2 92.3 91.5 94.2 81.8 97.4 91.1

MAB3 87.5 86.7 90.9 81.8 94.7 89.9

MJS 88.9 93.6 92.4 80.9 97.4 92.3

MSG 92.2 90.9 90.5 85.3 97.4 93.6

Table 4.2: Accuracies (tonic pitch-class (%)) when additional information regarding thegender of the lead singer (male/female) and performance type (vocal/instrumental) is used.The dashed horizontal line divides the methods based on supervised learning (MJS and MSG)and those based on expert knowledge (MRH1, MRH2, MAB1, MAB2 and MAB3).

the presence of the drone, the tonal sounds produced by percussive instruments andthe octave errors produced by the pitch tracker, all contribute to the appearance of ahigh peak one octave below the tonic of the female singers. This is especially thecase for three minute excerpts, wherein a limited amount of vocal pitch informationis available. In the case of the approaches based on multipitch analysis and classi-fication (MJS and MSG), a probable reason for obtaining better performance for malesingers is the larger number of excerpts from male singers in the database. As a res-ult, it is possible that the rules learned by the classifier are slightly biased towards theperformances of male singers.

4.2.2.2 Results Obtained Using Metadata Together with the Audio

One of the ways to reduce the amount of octave errors in tonic identification is torestrict the frequency range of the allowed tonic pitches. The frequency range canbe optimized based on the additional information regarding the gender of the singer(when available) to guide the method. In this section, we analyze the effect of includ-ing information regarding the gender of the singer and the performance type (vocal orinstrumental) on the identification accuracy obtained by the methods.

In Table 4.2, we present the identification accuracies obtained when associated metadatais available to the methods in addition to the audio data. Note for this evaluation weonly report the tonic pitch accuracy for vocal excerpts (and not pitch-class accuracy)since when this metadata is available the pitch range of the tonic is known and limitedto a single octave, meaning the TP and TPC accuracies will be the same.

Comparing the identification accuracies summarized in Table 4.2 with the with onesin Table 4.1, we see that the accuracies for all methods are higher when gender and

3aPU[

3aPU[ MPM M47

C _a _

CM2 CM3 IITM2 IITM160

65

70

75

80

85

90

95

100

Per

form

an

ce A

ccu

racy

(%

)

JS

SG

RH1

RH2

AB2

AB3

Dataset

Acc

urac

y (%

)

40

50

60

70

80

90

100

Per

form

an

ce A

ccu

racy

(%

)

JS SG

RH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SG

RH1RH2AB2AB3

JS SG

RH1RH2AB2AB3

Tonic PitchTonic Pitch−class

Hindustani Carnatic Male Female

Acc

urac

y (%

)

Tonic Pitch-classTonic Pitch

0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors

CM1 CM2 CM3 IISCB1 IITM1 IITM2

0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


Erro

rs (%

)

Datasets

0

5

10

15

20

25

30

Per

form

an

ce A

ccu

racy

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors

Hindustani! Carnatic Male Female

Acc

urac

y (%

)

0

5

10

15

20

25

30

Per

form

an

ce A

ccu

racy

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Per

form

an

ce A

ccu

racy

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


Erro

rs (%

)

Datasets

48

C _a _

CM2 CM3 IITM2 IITM160

65

70

75

80

85

90

95

100

Per

form

an

ce A

ccu

racy

(%

)

JS

SG

RH1

RH2

AB2

AB3

Dataset

Acc

urac

y (%

)

40

50

60

70

80

90

100

Per

form

an

ce A

ccu

racy

(%

)

JS SG

RH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SG

RH1RH2AB2AB3

JS SG

RH1RH2AB2AB3

Tonic PitchTonic Pitch−class

Hindustani Carnatic Male Female

Acc

urac

y (%

)

Tonic Pitch-classTonic Pitch

0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


Erro

rs (%

)

Datasets

0

5

10

15

20

25

30

Per

form

an

ce A

ccu

racy

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


Acc

urac

y (%

)

0

5

10

15

20

25

30

Per

form

an

ce A

ccu

racy

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Per

form

an

ce A

ccu

racy

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


0

5

10

15

20

25

30

Err

ors

(%

)

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

JS SGRH1RH2AB2AB3

Pa Errors

Ma Errors

Other Errors


Erro

rs (%

)

Datasets

<heUg]ba bZ YkWYecg

UgY[bel ]fY UaU_lf]f

<YgU]_YX Yeebe UaU_lf]f

UgY[bel ]fY Yeebe 9aU_lf]f

49

C _a _1 DaYYM e

§ a U U OT O M__URUOM U[

§ 3aPU[

§ bM UM [ Pa M U[

§ 5[ _U_ MPU U[ S P

U _ aY

§ BU OT Y M MW UOW

§ 3aPU[ Y MPM M

§ D _U Ub [ Pa M U[

§ O[ _U_ MPU U[ S P

U _ aY

• Gulati, S., Bellur, A., Salamon, J., Ranjani, H. G., Ishwar, V., Murthy, H. A., & Serra, X. (2014). Automatic tonic identification in Indian art music: approaches and evaluation. JNMR, 43(1), 53–71. 50

B P[YU M BU OT 7_ UYM U[

DY_bX]U

• Salamon, J. & Gómez, E. (2012). Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.

• Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J., & Serra, X. (2013). Essentia: an audio analysis library for music information retrieval. In ISMIR, pp. 493–498. 51

el_ DbM M D SY M U[

§ [PUO _ SY M U[

§ 6 UYU Y [PUO T M_ _

178 180 182 184 186 188 190−200

0

200

400

600

800

Time (s)

F0 frequ

ency

(ce

nts)

N1 N5N4N3N2

52

el_ DbM M D SY M U[

Predominant pitch estimation Tonic identification

Audio

Nyās segments

Pred. pitch estimation and representation

Histogram computation

SegmentationSvara identification Segmentation

Local Feature extractionContextual Local + Contextual

Segment classification Segment fusionSegment classification and fusion

53

el_ DbM M D SY M U[


Audio

Nyās segments






54

el_ DbM M D SY M U[


Audio

Nyās segments

Histogram computation SegmentationSvara identification Segmentation




55

el_ DbM M D SY M U[


Audio

Nyās segments






0.12, 0.34, 0.59, 0.23, 0.54

0.21, 0.24, 0.54, 0.54, 0.42

0.32, 0.23, 0.34, 0.41, 0.63

0.66, 0.98, 0.74, 0.33, 0.12

0.90, 0.42, 0.14, 0.83, 0.76

56

el_ DbM M D SY M U[


Audio

Nyās segments






0.12, 0.34, 0.59, 0.23, 0.54

0.21, 0.24, 0.54, 0.54, 0.42

0.32, 0.23, 0.34, 0.41, 0.63

0.66, 0.98, 0.74, 0.33, 0.12

0.90, 0.42, 0.14, 0.83, 0.76

Nyās

Non-nyās

Nyās

Nyās

Non-nyās

57

6M M_

§ k l R[ YM O _ :U Pa_ M U

§ 3 [ M U[ _1 Ya_UOUM 2)- e M _ MU U S

§ ) -/ el_ _ SY _

657

IYWbeX]a[f <heUg]ba 9eg]fgf Ix[Uf

58

C _a _118 MELODY DESCRIPTORS AND REPRESENTATIONS

Segmentation Feat. DTW Tree k-NN NB LR SVM

PLSFL 0.356 0.407 0.447 0.248 0.449 0.453FC 0.284 0.394 0.387 0.383 0.389 0.406

FL+FC 0.289 0.414 0.426 0.409 0.432 0.437

ProposedFL 0.524 0.672 0.719 0.491 0.736 0.749FC 0.436 0.629 0.615 0.641 0.621 0.673

FL+FC 0.446 0.682 0.708 0.591 0.725 0.735

Table 4.3: F-scores for nyas boundary detection using PLS method and the proposed seg-mentation method. Results are shown for different classifiers (Tree, k-NN, NB, LR, SVM)and local (FL), contextual (FC) and local together with contextual (FL+FC) features. DTWis the baseline method used for comparison. F-score for the random baseline obtained usingBR2 is 0.184.


PLSFL 0.553 0.685 0.723 0.621 0.727 0.722FC 0.251 0.639 0.631 0.690 0.688 0.674

FL+FC 0.389 0.694 0.693 0.708 0.722 0.706

ProposedFL 0.546 0.708 0.754 0.714 0.749 0.758FC 0.281 0.671 0.611 0.697 0.689 0.697

FL+FC 0.332 0.672 0.710 0.730 0.743 0.731

Table 4.4: F-scores for nyas and non-nyas label annotation task using PLS method and theproposed segmentation method. Results are shown for different classifiers (Tree, k-NN, NB,LR, SVM) and local (FL), contextual (FC) and local together with contextual (FL+FC) fea-tures. DTW is the baseline method used for comparison. The best random baseline F-score is0.153 obtained using BR2.

significant for all the proposed method variants. Furthermore, we also see that theproposed segmentation method consistently performs better than PLS. However, thedifferences are not always statistically significant.

In addition, we also investigate per-class accuracies for the label annotations. Wefind that the performance for the nyas segments is considerably better than the non-nyas segments. This could be attributed to the fact that even though the segmentclassification accuracy is balanced across classes, the differences in segment lengthof nyas and non-nyas segments (nyas segments being considerably longer than non-nyas segments) can result in more number of matched pairs for nyas segments.

In general, we see that the proposed segmentation method improves the performanceover PLS method in both the tasks, wherein the differences are statistically significant

118 MELODY DESCRIPTORS AND REPRESENTATIONS


PLSFL 0.356 0.407 0.447 0.248 0.449 0.453FC 0.284 0.394 0.387 0.383 0.389 0.406

FL+FC 0.289 0.414 0.426 0.409 0.432 0.437

ProposedFL 0.524 0.672 0.719 0.491 0.736 0.749FC 0.436 0.629 0.615 0.641 0.621 0.673

FL+FC 0.446 0.682 0.708 0.591 0.725 0.735

Table 4.3: F-scores for nyas boundary detection using PLS method and the proposed seg-mentation method. Results are shown for different classifiers (Tree, k-NN, NB, LR, SVM)and local (FL), contextual (FC) and local together with contextual (FL+FC) features. DTWis the baseline method used for comparison. F-score for the random baseline obtained usingBR2 is 0.184.


PLSFL 0.553 0.685 0.723 0.621 0.727 0.722FC 0.251 0.639 0.631 0.690 0.688 0.674

FL+FC 0.389 0.694 0.693 0.708 0.722 0.706

ProposedFL 0.546 0.708 0.754 0.714 0.749 0.758FC 0.281 0.671 0.611 0.697 0.689 0.697

FL+FC 0.332 0.672 0.710 0.730 0.743 0.731

Table 4.4: F-scores for nyas and non-nyas label annotation task using PLS method and theproposed segmentation method. Results are shown for different classifiers (Tree, k-NN, NB,LR, SVM) and local (FL), contextual (FC) and local together with contextual (FL+FC) fea-tures. DTW is the baseline method used for comparison. The best random baseline F-score is0.153 obtained using BR2.

significant for all the proposed method variants. Furthermore, we also see that theproposed segmentation method consistently performs better than PLS. However, thedifferences are not always statistically significant.

In addition, we also investigate per-class accuracies for the label annotations. Wefind that the performance for the nyas segments is considerably better than the non-nyas segments. This could be attributed to the fact that even though the segmentclassification accuracy is balanced across classes, the differences in segment lengthof nyas and non-nyas segments (nyas segments being considerably longer than non-nyas segments) can result in more number of matched pairs for nyas segments.

In general, we see that the proposed segmentation method improves the performanceover PLS method in both the tasks, wherein the differences are statistically significant

:bhaXUel

CUVY_

59

C _a _1 DaYYM e

§ B [ [_ P _ SY M U[

§ M a d MO U[

§ >[OM R M a _

§ BU O cU_ U M _ SY M U[

§ BM YM OTU S 6EH

§ >[OM 5[ d aM R M a _

• Gulati, S., Serrà, J., Ganguli, K. K., & Serra, X. (2014). Landmark detection in Hindustani music melodies. In ICMC-SMC, pp. 1062-1068. 60

[PUO BM B [O __U S

61

Chapter 5Melodic Pattern Processing:

Similarity, Discovery andCharacterization

5.1 IntroductionIn this chapter, we present our methodology for discovering musically relevant melodicpatterns in sizable audio collections of IAM. We address three main computationaltasks involved in this process: melodic similarity, pattern discovery and characteriz-ation of the discovered melodic patterns. We refer to these different tasks jointly asmelodic pattern processing.

“Only by repetition can a series of tones be characterized as somethingdefinite. Only repetition can demarcate a series of tones and its purpose.Repetition thus is the basis of music as an art”

(Schenker et al., 1980)

Repeating patterns are at the core of music. Consequently, analysis of patterns isfundamental in music analysis. In IAM, recurring melodic patterns are the buildingblocks of melodic structures. They provide a base for improvisation and composition,and thus, are crucial to the analysis and description of ragas, compositions, and artistsin this music tradition. A detailed account of the importance of melodic patterns inIAM is provided in Section 2.3.2.

To recapitulate, from the literature review presented in Section 2.4.2 and Section 2.5.2we see that the approaches for pattern processing in music can be broadly put intotwo categories (Figure 5.1). One of the types of approaches perform pattern detection(or matching) and follow a supervised methodology. In these approaches the systemknows a priori the pattern to be extracted. Typically such a system is fed with exem-plar patterns or queries and is expected to extract all of their occurrences in a piece of

121

[PUO BM B [O __U S

62

[PUO BM B [O __U S

63

[PUO BM B [O __U S

64

7dU_ U S 3 [MOT _ R[ 3

32B

AC

KG

RO

UN

D

Method Task MelodyRepresentation Segmentation Similarity

Measure Speed-up #Ragas #Rec #Patt #Occ

Ishwar et al. (2012) Distinction Continuous GTannotations HMM - 5c NA 6 431

Ross & Rao (2012) Detection Continuous Pa nyas DTW Segmentation 1h 2 2 107

Ross et al. (2012) Detection Continuous,SAX-12,1200 Sama location Euc., DTW Segmentation 3h 4 3 107

Ishwar et al. (2013) Detection Stationary point,Continuous Brute-force RLCS Two stage

method 2c 47 4 173

Rao et al. (2013) Distinction Continuous GTannotations DTW - 1h 8 3 268

Dutta & Murthy (2014a) Discovery Stationary point,Continuous Brute-force RLCS - 5c 59 - -

Dutta & Murthy (2014b) Detection Stationary point,Continuous NA Modified-

RLCSTwo stage

method 1c 16 NA 59

Rao et al. (2014)Distinction Continuous GT

annotations DTW - 1h 8 3 268

Distinction Continuous GTannotations HMM - 5c NA 10 652

Ganguli et al. (2015) Detection BSS,Transcription - Smith-

Waterman Discretization 34h 50 NA 1075

h Hindustani music collection c Carnatic music collection

ABBREVIATIONS: #Rec=Number of recordings; #Patt=Number of unique patterns; #Occ=Total number of annotated occurrences of thepatterns; GT=Ground truth; NA=Not available; “-”= Not applicable; Euc.=Euclidean distance.

Table 2.2: Summary of the methods proposed in the literature for melodic pattern processing in IAM. Note that all of these studies were publishedduring the course of our work.<YgYWg]ba #/ <]fg]aWg]ba #. <]fWbiYel #+

, +, , +0

65

[PUO BM B [O __U S

Unsupervised Approach

Supervised Approach


Supervised Approach66

[PUO BM B [O __U S


Supervised Approach



[PUO BM B [O __U S

§ 6M M_ _Uf

§ [c PS NUM_

§ :aYM [ _ UYU M U[ _



[PUO BM B [O __U S

DY_bX]W] ]_Ue]gl

GUggYea<]fWbiYel

GUggYeaUeUWgYe]mUg]ba

hcYei]fYX LafhcYei]fYX LafhcYei]fYX

69

[PUO BM B [O __U S

DY_bX]W] ]_Ue]gl

GUggYea<]fWbiYel



70

[PUO DUYU M U e

71

[PUO DUYU M U e

Time (s)

Freq

uenc

y (C

ent)

1 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0

200

400

600

800

1000

1200

1400 U__Ya[Y

72

[PUO DUYU M U e

U__Ya[Y

73

+ , - .

[PUO DUYU M U e

Predominant Pitch Estimation

Pitch Normalization

Uniform Time-scaling

Distance Computation

Melodic Similarity

Audio1 Audio2

74

[PUO DUYU M U e

U c_]a[ eUgY8

PYf ) Eb& N Ug ]aX8

PYf ) Eb& b hW 8

=hW_]XYUa be<laU ]W g] Y Uec]a[ #<KN &GUeU YgYef8


Pitch Normalization



Melodic Similarity

Audio1 Audio2

75

[PUO DUYU M U e

n+ & 01& / & . & --p m

XXII CONTENTS

Symbol Description

r Audio recordingR Corpus of audio recordingsS Svara frequency in CentsS time delayed melodic surface (TDMS)S Power compressed TDMSS Smoothened TDMSS Normalized TDMSs One element of TDMST Tonic pitch of an audio recording in HzT Time stampV Vocabulary of melodic patternsw Weight of an edge in a network of melodic patternsW Length of a melodic pattern in secondsW Length of a melodic pattern in samplesZ Normalization type used in melody representationD Distance between melodic patternsD Melodic similarity thresholde Allowed pitch deviation used in nyas segmentationy Temporal threshold used in nyas segmentationa Power compression factor in TDMS computationn Binary flatness measure used in nyas segmentationW Uniform time-scaling factorF Svara duration truncation thresholdr Max pitch deviation used in nyas segmentationsg Standard deviation of the Gaussian kernel used in the TDMS

computationQ Heaviside step functionu Flatness measureu Flatness threshold° Complexity weighting in melodic similarity computationw Sampling rate of the pitch sequence in HzV Maximum error parameter in piece-wise linear segmentation

methodz Complexity estimate of a melodic pattern


Pitch Normalization



Melodic Similarity

Audio1 Audio2

76

§ Eb abe U_]mUg]ba§ Kba]W

§ Q]aZ§ Q,.§ Q+,

§ DYUa§ DYX]Ua§ Q abe§ DYX]Ua UVfb_hgY XYi]Ug]ba

[PUO DUYU M U e


Pitch Normalization



Melodic Similarity

Audio1 Audio2

77

[PUO DUYU M U e

§ vbZZ

§ vba n (3& (3/& +( & +( /& +(+p


Pitch Normalization



Melodic Similarity

Audio1 Audio2

78

[PUO DUYU M U e

§ =hW_]XYUa

§ <laU ]W g] Y Uec]a[

§ ?_bVU_ WbafgeU]ag #-

§ CbWU_ WbafgeU]ag #


Pitch Normalization



Melodic Similarity

Audio1 Audio2

79

[PUO DUYU M U e


Pitch Normalization



Melodic Similarity

Audio1 Audio2

k k k

80

7bM aM U[ 1 D a

E GUggYeaf

+ E IUaXbfY[ Yagf

Kbc +aYUeYfgaY][ Vbef

DYUa9iYeU[YGeYW]f]ba

81

6M M_

49

IYWbeX]a[f <heUg]ba Ix[UfGUggYea FWW(

49

IYWbeX]a[f <heUg]ba Ix[Uf

GUggYea FWW(

UeaUg]W hf]W XUgUfYg

]aXhfgUa] hf]W XUgUfYg

82

C _a _5.2 MELODIC SIMILARITY: APPROACHES AND EVALUATION 131

Dataset MAP Srate Norm TScale Dist

MSDcmdiitm

0.413 w67 Zmean Woff DDTW_L1_G90

0.412 w67 Zmean Won DDTW_L1_G10


MSDhmdiitb

0.552 w100 Ztonic Woff DDTW_L0_G90



Table 5.1: MAP score and details of the parameter settings for the three best performingvariants for MSDcmd

iitm and MSDhmdiitb dataset. Srate: sampling rate of the melody representation,

Norm: normalization technique, TScale: uniform time-scaling and Dist: distance measure.

we use mean average precision (MAP), a typical evaluation measure in informationretrieval (Manning et al., 2008). Mean average precision (MAP) is computed bytaking the mean of the average precision values of each query in the dataset. This way,we have a single number to evaluate and compare the performance of a variant. Inorder to assess if the difference in the performance of any two variants is statisticallysignificant, we use the Wilcoxon signed rank-test (Wilcoxon, 1945) with p < 0.01. Tocompensate for multiple comparisons, we apply the Holm-Bonferroni method (Holm,1979). Thus, considering that we compare 560 different variants, we effectively use amuch more stringent criterion than p < 0.01 for measuring statistical significance.

5.2.3 Results and Discussion

In this section, we present the results of our evaluation of the 560 variants for eachof the datasets, MSDcmd

iitm and MSDhmdiitb . We order the variants in the decreasing order

of their MAP scores and present only the three best performing variants in Table 5.1.For complete results see the companion page for this study (Appendix A).

In Table 5.1 (top half), we show the MAP scores and details of the parameter settingsfor MSDcmd

iitm dataset. We see that the best performing variant has a MAP score of0.413. Having a look at the whole list, we observe that for MSDcmd

iitm , in the rankedlist of the 560 variants, top performing variants consistently use higher sampling rates(either w100, or w67). This can be attributed to the rapid oscillatory melodic move-ments present in Carnatic music, whose preservation requires a higher sampling rate.The top variants invariably use the zero mean normalization (Zmean), suggesting thatthere are several repeated instances of the melodic patterns that are pitch transposedwithin the same recording. In addition, top variants also use DTW-based distancewith local constraint (either DDTW_L1_G10 or DDTW_L1_G90), indicating that melodicfragments are prone to large pathological warping that can significantly degrade theperformance. We do not observe any consistency in the usage of uniform time-scaling.

CMD HMD

134 MELODIC PATTERN PROCESSING


MSDcmdiitm


0.277 w67 ZtonicQ12 Won DDTW_L1_G10


MSDhmdiitb




Table 5.2: MAP score and details of the parameter settings for the three best performing vari-ants for MSDcmd

iitm and MSDhmdiitb dataset. These results are corresponding to the experimental

setup where the target patterns’ length is not based on the ground-truth annotations, but istaken to be the same as the query pattern length. Srate: sampling rate of the melody repres-entation, Norm: normalization technique, TScale: uniform time-scaling and Dist: distancemeasure.

are segmented using the ground-truth annotations. We now present the results corres-ponding to the other experimental setup described in Section 5.2.2, where the targetpattern lengths are considered to be the same as that of the query pattern. We evaluateall 560 variants of the method as done above on both the datasets under this experi-mental setup. In Table 5.2 we show the MAP scores of the top performing variantsfor both the datasets MSDcmd

iitm and MSDhmdiitb . We find that the MAP score for the best

performing variant decreases from 0.41 to 0.28 and 0.55 to 0.26 for MSDcmdiitm and

MSDhmdiitb , respectively. This indicates that the melodic similarity computation task

becomes much more challenging in the absence of an accurate melodic segmentationmethod. For this experimental setup the trend in the sampling rate for Carnatic andHindustani music remain the same as we saw in Table 5.1, which is that a highersampling rate is desired for representing melodic patterns in Carnatic music. In termsof the normalization we see that surprisingly ZtonicQ12 is used by the top performingvariants for both the datasets. Note that in other variants whose performance is notstatistically significantly different from the ones reported in Table 5.2, Zmean and Ztonicnormalization is also used for MSDcmd

iitm and MSDcmdiitm dataset respectively. An interest-

ing observation is that in this experimental setup uniform time-scaling is consistentlyused by all the top performing variants. This indicates that such a time-scaling op-eration is immensely advantageous in the retrieval scenarios where the length of thetarget melodic patterns is taken to be the same as that of a query pattern (i.e. patternsegmentation is not known). We also see that DDTW_L1_G10 and DDTW_L1_G90 are al-ways used for MSDcmd

iitm and MSDhmdiitb datasets, respectively, indicting that applying a

local constraint in DTW is critical for this experimental setup.

83

C _a _5.2 MELODIC SIMILARITY: APPROACHES AND EVALUATION 131


MSDcmdiitm




MSDhmdiitb




Table 5.1: MAP score and details of the parameter settings for the three best performingvariants for MSDcmd

iitm and MSDhmdiitb dataset. Srate: sampling rate of the melody representation,

Norm: normalization technique, TScale: uniform time-scaling and Dist: distance measure.

we use mean average precision (MAP), a typical evaluation measure in informationretrieval (Manning et al., 2008). Mean average precision (MAP) is computed bytaking the mean of the average precision values of each query in the dataset. This way,we have a single number to evaluate and compare the performance of a variant. Inorder to assess if the difference in the performance of any two variants is statisticallysignificant, we use the Wilcoxon signed rank-test (Wilcoxon, 1945) with p < 0.01. Tocompensate for multiple comparisons, we apply the Holm-Bonferroni method (Holm,1979). Thus, considering that we compare 560 different variants, we effectively use amuch more stringent criterion than p < 0.01 for measuring statistical significance.


In this section, we present the results of our evaluation of the 560 variants for eachof the datasets, MSDcmd

iitm and MSDhmdiitb . We order the variants in the decreasing order

of their MAP scores and present only the three best performing variants in Table 5.1.For complete results see the companion page for this study (Appendix A).

In Table 5.1 (top half), we show the MAP scores and details of the parameter settingsfor MSDcmd

iitm dataset. We see that the best performing variant has a MAP score of0.413. Having a look at the whole list, we observe that for MSDcmd

iitm , in the rankedlist of the 560 variants, top performing variants consistently use higher sampling rates(either w100, or w67). This can be attributed to the rapid oscillatory melodic move-ments present in Carnatic music, whose preservation requires a higher sampling rate.The top variants invariably use the zero mean normalization (Zmean), suggesting thatthere are several repeated instances of the melodic patterns that are pitch transposedwithin the same recording. In addition, top variants also use DTW-based distancewith local constraint (either DDTW_L1_G10 or DDTW_L1_G90), indicating that melodicfragments are prone to large pathological warping that can significantly degrade theperformance. We do not observe any consistency in the usage of uniform time-scaling.

CMD HMD



MSDcmdiitm




MSDhmdiitb




Table 5.2: MAP score and details of the parameter settings for the three best performing vari-ants for MSDcmd

iitm and MSDhmdiitb dataset. These results are corresponding to the experimental

setup where the target patterns’ length is not based on the ground-truth annotations, but istaken to be the same as the query pattern length. Srate: sampling rate of the melody repres-entation, Norm: normalization technique, TScale: uniform time-scaling and Dist: distancemeasure.

are segmented using the ground-truth annotations. We now present the results corres-ponding to the other experimental setup described in Section 5.2.2, where the targetpattern lengths are considered to be the same as that of the query pattern. We evaluateall 560 variants of the method as done above on both the datasets under this experi-mental setup. In Table 5.2 we show the MAP scores of the top performing variantsfor both the datasets MSDcmd

iitm and MSDhmdiitb . We find that the MAP score for the best

performing variant decreases from 0.41 to 0.28 and 0.55 to 0.26 for MSDcmdiitm and

MSDhmdiitb , respectively. This indicates that the melodic similarity computation task

becomes much more challenging in the absence of an accurate melodic segmentationmethod. For this experimental setup the trend in the sampling rate for Carnatic andHindustani music remain the same as we saw in Table 5.1, which is that a highersampling rate is desired for representing melodic patterns in Carnatic music. In termsof the normalization we see that surprisingly ZtonicQ12 is used by the top performingvariants for both the datasets. Note that in other variants whose performance is notstatistically significantly different from the ones reported in Table 5.2, Zmean and Ztonicnormalization is also used for MSDcmd

iitm and MSDcmdiitm dataset respectively. An interest-

ing observation is that in this experimental setup uniform time-scaling is consistentlyused by all the top performing variants. This indicates that such a time-scaling op-eration is immensely advantageous in the retrieval scenarios where the length of thetarget melodic patterns is taken to be the same as that of a query pattern (i.e. patternsegmentation is not known). We also see that DDTW_L1_G10 and DDTW_L1_G90 are al-ways used for MSDcmd

iitm and MSDhmdiitb datasets, respectively, indicting that applying a

local constraint in DTW is critical for this experimental setup.

GUeU YgYef

GUggYea WUgY[be]Yf

GUggYea fY[ YagUg]ba

84

C _a _1 DaYYM e

U c_]a[ eUgY

Ebe U_]mUg]ba

K] Y fWU_]a[

<]fgUaWY

CbWU_ WbafgeU]ag

?_bVU_ WbafgeU]ag

][

DYUa

Eb WbafYafhf

<KN

N]g

AaiUe]Uag

AaiUe]Uag

Kba]W

Eb WbafYafhf

<KN

N]g bhg

N]g bhg

UeaUg]W# (. D9G

]aXhfgUa]# (// D9G

GUeU YgYe

85

C _a _1 DaYYM e

Bab a La ab a

G eUfY Y[ YagUg]ba

UeaUg]W

]aXhfgUa]

• Gulati, S., Serrà, J., & Serra, X. (2015). An evaluation of methodologies for melodic similarity in audio recordings of Indian art music. In ICASSP, pp. 678–682. 86

Y [bU S [PUO DUYU M U e


Pitch Normalization



Melodic Similarity

Audio1 Audio2

8h_gheY fcYW]Z]W]g]Yf

87


�

��

]aXhfgUa] hf]W

88


• G. E. Batista, X. Wang, and E. J Keogh. A complexity-invariant distance measure for time series, in SDM, volume 11, pp. 699–710, 2011

b c_Yk]gl

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6Time (s)

500

0

500

1000

1500Fr

eque

ncy

(Cen

t) 1

2

3

��

U Y c eUfY

UeaUg]W hf]W

89


based distance (DDTW) in order to compute the final similarity score D f = °DDTW.We compute ° as:

° =max(zi,z j)

min(zi,z j)(5.1)

zi = 2

sN�1

Âi=1

(pi � pi+1)2 (5.2)

where, zi is the complexity estimate of a melodic pattern of length N samples and piis the pitch value of the ith sample. We explore two variants of this complexity estim-ate. One of these variants is already proposed in Batista et al. (2011) and is describedin Eq. 5.2. We denote this method variant by MCW1. We propose another variant thatutilizes melodic characteristics of Carnatic music. This variant takes the number ofsaddle points in the melodic phrase as the complexity estimate (Ishwar et al., 2013).This method variant is denoted by MCW2. As saddle points we consider all the localminimas and the local maximas in the pitch contour which have at least one minimato maxima distance of half a semitone. Since such melodic characteristics are pre-dominantly present in Carnatic music, the complexity weighting is not applicable forcomputing melodic similarity in Hindustani music.

5.3.2 Evaluation

5.3.2.1 Dataset and Annotations

For evaluation we use the same music collection as used in Section 5.2, which isdescribed in Section 3.3.3. This collection enables a better comparison of the resultswith other studies since it has been used in several other studies for a similar task (Raoet al., 2014; Ross et al., 2012). However, we found a number of issues in the an-notations of the melodic phrases, which we corrected and also extended the datasetby adding 25% more number of melodic phrase annotations as explained in Sec-tion 3.3.3. We denote this new dataset by MSDCM and the comprising Carnatic andHindustani sub-datasets by MSDcmd

CM and MSDhmdCM , respectively. Similar to the way

we perform evaluations in Section 5.2, we evaluate our approach separately on bothCarnatic and Hindustani datasets. In Table 3.6 we summarize the relevant details ofthe dataset in terms of the number of artists, ragas, audio recordings and total dura-tion. In Table 3.8 we summarize the details of the annotated phrases in terms of theirnumber of instances and basic statistics of the length of the phrases.

5.3.2.2 Setup, Measures and Statistical Significance

The experimental setup and evaluation measures used in this study are exactly thesame as used for the comparative evaluation described in Section 5.2. We consider


based distance (DDTW) in order to compute the final similarity score D f = °DDTW.We compute ° as:

° =max(zi,z j)

min(zi,z j)(5.1)

zi = 2

sN�1

Âi=1

(pi � pi+1)2 (5.2)

where, zi is the complexity estimate of a melodic pattern of length N samples and piis the pitch value of the ith sample. We explore two variants of this complexity estim-ate. One of these variants is already proposed in Batista et al. (2011) and is describedin Eq. 5.2. We denote this method variant by MCW1. We propose another variant thatutilizes melodic characteristics of Carnatic music. This variant takes the number ofsaddle points in the melodic phrase as the complexity estimate (Ishwar et al., 2013).This method variant is denoted by MCW2. As saddle points we consider all the localminimas and the local maximas in the pitch contour which have at least one minimato maxima distance of half a semitone. Since such melodic characteristics are pre-dominantly present in Carnatic music, the complexity weighting is not applicable forcomputing melodic similarity in Hindustani music.

5.3.2 Evaluation

5.3.2.1 Dataset and Annotations

For evaluation we use the same music collection as used in Section 5.2, which isdescribed in Section 3.3.3. This collection enables a better comparison of the resultswith other studies since it has been used in several other studies for a similar task (Raoet al., 2014; Ross et al., 2012). However, we found a number of issues in the an-notations of the melodic phrases, which we corrected and also extended the datasetby adding 25% more number of melodic phrase annotations as explained in Sec-tion 3.3.3. We denote this new dataset by MSDCM and the comprising Carnatic andHindustani sub-datasets by MSDcmd

CM and MSDhmdCM , respectively. Similar to the way

we perform evaluations in Section 5.2, we evaluate our approach separately on bothCarnatic and Hindustani datasets. In Table 3.6 we summarize the relevant details ofthe dataset in terms of the number of artists, ragas, audio recordings and total dura-tion. In Table 3.8 we summarize the details of the annotated phrases in terms of theirnumber of instances and basic statistics of the length of the phrases.

5.3.2.2 Setup, Measures and Statistical Significance

The experimental setup and evaluation measures used in this study are exactly thesame as used for the comparative evaluation described in Section 5.2. We consider


Predominant Pitch Est. + Post Processing

Pitch Normalization

Svara Duration Truncation

Distance Computation + Complexity Weight

Melodic Similarity

Audio1 Audio2

Partial Transcription Elxf fY[ YagUg]ba

n (+& (-& (/& (1/& +( & +(/& ,( p

90

6M M_

49

IYWbeX]a[f <heUg]ba Ix[UfGUggYea FWW(

49

IYWbeX]a[f <heUg]ba Ix[Uf

GUggYea FWW(



91

C _a _144 MELODIC PATTERN PROCESSING

MSDhmdCM

Norm MB MDT MCW1 MCW2

Ztonic 0.45 (0.25) 0.52 (0.24) - -

Zmean 0.25 (0.20) 0.31 (0.23) - -

Ztetra 0.40 (0.23) 0.47 (0.23) - -

MSDcmdCM


Ztonic 0.39 (0.29) 0.42 (0.29) 0.41 (0.28) 0.41 (0.29)

Zmean 0.39 (0.26) 0.45 (0.28) 0.43 (0.27) 0.45 (0.27)

Ztetra 0.45 (0.26) 0.50 (0.27) 0.49 (0.28) 0.51 (0.27)

Table 5.3: MAP scores for the two datasets MSDhmdCM and MSDcmd

CM for the four method vari-ants MB, MDT, MCW1 and MCW2 and for different normalization techniques. Standard devi-ation of average precision is reported within round brackets.

We first analyse the results for the MSDhmdCM dataset. From Table 5.3 (upper half),

we see that the proposed method variant that applies a duration truncation performsbetter than the baseline method for all the normalization techniques. Moreover, thisdifference is found to be statistically significant in each case. The results for theMSDhmd

CM in this table correspond to F =500 ms, for which we obtain the highest ac-curacy compared to the other F values as shown in Figure 5.10. Furthermore, we seethat Ztonic results in the best accuracy for the MSDhmd

CM for all the method variants andthe difference is found to be statistically significant in each case. From Table 5.3, wenotice a high standard deviation of the average precision values. This is because someoccurrences of melodic phrases possess a large amount of melodic variation (actingas outliers), and therefore, the average precision value of the retrieved results usingthem as a query is much smaller compared to the other occurrences. In Figure 5.11,we show a boxplot of average precision values for each phrase category and for bothMB and MDT to get a better understanding of the results. We observe that with anexception of the phrase category H2, MDT consistently performs better than MB forall the other phrase categories. A close examination of this exception reveals that theerror often is in the segmentation of the steady svara regions of the melodic phrasescorresponding to H2. This can be attributed to a specific subtle melodic movement inH2 that is confused by the segmentation method as a melodic ornament instead of asvara transition, leading to a segmentation error.

We now analyse the results for the MSDcmdCM dataset. From Table 5.3 (lower half), we

see that using the method variants MDT, MCW1 and MCW2 we obtain reasonably higher

MB MDT

H1 H2 H3 H4 H5MB MB MB MBMDT MDT MDT MDT

MB MDT MCW2 MB MB MB MBMDT MDT MDT

C1 C2 C3 C4 C5MDTMCW2 MCW2 MCW2 MCW2

0.1 0.3 0.5 0.75 1.0 1.5 2.0

�

0.44

0.46

0.48

0.50

0.52MAP

HMDCMD

(s)

92

C _a _144 MELODIC PATTERN PROCESSING

MSDhmdCM


Ztonic 0.45 (0.25) 0.52 (0.24) - -

Zmean 0.25 (0.20) 0.31 (0.23) - -

Ztetra 0.40 (0.23) 0.47 (0.23) - -

MSDcmdCM


Ztonic 0.39 (0.29) 0.42 (0.29) 0.41 (0.28) 0.41 (0.29)

Zmean 0.39 (0.26) 0.45 (0.28) 0.43 (0.27) 0.45 (0.27)

Ztetra 0.45 (0.26) 0.50 (0.27) 0.49 (0.28) 0.51 (0.27)

Table 5.3: MAP scores for the two datasets MSDhmdCM and MSDcmd

CM for the four method vari-ants MB, MDT, MCW1 and MCW2 and for different normalization techniques. Standard devi-ation of average precision is reported within round brackets.

We first analyse the results for the MSDhmdCM dataset. From Table 5.3 (upper half),

we see that the proposed method variant that applies a duration truncation performsbetter than the baseline method for all the normalization techniques. Moreover, thisdifference is found to be statistically significant in each case. The results for theMSDhmd

CM in this table correspond to F =500 ms, for which we obtain the highest ac-curacy compared to the other F values as shown in Figure 5.10. Furthermore, we seethat Ztonic results in the best accuracy for the MSDhmd

CM for all the method variants andthe difference is found to be statistically significant in each case. From Table 5.3, wenotice a high standard deviation of the average precision values. This is because someoccurrences of melodic phrases possess a large amount of melodic variation (actingas outliers), and therefore, the average precision value of the retrieved results usingthem as a query is much smaller compared to the other occurrences. In Figure 5.11,we show a boxplot of average precision values for each phrase category and for bothMB and MDT to get a better understanding of the results. We observe that with anexception of the phrase category H2, MDT consistently performs better than MB forall the other phrase categories. A close examination of this exception reveals that theerror often is in the segmentation of the steady svara regions of the melodic phrasescorresponding to H2. This can be attributed to a specific subtle melodic movement inH2 that is confused by the segmentation method as a melodic ornament instead of asvara transition, leading to a segmentation error.

We now analyse the results for the MSDcmdCM dataset. From Table 5.3 (lower half), we

see that using the method variants MDT, MCW1 and MCW2 we obtain reasonably higher

MB MDT

H1 H2 H3 H4 H5MB MB MB MBMDT MDT MDT MDT

MB MDT MCW2 MB MB MB MBMDT MDT MDT

C1 C2 C3 C4 C5MDTMCW2 MCW2 MCW2 MCW2

0.1 0.3 0.5 0.75 1.0 1.5 2.0

�

0.44

0.46

0.48

0.50

0.52MAP

HMDCMD

(s)

<heUg]ba gehaWUg]ba

Fcg] U_ XheUg]ba8

b c_Yk]gl NY][ g]a[

93

C _a _1 DaYYM e

<heUg]ba gehaWUg]ba

b c_Yk]gl NY][ g]a[

UeaUg]W

]aXhfgUa]

UeaUg]W

• Gulati, S., Serrà, J., & Serra, X. (2015). Improving melodic similarity in Indian art music using culture-specific melodic characteristics. In ISMIR, pp. 680–686. 94

[PUO BM B [O __U S

DY_bX]W] ]_Ue]gl

GUggYea<]fWbiYel



95

BM 6U_O[b e

AageU eYWbeX]a[GUggYea <]fWbiYel

AagYe eYWbeX]a[GUggYea YUeW

IUa IYZ]aY Yag

I+

I,

I-

96

M O[ PU S BM 6U_O[b e

§ 6e MYUO UY cM U S 6EH

?_bVU_ WbafgeU]ag#+ _Ya[g

Eb CbWU_ WbafgeU]ag

gYc4 n# &+ & #+&+ & #+& p

97

CM W C RU Y

CbWU_ WbafgeU]ag

gYc4 n#,&+ & #+&+ & #+&, p

CbWU_ Wbfg ZhaWg]ba

#M+& M,& M-& M.98

5[Y a M U[ M 5[Y dU e

F#a, à U c_Yf

F#a, à UaX]XUgYf

-/ k #0 k 0 k / k (/ 6 -+ D]__]ba

<]fgUaWY Cb Ye :bhaXf99

>[c 4[a P_ R[ 6EH

<KehY 7 <C:

<C: 7 <CUfg<CUfg

Kbc E dhYhY

100

>[c 4[a P_ R[ 6EH

§ 5M_OMP P [c N[a P_ JRakthanmanon et al. (2013)K

§ U _ >M_ N[a P JKim et al. (2001)K

§ >4L [ST JKeogh et al. (2001)K

§ 7M e MNM P[ U S

• Kim, S. W., Park, S., & Chu, W. W. (2001). An index-based approach for similarity search supporting time warping in large sequence databases. In 17th International Conference on Data Engineering, pp. 607–614.

• Keogh, E. & Ratanamahatana, C. A. (2004). Exact indexing of dynamic time warping. Knowledge and Information Systems, 7(3), 358–386.• Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., & Keogh, E. (2013). Addressing big data time

series: mining trillions of time series subsequences under dynamic time warping. ACM Transactions on Knowledge Discovery from Data (TKDD), 7(3), 10:1–10:31 101

6M M_

§ D P M _1 /0 (((

§ D M OT M _1 j)- ((( (((

49

IYWbeX]a[f <heUg]ba

UeaUg]W hf]W

102

C _a _

103

C _a _1 5[Y a M U[ _

§ M O[ PU S1 )&,) E

§ O[ PU S1 ) &, E

>]efg CUfg

C:TBYb[ T=H

C:TBYb[ T=

/,

,-

+

./

/+

-

Cb Ye VbhaX AageU eYWbeX]a[#

AagYe eYWbeX]a[#

+ K 6 + +,

10 33 104

C _a _


Seed Category V1 V2 V3 V4

S1 0.92 0.92 0.91 0.89

S2 0.68 0.73 0.73 0.66

S3 0.35 0.34 0.35 0.35

Table 5.5: MAP scores for four variants of rank refinement method (Vi) for each seed category(S1, S2 and S3).

Next, we evaluate the performance of inter-recording pattern detection task and as-sess the effect of the four DTW cost variants explained in Section 5.4.3 (denoted byV1 . . .V4). To investigate the dependence of the performance on the category of theseed pair, we perform the evaluation within each seed category. In Table 5.5 we showthe MAP scores obtained for the different variants of the rank refinement method(V1,V2,V3 and V4) for each seed category (S1, S2 and S3). In addition, we also presenta box plot of corresponding average precision values in Figure 5.20. In general, weobserve that every method performs well for category S1, with a MAP score around0.9 and no statistically significant difference between each other. For category S2,V2 and V3 perform better than the rest and the difference is found to be statisticallysignificant. The performance is poor for the third category S3 for every variant. Thedifference in performance between any two methods across seed categories is stat-istically significant. We observe that MAP scores across different seed categoriescorrelate strongly with the fraction of melodically similar seed pairs in that category(discussed above). This means that the seed pattern pairs that have high distance Dbetween them obtain low average precision when used as query patterns and vice-versa. This might be because across repetitions of such patterns there could be a highdegree of melodic variation, to model which the current similarity measure appearsinadequate. In addition, closely repeating seed pattern pairs (i.e. with low distanceD between them) might have more number of repetitions with low degree of melodicvariations, for which the current similarity measure performs well.

Finally, we analyse the distance distributions of melodically similar and dissimilarsearch patterns. For this we use the best performing rank-refinement variant V2. Toexamine the separation in the distance distribution, we compute the ROC curve asshown in 5.19 (dashed red line). We observe that the separability between melodicallysimilar and dissimilar subsequences in this case is poorer than the one obtained for theseed pairs (solid blue line). This suggests that it is much harder to differentiate melod-ically similar from dissimilar patterns when the search is performed across recordings.This can be attributed to the total number of melodically dissimilar subsequences(irrelevant documents) for every query subsequence, which is orders of magnitudehigher in this task compared to the task of pattern discovery within a recording. Inaddition, one of the reasons can also be that the melodic phrases of two allied ragas(Section 2.3.2) are differentiated based on subtle melodic nuances (Viswanathan &

S1 S2 S3

�

S1 S2 S3

105

C _a _


Seed Category V1 V2 V3 V4

S1 0.92 0.92 0.91 0.89

S2 0.68 0.73 0.73 0.66

S3 0.35 0.34 0.35 0.35

Table 5.5: MAP scores for four variants of rank refinement method (Vi) for each seed category(S1, S2 and S3).

Next, we evaluate the performance of inter-recording pattern detection task and as-sess the effect of the four DTW cost variants explained in Section 5.4.3 (denoted byV1 . . .V4). To investigate the dependence of the performance on the category of theseed pair, we perform the evaluation within each seed category. In Table 5.5 we showthe MAP scores obtained for the different variants of the rank refinement method(V1,V2,V3 and V4) for each seed category (S1, S2 and S3). In addition, we also presenta box plot of corresponding average precision values in Figure 5.20. In general, weobserve that every method performs well for category S1, with a MAP score around0.9 and no statistically significant difference between each other. For category S2,V2 and V3 perform better than the rest and the difference is found to be statisticallysignificant. The performance is poor for the third category S3 for every variant. Thedifference in performance between any two methods across seed categories is stat-istically significant. We observe that MAP scores across different seed categoriescorrelate strongly with the fraction of melodically similar seed pairs in that category(discussed above). This means that the seed pattern pairs that have high distance Dbetween them obtain low average precision when used as query patterns and vice-versa. This might be because across repetitions of such patterns there could be a highdegree of melodic variation, to model which the current similarity measure appearsinadequate. In addition, closely repeating seed pattern pairs (i.e. with low distanceD between them) might have more number of repetitions with low degree of melodicvariations, for which the current similarity measure performs well.

Finally, we analyse the distance distributions of melodically similar and dissimilarsearch patterns. For this we use the best performing rank-refinement variant V2. Toexamine the separation in the distance distribution, we compute the ROC curve asshown in 5.19 (dashed red line). We observe that the separability between melodicallysimilar and dissimilar subsequences in this case is poorer than the one obtained for theseed pairs (solid blue line). This suggests that it is much harder to differentiate melod-ically similar from dissimilar patterns when the search is performed across recordings.This can be attributed to the total number of melodically dissimilar subsequences(irrelevant documents) for every query subsequence, which is orders of magnitudehigher in this task compared to the task of pattern discovery within a recording. Inaddition, one of the reasons can also be that the melodic phrases of two allied ragas(Section 2.3.2) are differentiated based on subtle melodic nuances (Viswanathan &

S1 S2 S3

�

S1 S2 S3

2, cUggYeaf

CbWU_ Wbfg ZhaWg]ba

<]fgUaWY fYcUeUV]_]gl

AageU if AagYe eYWbeX]a[

106

C _a _

AageU eYWbeX]a[

AagYe eYWbeX]a[

+ >GIà 2 KGI

+ >GIà / KGI

• Gulati, S., Serrà, J., Ishwar, V., & Serra, X. (2014). Mining melodic patterns in large audio collections of Indian art music. In SITIS-MIRA, pp. 264–271.

CbWU_ Wbfg ZhaWg]baà ]gl V_bW

107

[PUO BM B [O __U S

6U__UYU M M _ bM M _

108

[PUO BM B [O __U S

DY_bX]W] ]_Ue]gl

GUggYea<]fWbiYel



109

[PUO BM 5TM MO UfM U[

9MYMWM_ h 3 M Wl _

5[Y [_U U[ _ OURUO M _

ClSM Y[ UR_

110


Inter-recording Pattern Detection

Intra-recording Pattern DiscoveryData Processing

Pattern Network Generation

Network Filtering

Community Detection and Characterization

Rāga motifs

Carnatic music collection

Melodic Pattern Discovery

Pattern Characterization

111



Intra-recording Pattern DiscoveryData Processing


Network Filtering


Rāga motifs

Carnatic music collection


Pattern Characterization

112


• M. EJ Newman, “The structure and function of complex networks,” Society for Industrial and Applied Mathematics (SIAM) review,vol. 45, no. 2, pp. 167–256, 2003.

LaX]eYWgYX


Network Filtering


113


• S. Maslov and K. Sneppen, “Specificity and stability in topology of protein networks,” Science, vol. 296, no. 5569, pp. 910– 913, 2002.


Network Filtering


114


• V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, pp. P10008, 2008.

• S Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, no. 3, pp. 75–174, 2010.


Network Filtering


115



Network Filtering


b ha]gl KlcY EbXYf IYWbeX]a[f Ix[Uf

?U U U ][ ][ ][

b cbf]g]ba c eUfYf DYX]h CYff CYff

Ix[U c eUfYf DYX]h DYX]h CYff

116



Network Filtering


Ix[U X]fge]Vhg]ba

5.5 CHARACTERIZATION OF MELODIC PATTERNS 169

Figure 5.24: Graphical representation of the melodic pattern network after filtering bythreshold D. The detected communities in the network are indicated by different colors. Fewexamples of these communities are shown on the right.

rank the communities we empirically devise a goodness measure G, which denotesthe likelihood that a community Cq represents a raga motif. We propose to use

G = NL4c, (5.6)

where L is an estimate of the likelihood of raga a0q in Cq,

L =Aq,1

N, (5.7)

and c indicates how uniformly the nodes of the community are distributed over audiorecordings,

c =ÂLb

l=1 l ·Bq,l

N. (5.8)

Higher c implies a more uniform distribution. Since a community that represents araga motif is expected to contain nodes from a single raga (high value of L) and thenodes belong to many different recordings (high value of c), the goodness measure Gis high for such a community. In general we prefer large communities, but, to avoid

N = 5

bhag

Ix[U _] Y_] bbX

-

+ +

35

=

117



Network Filtering


IYWbeX]a[ X]fge]Vhg]ba

N = 5

bhag

1x2 + 2x2 + 3x15

Yageb]X

, ,

+




G = NL4c, (5.6)


L =Aq,1

N, (5.7)


c =ÂLb

l=1 l ·Bq,l

N. (5.8)


=

118



Network Filtering





G = NL4c, (5.6)


L =Aq,1

N, (5.7)


c =ÂLb

l=1 l ·Bq,l

N. (5.8)


?bbXaYff YUfheY

119

EbXYf

Yageb]X

Ix[U _] Y_] bbX

7bM aM U[

49

IYWbeX]a[f <heUg]ba

Ix[Uf

b cbf]g]baf

+ Ix[U k + b ha]gl 6 + G eUfYf

+ Dhf]W]Uaf

120

C _a _


Raga µr sr Raga µr sr

Hamsadhvani 0.84 0.23 Begad. a 0.88 0.11

Kamavardani 0.78 0.17 Kapi 0.75 0.10

Darbar 0.81 0.23 Bhairavi 0.91 0.15

Kalyan. i 0.90 0.10 Behag 0.84 0.16

Kambhoji 0.87 0.12 Tod. ı 0.92 0.07

Table 5.7: Mean (µr) and standard deviation (sr) of µ√ for each raga. Ragas with µr � 0.85are highlighted.

Figure 5.26: Histogram of µ√ for all of the 100 melodic patterns selected for the listeningtest.

several recordings in the dataset. As described in Section 5.5.1.2, in such a scenario,the goodness measure G can be made more robust to such cases by computing cusing the distribution of nodes Bq over unique compositions rather than over audiorecordings.

The results show that the proposed method successfully discovers raga motifs witha high accuracy. We see that, even for the allied ragas present in the dataset such asKambhoji and Begad. a (Section 5.5.2.1), the method is able to discover distinct char-acteristic raga motifs. As mentioned, allied ragas are challenging because they have asubstantial overlap in the set of svaras that they comprise (see also Table 5.6). Finally,on a more informal side, it is worth mentioning that musicians were impressed when,after the listening test, they came to know that the melodic patterns were discoveredby a machine following an unsupervised approach.

We consider this work as a preliminary study with a lot of scope for improvements.In particular, the approach can be extended to identify patterns belonging to other cat-egories (gamaka or composition-specific patterns), definition of the goodness measureG can be improved as suggested in Section 5.5.3, listening test could be done more

Rāgas

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1µ�

05101520253035

Count

121

C _a _


Raga µr sr Raga µr sr

Hamsadhvani 0.84 0.23 Begad. a 0.88 0.11

Kamavardani 0.78 0.17 Kapi 0.75 0.10

Darbar 0.81 0.23 Bhairavi 0.91 0.15

Kalyan. i 0.90 0.10 Behag 0.84 0.16

Kambhoji 0.87 0.12 Tod. ı 0.92 0.07

Table 5.7: Mean (µr) and standard deviation (sr) of µ√ for each raga. Ragas with µr � 0.85are highlighted.

Figure 5.26: Histogram of µ√ for all of the 100 melodic patterns selected for the listeningtest.

several recordings in the dataset. As described in Section 5.5.1.2, in such a scenario,the goodness measure G can be made more robust to such cases by computing cusing the distribution of nodes Bq over unique compositions rather than over audiorecordings.

The results show that the proposed method successfully discovers raga motifs witha high accuracy. We see that, even for the allied ragas present in the dataset such asKambhoji and Begad. a (Section 5.5.2.1), the method is able to discover distinct char-acteristic raga motifs. As mentioned, allied ragas are challenging because they have asubstantial overlap in the set of svaras that they comprise (see also Table 5.6). Finally,on a more informal side, it is worth mentioning that musicians were impressed when,after the listening test, they came to know that the melodic patterns were discoveredby a machine following an unsupervised approach.

We consider this work as a preliminary study with a lot of scope for improvements.In particular, the approach can be extended to identify patterns belonging to other cat-egories (gamaka or composition-specific patterns), definition of the goodness measureG can be improved as suggested in Section 5.5.3, listening test could be done more

Rāgas

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1µ�

05101520253035

CountGYe c eUfY cYeZbe UaWY

GYe ex[U cYeZbe UaWY

Dhf]W]Uaf’ U[eYY Yag

122

C _a _1 DaYYM e

DYUaIx[U# DYUaG eUfY# IUg]a[f 5 (3,(1/ 5(2/

• Gulati, S., Serrà, J., Ishwar, V., & Serra, X. (2016). Discovering rāga motifs by characterizing communities in networks of melodic patterns. In ICASSP, pp. 286–290. 123

+ , - . / 0 1 2 3 +

PYf

bhag

+ , - -

+,

,+,/

--

3a [YM UO ClSM C O[S U U[

124

Chapter 6Automatic Raga Recognition

6.1 Introduction

In this chapter, we address the task of automatically recognizing ragas in audio re-cordings of IAM. We describe two novel approaches for raga recognition that jointlycapture the tonal and the temporal aspects of melody. The contents of this chapter arelargely based on our published work in Gulati et al. (2016a,b).

Raga is a core musical concept used in compositions, performances, music organiz-ation, and pedagogy of IAM. Even beyond the art music, numerous compositions inIndian folk and film music are also based on ragas (Ganti, 2013). Raga is thereforeone of the most desired melodic descriptions of a recorded performance of IAM, andan important criterion used by listeners to browse its audio music collections. Despiteits significance, there exists a large volume of audio content whose raga is incorrectlylabeled or, simply, unlabeled. A computational approach to raga recognition will al-low us to automatically annotate large collections of audio music recordings. It willenable raga-based music retrieval in large audio archives, semantically-meaningfulmusic discovery and musicologically-informed navigation. Furthermore, a deeperunderstanding of the raga framework from a computational perspective will pave theway for building applications for music pedagogy in IAM.

Raga recognition is the most studied research topic in MIR of IAM. There exist aconsiderable number of approaches utilizing different characteristic aspects of ragassuch as svara set, svara salience and arohana-avrohana. A critical in-depth review ofthe existing approaches for raga recognition is presented in Section 2.4.3, wherein weidentify several shortcomings in these approaches and possible avenues for scientificcontribution to take this task to the next level. Here we provide a short summary ofthis analysis:

Nearly half of the number of the existing approaches for raga recognition donot utilize the temporal aspects of melody at all (Table 2.3), which are crucial

177

3a [YM UO ClSM C O[S U U[38

BA

CK

GR

OU

ND

Methods Svara set Svarasalience

Svaraintonation

arohana-avrohana

Melodicphrases

Svaradiscretization

Temporalaspects

Pandey et al. (2003) • • Yes •Chordia & Rae (2007) • • • Yes •Belle et al. (2009) • • • No

Shetty & Achary (2009) • • Yes •Sridhar & Geetha (2009) • • Yes •Koduri et al. (2011) • • Yes

Ranjani et al. (2011) • No

Chakraborty & De (2012) • Yes

Koduri et al. (2012) • • • Both

Chordia & Sentürk (2013) • • • No

Dighe et al. (2013a) • • • Yes •Dighe et al. (2013b) • • Yes

Koduri et al. (2014) • • • No

Kumar et al. (2014) • • • • Yes •Dutta et al. (2015)a • No •a This method performs raga verification and not recognition

Table 2.3: Raga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We alsoindicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order.

125


BA

CK

GR

OU

ND


Svaraintonation

arohana-avrohana

Melodicphrases

Svaradiscretization

Temporalaspects











e I [ ? D G X < a E

126


BA

CK

GR

OU

ND


Svaraintonation

arohana-avrohana

Melodicphrases

Svaradiscretization

Temporalaspects











e I [ ? D G X < a E

127


BA

CK

GR

OU

ND


Svaraintonation

arohana-avrohana

Melodicphrases

Svaradiscretization

Temporalaspects











e I [ ? D G X < a E

128


BA

CK

GR

OU

ND


Svaraintonation

arohana-avrohana

Melodicphrases

Svaradiscretization

Temporalaspects











129


BA

CK

GR

OU

ND


Svaraintonation

arohana-avrohana

Melodicphrases

Svaradiscretization

Temporalaspects











Ix[U 9 Ix[U : Ix[U

130


BA

CK

GR

OU

ND


Svaraintonation

arohana-avrohana

Melodicphrases

Svaradiscretization

Temporalaspects











131


BA

CK

GR

OU

ND


Svaraintonation

arohana-avrohana

Melodicphrases

Svaradiscretization

Temporalaspects











132


BA

CK

GR

OU

ND


Svaraintonation

arohana-avrohana

Melodicphrases

Svaradiscretization

Temporalaspects











Ix[U MYe]Z]WUg]ba

133

3a [YM UO ClSM C O[S U U[2.4

RE

LA

TE

DW

OR

KIN

IND

IAN

AR

TM

USIC

39

Method Tonal Feature TonicIdentification

Feature Recognition Method #Ragas Dataset(Dur./Num.)a

AudioType

Pandey et al. (2003) Pitch (Boersma & Weenink, 2001) NA Svara sequence HMM and n-Gram 2 - / 31 MP

Chordia & Rae (2007) Pitch (Sun, 2000) Manual PCD, PCDD SVM classifier 31 20 / 127 MP

Belle et al. (2009) Pitch (Rao & Rao, 2009) Manual PCD (parameterized) k-NN classifier 4 0.6 / 10 PP

Shetty & Achary (2009) Pitch (Sridhar & Geetha, 2006) NA #Svaras, Vakra svaras Neural Networkclassifier

20 - / 90 MP

Sridhar & Geetha (2009) Pitch (Lee, 2006) Singeridentification

Svara set, itssequence

String matching 3 - / 30 PP

Koduri et al. (2011) Pitch (Rao & Rao, 2010) Brute force PCD k-NN classifier 10 2.82 / 170 PP

Ranjani et al. (2011) Pitch (Boersma & Weenink, 2001) GMM fitting PDE SC-GMM and Setmatching

7 - / 48 PP

Chakraborty & De (2012) Pitch (Sengupta, 1990) Errorminimization

Svara set Set matching - - / - -

Koduri et al. (2012) Predominant pitch (Salamon &Gómez, 2012)

Multipitch-based PCD variants k-NN classifier 43 - / 215 PP

Chordia & Sentürk (2013) Pitch (Camacho, 2007) Brute force PCD variants k-NN and statisticalclassifiers

31 20 / 127 MP

Dighe et al. (2013a) Chroma (Lartillot et al., 2008) Brute force(vadi-based)

Chroma, Timbrefeatures

HMM 4 9.33 / 56 PP

Dighe et al. (2013b) Chroma (Lartillot et al., 2008) Brute force(vadi-based)

PCD variant RF classifier 8 16.8 / 117 PP


Multipitch-based PCD (parameterized) Different classifiers 45? 93/424 PP

Kumar et al. (2014) Predominant pitch (Salamon &Gómez, 2012)

Brute force PCD + n-Gramdistribution

SVM classifier 10 2.82 / 170 PP

Dutta et al. (2015)* Predominant pitch (Salamon &Gómez, 2012)

Cepstrum-based Pitch contours LCS with k-NN 30† 3 / 254 PP

a In the case of multiple datasets we list the larger one * This method performs raga verification and not recognition? Authors do not use all 45 ragas at once in a single experiment, but consider groups of 3 ragas per experiment † Authors finally use only 17 ragas in their experimentABBREVIATIONS: Dur.: Duration of the dataset, Num.: Number of recordings, NA: Not applicable, ‘-’: Not available, SC-GMM: semi-continuous GMM, MP: Monophonic, PP:Polyphonic

Table 2.4: Summary of the Raga recognition methods proposed in the literature. The methods are arranged in chronological order. 134

2.4R

EL

AT

ED

WO

RK

ININ

DIA

NA

RT

MU

SIC39

Method Tonal Feature TonicIdentification

Feature Recognition Method #Ragas Dataset(Dur./Num.)a

AudioType

Pandey et al. (2003) Pitch (Boersma & Weenink, 2001) NA Svara sequence HMM and n-Gram 2 - / 31 MP

Chordia & Rae (2007) Pitch (Sun, 2000) Manual PCD, PCDD SVM classifier 31 20 / 127 MP

Belle et al. (2009) Pitch (Rao & Rao, 2009) Manual PCD (parameterized) k-NN classifier 4 0.6 / 10 PP

Shetty & Achary (2009) Pitch (Sridhar & Geetha, 2006) NA #Svaras, Vakra svaras Neural Networkclassifier

20 - / 90 MP

Sridhar & Geetha (2009) Pitch (Lee, 2006) Singeridentification

Svara set, itssequence

String matching 3 - / 30 PP

Koduri et al. (2011) Pitch (Rao & Rao, 2010) Brute force PCD k-NN classifier 10 2.82 / 170 PP

Ranjani et al. (2011) Pitch (Boersma & Weenink, 2001) GMM fitting PDE SC-GMM and Setmatching

7 - / 48 PP

Chakraborty & De (2012) Pitch (Sengupta, 1990) Errorminimization

Svara set Set matching - - / - -


Multipitch-based PCD variants k-NN classifier 43 - / 215 PP

Chordia & Sentürk (2013) Pitch (Camacho, 2007) Brute force PCD variants k-NN and statisticalclassifiers

31 20 / 127 MP

Dighe et al. (2013a) Chroma (Lartillot et al., 2008) Brute force(vadi-based)

Chroma, Timbrefeatures

HMM 4 9.33 / 56 PP

Dighe et al. (2013b) Chroma (Lartillot et al., 2008) Brute force(vadi-based)

PCD variant RF classifier 8 16.8 / 117 PP


Multipitch-based PCD (parameterized) Different classifiers 45? 93/424 PP

Kumar et al. (2014) Predominant pitch (Salamon &Gómez, 2012)

Brute force PCD + n-Gramdistribution

SVM classifier 10 2.82 / 170 PP

Dutta et al. (2015)* Predominant pitch (Salamon &Gómez, 2012)

Cepstrum-based Pitch contours LCS with k-NN 30† 3 / 254 PP

a In the case of multiple datasets we list the larger one * This method performs raga verification and not recognition? Authors do not use all 45 ragas at once in a single experiment, but consider groups of 3 ragas per experiment † Authors finally use only 17 ragas in their experimentABBREVIATIONS: Dur.: Duration of the dataset, Num.: Number of recordings, NA: Not applicable, ‘-’: Not available, SC-GMM: semi-continuous GMM, MP: Monophonic, PP:Polyphonic

Table 2.4: Summary of the Raga recognition methods proposed in the literature. The methods are arranged in chronological order.

3a [YM UO ClSM C O[S U U[

135

6M M_

49

IYWbeX]a[f <heUg]ba CYUX 9eg]fgfbaWYegf

Ix[Uf

49 IYWbeX]a[f <heUg]ba CYUX 9eg]fgfIY_YUfYf

Ix[Uf



136

B [ [_ P 3 [MOT _

time (s) 10 30

time (s) 10 30

time (s) 10 30

0 20 40 60 80 100 120Index

0

20

40

60

80

100

120

Inde

x

(a)

0 20 40 60 80 100 120Index

0

20

40

60

80

100

120

(b)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0G eUfY VUfYXIx[U IYWb[a]g]ba

K<D VUfYXIx[U IYWb[a]g]ba

KbaU_ % KY cbeU_ <]fWeYg]mUg]ba

137

B [ [_ P 3 [MOT _

time (s) 10 30

time (s) 10 30

time (s) 10 30

0 20 40 60 80 100 120Index

0

20

40

60

80

100

120

Inde

x

(a)

0 20 40 60 80 100 120Index

0

20

40

60

80

100

120

(b)



KbaU_ % KY cbeU_

138

<]fWeYg]mUg]ba

BT M_ NM_ P ClSM C O[S U U[


Intra-recording Pattern Discovery

Data Processing

Music collection


Vocabulary Extraction

Term-frequencyFeature Extraction

Feature Normalization

TF-IDF Feature Extraction

Feature Matrix


Similarity Thresholding

Community Detection

Melodic Pattern Clustering

139



Intra-recording Pattern Discovery

Data Processing

Music collection


Vocabulary Extraction



TF-IDF Feature Extraction

Feature Matrix


Similarity Thresholding

Community Detection

Melodic Pattern Clustering

140


141


142


Kbc]W DbXY_]a[ o KYkg _Uff]Z]WUg]ba

G eUfYfIx[U

IYWbeX]a[f

NbeXfKbc]W<bWh Yagf

143


Kbc]W DbXY_]a[ o KYkg _Uff]Z]WUg]ba

KYe >eYdhYaWl t AaiYefY <bWh Yag>eYdhYaWl #K> A<>

144

E 6 M a 7d MO U[Vocabulary Extraction



145




146




+ .

, ,

+ +

- - -147




182 AUTOMATIC RAGA RECOGNITION

In the next step of our method we group together similar melodic patterns (Figure 6.1).To do so, we detect non-overlapping communities in the network of melodic pat-terns using the method proposed by Blondel et al. (2008). This community detectionmethod is based on optimizing the modularity of the network and is parameter-freefrom the user’s point of view. This method is capable of handling very large net-works and has been extensively used in various applications (Fortunato, 2010). Weuse its implementation available in networkX (Hagberg et al., 2008), a Python lan-guage package for exploration and analysis of networks and network algorithms. Notethat, from now on, the melodic patterns grouped within a community are regarded asthe occurrences of a single melodic phrase. Thus, a community essentially representsa melodic phrase or motif.

6.2.1.3 TF-IDF Feature Extraction

As mentioned above, we draw an analogy between raga rendition and textual descrip-tion of a topic. Using this analogy we represent each audio recording using a vectorspace model, wherein melodic patterns are considered as words (or terms). This pro-cess is divided into three blocks (Figure 6.1).

We start by building our vocabulary V , which translates to selecting relevant patterncommunities for characterizing ragas. For this, we include all the detected communit-ies except the ones that comprise patterns extracted from only a single audio record-ing. Such communities are analogous to the words that only occur within a documentand, hence, are irrelevant for modeling a topic.

We experiment with three different sets of features f1, f2 and f3, which are similar tothe TF-IDF features typically used in text information retrieval. We denote our corpusby R comprising NR = |R| number of recordings. A melodic phrase and a recordingis denoted by √ and r , respectively

f1(√,r) =

(1, if f (√,r) > 00, otherwise

(6.1)

where, f (√,r) denotes the raw frequency of occurrence of pattern √ in recordingr. f1 only considers the presence or absence of a pattern in a recording. In order toinvestigate if the frequency of occurrence of melodic patterns is relevant for charac-terizing ragas, we take f2(√,r) = f (√,r). As mentioned, the melodic patterns thatoccur across different ragas and in several recordings do not aid raga recognition.Therefore, to reduce their effect in the feature vector we employ a weighting scheme,similar to the inverse document frequency (idf) weighting in text retrieval.

f3(√,r) = f (√,r)I(√,R) (6.2)

I(√,R) = log✓

NR|{r 2 R :√2 r}|

◆(6.3)







f1(√,r) =


(6.1)


f3(√,r) = f (√,r)I(√,R) (6.2)

I(√,R) = log✓

NR|{r 2 R :√2 r}|

◆(6.3)







f1(√,r) =


(6.1)


f3(√,r) = f (√,r)I(√,R) (6.2)

I(√,R) = log✓

NR|{r 2 R :√2 r}|

◆(6.3)







f1(√,r) =


(6.1)


f3(√,r) = f (√,r)I(√,R) (6.2)

I(√,R) = log✓

NR|{r 2 R :√2 r}|

◆(6.3)148

C _a _6.2 PATTERN-BASED RAGA RECOGNITION 185

Method Feature SVML SGD NBM NBG RF LR 1-NN

MVSM

f1 51.04 55 37.5 54.37 25.41 55.83 -

f2 45.83 50.41 35.62 47.5 26.87 51.87 -

f3 45.83 51.66 67.29 44.79 23.75 51.87 -

MPC PCDfull - - - - - - 73.12

MGKPDparam 30.41 22.29 27.29 28.12 42.91 30.83 25.62

PDcontext 54.16 43.75 5.2 33.12 49.37 54.79 26.25

Table 6.1: Accuracy (%) of raga recognition on RRDCMD dataset by MVSM and other methodsmethods using different features and classifiers. Bold text signifies the best accuracy by amethod among all its variants

Hindustani music recordings. We therefore do not consider this method for comparingresults on Hindustani music.

The authors of MPC courteously ran the experiments on our dataset using the originalimplementations of the method. For MGK, the authors kindly extracted the features(PDparam and PDcontext) using the original implementation of their method and theexperiments using different classification strategies were done by us.


Before we proceed to present our results, we notify readers that the accuracies repor-ted in this section for different methods vary slightly from the ones reported in Gulatiet al. (2016b). As mentioned, the experimental setup used here is different com-pared to the paper (Section 6.2.2.2). In addition, the parameters used for discoveringmelodic patterns are also different (Section 6.2.1.1). Furthermore, we do not considerthe dataset that comprise only 10 raga as done in Gulati et al. (2016b), for which ourmethod was shown to outperform the state of the art.

In Table 6.1 and Table 6.2, we present the results of our proposed method MVSM andthe two state of the art methods MPC and MGK for the two datasets RRDCMD andRRDHMD, respectively. The highest accuracy for every method is highlighted in boldfor both the datasets. Due to the poor performance of SVML and NBB classifiers inthe experiments we do not consider them in our further analysis.

We start by analyzing the results of the variants of MVSM for both the datasets.From Table 6.1 and Table 6.2, we see that the highest accuracy obtained by MVSMfor RRDCMD is 67.29%, and for RRDHMD is 82.66%. The difference in the perform-ance across the two datasets can be attributed to several factors. One of them beingthe difference in the number of ragas across the two datasets: 40 in RRDCMD and 30RRDHMD. The performance difference can also be because of the difference in the



MVSM

f1 71 72.33 69.33 79.33 38.66 74.33 -

f2 65.33 64.33 67.66 72.66 40.33 68 -

f3 65.33 62.66 82.66 72 41.33 67.66 -

MPC PCDfull - - - - - - 91.66

Table 6.2: Accuracy (%) of raga recognition on RRDHMD dataset by MVSM and other methodsmethods using different features and classifiers. Bold text signifies the best accuracy by amethod among all its variants.

overall length of the audio recordings. Recordings of the music pieces considered inour datasets are significantly longer in Hindustani music compared to Carnatic mu-sic (Section 3.3.4). However, since the melodic characteristics and complexity variesconsiderably across the two music traditions, a definite reason can only be identifiedby performing the experiments with equal number of ragas in the dataset and equalduration of the music pieces (Section 6.4).

From Table 6.1 and Table 6.2, we see that for both the datasets, the accuracy obtainedby MVSM across the feature sets differs substantially. We also see that their perform-ance is very sensitive to the choice of the classifier. With the exceptions of the NBMand LR classifier, feature f1 in general performs better than the other two features andthe difference is found to be statistically significant in each case. This suggests that,considering just the presence or the absence of a melodic pattern, irrespective of itsfrequency of occurrence, might be sufficient for raga recognition. Interestingly, thisfinding is consistent with the fact that characteristic melodic patterns are unique to araga and a single occurrence of such patterns is sufficient to identify the raga (Krishna& Ishwar, 2012). However, the best performance is obtained by the combination off3 and NBM classifier, and the difference in its performance compared to any othervariant is statistically significant. The NBM classifier outperforming other classifi-ers using appropriate features is also well recognized in the text classification com-munity (McCallum & Nigam, 1998). We, therefore, only consider the combinationof f3 and the NBM classifier for comparing MVSM with the other methods. Overall,the accuracies across different features indicate that normalizing the frequency of oc-currence of melodic patterns (as done for f1 and f3) is beneficial for raga recognition.A plausible reason can be that a normalization procedure reduces the weight of thetype of melodic patterns that occur more frequently, such as the gamakas type pat-terns (Section 2.3.2), and are not the characteristic patterns of any particular raga. Itis worth noting that the feature weights assigned by a classifier can be used to identifythe relevant melodic patterns for raga recognition. These patterns can serve as a dic-tionary of semantically-meaningful melodic units for many computational tasks inIAM.

3 4 5 6 7 8 9 10 11 12 13 14

� (bin index)

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

0.22

Clu

ster

ing

Coe

ffici

ent(

C)

0

10

20

30

40

50

60

70

Acc

urac

y(%

)

C(G) � C(Gr) Accuracy of MVSM

3 4 5 6 7 8 9 10 11 12 13 14

� (bin index)

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0.24

Clu

ster

ing

Coe

ffici

ent(

C)

30

40

50

60

70

80

90

Acc

urac

y(%

)


149

C _a _6.2 PATTERN-BASED RAGA RECOGNITION 185


MVSM

f1 51.04 55 37.5 54.37 25.41 55.83 -

f2 45.83 50.41 35.62 47.5 26.87 51.87 -

f3 45.83 51.66 67.29 44.79 23.75 51.87 -

MPC PCDfull - - - - - - 73.12

MGKPDparam 30.41 22.29 27.29 28.12 42.91 30.83 25.62

PDcontext 54.16 43.75 5.2 33.12 49.37 54.79 26.25

Table 6.1: Accuracy (%) of raga recognition on RRDCMD dataset by MVSM and other methodsmethods using different features and classifiers. Bold text signifies the best accuracy by amethod among all its variants

Hindustani music recordings. We therefore do not consider this method for comparingresults on Hindustani music.

The authors of MPC courteously ran the experiments on our dataset using the originalimplementations of the method. For MGK, the authors kindly extracted the features(PDparam and PDcontext) using the original implementation of their method and theexperiments using different classification strategies were done by us.


Before we proceed to present our results, we notify readers that the accuracies repor-ted in this section for different methods vary slightly from the ones reported in Gulatiet al. (2016b). As mentioned, the experimental setup used here is different com-pared to the paper (Section 6.2.2.2). In addition, the parameters used for discoveringmelodic patterns are also different (Section 6.2.1.1). Furthermore, we do not considerthe dataset that comprise only 10 raga as done in Gulati et al. (2016b), for which ourmethod was shown to outperform the state of the art.

In Table 6.1 and Table 6.2, we present the results of our proposed method MVSM andthe two state of the art methods MPC and MGK for the two datasets RRDCMD andRRDHMD, respectively. The highest accuracy for every method is highlighted in boldfor both the datasets. Due to the poor performance of SVML and NBB classifiers inthe experiments we do not consider them in our further analysis.

We start by analyzing the results of the variants of MVSM for both the datasets.From Table 6.1 and Table 6.2, we see that the highest accuracy obtained by MVSMfor RRDCMD is 67.29%, and for RRDHMD is 82.66%. The difference in the perform-ance across the two datasets can be attributed to several factors. One of them beingthe difference in the number of ragas across the two datasets: 40 in RRDCMD and 30RRDHMD. The performance difference can also be because of the difference in the



MVSM

f1 71 72.33 69.33 79.33 38.66 74.33 -

f2 65.33 64.33 67.66 72.66 40.33 68 -

f3 65.33 62.66 82.66 72 41.33 67.66 -

MPC PCDfull - - - - - - 91.66

Table 6.2: Accuracy (%) of raga recognition on RRDHMD dataset by MVSM and other methodsmethods using different features and classifiers. Bold text signifies the best accuracy by amethod among all its variants.

overall length of the audio recordings. Recordings of the music pieces considered inour datasets are significantly longer in Hindustani music compared to Carnatic mu-sic (Section 3.3.4). However, since the melodic characteristics and complexity variesconsiderably across the two music traditions, a definite reason can only be identifiedby performing the experiments with equal number of ragas in the dataset and equalduration of the music pieces (Section 6.4).

From Table 6.1 and Table 6.2, we see that for both the datasets, the accuracy obtainedby MVSM across the feature sets differs substantially. We also see that their perform-ance is very sensitive to the choice of the classifier. With the exceptions of the NBMand LR classifier, feature f1 in general performs better than the other two features andthe difference is found to be statistically significant in each case. This suggests that,considering just the presence or the absence of a melodic pattern, irrespective of itsfrequency of occurrence, might be sufficient for raga recognition. Interestingly, thisfinding is consistent with the fact that characteristic melodic patterns are unique to araga and a single occurrence of such patterns is sufficient to identify the raga (Krishna& Ishwar, 2012). However, the best performance is obtained by the combination off3 and NBM classifier, and the difference in its performance compared to any othervariant is statistically significant. The NBM classifier outperforming other classifi-ers using appropriate features is also well recognized in the text classification com-munity (McCallum & Nigam, 1998). We, therefore, only consider the combinationof f3 and the NBM classifier for comparing MVSM with the other methods. Overall,the accuracies across different features indicate that normalizing the frequency of oc-currence of melodic patterns (as done for f1 and f3) is beneficial for raga recognition.A plausible reason can be that a normalization procedure reduces the weight of thetype of melodic patterns that occur more frequently, such as the gamakas type pat-terns (Section 2.3.2), and are not the characteristic patterns of any particular raga. Itis worth noting that the feature weights assigned by a classifier can be used to identifythe relevant melodic patterns for raga recognition. These patterns can serve as a dic-tionary of semantically-meaningful melodic units for many computational tasks inIAM.

3 4 5 6 7 8 9 10 11 12 13 14

� (bin index)

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

0.22

Clu

ster

ing

Coe

ffici

ent(

C)

0

10

20

30

40

50

60

70

Acc

urac

y(%

)


3 4 5 6 7 8 9 10 11 12 13 14

� (bin index)

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0.24

Clu

ster

ing

Coe

ffici

ent(

C)

30

40

50

60

70

80

90

Acc

urac

y(%

)


b cUeY ZYUgheYf

gUgY bZ g Y Ueg

150

C _a _1 DaYYM e

beX]U wYague #, +-

GebcbfYX

BbXhe] Yg U_( #, +.

G <>h__

K> A<> #>-

G <cUeU

1-

01

//

3,

2-

UeaUg]W#

]aXhfgUa]#

• Gulati, S., Serrà, J., Ishwar, V., Şentürk, S., & Serra, X. (2016). Phrase-based rāga recognition using vector space modeling. In ICASSP, pp. 66–70. 151

7 [ 3 M e_U_

R1-

S .anm

ukha

priy

aR

2-K

api

R3-

Bha

irav

iR

4-M

adhy

amav

ati

R5-

Bila

hari

R6-

Moh

anam

R7-

Senc

urut

.t .i

R8-

Srır

anja

niR

9-R

ıtig

aul .a

R10

-Hus

senı

R11

-Dha

nyas

iR

12-A

t .ana

R13

-Beh

agR

14-S

urat .

iR

15-K

amav

arda

niR

16-M

ukha

riR

17-S

indh

ubha

irav

iR

18-S

ahan

aR

19-K

anad

.aR

20-M

ayam

al .av

agau

l .aR

21-N

at .a

R22

-San

kara

bhar

an.a

mR

23-S

aver

iR

24-K

amas

R25

-Tod .

iR

26-B

egad

.aR

27-H

arik

ambh

oji

R28

-Srı

R29

-Kal

yan .i

R30

-Sam

aR

31-N

at .ak

urin

jiR

32-P

urvı

kal .y

an.i

R33

-Yad

ukul

aka

mbo

jiR

34-D

evag

andh

ari

R35

-Ked

arag

aul .a

R36

-Ana

ndab

hair

avi

R37

-Gau

l .aR

38-V

aral .

iR

39-K

ambh

oji

R40

-Kar

ahar

apri

ya

R1R2R3R4R5R6R7R8R9

R10R11R12R13R14R15R16R17R18R19R20R21R22R23R24R25R26R27R28R29R30R31R32R33R34R35R36R37R38R39R40

8 1 1 1 14 1 1 1 1 1 1 1 1

9 1 1 11 5 3 1 2

7 1 1 1 21 5 1 1 1 1 1 1

1 5 1 1 1 1 1 12 5 1 1 3

121 1 2 5 1 1 1

7 1 2 1 11 7 2 1 1

11 110 1 1

10 21 11

1 1 1 2 1 4 1 11 9 1 1

1 1 1 1 1 3 1 1 1 111 1

1212

1 3 7 110 2

121 10 1

1 9 21 3 1 1 2 1 2 1

11 11 3 1 1 6

1 1 101 11

1 1 1 1 5 2 11 1 10

1 1 1 8 12 1 2 1 1 1 3 1

2 2 1 712

1 1 101 11

0 1 2 3 4 5 6 7 8 9 10 11 12

152

7 [ 3 M e_U_

R1-

S .anm

ukha

priy

aR

2-K

api

R3-

Bha

irav

iR

4-M

adhy

amav

ati

R5-

Bila

hari

R6-

Moh

anam

R7-

Senc

urut

.t .i

R8-

Srır

anja

niR

9-R

ıtig

aul .a

R10

-Hus

senı

R11

-Dha

nyas

iR

12-A

t .ana

R13

-Beh

agR

14-S

urat .

iR

15-K

amav

arda

niR

16-M

ukha

riR

17-S

indh

ubha

irav

iR

18-S

ahan

aR

19-K

anad

.aR

20-M

ayam

al .av

agau

l .aR

21-N

at .a

R22

-San

kara

bhar

an.a

mR

23-S

aver

iR

24-K

amas

R25

-Tod .

iR

26-B

egad

.aR

27-H

arik

ambh

oji

R28

-Srı

R29

-Kal

yan .i

R30

-Sam

aR

31-N

at .ak

urin

jiR

32-P

urvı

kal .y

an.i

R33

-Yad

ukul

aka

mbo

jiR

34-D

evag

andh

ari

R35

-Ked

arag

aul .a

R36

-Ana

ndab

hair

avi

R37

-Gau

l .aR

38-V

aral .

iR

39-K

ambh

oji

R40

-Kar

ahar

apri

ya

R1R2R3R4R5R6R7R8R9


8 1 1 1 14 1 1 1 1 1 1 1 1

9 1 1 11 5 3 1 2

7 1 1 1 21 5 1 1 1 1 1 1

1 5 1 1 1 1 1 12 5 1 1 3

121 1 2 5 1 1 1

7 1 2 1 11 7 2 1 1

11 110 1 1

10 21 11

1 1 1 2 1 4 1 11 9 1 1

1 1 1 1 1 3 1 1 1 111 1

1212

1 3 7 110 2

121 10 1

1 9 21 3 1 1 2 1 2 1

11 11 3 1 1 6

1 1 101 11

1 1 1 1 5 2 11 1 10

1 1 1 8 12 1 2 1 1 1 3 1

2 2 1 712

1 1 101 11

0 1 2 3 4 5 6 7 8 9 10 11 12

G eUfY VUfYX ex[Uf

153

7 [ 3 M e_U_

R1-

S .anm

ukha

priy

aR

2-K

api

R3-

Bha

irav

iR

4-M

adhy

amav

ati

R5-

Bila

hari

R6-

Moh

anam

R7-

Senc

urut

.t .i

R8-

Srır

anja

niR

9-R

ıtig

aul .a

R10

-Hus

senı

R11

-Dha

nyas

iR

12-A

t .ana

R13

-Beh

agR

14-S

urat .

iR

15-K

amav

arda

niR

16-M

ukha

riR

17-S

indh

ubha

irav

iR

18-S

ahan

aR

19-K

anad

.aR

20-M

ayam

al .av

agau

l .aR

21-N

at .a

R22

-San

kara

bhar

an.a

mR

23-S

aver

iR

24-K

amas

R25

-Tod .

iR

26-B

egad

.aR

27-H

arik

ambh

oji

R28

-Srı

R29

-Kal

yan .i

R30

-Sam

aR

31-N

at .ak

urin

jiR

32-P

urvı

kal .y

an.i

R33

-Yad

ukul

aka

mbo

jiR

34-D

evag

andh

ari

R35

-Ked

arag

aul .a

R36

-Ana

ndab

hair

avi

R37

-Gau

l .aR

38-V

aral .

iR

39-K

ambh

oji

R40

-Kar

ahar

apri

ya

R1R2R3R4R5R6R7R8R9


8 1 1 1 14 1 1 1 1 1 1 1 1

9 1 1 11 5 3 1 2

7 1 1 1 21 5 1 1 1 1 1 1

1 5 1 1 1 1 1 12 5 1 1 3

121 1 2 5 1 1 1

7 1 2 1 11 7 2 1 1

11 110 1 1

10 21 11

1 1 1 2 1 4 1 11 9 1 1

1 1 1 1 1 3 1 1 1 111 1

1212

1 3 7 110 2

121 10 1

1 9 21 3 1 1 2 1 2 1

11 11 3 1 1 6

1 1 101 11

1 1 1 1 5 2 11 1 10

1 1 1 8 12 1 2 1 1 1 3 1

2 2 1 712

1 1 101 11

0 1 2 3 4 5 6 7 8 9 10 11 12

9__]YX ex[Uf

154

R1-

S .anm

ukha

priy

aR

2-K

api

R3-

Bha

irav

iR

4-M

adhy

amav

ati

R5-

Bila

hari

R6-

Moh

anam

R7-

Senc

urut

.t .i

R8-

Srır

anja

niR

9-R

ıtig

aul .a

R10

-Hus

senı

R11

-Dha

nyas

iR

12-A

t .ana

R13

-Beh

agR

14-S

urat .

iR

15-K

amav

arda

niR

16-M

ukha

riR

17-S

indh

ubha

irav

iR

18-S

ahan

aR

19-K

anad

.aR

20-M

ayam

al .av

agau

l .aR

21-N

at .a

R22

-San

kara

bhar

an.a

mR

23-S

aver

iR

24-K

amas

R25

-Tod .

iR

26-B

egad

.aR

27-H

arik

ambh

oji

R28

-Srı

R29

-Kal

yan .i

R30

-Sam

aR

31-N

at .ak

urin

jiR

32-P

urvı

kal .y

an.i

R33

-Yad

ukul

aka

mbo

jiR

34-D

evag

andh

ari

R35

-Ked

arag

aul .a

R36

-Ana

ndab

hair

avi

R37

-Gau

l .aR

38-V

aral .

iR

39-K

ambh

oji

R40

-Kar

ahar

apri

ya

R1R2R3R4R5R6R7R8R9


11 110 2

7 1 2 1 11 0 2 1 5 1 1 1

9 1 23 6 1 1 1

1 9 1 11 9 1 1

122 2 6 2

11 11 1 1 8 1

10 1 110 2

11 14 2 6

1 10 110 2

1 2 1 4 1 1 1 11 9 1 1

121 11

1 1110 1 1

121 7 1 3

1 1 5 51 2 2 4 1 2

11 11 1 1 9

1 111 1 10

1 1 1 1 81 2 1 8

1 5 612

121 11

3 1 81 2 9

0 1 2 3 4 5 6 7 8 9 10 11 12

7 [ 3 M e_U_R

1-S .a

nmuk

hapr

iya

R2-

Kap

iR

3-B

hair

avi

R4-

Mad

hyam

avat

iR

5-B

ilaha

riR

6-M

ohan

amR

7-Se

ncur

ut.t .

i

R8-

Srır

anja

niR

9-R

ıtig

aul .a

R10

-Hus

senı

R11

-Dha

nyas

iR

12-A

t .ana

R13

-Beh

agR

14-S

urat .

iR

15-K

amav

arda

niR

16-M

ukha

riR

17-S

indh

ubha

irav

iR

18-S

ahan

aR

19-K

anad

.aR

20-M

ayam

al .av

agau

l .aR

21-N

at .a

R22

-San

kara

bhar

an.a

mR

23-S

aver

iR

24-K

amas

R25

-Tod .

iR

26-B

egad

.aR

27-H

arik

ambh

oji

R28

-Srı

R29

-Kal

yan .i

R30

-Sam

aR

31-N

at .ak

urin

jiR

32-P

urvı

kal .y

an.i

R33

-Yad

ukul

aka

mbo

jiR

34-D

evag

andh

ari

R35

-Ked

arag

aul .a

R36

-Ana

ndab

hair

avi

R37

-Gau

l .aR

38-V

aral .

iR

39-K

ambh

oji

R40

-Kar

ahar

apri

ya

R1R2R3R4R5R6R7R8R9


8 1 1 1 14 1 1 1 1 1 1 1 1

9 1 1 11 5 3 1 2

7 1 1 1 21 5 1 1 1 1 1 1

1 5 1 1 1 1 1 12 5 1 1 3

121 1 2 5 1 1 1

7 1 2 1 11 7 2 1 1

11 110 1 1

10 21 11

1 1 1 2 1 4 1 11 9 1 1

1 1 1 1 1 3 1 1 1 111 1

1212

1 3 7 110 2

121 10 1

1 9 21 3 1 1 2 1 2 1

11 11 3 1 1 6

1 1 101 11

1 1 1 1 5 2 11 1 10

1 1 1 8 12 1 2 1 1 1 3 1

2 2 1 712

1 1 101 11

0 1 2 3 4 5 6 7 8 9 10 11 12

G eUfY VUfYX G < VUfYX

155

B [ [_ P 3 [MOT _

time (s) 10 30

time (s) 10 30

time (s) 10 30

0 20 40 60 80 100 120Index

0

20

40

60

80

100

120

Inde

x

(a)

0 20 40 60 80 100 120Index

0

20

40

60

80

100

120

(b)



KbaU_ % KY cbeU_ <]fWeYg]mUg]ba

156

5TM M U PUM 3 a_UO

0 2 4 6 8 10

Time (s)

�100

0

100

200

300

400

500

600

Freq

uenc

y(C

ents

)S

m

g

Characteristic movement

0 1 2 3 4 5 6 7

Time (s)

0

200

400

600

800

Freq

uenc

y(C

ents

)

S

mg

:x[y{ez

: z cU_xfz

157

EUY P Me P [PUO Da RMO


Tonic Normalization

Surface Generation

Power Compression

Gaussian smoothening

Audio signal

TDMS

Pre-processing

Post-processing

158



Tonic Normalization

Surface Generation

Power Compression

Gaussian smoothening

Audio signal

TDMS

Pre-processing

Post-processing

159


§ Da RMO 9 M U[


given recording, we generate a surface S of size h ⇥h by computing

si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)

for 0 i, j < h , where si j is the (i, j)th element of the two-dimensional matrix S, ptis the t th sample (in Cent-scale) of the pitch sequence of length N, I is an indicatorfunction such that

I(x,y) =

(1, iff x = y0, otherwise

(6.5)

B is an octave-wrapping integer binning operator defined by

B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)

and t is a time delay index (in frames) that is left as a parameter. Note that, the frameswhere a predominant pitch could not be obtained (unvoiced segments) are excludedfrom any calculation. For the size of S we use h = 120. This value corresponds to10 Cents per bin, an optimal pitch resolution reported by Chordia & Sentürk (2013).In that study, the authors show that raga recognition using PCDs with a bin width of10 Cents outperforms the PCDs with a bin width of 100 Cents, and obtains a compar-able results to PCDs with a bin width of 5 Cents.

An example of the generated surface S for a music piece54 in raga Yaman is shownin Figure 6.7 (a). We see that the prominent peaks in the surface correspond to thesvaras of raga Yaman. We notice that these peaks are steep and the dynamic rangeof the surface is high. This can be attributed to the nature of the melodies in thesemusic traditions, particularly in Hindustani music, where the melodies often containlong held svaras. In addition, the dynamic range is high also because the pitches inthe stable svara regions lie within a small frequency range around the mean svarafrequency compared to the pitches in the transitory melodic regions. Because of this,most of the pitch values are mapped to a small number of bins, making the prominentpeaks more steep.

6.3.1.3 Post-processing

Power Compression: In order to accentuate the values corresponding to the trans-itory regions in the melody and reduce the dynamic range of the surface, we apply anelement-wise power compression

S = Sa , (6.7)

where a is an exponent that is left as a parameter. Once a more compact (in terms ofthe dynamic range) surface is obtained, we apply Gaussian smoothing. With that, weattempt to attenuate the subtle differences in S corresponding to the different melodieswithin the same raga, while retaining the attributes that characterize that raga.

54http://musicbrainz.org/recording/e59642ca-72bc-466b-bf4b-d82bfbc7b4af



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)





si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



160


§ Da RMO 9 M U[



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



G]gW

#WYag

K] Y

G+

G,



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



G+G,



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



G+’

G,’

#G+’& G,’

# &

161


§ Da RMO 9 M U[



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



G]gW

#WYag

K] Y

G+

G,



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



G+G,



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



G+’

G,’

#G+’& G,’

# &

162


§ Da RMO 9 M U[



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



G]gW

#WYag

K] Y

G+

G,



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



G+G,



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



G+’

G,’

#G+’& G,’

# &

163


§ Da RMO 9 M U[



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



G]gW

#WYag

K] Y

G+

G,



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



G+G,



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



G+’

G,’

#G+’& G,’

# &

164


§ Da RMO 9 M U[



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



G]gW

#WYag

K] Y

G+G,



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



G+G,



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



G+’

G,’

#G+’& G,’

# &

165


§ Da RMO 9 M U[



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



G]gW

#WYag

K] Y

G+G,



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



G+’

G,’0 20 40 60 80 100 120

Index

0

20

40

60

80

100

120

Inde

x

(a)

0 20 40 60 80 100 120Index

0

20

40

60

80

100

120

(b)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

020

4060

8010

012

0In

dex

0 20 40 60 80 100

120

Index

(a)

020

4060

8010

012

0In

dex

0 20 40 60 80 100

120

(b)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

G+ G,



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



166


§ B[_ [O __U S§ B[c O[Y __U[ g(&) (& - (&- (&/- )i§ 9Ma__UM _Y[[ T U S g ) ) +i§ [ YM UfM U[ >)

0 20 40 60 80 100 120Index

0

20

40

60

80

100

120

Inde

x

(a)

0 20 40 60 80 100 120Index

0

20

40

60

80

100

120

(b)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0 20 40 60 80 100 120Index

0

20

40

60

80

100

120

Inde

x

(a)

0 20 40 60 80 100 120Index

0

20

40

60

80

100

120

(b)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0



si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)





si j =N�1

Ât=t

I (B(pt) , i) I (B(pt�t) , j) (6.4)


I(x,y) =


(6.5)


B(x) =j ⇣ hx

1200

⌘mod h

k, (6.6)





S = Sa , (6.7)



167


§ M U[

0 20 40 60 80 100 120Index

0

20

40

60

80

100

120

Inde

x

(a)

0 20 40 60 80 100 120Index

0

20

40

60

80

100

120

(b)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

168


§ M U[

0 20 40 60 80 100 120Index

0

20

40

60

80

100

120

Inde

x

(a)

0 20 40 60 80 100 120Index

0

20

40

60

80

100

120

(b)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Ix[U PU Ua

http://musicbrainz.org/recording/e59642ca-72bc-466b-bf4b-d82bfbc7b4af 169


§ M U[

0 20 40 60 80 100 120Index

0

20

40

60

80

100

120

Inde

x

(a)

0 20 40 60 80 100 120Index

0

20

40

60

80

100

120

(b)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Ix[U PU Ua


C _a _1 DaYYM e

GebcbfYX

beX]U wYague #, +-

G eUfY VUfYX

BbXhe] Yg U_( #, +.

K<D

G <>h__

K> A<> #>-

G <cUeU

21

1-

01

//

32

3,

2-

UeaUg]W#

]aXhfgUa]#

• Gulati, S., Serrà, J., Ganguli, K. K., Şentürk, S., & Serra, X. (2016). Time-delayed melody surfaces for rāga recognition. In ISMIR, pp. 751–757. 173

7RR O [R 6M M_

I+I,I-I.I/I0I1I2I3I+I++I+,I+-

Eh VYe bZ ex[Uf b]WY bZ ex[Uf

174

7RR O [R 6M M_

5 10 20 30 40Number of ragas

0.80

0.85

0.90

0.95

1.00

Acc

urac

y(%

)

5 10 15 20 25 30Number of ragas

0.80

0.85

0.90

0.95

1.00

Acc

urac

y(%

)

UeaUg]W

]aXhfgUa]

175

3 UOM U[ _

176

Chapter 7Applications

7.1 Introduction

The research work presented in this thesis has several applications. A number of theseapplications have already been described in Section 1.1. While some applicationssuch as raga-based music retrieval and music discovery are relevant mainly in thecontext of large audio collections, there are several applications that can be developedon the scale of the music collection compiled in the CompMusic project. In thischapter, we present some concrete examples of such applications that have alreadyincorporated parts of our work. We provide a brief overview of Dunya, a collection ofthe music corpora and software tools developed during the CompMusic project and,Saraga and Riyaz, the mobile applications developed within the technology transferproject, Culture Aware MUsic Technologies (CAMUT). In addition, we present threeweb demos that showcase parts of the outcomes of our computational methods. Wealso briefly present one of our recent studies that performs musicologically motivatedexploration of melodic structures in IAM. It serves as an example of how our methodscan facilitate investigations in computational musicology.

7.2 Dunya

Dunya55 comprises the music corpora and the software tools that have been developedas part of the CompMusic project. It includes data for five music traditions - Hindus-tani, Carnatic, Turkish Makam, Jingju and Andalusian music. By providing accessto the gathered data (such as audio recordings and metadata), and the generated data(such as audio descriptors) in the project, Dunya aims to facilitate study and explor-ation of relevant musical aspects of different music repertoires. The CompMusiccorpora mainly comprise audio recordings and complementary information that de-scribes those recordings. This complementary information can be in the form of the

55http://dunya.compmusic.upf.edu/

207

3 UOM U[ _

12

3 4 5

6

608 Proceedings of the 17th ISMIR Conference, New York City, USA, August 7-11, 2016

177

6a eM

9GA

178

6a eM H N RMO

1

4

32

5

ggc4))XhalU(Wb c hf]W(hcZ(YXh179

5[Y a M U[ M a_UO[ [Se

Ganguli, K. K., Gulati, S., Serra, X., & Rao, P. (2016). Data-driven exploration of melodic structures in Hindustani music. In ISMIR, pp. 605-611.

608 Proceedings of the 17th ISMIR Conference, New York City, USA, August 7-11, 2016

180

6 Y[1 [PUO BM _

ggc4))XhalU(Wb c hf]W(hcZ(YXh) bg]ZX]fWbiYel)181

6 Y[1 [PUO BM _

ggc4))XhalU(Wb c hf]W(hcZ(YXh) bg]ZX]fWbiYel)182

6 Y[1 [PUO BM _

ggc4))XhalU(Wb c hf]W(hcZ(YXh)cUggYeaTaYg be )183

6 Y[1 [PUO BM _


ClSMcU_

ggcf4))XhalU(Wb c hf]W(hcZ(YXh)eU[U ]fY)186

DM lSM

bagYag

9DLK

187

DM lSM9DLK

188

CUelf9DLK

7 / B hfYef

7 2 fYff]baf

189

DaYYM e M P a a B _ O Ub _

190

Chapter 8Summary and Future Perspectives

8.1 IntroductionIn this thesis, we have presented a number of computational approaches for analyzingmelodic elements at different levels of melodic organization in IAM. In tandem, theseapproaches generate a high-level melodic description of the audio collections in thismusic tradition. They build upon each other to finally allow us to achieve the goalsthat we set at the beginning of this thesis, which are:

To curate and structure representative music corpora of IAM that comprise au-dio recordings and the associated metadata, and use that to compile sizable andwell annotated tests datasets for melodic analyses.

To develop data-driven computational approaches for the discovery and char-acterization of musically relevant melodic patterns in sizable audio collectionsof IAM

To devise computational approaches for automatically recognizing ragas in re-corded performances of IAM.

Based on the results presented in this thesis, we can now say that our goals are suc-cessfully met. We started this thesis by presenting our motivation behind the analysisand description of melodies in IAM, highlighting the opportunities and challengesthat this music tradition offers within the context of music information research andthe CompMusic project (Chapter 1). We provided a brief introduction to IAM and itsmelodic organization, and critically reviewed the existing literature on related topicswithin the context of MIR and IAM (Chapter 2).

In Chapter 3, we provided a comprehensive overview of the music corpora and data-sets that were compiled and structured in our work as a part of the CompMusicproject. To the best of our knowledge, these corpora comprise the largest audiocollections of IAM along with curated metadata and automatically extracted music

221

5[ UNa U[ _1 5[ [ M M P 6M M_ _

§ PUM M Ya_UO O[ [ M

§ 6M M_ _

§ E[ UO UP URUOM U[ PM M_

§ el_ PM M_

§ Y [bU S Y [PUO _UYU M U e PM M_

§ ClSM O[S U U[ PM M_

§ :U Pa_ M U

§ 5M M UO

ggc4))Wb c hf]W(hcZ(YXh)abXY)- . 191

5[ UNa U[ _1 DOU URUO

§ C bU c [R dU_ U S M [MOT _

§ E[ UO UP URUOM U[ O[Y T _Ub bM aM U[

§ el_ P O U[

§ [PUO _UYU M U e O[Y T _Ub bM aM U[

§ Y [bU S Y [PUO _UYU M U e

§ [PUO M PU_O[b e

§ [PUO M OTM MO UfM U[

§ ClSM O[S U U[§ BT M_ NM_ P Y T[P

§ EUY P Me P Y [PUO _a RMO NM_ P Y T[P

ggc4))Wb c hf]W(hcZ(YXh)abXY)- . 192

5[ UNa U[ _1 E OT UOM

§ E[ UO UP URUOM U[ U S M U[

§ H N P Y[_ R[ _TM U S PU_O[b P M _

§ ClSMcU_ BU OT I U MbM_O U

§ DM lSM M P CUelf

193

a a B _ O Ub _

§ 5[YNU U S _a bU_ P M P a _a bU_ P M [MOT _

§ D aPeU S Y [PUO _ SY M U[

§ C RU T M_ N[a PM U _

§ 5[YNU U S Ya U R M a _ R[ lSM O[S U U[

§ D aPe [R b[ a U[ [R Ya_UO U O

§ A [ [SUOM R MY c[ W [ S [a Ya U P _O U U[ _e_ Y_

194

a a B _ O Ub _

§ 5[YNU U S _a bU_ P M P a _a bU_ P M [MOT _

§ D aPeU S Y [PUO _ SY M U[

§ C RU T M_ N[a PM U _

§ 5[YNU U S Ya U R M a _ R[ lSM O[S U U[

§ D aPe [R b[ a U[ [R Ya_UO U O

§ A [ [SUOM R MY c[ W [ S [a Ya U P _O U U[ _e_ Y_

MusicMuni Labs195

9DLK

BaN UOM U[ _ Ne T 3a T[§ B bU c P V[a M _

§ Gulati, S., Bellur, A., Salamon, J., Ranjani, H. G., Ishwar, V., Murthy, H. A., & Serra, X. (2014). Automatic tonic identificationin Indian art music: approaches and evaluation. JNMR, 43(1), 53–71.

§ Koduri, G. K., Gulati, S., Rao, P., & Serra, X. (2012). Rāga recognition based on pitch distribution methods. JNMR, 41(4), 337–350.

§ a M UO _ U bU c P O[ R O _ )-§ Gulati, S., Serrà, J., Ganguli, K. K., Şentürk, S., & Serra, X. (2016). Time-delayed melody surfaces for rāga recognition. In

ISMIR, pp. 751–757.

§ Ganguli, K. K., Gulati, S., Serra, X., & Rao, P. (2016). Data-driven exploration of melodic structures in Hindustani music. InISMIR, pp. 605-611.

§ Gulati, S., Serrà, J., Ishwar, V., Şentürk, S., & Serra, X. (2016). Phrase-based rāga recognition using vector space modeling. InICASSP, pp. 66–70.

§ Gulati, S., Serrà, J., Ishwar, V., & Serra, X. (2016). Discovering rāga motifs by characterizing communities in networks ofmelodic patterns. In ICASSP, pp. 286–290.

§ Gulati, S., Serrà, J., & Serra, X. (2015). Improving melodic similarity in Indian art music using culture-specific melodiccharacteristics. In ISMIR, pp. 680–686.

§ Gulati, S., Serrà, J., & Serra, X. (2015). An evaluation of methodologies for melodic similarity in audio recordings of Indian artmusic. In ICASSP, pp. 678–682.

196

BaN UOM U[ _ Ne T 3a T[§ Gulati, S., Serrà, J., Ishwar, V., & Serra, X. (2014). Mining melodic patterns in large audio collections of Indian art music. In

SITIS-MIRA, pp. 264–271.

§ Gulati, S., Serrà, J., Ganguli, K. K., & Serra, X. (2014). Landmark detection in Hindustani music melodies. In ICMC-SMC,

pp. 1062-1068.

§ Srinivasamurthy, A., Koduri, G. K., Gulati, S., Ishwar, V., & Serra, X. (2014). Corpora for Music Information Research in

Indian Art Music. In ICMC-SMC, pp. 1029-1036.

§ Şentürk, S., Gulati, S., & Serra, X. (2014). Towards alignment of score and audio recordings of Ottoman-Turkish makam

music. In FMA.

§ Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J., & Serra, X. (2013).

Essentia: an audio analysis library for music information retrieval. In ISMIR, pp. 493–498.

§ Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J., & Serra, X. (2013).

ESSENTIA: an open-source library for sound and music analysis. In ACM Multimedia, pp. 855–858.

§ Şentürk, S., Gulati, S., & Serra, X. (2013). Score informed tonic identification for makam music of Turkey. ISMIR, pp. 175–

180.

§ Sordo, M., Koduri, G. K., Şentürk, S., Gulati, S., & Serra, X. (2012). A musically aware system for browsing and interacting

with audio music collections. In CompMusic Workshop.

§ Koduri, G. K., Gulati, S., & Rao, P. (2011). A survey of rāga recognition techniques and improvements to the state-of-the-art.

SMC.197

BaN UOM U[ _ Ne T 3a T[

§ A T O[ UNa U[ _ [ O[ R O _ +§ Gulati, S., Serrà, J., Ishwar, V., & Serra, X. (2014). Melodic Pattern Extraction in Large Collections of Music

Recordings Using Time Series Mining Techniques. In Late-Breaking Demo Session of ISMIR.

§ Gulati, S., Ganguli, K. K., Gupta, S., Srinivasamurthy, A., & Serra, X. (2015). Rāgawise: A Lightweight Real-time

Raga Recognition System for Indian Art Music. In Late-Breaking Demo Session of ISMIR.

§ Caro, R., Srinivasamurthy, A., Gulati, S., & Serra, X. (2014). Jingju music: Concepts and Computational Tools for its

Analysis. A Tutorial in ISMIR.

198

5[Y a M U[ M 3 [MOT _ R[ [PUO6 _O U U[ U PUM 3 a_UO 5[ [ M

.6[O [ M ET _U_ B _ M U[

a_UO E OT [ [Se 9 [a6 M Y [R R[ YM U[ M P 5[YYa UOM U[ E OT [ [SU _

F Ub _U M B[Y a MN M 4M O [ M D MU

ET _U_ 6U O [ 1 ET _U_ O[YYU Y YN _1

.

)/ [b YN ().

Computational Approaches for Melodic Description in Indian Art Music Corpora

Education

Transcript of Computational Approaches for Melodic Description in Indian Art Music Corpora