July 6, 2012 - HOME | CompMusic€¦ · Spectral flux of unnormalised spectra Music Segment...

Applause Identification and its relevance to Archival of Carnatic Music

Padi Sarala 1 Vignesh Ishwar 1 Ashwin Bellur 1 Hema A.Murthy 1

1Computer Science Dept, IIT Madras, India.

July 6, 2012

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

Outline of the presentation

Introduction to Carnatic music concert

Problem definition

Feature Extraction

Spectral flux

Spectral Entropy

Characterising the applause using Cumulative sum

Highlights detection using CUSUM

Results


Carnatic music concert (1)

Carnatic music concert can be 2 to 3 hours long

Concert consists of various pieces.

Concert consists of compositions, interlaced with improvisational aspects like

Raga Alapana, Nereval, Kalpanaswara, Thanam, Sloka, Thani Avarthanam.


Carnatic music concert (2)

In a Concert audience applauds the artist after end of piece.

Some times audience applauds the artist in-between improvisational aspects

like Raga vocal, Raga violin, After song, Kalpana swara, Thanam, Thani

Avarthanam.

Most of the carnatic music recordings which are archived today are

Manually segmented into pieces.

Entire recordings are stored as a single recording.


Applications of Applause Identification

Existing work on Applause identification

Manoj et al (2011) , discusses how applause is detected in a continuous

speech meetings and how it can be used as a key indicator of highlights

in speech meeting.

Lie Lu et al (2001) , discusses techniques for audio classification and

segmenting the audio signal into speech, music, silences, environmental

sounds like applause, laughter etc and these segments can be used as

an index for audio retrieval.

Z. Xiong et al (2003), discusses how applause is detected for

determining the highlights of the game.


Problem Definition

Identifying the applauses in a given carnatic music concert using spectraldomain features.

Concert can be automatically segmented into individual pieces forarchival purpose.

Finding duration and strength of an applause using CUSUM technique.We can determine the highlights of the concert.


Characteristics of Applause and Music

-10000.0

-6000.0

-2000.0

2000.0

6000.0

10000.0

0 0.2 0.4 0.6 0.8 1

Ampli

tude

-10000.0

-6000.0

-2000.0

2000.0

6000.0

10000.0

0 0.2 0.4 0.6 0.8 1

-10000.0

-6000.0

-2000.0

2000.0

6000.0

10000.0

0 0.2 0.4 0.6 0.8 1

Ampli

tude

Time in seconds

-10000.0

-6000.0

-2000.0

2000.0

6000.0

10000.0

0 0.2 0.4 0.6 0.8 1

Time in seconds

-30000.0

-20000.0

-10000.0

0.0

10000.0

20000.0

30000.0

0 0.2 0.4 0.6 0.8 1

Am

plitu

de

-30000.0

-20000.0

-10000.0

0.0

10000.0

20000.0

30000.0

0 0.2 0.4 0.6 0.8 1

-30000.0

-20000.0

-10000.0

0.0

10000.0

20000.0

30000.0

0 0.2 0.4 0.6 0.8 1

Am

plitu

de

Time in seconds

-30000.0

-20000.0

-10000.0

0.0

10000.0

20000.0

30000.0

0 0.2 0.4 0.6 0.8 1

Time in seconds

Figure: Typical sequence of applause and music segments(time domain)

In time domain applause segment is rhythmic not structured but

corresponding to music it is more structured.


Characteristics of Applause and Music

0.0

20.0

40.0

60.0

80.0

0 2000 4000 6000 8000

Log M

agnit

ude (

dB)

0.0

20.0

40.0

60.0

80.0

0 2000 4000 6000 8000

0.0

20.0

40.0

60.0

80.0

0 2000 4000 6000 8000

Log M

agnit

ude (

dB)

Frequency in Hz

0.0

20.0

40.0

60.0

80.0

0 2000 4000 6000 8000

Frequency in Hz

0.0

20.0

40.0

60.0

80.0

0 2000 4000 6000 8000

Log M

agnit

ude (

dB)

0.0

20.0

40.0

60.0

80.0

0 2000 4000 6000 8000

0.0

20.0

40.0

60.0

80.0

0 2000 4000 6000 8000

Log M

agnit

ude (

dB)

Frequency in Hz

0.0

20.0

40.0

60.0

80.0

0 2000 4000 6000 8000

Frequency in Hz

Figure: Typical sequence of applause and music segments(spectral domain)

Power spectrum of applause is flat whereas spectrum of music is structured.


Feature Extraction

Selecting a good feature for classification or segmentation is crucial task.

Most of the audio signals spectral properties change slowly with respect to

time.

To discriminate between music and applause the following features are used.

Spectral flux

Spectral entropy


Spectral flux (1)

Spectral flux (SF), also called spectral variation, characterises the change inspectra between adjacent two frames of speech signal.

It measures how quickly the power spectrum changes.

SF [n] =∫ω

(| Xn(ω) | − | Xn+1(ω) |)2dω (1)

where Xn(w) is the magnitude spectrum of nth frame of an audio signal.


Spectra flux (2)

Different Normalisations of Spectral flux are:

1 Spectral flux with no normalisation.

2 Power spectral density normalisation: In this approach XNormn(ω) is

defined:

XNormn(ω) =Xn(ω)∫

ωXn(ω)dω

(2)

3 Peak normalisation: In this approach XNormn(ω) is defined as:

XNormn(ω) =Xn(ω)

maxω(Xn(ω))(3)


Spectral flux (3)

0 100 200 300 400 500 600 700 8000

0.5

1

1.5

2

2.5x 10

9

Time in Seconds

Spe

ctra

l flu

x of

unn

orm

alis

ed s

pect

ra

Music Segment

Applause Segment

0 100 200 300 400 500 600 700 8000

1

2

x 10−4

Time in Seconds

Spe

ctra

l flu

x of

Pow

er S

pect

ral D

ensi

ty N

orm

alis

atio

n

Music SegmentAppaluse Segment

0 100 200 300 400 500 600 700 8000

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

Time in Seconds

Spe

ctra

l flu

x of

Pea

k N

orm

alis

ed S

pect

ra

Applause Segment

Music Segment

Figure: Different Normalisations of Spectral flux


Spectral Entropy (1)

Spectral Entropy (SE) is the measure of randomness of a system. Shannonsentropy of a discrete stochastic variable X with probability mass function isgiven by

H(X) = −N∑

i=1

p(xi)log2 [p(xi)] (4)

PSDn(ω) =| Xn(ω) |

2

∫

ω| Xn(ω) |

2dω

SE [n] = −∫

ω

PSDn(ω) log PSDn(ω)dω (5)


Spectral Entropy (2)

0 100 200 300 400 500 600 700 8000

0.5

1

1.5

2

2.5

3

3.5

4

Time in Seconds

Spe

ctra

l Ent

ropy

Applause Segment

Music Segment

Figure: Spectral entropy of music signal


Database Used

19 Concerts of male and female singers are taken for experiments.

All concerts are Vocal, in that lead musician is a singer.

Each concert has 15-20 applauses resulting a total of 343 applauses.


Experimental analysis

For 19 concerts Spectral flux and Spectral entropy features are extracted for a frame of0.25 s duration with a overlap of 0.01 s with a sampling frequency of 44.1KHz.

Extracted features are smoothed by a rectangular moving average filter of length 15.

For all concerts applause locations and type of applauses are marked manually by amusician.

Based on the ground truth DET curve and Equal Error Rates (EER) are calculated forall above extracted features.


Experimental Analysis

DET Curve is plotted for Applause detection for various thresholds.The Equal error rates(EER) are given in Table.

1 2 5 10 20 40 60 80 1

2

5

10

20

40

60

80 Applause Detection Performance

False Alarm probability (in %)

Miss

pro

babil

ity (i

n %

)

EntropyfluxnonormfluxnormEER values

Figure: DET Curve for appaluse detection

Method EERSpectral Flux (no norm) 44.55 %Spectral Flux 23.33%Spectral Entropy 17.33%

Table: EER for applause detection


Introduction to cumulative sum(CUSUM) method

In case of spectral flux and spectral entropy applause locations areidentified based on threshold.

It may not be sufficient to determine the duration and strength of anapplause.

So CUSUM is a non-parametric approach and it can be used toidentify the statistical inhomogeneity of a given signal.

CUSUM is estimated asLet X [n] be the value of feature extracted at time n,

Y [n] = X [n]− a

Cusum[n] ={

Cusum[n − 1] + Y [n], Y [n] > 00 Otherwise

If Cusum[n] > Θ, then it suggests that there is a significant structural shift inthe series. The values of ‘a’ and ‘Θ’ have to be estimated empirically andmay vary across different data sets.


Characterising the applauses using CUSUM (1)

0 100 200 300 400 500 600 700 8000

0.2

0.4

0.6

0.8

1

Sp

ect

ral F

lux

0 100 200 300 400 500 600 700 8000

0.2

0.4

0.6

0.8

1

Sp

ect

ral F

lux

0 100 200 300 400 500 600 700 8000

0.2

0.4

0.6

0.8

1

Time in seconds

Sp

ect

ral E

ntr

op

y

(a) Spectral flux of unnormalised spectra

(b) Spectral flux of peak normalised spectra

(c) Spectral Entropy

0 100 200 300 400 500 600 700 8000

2

4

6

8x 10

10

Sp

ectr

al flu

x

0 100 200 300 400 500 600 700 8000

0.5

1

1.5

2

Sp

ectr

al flu

x

0 100 200 300 400 500 600 700 8000

50

100

150

200

Time in Seconds

Sp

ectr

al E

ntr

op

y

(b) Cusum of Spectral flux of peak normalised spectra

(a) Cusum of Spectral flux of unnormalised spectra

(a) Cusum of Spectral entropy

Figure: spectral flux and spectral entropy and their Cusum values

The CUSUM was computed for both spectral flux and spectral entropy.

Start of triangle and end of triangle indicates the location and duration of anapplause.


Characterising the applauses using CUSUM (1)

0 1 2 3 4 5 6

x 105

0

1

2

3

4

5

6

7x 10

−3 Spectral flux peak normalisation for Abhishek−Meyundi Concert

Time in Seconds

Spe

ctra

l flu

x P

eak

norm

alis

atio

n

0 1 2 3 4 5 6

x 105

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5Spectral Entropy Values for Abhishek−Meyundi Concert

Time in Seconds

Spe

ctra

l Ent

ropy

0 1 2 3 4 5 6

x 105

0

0.5

1

1.5

2

2.5

3

3.5Cusum of Spectral flux Peak normalisation for Abhishek−Meyundi Concert

Time in Seconds

Cus

um o

f Spe

ctra

l flu

x P

eak

Nor

mal

isat

ion

0 1 2 3 4 5 6

x 105

0

100

200

300

400

500

600Cusum of Spectral Entropy for Abhishek−Meyundi Concert

Time in Seconds

Cus

um o

f Spe

ctra

l Ent

ropy

val

ues

Figure: Detecting the Applause locations based on Threshold and Cusum values


CUSUM values for whole concert

0 1 2 3 4 5 6

x 105

0

100

200

300

400

500

600Applause Detection for Abhishek−Meyundi Concert

Time in Seconds

Cu

su

m o

f S

pe

ctr

al E

ntr

op

y

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

200

400

600

800

1000

1200Applause Detection for Abhishek−Meyundi Concert

Time in Seconds

Cu

su

m T

ria

ng

les fo

r S

pe

ctr

al E

ntr

op

y

Figure: CUSUM values for whole concert

Figure consists of a sequence of CUSUM triangles (for a carefully chosen

value of a) for the entire piece.

We can see around 22 triangles in which 20 triangles are applauses.


Highlights of carnatic music concert using CUSUM values

CUSUM values of spectral flux and spectral entropy determines the duration

and strength of an applause.

Based on duration and strength of an applause we have taken top 3

highlights for all 19 concerts.

Table shows the highlights of all 19 concerts using above features.

SNO Highlights

1 Taniavrthanam

2 Raga Alapana of main song

3 Within Taniavarthanam

4 Tanam

5 Swaram of main song

6 Raga alapana vocal

7 Raga alapana violin

8 Alapana of RTP

9 Nereval

10 Main song

11 Varnam

12 Pallavi

Table: Highlights of concerts using CUSUM


Conclusion

Most of the carnatic music recordings which are archived today areManually segmented into pieces.Entire recordings are stored as single recording.

Because of locating the applauses in a concertWe can automatically segment the concert into pieces for archivalpurpose.Duration and strength of an applause are used to determine thehighlights of the concert.


References

1. Manoj C, Magesh S, Sankaran M S, and Manikandan M S.“ A novel approach fordetecting applause in continuous meeting” . In IEEE International Conference onElectronics and Computer Technology, pages 182–186, India, April 2011.

2. Lie Lu, Hao Jiang, and HongJiang Zhang. “A robust audio classification andsegmentation method” . In International ACM Multimedia Conference, pages203–211, Canada, September 2001.

3. M J. Carey, E. S. Parris, and H. L Loyd-Thomas, “ A comparison of features forspeech and music discrimination” , in proceedings of IEEE Int. Conf, Acoust.,Speech, and Signal Processing, vol. 1, march 1999, pp. 149-152.

4. J. O. Roman Jarina, “ A discriminative feature selection for applause soundsdetection” , in Proc. 8th Int. Workshop on Image Analysis for Multimedia interactiveService, 2007.

5. A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki, “ The det curvein assessment of detection task performance” , 1997, pp.1895-1898.

6. Z. Xiong, R. Radhakrishnan, A. Divakaran, and T. S.Huang, “Audio events detectionbased highlights extraction from baseball, golf and soccer games in a unifiedframework ,” in Golf and Soccer Games in A Unified Framework, ICASSP 2003, pp.401404.


References

7. B E Brodsky and B S Darkhovsky. “Non-parametric Methods in change-pointproblems” . Kluwer Academic Publishers, New York, 1993.

8. T M Krishna. “Kalpita sangita, Kalpana sangita and Manodharma” . PrivateCommunication, 2007-2011.

9. H Liu and M S Kim. “Real-time detection of stealthy ddos attacks usingtime-series decomposition” . In ICC, pages 1 – 6, Bangalore, India, July 2010.

10. Lawrence R Rabiner and Ronald W Schafer. “Theory and applications of digitalspeech processing” . Pearson International, Upper Saddle River, New Jersey, 2011.

11. H Wang, D Zhang, and K Shin. “Syn-dog: Sniffing syn flooding sources” . InICDCS, pages 421 – 428, Bangalore, India, July 2002.

12. Tong Zhang. “Automatic singer identification” . In Multimedia and Expo, 2003. ICME’03. Proceedings. 2003 International Conference on, volume 1, pages I – 33–6 vol.1,july 2003.

13. Jouni Paulus.”Improving markov model based music piece structure labell ingwith acoustic information” . In International Society for Music Information RetrievalConference, pages 303–308, August 2010.


THANK YOU


July 6, 2012 - HOME | CompMusic€¦ · Spectral flux of unnormalised spectra Music Segment...

Documents

Transcript of July 6, 2012 - HOME | CompMusic€¦ · Spectral flux of unnormalised spectra Music Segment...