July 6, 2012 - HOME | CompMusic€¦ · Spectral flux of unnormalised spectra Music Segment...

26
Applause Identification and its relevance to Archival of Carnatic Music Padi Sarala 1 Vignesh Ishwar 1 Ashwin Bellur 1 Hema A.Murthy 1 1 Computer Science Dept, IIT Madras, India. July 6, 2012 Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

Transcript of July 6, 2012 - HOME | CompMusic€¦ · Spectral flux of unnormalised spectra Music Segment...

  • Applause Identification and its relevance to Archival of Carnatic Music

    Padi Sarala 1 Vignesh Ishwar 1 Ashwin Bellur 1 Hema A.Murthy 1

    1Computer Science Dept, IIT Madras, India.

    July 6, 2012

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Outline of the presentation

    Introduction to Carnatic music concert

    Problem definition

    Feature Extraction

    Spectral flux

    Spectral Entropy

    Characterising the applause using Cumulative sum

    Highlights detection using CUSUM

    Results

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Carnatic music concert (1)

    Carnatic music concert can be 2 to 3 hours long

    Concert consists of various pieces.

    Concert consists of compositions, interlaced with improvisational aspects like

    Raga Alapana, Nereval, Kalpanaswara, Thanam, Sloka, Thani Avarthanam.

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Carnatic music concert (2)

    In a Concert audience applauds the artist after end of piece.

    Some times audience applauds the artist in-between improvisational aspects

    like Raga vocal, Raga violin, After song, Kalpana swara, Thanam, Thani

    Avarthanam.

    Most of the carnatic music recordings which are archived today are

    Manually segmented into pieces.

    Entire recordings are stored as a single recording.

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Applications of Applause Identification

    Existing work on Applause identification

    Manoj et al (2011) , discusses how applause is detected in a continuous

    speech meetings and how it can be used as a key indicator of highlights

    in speech meeting.

    Lie Lu et al (2001) , discusses techniques for audio classification and

    segmenting the audio signal into speech, music, silences, environmental

    sounds like applause, laughter etc and these segments can be used as

    an index for audio retrieval.

    Z. Xiong et al (2003), discusses how applause is detected for

    determining the highlights of the game.

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Problem Definition

    Identifying the applauses in a given carnatic music concert using spectraldomain features.

    Concert can be automatically segmented into individual pieces forarchival purpose.

    Finding duration and strength of an applause using CUSUM technique.We can determine the highlights of the concert.

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Characteristics of Applause and Music

    -10000.0

    -6000.0

    -2000.0

    2000.0

    6000.0

    10000.0

    0 0.2 0.4 0.6 0.8 1

    Ampli

    tude

    -10000.0

    -6000.0

    -2000.0

    2000.0

    6000.0

    10000.0

    0 0.2 0.4 0.6 0.8 1

    -10000.0

    -6000.0

    -2000.0

    2000.0

    6000.0

    10000.0

    0 0.2 0.4 0.6 0.8 1

    Ampli

    tude

    Time in seconds

    -10000.0

    -6000.0

    -2000.0

    2000.0

    6000.0

    10000.0

    0 0.2 0.4 0.6 0.8 1

    Time in seconds

    -30000.0

    -20000.0

    -10000.0

    0.0

    10000.0

    20000.0

    30000.0

    0 0.2 0.4 0.6 0.8 1

    Am

    plitu

    de

    -30000.0

    -20000.0

    -10000.0

    0.0

    10000.0

    20000.0

    30000.0

    0 0.2 0.4 0.6 0.8 1

    -30000.0

    -20000.0

    -10000.0

    0.0

    10000.0

    20000.0

    30000.0

    0 0.2 0.4 0.6 0.8 1

    Am

    plitu

    de

    Time in seconds

    -30000.0

    -20000.0

    -10000.0

    0.0

    10000.0

    20000.0

    30000.0

    0 0.2 0.4 0.6 0.8 1

    Time in seconds

    Figure: Typical sequence of applause and music segments(time domain)

    In time domain applause segment is rhythmic not structured but

    corresponding to music it is more structured.

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Characteristics of Applause and Music

    0.0

    20.0

    40.0

    60.0

    80.0

    0 2000 4000 6000 8000

    Log M

    agnit

    ude (

    dB)

    0.0

    20.0

    40.0

    60.0

    80.0

    0 2000 4000 6000 8000

    0.0

    20.0

    40.0

    60.0

    80.0

    0 2000 4000 6000 8000

    Log M

    agnit

    ude (

    dB)

    Frequency in Hz

    0.0

    20.0

    40.0

    60.0

    80.0

    0 2000 4000 6000 8000

    Frequency in Hz

    0.0

    20.0

    40.0

    60.0

    80.0

    0 2000 4000 6000 8000

    Log M

    agnit

    ude (

    dB)

    0.0

    20.0

    40.0

    60.0

    80.0

    0 2000 4000 6000 8000

    0.0

    20.0

    40.0

    60.0

    80.0

    0 2000 4000 6000 8000

    Log M

    agnit

    ude (

    dB)

    Frequency in Hz

    0.0

    20.0

    40.0

    60.0

    80.0

    0 2000 4000 6000 8000

    Frequency in Hz

    Figure: Typical sequence of applause and music segments(spectral domain)

    Power spectrum of applause is flat whereas spectrum of music is structured.

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Feature Extraction

    Selecting a good feature for classification or segmentation is crucial task.

    Most of the audio signals spectral properties change slowly with respect to

    time.

    To discriminate between music and applause the following features are used.

    Spectral flux

    Spectral entropy

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Spectral flux (1)

    Spectral flux (SF), also called spectral variation, characterises the change inspectra between adjacent two frames of speech signal.

    It measures how quickly the power spectrum changes.

    SF [n] =∫ω

    (| Xn(ω) | − | Xn+1(ω) |)2dω (1)

    where Xn(w) is the magnitude spectrum of nth frame of an audio signal.

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Spectra flux (2)

    Different Normalisations of Spectral flux are:

    1 Spectral flux with no normalisation.

    2 Power spectral density normalisation: In this approach XNormn(ω) is

    defined:

    XNormn(ω) =Xn(ω)∫

    ωXn(ω)dω

    (2)

    3 Peak normalisation: In this approach XNormn(ω) is defined as:

    XNormn(ω) =Xn(ω)

    maxω(Xn(ω))(3)

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Spectral flux (3)

    0 100 200 300 400 500 600 700 8000

    0.5

    1

    1.5

    2

    2.5x 10

    9

    Time in Seconds

    Spe

    ctra

    l flu

    x of

    unn

    orm

    alis

    ed s

    pect

    ra

    Music Segment

    Applause Segment

    0 100 200 300 400 500 600 700 8000

    1

    2

    x 10−4

    Time in Seconds

    Spe

    ctra

    l flu

    x of

    Pow

    er S

    pect

    ral D

    ensi

    ty N

    orm

    alis

    atio

    n

    Music SegmentAppaluse Segment

    0 100 200 300 400 500 600 700 8000

    0.002

    0.004

    0.006

    0.008

    0.01

    0.012

    0.014

    0.016

    0.018

    Time in Seconds

    Spe

    ctra

    l flu

    x of

    Pea

    k N

    orm

    alis

    ed S

    pect

    ra

    Applause Segment

    Music Segment

    Figure: Different Normalisations of Spectral flux

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Spectral Entropy (1)

    Spectral Entropy (SE) is the measure of randomness of a system. Shannonsentropy of a discrete stochastic variable X with probability mass function isgiven by

    H(X) = −N∑

    i=1

    p(xi)log2 [p(xi)] (4)

    PSDn(ω) =| Xn(ω) |

    2

    ω| Xn(ω) |

    2dω

    SE [n] = −∫

    ω

    PSDn(ω) log PSDn(ω)dω (5)

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Spectral Entropy (2)

    0 100 200 300 400 500 600 700 8000

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    Time in Seconds

    Spe

    ctra

    l Ent

    ropy

    Applause Segment

    Music Segment

    Figure: Spectral entropy of music signal

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Database Used

    19 Concerts of male and female singers are taken for experiments.

    All concerts are Vocal, in that lead musician is a singer.

    Each concert has 15-20 applauses resulting a total of 343 applauses.

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Experimental analysis

    For 19 concerts Spectral flux and Spectral entropy features are extracted for a frame of0.25 s duration with a overlap of 0.01 s with a sampling frequency of 44.1KHz.

    Extracted features are smoothed by a rectangular moving average filter of length 15.

    For all concerts applause locations and type of applauses are marked manually by amusician.

    Based on the ground truth DET curve and Equal Error Rates (EER) are calculated forall above extracted features.

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Experimental Analysis

    DET Curve is plotted for Applause detection for various thresholds.The Equal error rates(EER) are given in Table.

    1 2 5 10 20 40 60 80 1

    2

    5

    10

    20

    40

    60

    80 Applause Detection Performance

    False Alarm probability (in %)

    Miss

    pro

    babil

    ity (i

    n %

    )

    EntropyfluxnonormfluxnormEER values

    Figure: DET Curve for appaluse detection

    Method EERSpectral Flux (no norm) 44.55 %Spectral Flux 23.33%Spectral Entropy 17.33%

    Table: EER for applause detection

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Introduction to cumulative sum(CUSUM) method

    In case of spectral flux and spectral entropy applause locations areidentified based on threshold.

    It may not be sufficient to determine the duration and strength of anapplause.

    So CUSUM is a non-parametric approach and it can be used toidentify the statistical inhomogeneity of a given signal.

    CUSUM is estimated asLet X [n] be the value of feature extracted at time n,

    Y [n] = X [n]− a

    Cusum[n] ={

    Cusum[n − 1] + Y [n], Y [n] > 00 Otherwise

    If Cusum[n] > Θ, then it suggests that there is a significant structural shift inthe series. The values of ‘a’ and ‘Θ’ have to be estimated empirically andmay vary across different data sets.

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Characterising the applauses using CUSUM (1)

    0 100 200 300 400 500 600 700 8000

    0.2

    0.4

    0.6

    0.8

    1

    Sp

    ect

    ral F

    lux

    0 100 200 300 400 500 600 700 8000

    0.2

    0.4

    0.6

    0.8

    1

    Sp

    ect

    ral F

    lux

    0 100 200 300 400 500 600 700 8000

    0.2

    0.4

    0.6

    0.8

    1

    Time in seconds

    Sp

    ect

    ral E

    ntr

    op

    y

    (a) Spectral flux of unnormalised spectra

    (b) Spectral flux of peak normalised spectra

    (c) Spectral Entropy

    0 100 200 300 400 500 600 700 8000

    2

    4

    6

    8x 10

    10

    Sp

    ectr

    al flu

    x

    0 100 200 300 400 500 600 700 8000

    0.5

    1

    1.5

    2

    Sp

    ectr

    al flu

    x

    0 100 200 300 400 500 600 700 8000

    50

    100

    150

    200

    Time in Seconds

    Sp

    ectr

    al E

    ntr

    op

    y

    (b) Cusum of Spectral flux of peak normalised spectra

    (a) Cusum of Spectral flux of unnormalised spectra

    (a) Cusum of Spectral entropy

    Figure: spectral flux and spectral entropy and their Cusum values

    The CUSUM was computed for both spectral flux and spectral entropy.

    Start of triangle and end of triangle indicates the location and duration of anapplause.

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Characterising the applauses using CUSUM (1)

    0 1 2 3 4 5 6

    x 105

    0

    1

    2

    3

    4

    5

    6

    7x 10

    −3 Spectral flux peak normalisation for Abhishek−Meyundi Concert

    Time in Seconds

    Spe

    ctra

    l flu

    x P

    eak

    norm

    alis

    atio

    n

    0 1 2 3 4 5 6

    x 105

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5Spectral Entropy Values for Abhishek−Meyundi Concert

    Time in Seconds

    Spe

    ctra

    l Ent

    ropy

    0 1 2 3 4 5 6

    x 105

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5Cusum of Spectral flux Peak normalisation for Abhishek−Meyundi Concert

    Time in Seconds

    Cus

    um o

    f Spe

    ctra

    l flu

    x P

    eak

    Nor

    mal

    isat

    ion

    0 1 2 3 4 5 6

    x 105

    0

    100

    200

    300

    400

    500

    600Cusum of Spectral Entropy for Abhishek−Meyundi Concert

    Time in Seconds

    Cus

    um o

    f Spe

    ctra

    l Ent

    ropy

    val

    ues

    Figure: Detecting the Applause locations based on Threshold and Cusum values

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • CUSUM values for whole concert

    0 1 2 3 4 5 6

    x 105

    0

    100

    200

    300

    400

    500

    600Applause Detection for Abhishek−Meyundi Concert

    Time in Seconds

    Cu

    su

    m o

    f S

    pe

    ctr

    al E

    ntr

    op

    y

    0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

    200

    400

    600

    800

    1000

    1200Applause Detection for Abhishek−Meyundi Concert

    Time in Seconds

    Cu

    su

    m T

    ria

    ng

    les fo

    r S

    pe

    ctr

    al E

    ntr

    op

    y

    Figure: CUSUM values for whole concert

    Figure consists of a sequence of CUSUM triangles (for a carefully chosen

    value of a) for the entire piece.

    We can see around 22 triangles in which 20 triangles are applauses.

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Highlights of carnatic music concert using CUSUM values

    CUSUM values of spectral flux and spectral entropy determines the duration

    and strength of an applause.

    Based on duration and strength of an applause we have taken top 3

    highlights for all 19 concerts.

    Table shows the highlights of all 19 concerts using above features.

    SNO Highlights

    1 Taniavrthanam

    2 Raga Alapana of main song

    3 Within Taniavarthanam

    4 Tanam

    5 Swaram of main song

    6 Raga alapana vocal

    7 Raga alapana violin

    8 Alapana of RTP

    9 Nereval

    10 Main song

    11 Varnam

    12 Pallavi

    Table: Highlights of concerts using CUSUM

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • Conclusion

    Most of the carnatic music recordings which are archived today areManually segmented into pieces.Entire recordings are stored as single recording.

    Because of locating the applauses in a concertWe can automatically segment the concert into pieces for archivalpurpose.Duration and strength of an applause are used to determine thehighlights of the concert.

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • References

    1. Manoj C, Magesh S, Sankaran M S, and Manikandan M S.“ A novel approach fordetecting applause in continuous meeting” . In IEEE International Conference onElectronics and Computer Technology, pages 182–186, India, April 2011.

    2. Lie Lu, Hao Jiang, and HongJiang Zhang. “A robust audio classification andsegmentation method” . In International ACM Multimedia Conference, pages203–211, Canada, September 2001.

    3. M J. Carey, E. S. Parris, and H. L Loyd-Thomas, “ A comparison of features forspeech and music discrimination” , in proceedings of IEEE Int. Conf, Acoust.,Speech, and Signal Processing, vol. 1, march 1999, pp. 149-152.

    4. J. O. Roman Jarina, “ A discriminative feature selection for applause soundsdetection” , in Proc. 8th Int. Workshop on Image Analysis for Multimedia interactiveService, 2007.

    5. A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki, “ The det curvein assessment of detection task performance” , 1997, pp.1895-1898.

    6. Z. Xiong, R. Radhakrishnan, A. Divakaran, and T. S.Huang, “Audio events detectionbased highlights extraction from baseball, golf and soccer games in a unifiedframework ,” in Golf and Soccer Games in A Unified Framework, ICASSP 2003, pp.401404.

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • References

    7. B E Brodsky and B S Darkhovsky. “Non-parametric Methods in change-pointproblems” . Kluwer Academic Publishers, New York, 1993.

    8. T M Krishna. “Kalpita sangita, Kalpana sangita and Manodharma” . PrivateCommunication, 2007-2011.

    9. H Liu and M S Kim. “Real-time detection of stealthy ddos attacks usingtime-series decomposition” . In ICC, pages 1 – 6, Bangalore, India, July 2010.

    10. Lawrence R Rabiner and Ronald W Schafer. “Theory and applications of digitalspeech processing” . Pearson International, Upper Saddle River, New Jersey, 2011.

    11. H Wang, D Zhang, and K Shin. “Syn-dog: Sniffing syn flooding sources” . InICDCS, pages 421 – 428, Bangalore, India, July 2002.

    12. Tong Zhang. “Automatic singer identification” . In Multimedia and Expo, 2003. ICME’03. Proceedings. 2003 International Conference on, volume 1, pages I – 33–6 vol.1,july 2003.

    13. Jouni Paulus.”Improving markov model based music piece structure labell ingwith acoustic information” . In International Society for Music Information RetrievalConference, pages 303–308, August 2010.

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

  • THANK YOU

    Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop