July 6, 2012 - HOME | CompMusic€¦ · Spectral flux of unnormalised spectra Music Segment...
Transcript of July 6, 2012 - HOME | CompMusic€¦ · Spectral flux of unnormalised spectra Music Segment...
-
Applause Identification and its relevance to Archival of Carnatic Music
Padi Sarala 1 Vignesh Ishwar 1 Ashwin Bellur 1 Hema A.Murthy 1
1Computer Science Dept, IIT Madras, India.
July 6, 2012
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Outline of the presentation
Introduction to Carnatic music concert
Problem definition
Feature Extraction
Spectral flux
Spectral Entropy
Characterising the applause using Cumulative sum
Highlights detection using CUSUM
Results
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Carnatic music concert (1)
Carnatic music concert can be 2 to 3 hours long
Concert consists of various pieces.
Concert consists of compositions, interlaced with improvisational aspects like
Raga Alapana, Nereval, Kalpanaswara, Thanam, Sloka, Thani Avarthanam.
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Carnatic music concert (2)
In a Concert audience applauds the artist after end of piece.
Some times audience applauds the artist in-between improvisational aspects
like Raga vocal, Raga violin, After song, Kalpana swara, Thanam, Thani
Avarthanam.
Most of the carnatic music recordings which are archived today are
Manually segmented into pieces.
Entire recordings are stored as a single recording.
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Applications of Applause Identification
Existing work on Applause identification
Manoj et al (2011) , discusses how applause is detected in a continuous
speech meetings and how it can be used as a key indicator of highlights
in speech meeting.
Lie Lu et al (2001) , discusses techniques for audio classification and
segmenting the audio signal into speech, music, silences, environmental
sounds like applause, laughter etc and these segments can be used as
an index for audio retrieval.
Z. Xiong et al (2003), discusses how applause is detected for
determining the highlights of the game.
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Problem Definition
Identifying the applauses in a given carnatic music concert using spectraldomain features.
Concert can be automatically segmented into individual pieces forarchival purpose.
Finding duration and strength of an applause using CUSUM technique.We can determine the highlights of the concert.
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Characteristics of Applause and Music
-10000.0
-6000.0
-2000.0
2000.0
6000.0
10000.0
0 0.2 0.4 0.6 0.8 1
Ampli
tude
-10000.0
-6000.0
-2000.0
2000.0
6000.0
10000.0
0 0.2 0.4 0.6 0.8 1
-10000.0
-6000.0
-2000.0
2000.0
6000.0
10000.0
0 0.2 0.4 0.6 0.8 1
Ampli
tude
Time in seconds
-10000.0
-6000.0
-2000.0
2000.0
6000.0
10000.0
0 0.2 0.4 0.6 0.8 1
Time in seconds
-30000.0
-20000.0
-10000.0
0.0
10000.0
20000.0
30000.0
0 0.2 0.4 0.6 0.8 1
Am
plitu
de
-30000.0
-20000.0
-10000.0
0.0
10000.0
20000.0
30000.0
0 0.2 0.4 0.6 0.8 1
-30000.0
-20000.0
-10000.0
0.0
10000.0
20000.0
30000.0
0 0.2 0.4 0.6 0.8 1
Am
plitu
de
Time in seconds
-30000.0
-20000.0
-10000.0
0.0
10000.0
20000.0
30000.0
0 0.2 0.4 0.6 0.8 1
Time in seconds
Figure: Typical sequence of applause and music segments(time domain)
In time domain applause segment is rhythmic not structured but
corresponding to music it is more structured.
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Characteristics of Applause and Music
0.0
20.0
40.0
60.0
80.0
0 2000 4000 6000 8000
Log M
agnit
ude (
dB)
0.0
20.0
40.0
60.0
80.0
0 2000 4000 6000 8000
0.0
20.0
40.0
60.0
80.0
0 2000 4000 6000 8000
Log M
agnit
ude (
dB)
Frequency in Hz
0.0
20.0
40.0
60.0
80.0
0 2000 4000 6000 8000
Frequency in Hz
0.0
20.0
40.0
60.0
80.0
0 2000 4000 6000 8000
Log M
agnit
ude (
dB)
0.0
20.0
40.0
60.0
80.0
0 2000 4000 6000 8000
0.0
20.0
40.0
60.0
80.0
0 2000 4000 6000 8000
Log M
agnit
ude (
dB)
Frequency in Hz
0.0
20.0
40.0
60.0
80.0
0 2000 4000 6000 8000
Frequency in Hz
Figure: Typical sequence of applause and music segments(spectral domain)
Power spectrum of applause is flat whereas spectrum of music is structured.
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Feature Extraction
Selecting a good feature for classification or segmentation is crucial task.
Most of the audio signals spectral properties change slowly with respect to
time.
To discriminate between music and applause the following features are used.
Spectral flux
Spectral entropy
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Spectral flux (1)
Spectral flux (SF), also called spectral variation, characterises the change inspectra between adjacent two frames of speech signal.
It measures how quickly the power spectrum changes.
SF [n] =∫ω
(| Xn(ω) | − | Xn+1(ω) |)2dω (1)
where Xn(w) is the magnitude spectrum of nth frame of an audio signal.
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Spectra flux (2)
Different Normalisations of Spectral flux are:
1 Spectral flux with no normalisation.
2 Power spectral density normalisation: In this approach XNormn(ω) is
defined:
XNormn(ω) =Xn(ω)∫
ωXn(ω)dω
(2)
3 Peak normalisation: In this approach XNormn(ω) is defined as:
XNormn(ω) =Xn(ω)
maxω(Xn(ω))(3)
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Spectral flux (3)
0 100 200 300 400 500 600 700 8000
0.5
1
1.5
2
2.5x 10
9
Time in Seconds
Spe
ctra
l flu
x of
unn
orm
alis
ed s
pect
ra
Music Segment
Applause Segment
0 100 200 300 400 500 600 700 8000
1
2
x 10−4
Time in Seconds
Spe
ctra
l flu
x of
Pow
er S
pect
ral D
ensi
ty N
orm
alis
atio
n
Music SegmentAppaluse Segment
0 100 200 300 400 500 600 700 8000
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
Time in Seconds
Spe
ctra
l flu
x of
Pea
k N
orm
alis
ed S
pect
ra
Applause Segment
Music Segment
Figure: Different Normalisations of Spectral flux
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Spectral Entropy (1)
Spectral Entropy (SE) is the measure of randomness of a system. Shannonsentropy of a discrete stochastic variable X with probability mass function isgiven by
H(X) = −N∑
i=1
p(xi)log2 [p(xi)] (4)
PSDn(ω) =| Xn(ω) |
2
∫
ω| Xn(ω) |
2dω
SE [n] = −∫
ω
PSDn(ω) log PSDn(ω)dω (5)
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Spectral Entropy (2)
0 100 200 300 400 500 600 700 8000
0.5
1
1.5
2
2.5
3
3.5
4
Time in Seconds
Spe
ctra
l Ent
ropy
Applause Segment
Music Segment
Figure: Spectral entropy of music signal
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Database Used
19 Concerts of male and female singers are taken for experiments.
All concerts are Vocal, in that lead musician is a singer.
Each concert has 15-20 applauses resulting a total of 343 applauses.
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Experimental analysis
For 19 concerts Spectral flux and Spectral entropy features are extracted for a frame of0.25 s duration with a overlap of 0.01 s with a sampling frequency of 44.1KHz.
Extracted features are smoothed by a rectangular moving average filter of length 15.
For all concerts applause locations and type of applauses are marked manually by amusician.
Based on the ground truth DET curve and Equal Error Rates (EER) are calculated forall above extracted features.
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Experimental Analysis
DET Curve is plotted for Applause detection for various thresholds.The Equal error rates(EER) are given in Table.
1 2 5 10 20 40 60 80 1
2
5
10
20
40
60
80 Applause Detection Performance
False Alarm probability (in %)
Miss
pro
babil
ity (i
n %
)
EntropyfluxnonormfluxnormEER values
Figure: DET Curve for appaluse detection
Method EERSpectral Flux (no norm) 44.55 %Spectral Flux 23.33%Spectral Entropy 17.33%
Table: EER for applause detection
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Introduction to cumulative sum(CUSUM) method
In case of spectral flux and spectral entropy applause locations areidentified based on threshold.
It may not be sufficient to determine the duration and strength of anapplause.
So CUSUM is a non-parametric approach and it can be used toidentify the statistical inhomogeneity of a given signal.
CUSUM is estimated asLet X [n] be the value of feature extracted at time n,
Y [n] = X [n]− a
Cusum[n] ={
Cusum[n − 1] + Y [n], Y [n] > 00 Otherwise
If Cusum[n] > Θ, then it suggests that there is a significant structural shift inthe series. The values of ‘a’ and ‘Θ’ have to be estimated empirically andmay vary across different data sets.
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Characterising the applauses using CUSUM (1)
0 100 200 300 400 500 600 700 8000
0.2
0.4
0.6
0.8
1
Sp
ect
ral F
lux
0 100 200 300 400 500 600 700 8000
0.2
0.4
0.6
0.8
1
Sp
ect
ral F
lux
0 100 200 300 400 500 600 700 8000
0.2
0.4
0.6
0.8
1
Time in seconds
Sp
ect
ral E
ntr
op
y
(a) Spectral flux of unnormalised spectra
(b) Spectral flux of peak normalised spectra
(c) Spectral Entropy
0 100 200 300 400 500 600 700 8000
2
4
6
8x 10
10
Sp
ectr
al flu
x
0 100 200 300 400 500 600 700 8000
0.5
1
1.5
2
Sp
ectr
al flu
x
0 100 200 300 400 500 600 700 8000
50
100
150
200
Time in Seconds
Sp
ectr
al E
ntr
op
y
(b) Cusum of Spectral flux of peak normalised spectra
(a) Cusum of Spectral flux of unnormalised spectra
(a) Cusum of Spectral entropy
Figure: spectral flux and spectral entropy and their Cusum values
The CUSUM was computed for both spectral flux and spectral entropy.
Start of triangle and end of triangle indicates the location and duration of anapplause.
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Characterising the applauses using CUSUM (1)
0 1 2 3 4 5 6
x 105
0
1
2
3
4
5
6
7x 10
−3 Spectral flux peak normalisation for Abhishek−Meyundi Concert
Time in Seconds
Spe
ctra
l flu
x P
eak
norm
alis
atio
n
0 1 2 3 4 5 6
x 105
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5Spectral Entropy Values for Abhishek−Meyundi Concert
Time in Seconds
Spe
ctra
l Ent
ropy
0 1 2 3 4 5 6
x 105
0
0.5
1
1.5
2
2.5
3
3.5Cusum of Spectral flux Peak normalisation for Abhishek−Meyundi Concert
Time in Seconds
Cus
um o
f Spe
ctra
l flu
x P
eak
Nor
mal
isat
ion
0 1 2 3 4 5 6
x 105
0
100
200
300
400
500
600Cusum of Spectral Entropy for Abhishek−Meyundi Concert
Time in Seconds
Cus
um o
f Spe
ctra
l Ent
ropy
val
ues
Figure: Detecting the Applause locations based on Threshold and Cusum values
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
CUSUM values for whole concert
0 1 2 3 4 5 6
x 105
0
100
200
300
400
500
600Applause Detection for Abhishek−Meyundi Concert
Time in Seconds
Cu
su
m o
f S
pe
ctr
al E
ntr
op
y
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
200
400
600
800
1000
1200Applause Detection for Abhishek−Meyundi Concert
Time in Seconds
Cu
su
m T
ria
ng
les fo
r S
pe
ctr
al E
ntr
op
y
Figure: CUSUM values for whole concert
Figure consists of a sequence of CUSUM triangles (for a carefully chosen
value of a) for the entire piece.
We can see around 22 triangles in which 20 triangles are applauses.
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Highlights of carnatic music concert using CUSUM values
CUSUM values of spectral flux and spectral entropy determines the duration
and strength of an applause.
Based on duration and strength of an applause we have taken top 3
highlights for all 19 concerts.
Table shows the highlights of all 19 concerts using above features.
SNO Highlights
1 Taniavrthanam
2 Raga Alapana of main song
3 Within Taniavarthanam
4 Tanam
5 Swaram of main song
6 Raga alapana vocal
7 Raga alapana violin
8 Alapana of RTP
9 Nereval
10 Main song
11 Varnam
12 Pallavi
Table: Highlights of concerts using CUSUM
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
Conclusion
Most of the carnatic music recordings which are archived today areManually segmented into pieces.Entire recordings are stored as single recording.
Because of locating the applauses in a concertWe can automatically segment the concert into pieces for archivalpurpose.Duration and strength of an applause are used to determine thehighlights of the concert.
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
References
1. Manoj C, Magesh S, Sankaran M S, and Manikandan M S.“ A novel approach fordetecting applause in continuous meeting” . In IEEE International Conference onElectronics and Computer Technology, pages 182–186, India, April 2011.
2. Lie Lu, Hao Jiang, and HongJiang Zhang. “A robust audio classification andsegmentation method” . In International ACM Multimedia Conference, pages203–211, Canada, September 2001.
3. M J. Carey, E. S. Parris, and H. L Loyd-Thomas, “ A comparison of features forspeech and music discrimination” , in proceedings of IEEE Int. Conf, Acoust.,Speech, and Signal Processing, vol. 1, march 1999, pp. 149-152.
4. J. O. Roman Jarina, “ A discriminative feature selection for applause soundsdetection” , in Proc. 8th Int. Workshop on Image Analysis for Multimedia interactiveService, 2007.
5. A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki, “ The det curvein assessment of detection task performance” , 1997, pp.1895-1898.
6. Z. Xiong, R. Radhakrishnan, A. Divakaran, and T. S.Huang, “Audio events detectionbased highlights extraction from baseball, golf and soccer games in a unifiedframework ,” in Golf and Soccer Games in A Unified Framework, ICASSP 2003, pp.401404.
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
References
7. B E Brodsky and B S Darkhovsky. “Non-parametric Methods in change-pointproblems” . Kluwer Academic Publishers, New York, 1993.
8. T M Krishna. “Kalpita sangita, Kalpana sangita and Manodharma” . PrivateCommunication, 2007-2011.
9. H Liu and M S Kim. “Real-time detection of stealthy ddos attacks usingtime-series decomposition” . In ICC, pages 1 – 6, Bangalore, India, July 2010.
10. Lawrence R Rabiner and Ronald W Schafer. “Theory and applications of digitalspeech processing” . Pearson International, Upper Saddle River, New Jersey, 2011.
11. H Wang, D Zhang, and K Shin. “Syn-dog: Sniffing syn flooding sources” . InICDCS, pages 421 – 428, Bangalore, India, July 2002.
12. Tong Zhang. “Automatic singer identification” . In Multimedia and Expo, 2003. ICME’03. Proceedings. 2003 International Conference on, volume 1, pages I – 33–6 vol.1,july 2003.
13. Jouni Paulus.”Improving markov model based music piece structure labell ingwith acoustic information” . In International Society for Music Information RetrievalConference, pages 303–308, August 2010.
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
-
THANK YOU
Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop