Audio classification Discriminating speech, music and environmental audio Rajas A. Sambhare ECE 539.
-
Upload
madeleine-tate -
Category
Documents
-
view
218 -
download
0
Transcript of Audio classification Discriminating speech, music and environmental audio Rajas A. Sambhare ECE 539.
![Page 1: Audio classification Discriminating speech, music and environmental audio Rajas A. Sambhare ECE 539.](https://reader035.fdocuments.in/reader035/viewer/2022072112/56649e315503460f94b226e2/html5/thumbnails/1.jpg)
Audio classificationDiscriminating speech, music and environmental audio
Rajas A. SambhareECE 539
![Page 2: Audio classification Discriminating speech, music and environmental audio Rajas A. Sambhare ECE 539.](https://reader035.fdocuments.in/reader035/viewer/2022072112/56649e315503460f94b226e2/html5/thumbnails/2.jpg)
ObjectiveDiscrimination between speech, music and environmental
audio (special effects) using short 3-second samples
• To extract a relevant set of feature vectors from the audio samples
• To develop a pattern classifier that can successfully discriminate the three different classes based on the extracted vectors
![Page 3: Audio classification Discriminating speech, music and environmental audio Rajas A. Sambhare ECE 539.](https://reader035.fdocuments.in/reader035/viewer/2022072112/56649e315503460f94b226e2/html5/thumbnails/3.jpg)
Feature extraction
0
2
0
2
|)(|
|)(|
dF
dF
cFrequency Centroid
dF
dF
Bc
0
2
0
22
|)(|
|)(|)(
Bandwidth
![Page 4: Audio classification Discriminating speech, music and environmental audio Rajas A. Sambhare ECE 539.](https://reader035.fdocuments.in/reader035/viewer/2022072112/56649e315503460f94b226e2/html5/thumbnails/4.jpg)
Feature extraction3 sec audio sample
(22050 Hz) 512-sample frames
512 point FFT
Extract centroid, energy in 22 critical
bands,and bandwidth
23.21ms, 512 samples, 25% overlap, Hanning
Calculate log power ratios in each band
Calculate mean, SD for centroid, log power ratios and
bandwidth across all frames
21
Calculate silence ratio (SR)
Concatenate mean, SDof centroid, log powerratios, bandwidth and
silence ratio
Save 49 dimensionfeature vector
![Page 5: Audio classification Discriminating speech, music and environmental audio Rajas A. Sambhare ECE 539.](https://reader035.fdocuments.in/reader035/viewer/2022072112/56649e315503460f94b226e2/html5/thumbnails/5.jpg)
Neural network development
• Create a database of 135 training and 45 testing samples
• Develop neural network using MATLAB
• Dynamically partition training samples using 25% for tuning
• Decide on network architecture (No. of hidden layers and neurons)
• Decide on network parameters like and
• Attempt classification using various combinations of feature vectors
Feedforward Multi-layer perceptron with back-propagation training
49
20
3
Designed network, 49-20-3
![Page 6: Audio classification Discriminating speech, music and environmental audio Rajas A. Sambhare ECE 539.](https://reader035.fdocuments.in/reader035/viewer/2022072112/56649e315503460f94b226e2/html5/thumbnails/6.jpg)
Results
• Classification rate of 82.37% after using critical sub-band ratios, frequency centroid, bandwidth and silence ratios
• Classification rate of 79.78% after using only critical sub-band ratios.• Classification rate of 84.44% after using only frequency centroid,
bandwidth and silence ratios but extremely slow training and variable results (2.34% std. dev. in classification rate)
• Baseline study: Study by Zhang and Kuo [1] a classification rate of ~90% was reported, using a rule-based heuristic. However better results are expected on increasing database size.
References: [1] Hierarchical System for Content-based Audio Classification and Retrieval, Tong Zhang, C.-C. Jay Kuo, Proc. SPIE Vol. 3527, p. 398-409, Multimedia Storage and Archiving Systems III, 1998