Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted...
-
Upload
joanna-harper -
Category
Documents
-
view
217 -
download
0
Transcript of Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted...
![Page 1: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/1.jpg)
Singer Similarity
Doug Van Nort
MUMT 611
![Page 2: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/2.jpg)
Goal
Determine Singer / Vocalist based on extracted features of audio signal
Classify audio files based on singer Storage and retrieval
![Page 3: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/3.jpg)
Introduction
Identification of singer fairly easy task for humans regardless of musical context
Not so easy to find parameters for automatic identification
More file sharing and databases leads to increased demand
![Page 4: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/4.jpg)
Introduction
Much work done in speech recognition, performs poorly for singer ID Systems trained on speech data, with no
background noise
The vocal problem has some fundamental differences Vocals exist in variety of background noise Voiced/unvoiced content
Singer recognition similar problem to solo instrument identification
![Page 5: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/5.jpg)
The Players
Kim and Whitman 2002
Liu and Huang 2002
![Page 6: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/6.jpg)
Kim and Whitman
From MIT Media Lab
Singer identification which Assumes strong harmonicity from
vocals Assumes pop music
Instrumentation/levels within critical frequency range
![Page 7: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/7.jpg)
Two step process
Untrained algorithm for automatic segmentation
Classification with training based on vocal segments
![Page 8: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/8.jpg)
Detection of Vocal Regions Filter frequencies outside of vocal
range of 200-2,000 Hz Chebychev IIR digital filter
Detect harmonicity
![Page 9: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/9.jpg)
Filter Frequency Response
![Page 10: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/10.jpg)
Filtering alone not enough Bass and cymbals gone, but Other instruments fall within range
Need to extract features within vocal range to find voice
![Page 11: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/11.jpg)
Harmonic detection
Band limited output sent through bank of inverse comb filters Delay varied
![Page 12: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/12.jpg)
Most attenuated signal represents strongest harmonic content
Harmonicity measure calculated by ratio of signal energy to maximally attenuated signal Allows for establishment of
threshold
![Page 13: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/13.jpg)
Singer Identification
Linear Predictive Coding (LPC) used to extract location and magnitude of formants
One of two classifiers used to identify singer based on formant information
![Page 14: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/14.jpg)
Feature Extraction
A 12-pole linear predictor used to find formants using autocorrelation method
Standard LPC treats frequencies linearly, but human sensitivity is more logarithmic Warp function maps frequencies to
approximation of Bark scale Further beneficial in finding fundamental
![Page 15: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/15.jpg)
![Page 16: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/16.jpg)
Classification Techniques 2 established pattern recognition
algorithms used:
Gaussian Mixture Model (GMM)
Support Vector Machine (SVM)
![Page 17: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/17.jpg)
GMM
Uses multiple weighted Gaussians to capture behavior of each class Each vector assumed to arise from mixture of
gaussian dists.
Parameters for Gaussians found via Expectation Maximization (EM) Mean and variance
Prior to EM, Principal Component Analysis (PCA) taken of data Normalizes variances, avoids highly irregular
scalings which EM can produce
![Page 18: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/18.jpg)
SVM
Computes optimal hyperplane to linearly separate two classes of data
Does not depend on probability estimation
Determined by a small number of data points (support vectors)
![Page 19: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/19.jpg)
Experiments & Results Testbed of 200 songs by 17
different artists/vocalists
Tracks downsampled to 11.025 Khz Vocal range still well below Nyquist
![Page 20: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/20.jpg)
Half of database used for training, half for testing
Two experiments: LPC features taken from entire
song LPC features taken from vocal
segments
![Page 21: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/21.jpg)
1024 frame analysis with hop size of 2
LP analysis used both linear and warped freq scales
![Page 22: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/22.jpg)
Results
![Page 23: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/23.jpg)
Results
Results better than chance (~6%) but fall short of expected human performance
Linear freq alone outperforms warped freq
Oddly, using only vocal segments decreases performance for SVM
![Page 24: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/24.jpg)
Liu and Huang
Based on MP3 database
Particularly high demand for such an approach, given widespread use of Mpeg 1, layer 3
Algorithm works directly on MP3 decoder algorithm
![Page 25: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/25.jpg)
Process
Coefficients of polyphase filter taken from MP3 decoding process
File segmented into phonemes based on said coefficients
Feature vector constructed for each phoneme, and stored along with artist name in database
Classifier trained on database, used to identify unknown MP3 files
![Page 26: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/26.jpg)
Flowchart for singer similarity System of Liu/Huang
![Page 27: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/27.jpg)
Phoneme Segmentation MP3 decoding provides polyphase
coefficients
![Page 28: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/28.jpg)
Energy intensity of each subband is sum of squares of subband coefficients
Frame energy calculated from polyphase coefficients
![Page 29: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/29.jpg)
Energy gap exists between two phonemes
Segmentation looks to automatically identify this gap
![Page 30: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/30.jpg)
Waveform of two phonemes
Frame energy of two phonemes
![Page 31: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/31.jpg)
Phoneme Feature Extraction Phoneme features computed
directly from MDCT coefficients 576 dimensional feature vector for
each frame
Phoneme feature vector of n frames
![Page 32: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/32.jpg)
Classification : setup
Create database of phoneme feature vectors Becomes training set
Discriminating Radius: measure of uniqueness by min Euclidean distance between dissimilar vectors
![Page 33: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/33.jpg)
Good vs. Bad discriminators
![Page 34: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/34.jpg)
Number of similar phonemes within discriminating radius also cosidered
Number of phonemes within radius = wf = frequency of phoneme f
![Page 35: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/35.jpg)
Discriminating ability of each phoneme depends on frequency and distance
![Page 36: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/36.jpg)
Classification: in action Unknown MP3 segmented into
phonemes Only first N used for efficiency
kNN used as classifier K neighbors compared to N
phonemes and weighted by discriminating function
K*N weighted “votes” clustered by singer, and the winner is one with largest score
![Page 37: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/37.jpg)
Experiments/Results
10 Male, 10 Female singers
30 songs apiece 10 phoneme database 10 training (discriminator weights) 10 test set
![Page 38: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/38.jpg)
Free parameters
User defined parameters: k value Discrimination threshold Number of singers in a class
![Page 39: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/39.jpg)
Varying threshold
![Page 40: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/40.jpg)
Varying k
![Page 41: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/41.jpg)
Varying number of singers
![Page 42: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/42.jpg)
Results for all Singers
![Page 43: Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfb91a28abf838c9f9f9/html5/thumbnails/43.jpg)
Conclusion
Not much work yet strictly on singer
Tough because of time and background variances
Quite useful as many people identify artists with singer
Initial results promising, short of human performance
See also: Minnowmatch [Whitman, Flake, Lawrence]