Timbre Similarity Timbre Similarity Work by Aucouturier & PachetWork by Aucouturier & Pachet
Rebecca FiebrinkRebecca Fiebrink
MUMT 611MUMT 611
3 March 20053 March 2005
22 of 21 of 21
Presentation OverviewPresentation Overview
Pachet & Aucouturier; why timbre similarity?Pachet & Aucouturier; why timbre similarity? Basic approach to quantifying timbre and Basic approach to quantifying timbre and
timbre similaritytimbre similarity ““Finding songs that sound the same,” 2002Finding songs that sound the same,” 2002 The CUIDADO projectThe CUIDADO project P & A’s work in contextP & A’s work in context Practical and theoretical improvements, 2004Practical and theoretical improvements, 2004 Remaining problems and future workRemaining problems and future work
33 of 21 of 21
Who are they?Who are they?
Sony Computer Science Library (CSL), Sony Computer Science Library (CSL), ParisParis
François Pachet: Music access and François Pachet: Music access and interaction, “interestingness”interaction, “interestingness”
Jean-Julien Aucouturier: PhD studentJean-Julien Aucouturier: PhD student A host of papers on music browsing, A host of papers on music browsing,
genre, metadata, segmentation, …genre, metadata, segmentation, …
44 of 21 of 21
Why timbre similarity?Why timbre similarity?
Electronic Music Distribution (EMD) systems:Electronic Music Distribution (EMD) systems: Move from mass-market to individualized Move from mass-market to individualized
distributiondistribution Collaborative filtering isn’t sufficientCollaborative filtering isn’t sufficient High-level, perceptually relevant descriptors High-level, perceptually relevant descriptors
play complementary / competing role; allow for play complementary / competing role; allow for more interestingmore interesting music browsing music browsing
Makes more sense than “melodic similarity”Makes more sense than “melodic similarity” Tied to genre, but not too tightlyTied to genre, but not too tightly
55 of 21 of 21
How to quantify timbre?How to quantify timbre?
High-level descriptor for an entire song or High-level descriptor for an entire song or piecepiece
Mel Frequency Cepstral Coefficients (MFCCs) Mel Frequency Cepstral Coefficients (MFCCs) are building blocksare building blocks Related to spectral envelopeRelated to spectral envelope First few coefficients account for timbre envelope; First few coefficients account for timbre envelope;
later ones describe pitchlater ones describe pitch Derive a compact representation of a piece’s Derive a compact representation of a piece’s
MFCC “space” and a way to compare MFCC “space” and a way to compare representations for two piecesrepresentations for two pieces
66 of 21 of 21
A & P’s implementation A & P’s implementation (2002)(2002)
Find first 8 MFCCs every 50 msFind first 8 MFCCs every 50 ms Model song as mixture of 3 Gaussian densities over all Model song as mixture of 3 Gaussian densities over all
possible MFCCs of length 8 (GMM = “Gaussian possible MFCCs of length 8 (GMM = “Gaussian mixture model”)mixture model”)
Calculate “distance” between GMMs by samplingCalculate “distance” between GMMs by sampling Sample from one GMM, compute likelihood of the samples Sample from one GMM, compute likelihood of the samples
given the other GMMgiven the other GMM Force symmetry and normalizeForce symmetry and normalize Use 1000 samplesUse 1000 samples
Store GMM information for each song and calculate Store GMM information for each song and calculate similarity matrixsimilarity matrix
77 of 21 of 21
Results of 2002 versionResults of 2002 version
Same artistSame artist Harpsichord pieces: Bach - Wohltemperierte Clavier Fuga II in C
minor and Bach – Wohltemperierte Clavier - Praeludium IV in C sharp minor
Trip Hop: Portishead - Mysterons (live) and Portishead - Sour Times Different artists, same genreDifferent artists, same genre
Harpsichord pieces: Bach - Das Wohltemperierte Clavier - Praeludium IV in C sharp minor BWV849 and Couperin – Gavotte
"Woman Rock Singer": Leah Andreone - It's OK and Meredith Brooks – Bitch
““Interesting” resultsInteresting” results “Classical” and “Pop": Beethoven - Romanze fur Violine und
Orchester Nr. 2 F-dur op.50 and Beatles - Eleanor Rigby "Trip Hop" and "Celtic Folk ": Portishead - Mysterons and Alan Stivell
- Arvor You. (same kind of harpy theremin-like ambiance)
88 of 21 of 21
Evaluating resultsEvaluating results
No ground truth existsNo ground truth exists Similarity is subjectiveSimilarity is subjective People don’t hear timbre alonePeople don’t hear timbre alone
Survey of 10 people: Is A more like B or Survey of 10 people: Is A more like B or C?C? Algorithm matches people 80% of timeAlgorithm matches people 80% of time
One view: Divergence from expectation One view: Divergence from expectation makes it makes it usefuluseful
99 of 21 of 21
Generating “aha!”Generating “aha!”
Produce Produce interestinginteresting matches: when matches: when genre and timbre are not correlatedgenre and timbre are not correlated
Allow user control over size of “Aha!” Allow user control over size of “Aha!” explorationexploration
1010 of 21 of 21
Using the measure: Using the measure: CUIDADOCUIDADO
CContent-based ontent-based UUnified nified IInterfaces and nterfaces and DDescriptors for escriptors for AAudio and udio and MMusic usic DDatabases available atabases available OOnlinenline
2001-2003 European research project2001-2003 European research project ““aims at developing a new chain of applications aims at developing a new chain of applications
through the use of audio/music content descriptors, in through the use of audio/music content descriptors, in the spirit of the MPEG-7 standard”the spirit of the MPEG-7 standard”
design of appropriate design of appropriate description structuresdescription structures development of development of extractors for deriving high-level extractors for deriving high-level
informationinformation from audio signals from audio signals design and implementation of two applications: the design and implementation of two applications: the
Sound PaletteSound Palette and the and the Music BrowserMusic Browser(From the CUIDADO website)(From the CUIDADO website)
1111 of 21 of 21
CUIDADO Music BrowserCUIDADO Music Browser
Client/server architecture for music Client/server architecture for music browsingbrowsing
Target audience: casual music loverTarget audience: casual music lover 17,075 popular music titles with metadata17,075 popular music titles with metadata
Picture from “The CUIDADO project”
1212 of 21 of 21
Music Browser Query Music Browser Query PanelPanel
Picture from “Popular music access”
1313 of 21 of 21
Using Timbre in the Using Timbre in the Music BrowserMusic Browser
Nearest-neighbor searchNearest-neighbor search ““Find me something that sounds like this song”Find me something that sounds like this song” Allow user control over size of exploration: “Aha sliderAllow user control over size of exploration: “Aha slider
Same artist … Same genre … “interesting” Same artist … Same genre … “interesting”
Playlist generationPlaylist generation Example:Example:
1- Timbre continuity throughout the sequence1- Timbre continuity throughout the sequence2- Genre Cardinality: 30% Rock, 30% Folk, 30%Pop2- Genre Cardinality: 30% Rock, 30% Folk, 30%Pop
3- Genre Distribution: the titles of the same genre should be as 3- Genre Distribution: the titles of the same genre should be as separated as possibleseparated as possible
1414 of 21 of 21
Sample playlistSample playlist
Arlo Guthrie – City Of New Orleans (Folk/Rock)Arlo Guthrie – City Of New Orleans (Folk/Rock) Belle & Sebastien – The boy done wrong again (Rock/Alternative)Belle & Sebastien – The boy done wrong again (Rock/Alternative) Ben Harper – Pleasure & Pain (Pop/Blues)Ben Harper – Pleasure & Pain (Pop/Blues) Joni Mitchell – Borderline (Folk/Pop)Joni Mitchell – Borderline (Folk/Pop) Badly Drawn Boy – Camping Next to Water (Rock/Alternative)Badly Drawn Boy – Camping Next to Water (Rock/Alternative) Rolling Stones – You Can’t always get what you want (Pop/Blues)Rolling Stones – You Can’t always get what you want (Pop/Blues) Nick Drake - One of these things first (Folk/Pop)Nick Drake - One of these things first (Folk/Pop) Radiohead - Motion Picture Soundtrack (Rock/Brit)Radiohead - Motion Picture Soundtrack (Rock/Brit) The Beatles - Mother Nature's Son (Pop/Brit)The Beatles - Mother Nature's Son (Pop/Brit) Tracy Chapman - Talkin' about a Revolution (Rock/Folk)Tracy Chapman - Talkin' about a Revolution (Rock/Folk)
1515 of 21 of 21
Work in ContextWork in Context
Several other researchers also use MFCCs with Several other researchers also use MFCCs with reasonable results: Baumann 2003, Berenzweig et al. reasonable results: Baumann 2003, Berenzweig et al. 2002, Foote 1997, Kulesh 2003, Logan and Salomon 2002, Foote 1997, Kulesh 2003, Logan and Salomon 2001, … 2001, …
Pampalk, Dixon, and Widmer 2003Pampalk, Dixon, and Widmer 2003 P & A’s work is relatively accurateP & A’s work is relatively accurate Implementation is relatively slowImplementation is relatively slow Incorporating use of 1Incorporating use of 1stst MFCC integrates average dynamic MFCC integrates average dynamic
level into resultslevel into results Hard to compare one group’s work with another’sHard to compare one group’s work with another’s Hard to propose future research directions beyond Hard to propose future research directions beyond
parameter tweakingparameter tweaking
1616 of 21 of 21
Practical & Theoretical Practical & Theoretical Improvements, 2004Improvements, 2004
A & P conducted extensive tests varying A & P conducted extensive tests varying algorithms and parameters of 2002 algorithms and parameters of 2002 systemsystem Can optimal parameter settings be found?Can optimal parameter settings be found? What is the limit on improvement?What is the limit on improvement?
Evaluate in the context of CUIDADO Evaluate in the context of CUIDADO Music BrowserMusic Browser
1717 of 21 of 21
Optimal parameter Optimal parameter valuesvalues
Signal sample rate: higher is betterSignal sample rate: higher is better Distance sample rate (used to compare GMMs): higher is better, Distance sample rate (used to compare GMMs): higher is better,
but little improvement over 1000but little improvement over 1000 Sampling can perform as well as Earth Mover’s distance (EMD)Sampling can perform as well as Earth Mover’s distance (EMD) The number of MFCCs and the number of components in the The number of MFCCs and the number of components in the
GMM jointly affect the outcome:GMM jointly affect the outcome: 50 components and 20 MFCCs is optimal50 components and 20 MFCCs is optimal # components can be reduced without hurting performance # components can be reduced without hurting performance
muchmuch 30 ms is optimum window size30 ms is optimum window size Adhering to above guidelines leads to absolute improvement of Adhering to above guidelines leads to absolute improvement of
16% to precision16% to precision Precision is underestimated: considers same-genre onlyPrecision is underestimated: considers same-genre only
1818 of 21 of 21
Alternative algorithmsAlternative algorithms
Several speech-processing algorithms Several speech-processing algorithms were triedwere tried Mixed resultsMixed results No drastic improvements: 2% additional No drastic improvements: 2% additional
precision at mostprecision at most
HMM instead of GMM offers no HMM instead of GMM offers no improvementimprovement
1919 of 21 of 21
Conclusions of 2004 Conclusions of 2004 studystudy
““Ceiling” of 65% precision (conservative Ceiling” of 65% precision (conservative estimate)estimate)
False positives remain a problemFalse positives remain a problem Jimi Hendrix != Joni MitchellJimi Hendrix != Joni Mitchell Due to “hubs” in nearest-neighbor spaceDue to “hubs” in nearest-neighbor space
Problems are inherent in approach itself?Problems are inherent in approach itself?
2020 of 21 of 21
Proposals for future workProposals for future work
Address Address perceptionperception of timbre of timbre Some frames are more important than Some frames are more important than
othersothers Some timbres more salient than othersSome timbres more salient than others People assess similarity by choosing “This People assess similarity by choosing “This
sounds like X” or “This doesn’t sound like X”sounds like X” or “This doesn’t sound like X”
2121 of 21 of 21
ConclusionsConclusions
High-level, perceptually based similarity High-level, perceptually based similarity has a place in electronic music has a place in electronic music distributiondistribution
Current systems for timbre similarity have Current systems for timbre similarity have some usesome use
There is still room for new, innovative, There is still room for new, innovative, and cross-disciplinary workand cross-disciplinary work
Top Related