Post on 20-Dec-2015
Thursday, November 13, 2008ASA 156: Statistical Approaches for Analysis of
Music and Speech Audio Signals
AudioDB: Scalable approximate AudioDB: Scalable approximate nearest-neighbor search with nearest-neighbor search with
automatic radius-bounded indexingautomatic radius-bounded indexing
Michael A. CaseyMichael A. Casey
Digital MusicsDigital Musics
Dartmouth College, Hanover, Dartmouth College, Hanover, NHNH
Scalable SimilarityScalable Similarity
8M tracks in commercial collection8M tracks in commercial collection PByte of multimedia data PByte of multimedia data Require passage-level retrieval (~ 2 Require passage-level retrieval (~ 2
bars)bars) Require scalable nearest-neighbor Require scalable nearest-neighbor
methodsmethods
SpecificitySpecificity
Partial track retrievalPartial track retrieval Alternate versions: remix, cover, live, Alternate versions: remix, cover, live,
album album Task is mid-high specificityTask is mid-high specificity
Example: remixingExample: remixing
Original TrackOriginal Track Remix 1Remix 1 Remix 2Remix 2 Remix 3Remix 3
Audio ShinglesAudio Shingles
, concatenate l frames of m dimensional features
A shingle is defined as:
• Shingles provide contextual information about features • Originally used for Internet search engines:
•Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, Geoffrey Zweig: “Syntactic Clustering of the Web”. Computer Networks 29(8-13): 1157-1166 (1997)
•Related to N-grams, overlapping sequences of features• Applied to audio domain by Casey and Slaney :
•Casey, M. Slaney, M. “The Importance of Sequences in Musical Similarity”, in Proc.
IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006
Audio Shingle Similarity Audio Shingle Similarity
, a query shingle drawn from a query track {Q}
, database of audio tracks indexed by (n)
, a database shingle from track n
Shingles are normalized to unit vectors, therefore:
For shingles with M dimensions (M=l.m); m=12, 20; l=30,40
Open source: google: Open source: google: “audioDB”“audioDB” Management of tracks, sequences, Management of tracks, sequences,
saliencesalience Automatic indexing parametersAutomatic indexing parameters OMRAS2, Yahoo!, AWAL, CHARM, more…OMRAS2, Yahoo!, AWAL, CHARM, more… Web-services interface (SOAP / JSON)Web-services interface (SOAP / JSON) Implementation of LSH for large N ~ 1BImplementation of LSH for large N ~ 1B 1-10 ms whole-track retrieval from 1B 1-10 ms whole-track retrieval from 1B
vectorsvectors
AudioDB: Shingle Nearest AudioDB: Shingle Nearest Neighbor SearchNeighbor Search
Whole-track similarityWhole-track similarity
Often want to know which tracks are Often want to know which tracks are similarsimilar
Similarity depends on specificity of Similarity depends on specificity of tasktask Distortion / filtering / re-encoding (high)Distortion / filtering / re-encoding (high) Remix with new audio material (mid)Remix with new audio material (mid) Cover song: same song, different artist Cover song: same song, different artist
(mid)(mid)
Whole-track resemblance:Whole-track resemblance:radius-bounded searchradius-bounded search
Compute the number of shingle collisions between two tracks:
Whole-track resemblance:Whole-track resemblance:radius-bounded searchradius-bounded search
Compute the number of shingle collisions between two tracks:
• Requires a threshold for considering shingles to be related• Need a way to estimate relatedness (threshold) for data set
Statistical approaches to Statistical approaches to modeling modeling
distance distributionsdistance distributions
Distribution of minimum Distribution of minimum distancesdistances
Database: 1.4 million shingles. The left bump is the minimum between 1000 randomly selectedquery shingles and this database. The right bump is a small sampling (1/98 000 000) of the full histogram of all distances.
Radius-bounded retrieval Radius-bounded retrieval performance: cover song performance: cover song
(opus task)(opus task)
• Performance depends critically on xthresh, the collision threshold
• Want to estimate xthresh automatically from unlabelled data
Order StatisticsOrder Statistics
Minimum-value distribution is Minimum-value distribution is analyticanalytic
Estimate the distribution parametersEstimate the distribution parameters Substitute into minimum value Substitute into minimum value
distributiondistribution Define a threshold in terms of FP Define a threshold in terms of FP
raterate This gives an estimate of This gives an estimate of xthreshxthresh
Estimating Estimating xthresh xthresh from from unlabelled dataunlabelled data
Use theoretical statisticsUse theoretical statistics Null Hypothesis: Null Hypothesis:
HH00: shingles are drawn from unrelated tracks: shingles are drawn from unrelated tracks
Assume elements i.i.d., normally distributedAssume elements i.i.d., normally distributed MM dimensional shingles, dimensional shingles, dd effective degrees of effective degrees of
freedom: freedom:
Squared distance distribution for Squared distance distribution for HH00
ML for background ML for background distributiondistribution
• Likelihood for N data points (distances squared)• d = effective degrees of freedom• M = shingle dimensionality
Background distribution Background distribution parametersparameters
• Likelihood for N data points (distances squared)• d = effective degrees of freedom• M = shingle dimensionality
Unlabelled data Unlabelled data experimentexperiment
Unlabelled data set Unlabelled data set Known to contain:Known to contain:
cover songs (same work, different performer)cover songs (same work, different performer) Near duplicate recordings (misattribution, Near duplicate recordings (misattribution,
encoding)encoding) Estimate background distance distributionEstimate background distance distribution Estimate minimum value distributionEstimate minimum value distribution Set Set xthresh xthresh so FP rate is <= 1%so FP rate is <= 1% Whole-track retrieval based on shingle Whole-track retrieval based on shingle
collisionscollisions
ScalingScaling
Locality sensitive hashing Locality sensitive hashing Trade-off approximate NN for time Trade-off approximate NN for time
complexitycomplexity 3 to 4 orders of magnitude speed-up3 to 4 orders of magnitude speed-up No noticeable degradation in No noticeable degradation in
performanceperformance For optimal radius thresholdFor optimal radius threshold
Current deploymentCurrent deployment
Large commercial collectionsLarge commercial collections AWAL ~ 100,000 tracksAWAL ~ 100,000 tracks Yahoo! 2M+ tracks, related song Yahoo! 2M+ tracks, related song
classifierclassifier AudioDB: open-source, international AudioDB: open-source, international
consortium of developersconsortium of developers Google: “audioDB”Google: “audioDB”
ConclusionsConclusions
Radius-bounded retrieval model for tracksRadius-bounded retrieval model for tracks Shingles preserve temporal information, high Shingles preserve temporal information, high
dd Implements mid-to-high specificity searchImplements mid-to-high specificity search Optimal radius threshold from order statistics Optimal radius threshold from order statistics
null hypothesis: shingles are drawn from unrelated null hypothesis: shingles are drawn from unrelated trackstracks
LSH requires radius bound, automatic LSH requires radius bound, automatic estimateestimate
Scales to 1B shingles+ using LSHScales to 1B shingles+ using LSH
ThanksThanks
Malcolm Slaney, Yahoo! Research Malcolm Slaney, Yahoo! Research Inc.Inc.
Christophe Rhodes, Goldsmiths, U. Christophe Rhodes, Goldsmiths, U. of Londonof London
Michela Magas, Goldsmiths, U. of Michela Magas, Goldsmiths, U. of LondonLondon
Funding: EPSRC: EP/E02274X/1 Funding: EPSRC: EP/E02274X/1