Song-level Multi-pitch Tracking by Heavily Constrained Clustering
description
Transcript of Song-level Multi-pitch Tracking by Heavily Constrained Clustering
Song-level Multi-pitch Tracking by Heavily Constrained Clustering
Zhiyao Duan, Jinyu Han and Bryan PardoEECS Dept., Northwestern Univ.
Interactive Audio Lab, http://music.cs.northwestern.edu
For presentation in ICASSP 2010, Dallas, Texas, USA.
Multi-pitch Estimation & Tracking Task• Given polyphonic music played by several
monophonic harmonic instruments (Num known)
• Estimate a pitch trajectory for each instrument
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 2
Potential Applications
• Automatic music transcription• Harmonic source separation• Other applications
– Melody-based music search– Chord recognition– Source localization– Music education– ……
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 3
The 2-stage Standard Approach• Stage 1: Multi-pitch Estimation (MPE): estimate
pitches in each single time frame– Z. Duan, B. Pardo and C. Zhang. , “Multiple Fundamental Frequency
Estimation by Modeling Spectral Peaks and Non-peak Regions”, IEEE Trans. Audio Speech Language Process., in press.
• Stage 2: Multi-pitch Tracking (MPT): connect pitch estimates across frames into pitch trajectories
4
…Time
Freq
uen
cy
State of the Art of MPT
• What existing MPT methods do– Form short pitch trajectories within a note,
(note-level) according to local time-frequency proximity of pitch estimates
• Our contribution– Form long pitch trajectories through multiple
notes (song-level) using a new constrained clustering algorithm
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 5
Try Clustering by Timbre
• Each trajectory is a cluster of pitch estimates• One cluster per instrument• Clustering principle: maintain timbre
consistency in each cluster
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
?
Timbre Feature of Pitch Estimates• Harmonic structure: relative amplitudes
of first 50 harmonics
Time
Freq
uenc
y
0 10 20 30 40 500
20
40
60
80
100
Harmonic number
Am
plitu
de (d
B)
Harmonic Structure
Minimize This Objective Function
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
2
1
( )ki
K
i kk T
f
cx
A partitioninto K clusters The 50-d harmonic
structure of i-thpitch estimate
Number ofClusters
Center of k-th cluster
For all pitch estimates in k-th cluster
Objective Function Is Not Enough
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
Add Pitch-locality Constraints• Must-link: pitch estimates close in both time and
frequency should be in the same cluster• Cannot-link: simultaneous pitches should not be
in the same cluster (only for monophonic instruments)
10Time
Freq
uenc
y
Properties of Our Problem
• Objective: timbre consistency• Constraints: pitch locality• Previous constrained clustering algorithms do
not apply due to the following properties:– Inconsistent constraints:
pitch estimates sometimes erroneousmay make constraints unsatisfiable
– Heavily constrained: nearly every pitch estimate is involved in at least one constraint
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
The Proposed Clustering Algorithm
: clustering in n-th iteration; : {all constraints satisfied by } ;
1. Start from an initial clustering , which satisfies , a subset of all constraints; n=1;2. Find a new clustering that decreases the objective and also satisfies ;3. = {all constraints satisfied by } ;4. Repeat 2-4 until the objective (nearly) cannot be decreased;
0 0C
n1nC
nnC
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
0 1C CC
f
0 1) ( ) ( )(f f f
n
nC n
Initial Clustering
• Trivial one– : a random partition– : constraints satisfied by , may be empty
• A more informative one for MPT– : label pitches according to pitch order in each
frame: highest, second-highest, third.., fourth…– : will contain all cannot-links
0
0C 0
0
0C
…Time
Freq
uen
cy …Time
Freq
uen
cy
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
• 1. Satisfy current constraints• 2. Decrease the objective function
: satisfied cannot-link : unsatisfied cannot-link : satisfied must-link : unsatisfied cannot-link
• Swap set: A connected subgraph between two clusters. • Traverse all swap sets until finding a new clustering that
decreases the objective function
4
2 3
7
83
1
5 6
Find A New Clustering
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
4
2 3
7
83
1
5 64
2 3
7
83
1
5 6
0 1C CC 0 1) ( ) ( )(f f f
Algorithm Review
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
: partition of points into clusters
: feasible solution space under constraints kkS kC
Experiments• Data set
– 10 J.S. Bach chorales (quartets, played by violin, clarinet, saxophone and bassoon)
– Each instrument is recorded individually, then mixed• Ground-truth pitch trajectories
– Use YIN on monophonic tracks before mixing• Input pitch estimates
– Our previous work in [1]– Input accuracy: 70.0+-3.1%
[1] Zhiyao Duan, Bryan Pardo and Changshui Zhang, “Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-peak Regions”, IEEE Trans. Audio Speech Language Process., in press.
16Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
Overall Multi-pitch Tracking Results
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
Mean % of correct pitch estimates
Among Correctly Estimated Pitches
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
An Example
0 5 10 15 20 2540
50
60
70
80
90
Time (second)
Pitc
h (M
IDI n
umbe
r)Ground-truth Pitch Trajectories
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
An Example
0 5 10 15 20 2540
50
60
70
80
90
Time (second)
Pitc
h (M
IDI n
umbe
r)Our Resutls
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
Conclusion
• Formulate the song-level Multi-pitch Tracking problem as a constrained clustering problem– Objective: timbre consistency– Constraints: pitch locality
• Existing constrained clustering algorithms do not apply due to problem properties
• Propose a new constrained clustering algorithm
• Experimental results are promisingNorthwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
Thanks you!
22Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu