AMAC Corporate Jet AG Aircraft Management and Charter Services
Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops ›...
Transcript of Advances in Models for Acoustic Processingweb4.cs.ucl.ac.uk › ... › D.Barber › workshops ›...
Advances in Models for Acoustic Processing
David Barber and Taylan Cemgil
Signal Processing and Communications Lab.
9 Dec 2006
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada
Outline
• Acoustic Modeling and applications
• Parameter estimation and Inference
– Subspace methods, Variational, Monte Carlo
• Issues
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 1
Acoustic Modeling
t/sec
f/Hz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
t/sec
f/Hz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 2
Probabilistic Models
• Once a realistic model is constructed many related task can be cast to posteriorinference problems
p(Structure|Observations) ∝ p(Observations|Structure)p(Structure)
– analysis,– localisation,– restoration,– transcription,– source separation,– identification,– coding,– resynthesis, cross synthesis
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 3
Source Separation
s1t
s2t
. . . sn
t
x1t
. . . xm
t
t = 1 . . . T
a1 r1 . . . a
m rm
• Joint estimation Sources, Channel noise and mixing system
• Typically underdetermined (Channels < Sources) ⇒ Multimodal posterior
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 4
Source Separation
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 5
Source Separation
t/sec
f/Hz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Speech + Piano + Guitar)
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 6
Audio Interpolation
• Estimate missing samples given observed ones
• Restoration, concatenative expressive speech synthesis, ...
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 7
Audio Interpolation
p(x¬κ|xκ) ∝
∫
dHp(x¬κ|H)p(xκ|H)p(H)
H ≡ (parameters, hidden states)
H
x¬κ xκ
Missing y
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 8
Application: Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
• Each latent process ν = 1 . . . W corresponds to a “voice”. Indicators r1:W,1:K
encode a latent “piano roll”
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 9
Tempo, Rhythm, Meter analysis
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1:K)
50 100 150 200 250 300 350 400 450
800
600
400
200 −10
−5
0Q
uart
er n
otes
per
min
. log p(nk|y
1:K)
50 100 150 200 250 300 350 400 450
180
120
60−10
−5
0
p(rk|y
1:K)
Frame Index, k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 0.20.40.60.8
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 10
Hierarchical ModelingS ore Expression
Ä äÅ ä
øø
tt tt tt tIt tt tYtc d tY t t
øø
tYt tt tIt tIt tt tY"tt d tY t t
0 500 1000 1500 2000 2500
−3.5
−3
−2.5
ω
t
True EstimatedPiano Roll
0 500 1000 1500 2000 2500
θ
Audio SignalCemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 11
Hierarchical Modeling
M
1 2 : : : tv1 v2 : : : vtk1 k2 : : : kth1 h2 : : : ht1 2 : : : tm1 m2 : : : mt
gj;1 gj;2 : : : gj;trj;1 rj;2 : : : rj;tnj;1 nj;2 : : : nj;txj;1 xj;2 : : : xj;tyj;1 yj;2 : : : yj;ty1 y2 : : : ytCemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 12
Time Series Modeling
• Sound is primarily about oscillations and resonance
• Cascade of second order sytems
• Audio signals can often be compactly represented by sinusoidals
(real) yn =
p∑
k=1
αke−γkn cos(ωkn + φk)
(complex) yn =
p∑
k=1
ck(e−γk+jωk)n
y = F (γ1:p, ω1:p)c
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 13
State space Parametrisation
xn+1 =
e−γ1+jω1
. . .e−γp+jωp
︸ ︷︷ ︸
A
xn x0 =
c1
c2...cp
yn =(
1 1 . . . 1 1)
︸ ︷︷ ︸
C
xn
x0 x1 . . . xk−1 xk . . . xK
y1 . . . yk−1 yk . . . yK
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 14
State Space Parametrisation
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 15
Classical System identification approach
• The state space representation implies
xn+1 = Axn
yn = Cxn⇒ yn = CAnx0
• Therefore we can write for arbitrary L and M the Hankel matrix
y0 y1 . . . yM
y1 y2 . . . yM+1... ... . . . ...
yL yL+1 . . . yL+M
︸ ︷︷ ︸
Y
=
C
CA...
CAL
︸ ︷︷ ︸
ΓL+1
(x0 Ax0 . . . AMx0
)
︸ ︷︷ ︸
ΩM+1
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 16
Identification via matrix factorisation
1. Given the “impulse response” Hankel matrix Y (Ho and Kalman 1966, Rao and Arun1992, Viberg 1995), compute a matrix factorisation (typically via SVD)
Y = ΓL+1ΩM+1 =
C
CA...
CAL
(x0 Ax0 . . . AMx0
)
︸ ︷︷ ︸
2. Read off C and x0 from factors ΓL+1 and ΩM+1
3. Compute transition matrix by exploiting shift invariance
CA
CA2
...CAL
=
C
CA...
CAL−1
A ⇒ A = Γ†1:nΓ2:n+1
Matrix factorisation ideas have lead to useful methods (N4SID, NMF, MMMF...)
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 17
Pros and Cons
• Uses well understood algorithms from numerical linear algebra ⇒ often quitefast and numerically stable
• Model selection can be based on numerical rank analysis; inspection ofsingular values e.t.c.
• Handling of uncertainty and nonstationarity is not very transparent
• Prior knowledge is hard to incorporate
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 18
Hierarchical Factorial Models
• Each component models a latent process
• The observations are projections
rν0 · · · rν
k · · · rνK
θν0 · · · θ
νk · · · θ
νK
ν = 1 . . . W
yk yK
• Generalises Source-filter models
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 19
Harmonic model with changepoints
rk|rk−1 ∼ p(rk|rk−1)
θk|θk−1, rk ∼ [rk = 0]N (Aθk−1, Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0, S)︸ ︷︷ ︸
new
yk|θk ∼ N (Cθk, R)
A =
Gω
G2ω
. . .GH
ω
N
Gω = ρk
(cos(ω) − sin(ω)sin(ω) cos(ω)
)
damping factor 0 < ρk < 1, framelength N and damped sinusoidal basis matrix C of sizeN × 2H
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 20
Harmonic model with changepoints
r kfr
eque
ncy
k
k
x k
• Each changepoint denotes the onset of a new audio event
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 21
Monophonic transcription
• Detecting onsets, offsets and pitch (Cemgil et. al. 2006, IEEE TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference is possible
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 22
Factorial Changepoint model
r0,ν ∼ C(r0,ν; π0,ν)
θ0,ν ∼ N (θ0,ν; µν, Pν)
rk,ν|rk−1,ν ∼ C(rk,ν; πν(rt−1,ν)) Changepoint indicator
θk,ν|θk−1,ν ∼ N (θk,ν; Aν(rk)θk−1,ν, Qν(rk)) Latent state
yk|θk,1:W ∼ N (yk; Ckθk,1:W , R) Observation
rν0 · · · rν
k · · · rνK
θν0 · · · θ
νk · · · θ
νK
ν = 1 . . . W
yk yK
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 23
Application: Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
• Each latent changepoint process ν = 1 . . . W corresponds to a “piano key”.Indicators r1:W,1:K encode a latent “piano roll”
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 24
Single time slice - Bayesian Variable Selection
ri ∼ C(ri; πon, πoff)
si|ri ∼ [ri = on]N (si; 0, Σ) + [ri 6= on]δ(si)
x|s1:W ∼ N (x; Cs1:W , R)
C ≡ [ C1 . . . Ci . . . CW ]
r1 . . . rW
s1 . . . sW
x
• Generalized Linear Model – Column’s of C are the basis vectors• The exact posterior is a mixture of 2W Gaussians• When W is large, computation of posterior features becomes intractable.• Sparsity by construction (Olshausen and Millman, Attias, ...)
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 25
Chord detection example
0 50 100 150 200 250 300 350 400−20
−10
0
10
20
0 π/4 π/2 3π/40
100
200
300
400
500
600
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 26
Inference : Iterative Improvement
r∗1:W = arg maxr1:W
∫
ds1:Wp(y|s1:W )p(s1:W |r1:W )p(r1:W )
iteration r1 rM log p(y1:T , r1:M )
1 • −1220638254
2 • • −665073975
3 • • • −311983860
4 • • • • −162334351
5 • • • • • −43419569
6 • • • • • • −1633593
7 • • • • • • • −14336
8 • • • • • • • • −5766
9 • • • • • • • −5210
10 • • • • • • −4664
True • • • • • • −4664
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 27
Inference : MCMC/Gibbs sampler
• MCMC: Construct a markov chain with stationary distribution as the desiredposterior P
• Gibbs sampler: We cycle through all variables rν = 1 . . . W and sample fromfull conditionals
rν ∼ p(rν|r(t+1)1 , r
(t+1)2 , . . . , r
(t+1)ν−1 , r
(t)ν+1, . . . , r
(t)W )
• Rao-Blackwellisation: Conditioned on r1:W , the latent variables s1:W can beintegrated over analytically.
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 28
Variational Bayes – Structured mean field
• VB: Approximate a complicated distribution P with a simpler, tractable one Qin the sense of
Q∗ = argminQ
KL(Q||P)
• KL is the Kullback-Leibler divergence
KL(Q||P) ≡ 〈logQ〉Q − 〈logP〉Q ≥ 0
• If Q obeys the factorisation as Q =∏
ν Qν the solution is given by the fixedpoint
Qν ∝ exp(〈logP〉Q¬ν
)
• Leads to powerful generalisations of the Expectation Maximisation (EM)algorithm (Hinton and Neal 1998, Attias 2000)
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 29
MCMC versus Variational Bayes (VB)
• Each configuration of r1:W corresponds to a corner of a W dimensionalhypercube
b
b b
bb
b
b
b
• MCMC moves along the edges stochastically
• VB moves inside the hypercube deterministically
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 30
Sequential Inference
• Filtering: Mixture Kalman Filter (Rao-Blackwellized PF) (Chen and Liu 2001)
• MMAP: Breadth-first search algorithm with greedy or randomised pruning,multi-hypothesis tracker (MHT)
• For each hypothesis, there are 2W possible branches at each timeslice
⇒ Need a fast proposal to find promising branches without exhaustiveevaluation
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 31
Music Processing challenges
• Computational modeling of human listening and music performance abilities
– complex and nonstationary temporal structure, both on physical-signal andcognitive-symbolic level
– Applications: Interactive Music performance, Musicology, Music InformationRetrieval, Education
• Analysis
– identification of individual sound events - notes, kicks– invariant characteristics - timbre– extraction of higher structure information - tempo, harmony, rhythm– not well defined attributes - expression, mood, genre
• Synthesis
– design of soud synthesis models - abstract or physical– performance rendering: generation of a physically, perceptually or artistically
feasible control policy
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 32
Issues
• What types of modelling approaches are useful for acoustic processing (e.g.hierarchical, generative, discriminative) ?
• What classes of inference algorithms are suitable for these potentially largeand hybrid models of sound ?
• How can we improve the quality and speed of inference ?
• Can efficient online algorithms be developed?
• How can we learn efficient auditory codes based on independenceassumptions about the generating processes?
• What can biology and cognitive science can tell us about acousticrepresentations and processing? (and vice versa)
Cemgil NIPS Workshop on Advances in Models for Acoustic Processing. 9 Dec 2006, Whistler, Canada 33