Superresolution-based stereo signal separation via supervised nonnegative matrix factorization
-
Upload
daichi-kitamura -
Category
Engineering
-
view
90 -
download
4
Transcript of Superresolution-based stereo signal separation via supervised nonnegative matrix factorization
Superresolution-Based StereoSignal Separation via SupervisedNonnegative Matrix Factorization
Daichi Kitamura, Hiroshi Saruwatari,
Yusuke Iwao, Kiyohiro Shikano(Nara Institute of Science and Technology, Nara, Japan)
Kazunobu Kondo, Yu Takahashi(Yamaha Corporation Research & Development Center, Shizuoka, Japan)
18th International conference on Digital Signal Processing 2013
2
Outline• 1. Research background• 2. Conventional method
– Nonnegative matrix factorization– Penalized supervised nonnegative matrix factorization– Directional clustering– Multichannel NMF– Hybrid method
• 3. Proposed method– Regularized superresolution-based nonnegative matrix
factorization
• 4. Experiments• 5. Conclusions
3
Outline• 1. Research background• 2. Conventional method
– Nonnegative matrix factorization– Penalized supervised nonnegative matrix factorization– Directional clustering– Multichannel NMF– Hybrid method
• 3. Proposed method– Regularized superresolution-based nonnegative matrix
factorization
• 4. Experiments• 5. Conclusions
4
Background• Music signal separation technologies have received much
attention.
• Music signal separation based on nonnegative matrix factorization (NMF) has been a very active area of the research.
• The extraction performance of NMF markedly degrades for the case of many source mixtures.
• Automatic music transcription• 3D audio system, etc.
Applications
We propose a new method for multichannel signal separation with NMF utilizing both spectral and spatial cues included in mixtures of multiple instruments.
5
Outline• 1. Research background• 2. Conventional method
– Nonnegative matrix factorization– Penalized supervised nonnegative matrix factorization– Directional clustering– Multichannel NMF– Hybrid method
• 3. Proposed method– Regularized superresolution-based nonnegative matrix
factorization
• 4. Experiments• 5. Conclusions
6
NMF• NMF is a type of sparse representation algorithm that
decomposes a nonnegative matrix into two nonnegative matrices. [D. D. Lee, et al., 2001]
Time
Freq
uenc
y
AmplitudeFr
eque
ncy
Ampl
itude
Observed matrix(Spectrogram)
Basis matrix(Spectral bases)
Activation matrix(Time-varying gain)
Time
: Number of frequency bins: Number of frames: Number of bases
: Observed matrix: Basis matrix: Activation matrix
7
Penalized Supervised NMF (PSNMF)• In PSNMF, the following decomposition is addressed under
the condition that is known in advance. [Yagi, et al., 2012]
Separation process Fix trained bases and update .
is forced to become uncorrelated with Update
Training process
Supervised bases of the target soundSupervision sound
Problem of PSNMF: When the signal includes many sources, the extraction performance markedly degrades.
8
Directional Clustering• Directional clustering can estimate sources and their direction
in multichannel signal. [Araki, et al., 2007] [Miyabe, et al., 2009]
L R L-c
h in
pu
t sig
nal
R-ch input signal
: Source component: Centroid vector
Center clusterRight clusterLeft cluster
Problem of directional clustering: This method cannot separate sources in the same direction.
9
• Multichannel NMF also has been proposed [Ozerov, et al., 2010]
[Sawada, et al., 2012].• Natural extension of NMF for a multichannel signal• This method uses spectral and spatial cues to achieve the
unsupervised separation task.
Multichannel NMF
Problem of multichannel NMF: This unified method is very difficult optimization problem mathematically.Many variables should be optimized using only one cost function.Multichannel NMF involve strong dependence on initial values and lack robustness.
10
Hybrid method• Conventional hybrid method utilizes PSNMF after the
directional clustering. [Iwao, et al., 2012]
• This method consists of two techniques.– Directional clustering– PSNMF
Directional clustering
L R PSNMF
Spatialseparation
Sourceseparation
Conventional Hybrid method
11
Problem of hybrid method• The signal extracted by the hybrid method has considerable
distortion.• There are many spectral chasms in the spectrogram obtained
by directional clustering. • The resolution of the spectrogram is degraded.
1 0 0 0 0 0 0
0 1 1 0 0 1 1
1 0 0 0 0 0 0
0 1 0 1 1 0 1
1 0 0 0 0 0 0
1 1 1 0 1 1 0
Time
Fre
que
ncy
: Target direction Time
Fre
que
ncy
TimeF
requ
enc
y: Other direction : Hadamard product (product of each
element)
Input spectrogram Binary mask Separated cluster
Directional Clustering
: Chasms
12
Outline• 1. Research background• 2. Conventional method
– Nonnegative matrix factorization– Penalized supervised nonnegative matrix factorization– Directional clustering– Multichannel NMF– Hybrid method
• 3. Proposed method– Regularized superresolution-based nonnegative matrix
factorization
• 4. Experiments• 5. Conclusions
13
Proposed hybrid method
Input stereo signal
L-ch R-ch
STFT
Directional clustering
Center component
L-ch R-chcenter cluster
Index of
based SNMFSuperresolution-
based SNMFSuperresolution-
ISTFT ISTFT
Mixing
Extracted signal
Input stereo signal
L-ch R-ch
STFT
Directional clustering
Center component
PSNMFPSNMF
L-ch R-ch
ISTFT ISTFT
Mixing
Extracted signal
Conventional hybrid method
Proposedhybrid method
Employ a new supervised NMF algorithm as an alternative to the conventional PSNMF in the hybrid method.
14
Superresolution-based supervised NMF• In proposed supervised NMF, the spectral chasms are treated
as unseen observations using index matrix.
: Chasms
Time
Fre
que
ncy
Separated clusterChasms
Treat chasms as unseen observations.
1 0 0 0 0 0 0
0 1 1 0 0 1 1
1 0 0 0 0 0 0
0 1 0 1 1 0 1
1 0 0 0 0 0 0
1 1 1 0 1 1 0
Time
Fre
que
ncy
Index matrix
1 : Grid of separated component
0 : Grid of chasm (hole)
15
Superresolution-based supervised NMF• The components of the target sound lost after directional
clustering can be extrapolated using supervised bases.
Time
Fre
que
ncy
Separated cluster
Time
Fre
que
ncy
Reconstructed spectrogram: Chasms
Supervised bases
Superresolution using supervised bases
16
Superresolution-based supervised NMF• Signal flow of the proposed hybrid method
Center RightLeftDirection
sour
ce c
ompo
nent (a)
Freq
uenc
y of
Observedspectra
Target source
17
Target direction
Superresolution-based supervised NMF• Signal flow of the proposed hybrid method
Center RightLeftDirection
sour
ce c
ompo
nent
z
(b)
Freq
uenc
y of
Afterdirectionalclustering
Target source
Center RightLeftDirection
sour
ce c
ompo
nent (a)
Freq
uenc
y of
Observedspectra
Center sources lose some of their components
Directional clustering
18
Superresolution-based supervised NMF• Signal flow of the proposed hybrid method
Center RightLeftDirection
sour
ce c
ompo
nent
z
(b)
Freq
uenc
y of
Afterdirectionalclustering Center sources lose some
of their components
19
Superresolution-based supervised NMF• Signal flow of the proposed hybrid method
Center RightLeftDirection
sour
ce c
ompo
nent
z
(b)
Freq
uenc
y of
Afterdirectionalclustering Center sources lose some
of their components
Superresolution-based NMF
Center RightLeftDirection
sour
ce c
ompo
nent (c)
Freq
uenc
y of
Aftersuper-resolution-based SNMF
Extrapolated target source
20
Superresolution-based supervised NMF• The basis extrapolation includes an underlying problem.• If the time-frequency spectra are almost unseen in the
spectrogram, which means that the indexes are almost zero, a large extrapolation error may occur.
• It is necessary to regularize the extrapolation.
4
3
2
1
0
Fre
quen
cy [k
Hz]
43210 Time [s]
Extrapolation error (incorrectly modifying the activation)
Time
Fre
que
ncy
Separated cluster
Almost unseen frame
21
Superresolution-based supervised NMF• We propose to introduce the regularization term in the cost
function.• The intensity of these regularizations are proportional to the
number of chasms in each frame.
Regularization of norm minimization
: Index matrix : Binary complement: Entry of index matrix : Entry of matrix
: Entry of matrix
22
Superresolution-based supervised NMF• The cost function in regularized superresolution-based NMF is
defined using the index matrix as follows:
• Since the divergence is only defined in grids whose index is one, the chasms in the spectrogram are ignored.
: Penalty term to force and to become uncorrelated with each other
: Weighting parameter
Regularization term Penalty term
: an arbitrary divergence function
23
Superresolution-based supervised NMF• The update rules that minimize the cost function based on KL
divergence are obtained as follows:
24
Superresolution-based supervised NMF• The update rules that minimize the cost function based on
Euclidian distance are obtained as follows:
25
Outline• 1. Research background• 2. Conventional method
– Nonnegative matrix factorization– Penalized supervised nonnegative matrix factorization– Directional clustering– Multichannel NMF– Hybrid method
• 3. Proposed method– Regularized superresolution-based nonnegative matrix
factorization
• 4. Experiments• 5. Conclusions
26
Evaluation experiment• We compared five methods.
– Simple directional clustering– Simple PSNMF– Multichannel NMF based on IS-divergence– Conventional hybrid method using PSNMF– Proposed hybrid method using superresolution-based SNMF
Input stereo signal
L-ch R-ch
STFT
Directional clustering
Center component
PSNMFPSNMF
L-ch R-ch
ISTFT ISTFT
Mixing
Extracted signal
Input stereo signal
L-ch R-ch
STFT
Directional clustering
Center component
L-ch R-chcenter cluster
Index of
based SNMFSuperresolution-
based SNMFSuperresolution-
ISTFT ISTFT
Mixing
Extracted signal
27
Evaluation experiment• We used stereo-panning signals ( , ). • Mixture of four instruments (Ob., Fl., Tb., and Pf.) generated
by MIDI synthesizer• We used the same type of MIDI sounds of the target
instruments as supervision for training process.
Center
12 3
4
Left Right
Target source
Supervision sound
Two octave notes that cover all notes of the target signal
28
Experimental results ( )• Average SDR, SIR, and SAR scores for each method, where the four
instruments are shuffled with 12 combinations.
SDR : quality of the separated target soundSIR : degree of separation between the target and other soundsSAR : absence of artificial distortion
Good
Bad
SDR SIR SAR
29
Experimental results ( )• Average SDR, SIR, and SAR scores for each method, where the four
instruments are shuffled with 12 combinations.
SDR : quality of the separated target soundSIR : degree of separation between the target and other soundsSAR : absence of artificial distortion
Good
Bad
SDR SIR SAR
30
Conclusions• We propose a new supervised NMF algorithm for the hybrid
method to separate stereo or multichannel signals.• The proposed supervised method recovers the resolution of
spectrogram, which is obtained by the binary masking in directional clustering, using supervised basis extrapolation.
• The proposed hybrid method can separate the target signal with high performance compared with conventional methods.
Thank you for your attention!