Superresolution-based stereo signal separation via supervised nonnegative matrix factorization

30
Superresolution-Based Stereo Signal Separation via Supervised Nonnegative Matrix Factorization Daichi Kitamura, Hiroshi Saruwatari, Yusuke Iwao, Kiyohiro Shikano (Nara Institute of Science and Technology, Nara, Japan) Kazunobu Kondo, Yu Takahashi (Yamaha Corporation Research & Development Center, Shizuoka, Japan) 18th International conference on Digital Signal Processing 2013

Transcript of Superresolution-based stereo signal separation via supervised nonnegative matrix factorization

Superresolution-Based StereoSignal Separation via SupervisedNonnegative Matrix Factorization

Daichi Kitamura, Hiroshi Saruwatari,

Yusuke Iwao, Kiyohiro Shikano(Nara Institute of Science and Technology, Nara, Japan)

Kazunobu Kondo, Yu Takahashi(Yamaha Corporation Research & Development Center, Shizuoka, Japan)

18th International conference on Digital Signal Processing 2013

2

Outline• 1. Research background• 2. Conventional method

– Nonnegative matrix factorization– Penalized supervised nonnegative matrix factorization– Directional clustering– Multichannel NMF– Hybrid method

• 3. Proposed method– Regularized superresolution-based nonnegative matrix

factorization

• 4. Experiments• 5. Conclusions

3

Outline• 1. Research background• 2. Conventional method

– Nonnegative matrix factorization– Penalized supervised nonnegative matrix factorization– Directional clustering– Multichannel NMF– Hybrid method

• 3. Proposed method– Regularized superresolution-based nonnegative matrix

factorization

• 4. Experiments• 5. Conclusions

4

Background• Music signal separation technologies have received much

attention.

• Music signal separation based on nonnegative matrix factorization (NMF) has been a very active area of the research.

• The extraction performance of NMF markedly degrades for the case of many source mixtures.

• Automatic music transcription• 3D audio system, etc.

Applications

We propose a new method for multichannel signal separation with NMF utilizing both spectral and spatial cues included in mixtures of multiple instruments.

5

Outline• 1. Research background• 2. Conventional method

– Nonnegative matrix factorization– Penalized supervised nonnegative matrix factorization– Directional clustering– Multichannel NMF– Hybrid method

• 3. Proposed method– Regularized superresolution-based nonnegative matrix

factorization

• 4. Experiments• 5. Conclusions

6

NMF• NMF is a type of sparse representation algorithm that

decomposes a nonnegative matrix into two nonnegative matrices. [D. D. Lee, et al., 2001]

Time

Freq

uenc

y

AmplitudeFr

eque

ncy

Ampl

itude

Observed matrix(Spectrogram)

Basis matrix(Spectral bases)

Activation matrix(Time-varying gain)

Time

: Number of frequency bins: Number of frames: Number of bases

: Observed matrix: Basis matrix: Activation matrix

7

Penalized Supervised NMF (PSNMF)• In PSNMF, the following decomposition is addressed under

the condition that is known in advance. [Yagi, et al., 2012]

Separation process Fix trained bases and update .

is forced to become uncorrelated with Update

Training process

Supervised bases of the target soundSupervision sound

Problem of PSNMF: When the signal includes many sources, the extraction performance markedly degrades.

8

Directional Clustering• Directional clustering can estimate sources and their direction

in multichannel signal. [Araki, et al., 2007] [Miyabe, et al., 2009]

L R L-c

h in

pu

t sig

nal

R-ch input signal

: Source component: Centroid vector

Center clusterRight clusterLeft cluster

Problem of directional clustering: This method cannot separate sources in the same direction.

9

• Multichannel NMF also has been proposed [Ozerov, et al., 2010]

[Sawada, et al., 2012].• Natural extension of NMF for a multichannel signal• This method uses spectral and spatial cues to achieve the

unsupervised separation task.

Multichannel NMF

Problem of multichannel NMF: This unified method is very difficult optimization problem mathematically.Many variables should be optimized using only one cost function.Multichannel NMF involve strong dependence on initial values and lack robustness.

10

Hybrid method• Conventional hybrid method utilizes PSNMF after the

directional clustering. [Iwao, et al., 2012]

• This method consists of two techniques.– Directional clustering– PSNMF

Directional clustering

L R PSNMF

Spatialseparation

Sourceseparation

Conventional Hybrid method

11

Problem of hybrid method• The signal extracted by the hybrid method has considerable

distortion.• There are many spectral chasms in the spectrogram obtained

by directional clustering. • The resolution of the spectrogram is degraded.

1 0 0 0 0 0 0

0 1 1 0 0 1 1

1 0 0 0 0 0 0

0 1 0 1 1 0 1

1 0 0 0 0 0 0

1 1 1 0 1 1 0

Time

Fre

que

ncy

: Target direction Time

Fre

que

ncy

TimeF

requ

enc

y: Other direction : Hadamard product (product of each

element)

Input spectrogram Binary mask Separated cluster

Directional Clustering

: Chasms

12

Outline• 1. Research background• 2. Conventional method

– Nonnegative matrix factorization– Penalized supervised nonnegative matrix factorization– Directional clustering– Multichannel NMF– Hybrid method

• 3. Proposed method– Regularized superresolution-based nonnegative matrix

factorization

• 4. Experiments• 5. Conclusions

13

Proposed hybrid method

Input stereo signal

L-ch R-ch

STFT

Directional clustering

Center component

L-ch R-chcenter cluster

Index of

based SNMFSuperresolution-

based SNMFSuperresolution-

ISTFT ISTFT

Mixing

Extracted signal

Input stereo signal

L-ch R-ch

STFT

Directional clustering

Center component

PSNMFPSNMF

L-ch R-ch

ISTFT ISTFT

Mixing

Extracted signal

Conventional hybrid method

Proposedhybrid method

Employ a new supervised NMF algorithm as an alternative to the conventional PSNMF in the hybrid method.

14

Superresolution-based supervised NMF• In proposed supervised NMF, the spectral chasms are treated

as unseen observations using index matrix.

: Chasms

Time

Fre

que

ncy

Separated clusterChasms

Treat chasms as unseen observations.

1 0 0 0 0 0 0

0 1 1 0 0 1 1

1 0 0 0 0 0 0

0 1 0 1 1 0 1

1 0 0 0 0 0 0

1 1 1 0 1 1 0

Time

Fre

que

ncy

Index matrix

1 : Grid of separated component

0 : Grid of chasm (hole)

15

Superresolution-based supervised NMF• The components of the target sound lost after directional

clustering can be extrapolated using supervised bases.

Time

Fre

que

ncy

Separated cluster

Time

Fre

que

ncy

Reconstructed spectrogram: Chasms

Supervised bases

Superresolution using supervised bases

16

Superresolution-based supervised NMF• Signal flow of the proposed hybrid method

Center RightLeftDirection

sour

ce c

ompo

nent (a)

Freq

uenc

y of

Observedspectra

Target source

17

Target direction

Superresolution-based supervised NMF• Signal flow of the proposed hybrid method

Center RightLeftDirection

sour

ce c

ompo

nent

z

(b)

Freq

uenc

y of

Afterdirectionalclustering

Target source

Center RightLeftDirection

sour

ce c

ompo

nent (a)

Freq

uenc

y of

Observedspectra

Center sources lose some of their components

Directional clustering

18

Superresolution-based supervised NMF• Signal flow of the proposed hybrid method

Center RightLeftDirection

sour

ce c

ompo

nent

z

(b)

Freq

uenc

y of

Afterdirectionalclustering Center sources lose some

of their components

19

Superresolution-based supervised NMF• Signal flow of the proposed hybrid method

Center RightLeftDirection

sour

ce c

ompo

nent

z

(b)

Freq

uenc

y of

Afterdirectionalclustering Center sources lose some

of their components

Superresolution-based NMF

Center RightLeftDirection

sour

ce c

ompo

nent (c)

Freq

uenc

y of

Aftersuper-resolution-based SNMF

Extrapolated target source

20

Superresolution-based supervised NMF• The basis extrapolation includes an underlying problem.• If the time-frequency spectra are almost unseen in the

spectrogram, which means that the indexes are almost zero, a large extrapolation error may occur.

• It is necessary to regularize the extrapolation.

4

3

2

1

0

Fre

quen

cy [k

Hz]

43210 Time [s]

Extrapolation error (incorrectly modifying the activation)

Time

Fre

que

ncy

Separated cluster

Almost unseen frame

21

Superresolution-based supervised NMF• We propose to introduce the regularization term in the cost

function.• The intensity of these regularizations are proportional to the

number of chasms in each frame.

Regularization of norm minimization

: Index matrix : Binary complement: Entry of index matrix : Entry of matrix

: Entry of matrix

22

Superresolution-based supervised NMF• The cost function in regularized superresolution-based NMF is

defined using the index matrix as follows:

• Since the divergence is only defined in grids whose index is one, the chasms in the spectrogram are ignored.

: Penalty term to force and to become uncorrelated with each other

: Weighting parameter

Regularization term Penalty term

: an arbitrary divergence function

23

Superresolution-based supervised NMF• The update rules that minimize the cost function based on KL

divergence are obtained as follows:

24

Superresolution-based supervised NMF• The update rules that minimize the cost function based on

Euclidian distance are obtained as follows:

25

Outline• 1. Research background• 2. Conventional method

– Nonnegative matrix factorization– Penalized supervised nonnegative matrix factorization– Directional clustering– Multichannel NMF– Hybrid method

• 3. Proposed method– Regularized superresolution-based nonnegative matrix

factorization

• 4. Experiments• 5. Conclusions

26

Evaluation experiment• We compared five methods.

– Simple directional clustering– Simple PSNMF– Multichannel NMF based on IS-divergence– Conventional hybrid method using PSNMF– Proposed hybrid method using superresolution-based SNMF

Input stereo signal

L-ch R-ch

STFT

Directional clustering

Center component

PSNMFPSNMF

L-ch R-ch

ISTFT ISTFT

Mixing

Extracted signal

Input stereo signal

L-ch R-ch

STFT

Directional clustering

Center component

L-ch R-chcenter cluster

Index of

based SNMFSuperresolution-

based SNMFSuperresolution-

ISTFT ISTFT

Mixing

Extracted signal

27

Evaluation experiment• We used stereo-panning signals ( , ). • Mixture of four instruments (Ob., Fl., Tb., and Pf.) generated

by MIDI synthesizer• We used the same type of MIDI sounds of the target

instruments as supervision for training process.

Center

12 3

Left Right

Target source

Supervision sound

Two octave notes that cover all notes of the target signal

28

Experimental results ( )• Average SDR, SIR, and SAR scores for each method, where the four

instruments are shuffled with 12 combinations.

SDR : quality of the separated target soundSIR : degree of separation between the target and other soundsSAR : absence of artificial distortion

Good

Bad

SDR SIR SAR

29

Experimental results ( )• Average SDR, SIR, and SAR scores for each method, where the four

instruments are shuffled with 12 combinations.

SDR : quality of the separated target soundSIR : degree of separation between the target and other soundsSAR : absence of artificial distortion

Good

Bad

SDR SIR SAR

30

Conclusions• We propose a new supervised NMF algorithm for the hybrid

method to separate stereo or multichannel signals.• The proposed supervised method recovers the resolution of

spectrogram, which is obtained by the binary masking in directional clustering, using supervised basis extrapolation.

• The proposed hybrid method can separate the target signal with high performance compared with conventional methods.

Thank you for your attention!