A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF...

20
U NIVERSITY OF I LLINOIS AT U RBANA -C HAMPAIGN A Neural Network Alternative to Non-negative Audio Models Paris Smaragdis #* Shrikant Venkataramani * * University of Illinois at Urbana Champaign # Adobe Research ICASSP 2017

Transcript of A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF...

Page 1: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN A Neural Network Alternative

to Non-negative Audio Models Paris Smaragdis#*

Shrikant Venkataramani*

*University of Illinois at Urbana Champaign #Adobe Research

ICASSP 2017

Page 2: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

Motivation• Supervised single channel source separation

• Using models trained from clean sounds

• Two dominant approaches • Non-negative Matrix Factorization (NMF)

• Reusable and interpretable models • Deep learning

• State of the art results, Non-transferable models

• Neural network formulation of NMF models • Maintaining reusability, taking advantage of deep-

learning structures

2

Page 3: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

Transferable Models• Being able to plug-in trained models

3

H

NMF

DNN

Page 4: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN • Learning spectral bases from spectrograms.

Learning an NMF model

4

Magnitude Spectrogram

Spectral Bases

Activations

X = W ·H X,W,H 2 R+

W

H

X

Page 5: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

NMF in action• Analyzing piano notes

5

Page 6: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

NMF as an Auto-encoder

6

X

W⇤ W

bX = WH

s.t.H � 0

s.t.W � 0

H = W⇤X

NMF Non-negative Auto-encoder (NAE)X = W ·H H = W⇤ ·X such that H � 0

bX = W ·H such that W � 0

Page 7: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

Removing Non-negativity constraints

• Enforcing non-negativity is cumbersome. • Non-negative layer outputs

• Results in a magnitude spectrogram at the output • Results in a non-negative activation at hidden layer • Can be enforced with an activation function

7

g(x) = max(x, 0) or |x| or ln(1 + e

x

)

X

H = g(W⇤X)

bX = g(WH)

W⇤ W

Page 8: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

NAE in action• Bases are not guaranteed to be non-negative

• Results in dense activations • Adding sparsity to hidden layer output

• Results in sparse activations • Intuitive bases and activations

8

KL(X||g(W ·H)) KL(X||g(W ·H)) + �||H||1

Page 9: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

Advantages of NAE• Difficult to extend NMF models

• Easy to do with Neural nets. • LSTMs, CNNs, Multi-layer networks etc

9

X

bX

W⇤

W

H

Recurrent NAE

X

bX

W⇤1

W⇤2

W2

W1

H1

H2

Multi-layer NAE

Page 10: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

NMF source separation• Spectrogram of the mixture is the sum of source

spectrograms.

10

Estimate Hx1 ,Hx2 such that,

X = W1

·Hx1 +W

2

·Hx2

X1

= W1

·Hx1

X2

= W2

·Hx2

Estimate models for

each source

Explain the contribution of the source in

unknown mixture using trained models Training data for source 1

Training data for source 2

Page 11: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

NAE source separation

11

Estimate models for

each source

Explain the contribution of the source in

unknown mixture using trained models

Training data for source 1

Training data for source 2

Page 12: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

NAE source separation• Goal: Estimating network inputs instead of the

weights

• Gradient-descent/back-propagation to train the network

• Separated spectrograms

12

X = g(W1

·Hx1) + g(W

2

·Hx2)

X1

= g(W1

·Hx1)

X2

= g(W2

·Hx2)

Page 13: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

Evaluation• Two-speaker mixtures

• Training data ~ 20-25 seconds • Test data: Single sentence of known speakers

• Evaluation metrics • BSS_eval metrics (SDR, SIR, SAR) • STOI (intelligibility measure)

• Compared multilayer and shallow versions • With multiple ranks (number of hidden units)

13

Page 14: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

NMF vs NAE• Number of NAE layers = 2 • Number of hidden units = 20

14

Page 15: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

NMF vs NAE• Number of NAE layers = 4 • Number of hidden units = 20

15

Page 16: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

NMF vs NAE• Number of NAE layers = 2 • Number of hidden units = 100

16

Page 17: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

NMF vs NAE• Number of NAE layers = 4 • Number of hidden units = 100

17

Page 18: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

Shallow vs Multi-layer NAE• Shallow NAEs

give comparable performance over all ranks

• Multi-layer NAEs require higher ranks

18

Page 19: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

Conclusions• NAE models can replace NMF

• This allows us to generalize to complex structures

• NAE models superior to NMF models • Shallow NAEs comparable to NMF • Multi-layer NAEs outperform NMF significantly

• Future directions • Incorporating exotic neural models (LSTMs, CNNs etc)

19

Page 20: A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF significantly • Future directions • Incorporating exotic neural models (LSTMs,

UN

IVE

RS

ITY

OF

IL

LIN

OIS

AT

UR

BA

NA

-CH

AM

PA

IGN

THANK YOU

20