A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF...
Transcript of A Neural Network Alternative to Non-negative Audio Models · • Multi-layer NAEs outperform NMF...
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN A Neural Network Alternative
to Non-negative Audio Models Paris Smaragdis#*
Shrikant Venkataramani*
*University of Illinois at Urbana Champaign #Adobe Research
ICASSP 2017
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
Motivation• Supervised single channel source separation
• Using models trained from clean sounds
• Two dominant approaches • Non-negative Matrix Factorization (NMF)
• Reusable and interpretable models • Deep learning
• State of the art results, Non-transferable models
• Neural network formulation of NMF models • Maintaining reusability, taking advantage of deep-
learning structures
2
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
Transferable Models• Being able to plug-in trained models
3
H
NMF
DNN
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN • Learning spectral bases from spectrograms.
Learning an NMF model
4
Magnitude Spectrogram
Spectral Bases
Activations
X = W ·H X,W,H 2 R+
W
H
X
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
NMF in action• Analyzing piano notes
5
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
NMF as an Auto-encoder
6
X
W⇤ W
bX = WH
s.t.H � 0
s.t.W � 0
H = W⇤X
NMF Non-negative Auto-encoder (NAE)X = W ·H H = W⇤ ·X such that H � 0
bX = W ·H such that W � 0
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
Removing Non-negativity constraints
• Enforcing non-negativity is cumbersome. • Non-negative layer outputs
• Results in a magnitude spectrogram at the output • Results in a non-negative activation at hidden layer • Can be enforced with an activation function
7
g(x) = max(x, 0) or |x| or ln(1 + e
x
)
X
H = g(W⇤X)
bX = g(WH)
W⇤ W
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
NAE in action• Bases are not guaranteed to be non-negative
• Results in dense activations • Adding sparsity to hidden layer output
• Results in sparse activations • Intuitive bases and activations
8
KL(X||g(W ·H)) KL(X||g(W ·H)) + �||H||1
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
Advantages of NAE• Difficult to extend NMF models
• Easy to do with Neural nets. • LSTMs, CNNs, Multi-layer networks etc
9
X
bX
W⇤
W
H
Recurrent NAE
X
bX
W⇤1
W⇤2
W2
W1
H1
H2
Multi-layer NAE
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
NMF source separation• Spectrogram of the mixture is the sum of source
spectrograms.
10
Estimate Hx1 ,Hx2 such that,
X = W1
·Hx1 +W
2
·Hx2
X1
= W1
·Hx1
X2
= W2
·Hx2
Estimate models for
each source
Explain the contribution of the source in
unknown mixture using trained models Training data for source 1
Training data for source 2
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
NAE source separation
11
Estimate models for
each source
Explain the contribution of the source in
unknown mixture using trained models
Training data for source 1
Training data for source 2
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
NAE source separation• Goal: Estimating network inputs instead of the
weights
• Gradient-descent/back-propagation to train the network
• Separated spectrograms
12
X = g(W1
·Hx1) + g(W
2
·Hx2)
X1
= g(W1
·Hx1)
X2
= g(W2
·Hx2)
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
Evaluation• Two-speaker mixtures
• Training data ~ 20-25 seconds • Test data: Single sentence of known speakers
• Evaluation metrics • BSS_eval metrics (SDR, SIR, SAR) • STOI (intelligibility measure)
• Compared multilayer and shallow versions • With multiple ranks (number of hidden units)
13
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
NMF vs NAE• Number of NAE layers = 2 • Number of hidden units = 20
14
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
NMF vs NAE• Number of NAE layers = 4 • Number of hidden units = 20
15
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
NMF vs NAE• Number of NAE layers = 2 • Number of hidden units = 100
16
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
NMF vs NAE• Number of NAE layers = 4 • Number of hidden units = 100
17
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
Shallow vs Multi-layer NAE• Shallow NAEs
give comparable performance over all ranks
• Multi-layer NAEs require higher ranks
18
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
Conclusions• NAE models can replace NMF
• This allows us to generalize to complex structures
• NAE models superior to NMF models • Shallow NAEs comparable to NMF • Multi-layer NAEs outperform NMF significantly
• Future directions • Incorporating exotic neural models (LSTMs, CNNs etc)
19
UN
IVE
RS
ITY
OF
IL
LIN
OIS
AT
UR
BA
NA
-CH
AM
PA
IGN
THANK YOU
20