Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace...

37
Noise Compensation for Subspace Gaussian Mixture Models Liang Lu University of Edinburgh Joint work with KK Chin, A. Ghoshal and S. Renals Liang Lu, Interspeech, September, 2012 R T S C R T S C

Transcript of Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace...

Page 1: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Noise Compensation for Subspace GaussianMixture Models

Liang LuUniversity of Edinburgh

Joint work with KK Chin, A. Ghoshal and S. Renals

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 2: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Outline

I MotivationI Subspace GMM (SGMM) works well in matched speech

condition [Povey et al., 2011]I In mismatched condition (i.e. noise), the gain disappears

I GoalI Noise compensation for SGMM

I MethodI Model space compensationI Joint uncertainty decoding (JUD) [Liao and Gales, 2005]

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 3: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

HMM-GMM acoustic model

j − 1 j + 1j

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 4: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Subspace Gaussian Mixture Models [Povey et al., 2011]

j − 1 j + 1j

vjkMi

wi

ΣΣΣi

i = 1, . . . , I

I GlobalI Mi is the basis for meansI wi is the basis for weightsI ΣΣΣi is the covariance matrix

I State-dependentI vjk is low dimensional vectors (e.g. 40dim)I Gaussian mean: µµµjki = Mivjk

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 5: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Subspace Gaussian Mixture Models

I More intuitively, suppose we have an acoustic space like this

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 6: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Subspace Gaussian Mixture Models

I We then partition the whole acoustic space into I regions.

I This can be done by learning a GMM using the training data.

1

2

3

I

. . .

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 7: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Subspace Gaussian Mixture Models

I We then introduce some parameters to structure each region

ΣΣΣi - model the covariance of this region

Mi - span the basis for Gaussian mean

wi - span the basis for Gaussian weight

1

2

3. . .

wiΣΣΣiMi

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 8: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Subspace Gaussian Mixture Models

Given a class with some data, such as an HMM state

j − 1 j + 1j

• ••• •••••• ••• ••• •••

••• ••••

1

2

3. . .• •

• •• ••

••••• ••

••••••••• ••

vjk

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 9: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Subspace Gaussian Mixture Models

Then we learn a GMM for this class

j − 1 j + 1j

• ••• •••••• ••• ••• •••

••• ••••

1

2

3. . .• •

• •• ••

••••• ••

••••••••• ••

vjk

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 10: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Noise compensation

I Larger modelling power → higher recognition accuracy.I Our systems on Aurora 4, the #Gaussians is 6.4M (SGMM),

vs. 50k (GMM).I SGMM vs. GMM → 5.2% vs. 7.7% on clean conditionI SGMM vs. GMM → 59.9% vs. 59.3% on noisy condition

I Can we do noise compensation for SGMMs ?

GMM clean SGMM clean GMM noisy SGMM noisy0

10

20

30

40

50

60

WER

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 11: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Noise compensation

There are numerous work on noise compensation for robust ASR[Deng, 2011]

I Feature domainI Spectral subtraction, cmn/cvnI Cepstral mean square error estimationI AlgonquinI SpliceI Feature space vector Taylor series (VTS)

I Model domainI MLLR, noise constraint MLLRI PMC, Data-driven PMC (DPMC), iterative DPMCI VTS, joint uncertainty decoding (JUD)I Linear spline interpolation (LSI)I Unscented transform (UT)

I HybridI Noise adaptive training

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 12: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Noise compensation for SGMM

I Model space compensation for SGMM

I Not data-driven but using heuristic knowledge

I Mismatch function y = f (x,h,n,ααα) [Acero, 1990]

I ααα denotes the phase term between noise and speech[Deng et al., 2004].

Chanel noiseh

Clean speechx

Additive noise

Noisy speechy

n

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 13: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Noise compensation for SGMM

I Model space compensation for SGMM

I Not data-driven but using heuristic knowledge

I Mismatch function y = f (x,h,n,ααα) [Acero, 1990]

I ααα denotes the phase term between noise and speech[Deng et al., 2004].

Chanel noiseh

Clean speechx

Additive noise

Noisy speechy

n

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 14: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Noise compensation for SGMM

I Model space compensation for SGMM

I Not data-driven but using heuristic knowledge

I Mismatch function y = f (x,h,n,ααα) [Acero, 1990]

I ααα denotes the phase term between noise and speech[Deng et al., 2004].

Chanel noiseh

Clean speechx

Additive noise

Noisy speechy

n

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 15: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Noise compensation for SGMM

I Model space compensation for SGMM

I Not data-driven but using heuristic knowledge

I Mismatch function y = f (x,h,n,ααα) [Acero, 1990]

I ααα denotes the phase term between noise and speech[Deng et al., 2004].

Chanel noiseh

Clean speechx

Additive noise

Noisy speechy

n

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 16: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Noise compensation for SGMM

The mismatch function is

y = f (x,h,n,ααα)

= x + h + C log[1 + exp

(C−1 (n− x− h)

)+ 2ααα • exp

(C−1(n− x− h)/2

)︸ ︷︷ ︸phase term

]. (1)

where C be the DCT matrix.

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 17: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Noise compensation

I Aim: estimate µµµy and ΣΣΣy for each Gaussian component.

I Difficulty: y = f (x,h,n,ααα) is highly nonlinear, no analyticsolution!

I Solution: Vector Taylor series (VTS) approximation[Moreno et al., 1996]

I Cost: Real time factor > 100, memory > 10G for (mediumsize) SGMM with 6.4M Gaussian

I Inelegant: Direct apply VTS will destroy the compact ofstructure of SGMMs

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 18: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Noise compensation

I Aim: estimate µµµy and ΣΣΣy for each Gaussian component.

I Difficulty: y = f (x,h,n,ααα) is highly nonlinear, no analyticsolution!

I Solution: Vector Taylor series (VTS) approximation[Moreno et al., 1996]

I Cost: Real time factor > 100, memory > 10G for (mediumsize) SGMM with 6.4M Gaussian

I Inelegant: Direct apply VTS will destroy the compact ofstructure of SGMMs

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 19: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Noise compensation

I Aim: estimate µµµy and ΣΣΣy for each Gaussian component.

I Difficulty: y = f (x,h,n,ααα) is highly nonlinear, no analyticsolution!

I Solution: Vector Taylor series (VTS) approximation[Moreno et al., 1996]

I Cost: Real time factor > 100, memory > 10G for (mediumsize) SGMM with 6.4M Gaussian

I Inelegant: Direct apply VTS will destroy the compact ofstructure of SGMMs

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 20: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Noise compensation

I Aim: estimate µµµy and ΣΣΣy for each Gaussian component.

I Difficulty: y = f (x,h,n,ααα) is highly nonlinear, no analyticsolution!

I Solution: Vector Taylor series (VTS) approximation[Moreno et al., 1996]

I Cost: Real time factor > 100, memory > 10G for (mediumsize) SGMM with 6.4M Gaussian

I Inelegant: Direct apply VTS will destroy the compact ofstructure of SGMMs

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 21: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Noise compensation

I Aim: estimate µµµy and ΣΣΣy for each Gaussian component.

I Difficulty: y = f (x,h,n,ααα) is highly nonlinear, no analyticsolution!

I Solution: Vector Taylor series (VTS) approximation[Moreno et al., 1996]

I Cost: Real time factor > 100, memory > 10G for (mediumsize) SGMM with 6.4M Gaussian

I Inelegant: Direct apply VTS will destroy the compact ofstructure of SGMMs

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 22: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Noise compensation

I Solution: Joint uncertainty decoding (JUD)

VTS vs. JUD

VTS

JUD

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 23: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Noise compensation

I Applying JUD to SGMM

1

2

3

I

. . .

I Cost: Real time factor ∼ 10 for SGMM with 6.4M Gaussians

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 24: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Experiments

I DatabaseI Aurora 4 datasetI Clean speech and noisy speech with SNR [5db - 15db]I Close-talking microphone and desk-mounted microphoneI ∼ 15 hour training dataI 330 testing utterances

I System configurationI 39dim MFCCI #triphone states: 3.1k (GMM) vs. 3.9k (SGMM)I #Gaussians: 50k (GMM) vs. 6.4M (SGMM)I #regression classes: 112 (GMM) vs. 400 (SGMM)

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 25: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Noise compensation experiments

Baseline JUD VTS0

10

20

30

40

50

60

70

GMM SGMM

GMM SGMMGMM

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 26: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Experiments

Results by tuning the value of phase factors.

0.5 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.016

18

20

22

24

26

28

The value of phase factor

Wor

d Er

ror R

ate

(\%)

VTS/GMM systemJUD/GMM systemJUD/SGMM system

I JUD/SGMM system achieved 16.8% WER on Aurora 4database

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 27: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Remarks

I The phase term is very effective for noise compensation

I Similar improvements were also observed in other studies, e.g.[Li et al., 2009]

I The reasons maybe it can compensate for the linearizationbias and performs domain compensation [Li et al., 2009]

I Our insight is it may helps to avoid the over estimation of thenoise model

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 28: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Remarks

I The phase term is very effective for noise compensation

I Similar improvements were also observed in other studies, e.g.[Li et al., 2009]

I The reasons maybe it can compensate for the linearizationbias and performs domain compensation [Li et al., 2009]

I Our insight is it may helps to avoid the over estimation of thenoise model

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 29: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Remarks

I The phase term is very effective for noise compensation

I Similar improvements were also observed in other studies, e.g.[Li et al., 2009]

I The reasons maybe it can compensate for the linearizationbias and performs domain compensation [Li et al., 2009]

I Our insight is it may helps to avoid the over estimation of thenoise model

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 30: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Remarks

I The phase term is very effective for noise compensation

I Similar improvements were also observed in other studies, e.g.[Li et al., 2009]

I The reasons maybe it can compensate for the linearizationbias and performs domain compensation [Li et al., 2009]

I Our insight is it may helps to avoid the over estimation of thenoise model

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 31: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Conclusion

I SGMM is a promising alternative for acoustic modelling

I Noise compensation using JUD works well for SGMMs

I The phase term is particular effective for the noisecompensation

I Future works will be on noise adaptive training, compensationin log-spectral domain.

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 32: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 33: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Noise compensation

I With JUD, the marginal likelihood can be obtained as

p(y | m) ≈ |A(r)| N(A(r)y + b(r);µµµm,ΣΣΣm + ΣΣΣ

(r)b

). (2)

I The transformation is done in the feature space, applied toeach frame

I Computation is saved since that the #frame � #Gaussians

I The transformation should be diagonalized in GMM systems,but not in SGMM system since we used full covariance matrix

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 34: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Experiments

Table: GMM systems with ααα = 0.

Methods Clean AvgClean model 7.7 59.3MTR model 12.7 26.9VTS 7.3 18.3JUD 7.0 21.1

Table: SGMM systems with ααα = 0.

Methods Clean AvgClean model 5.2 59.9MTR model 6.8 22.2JUD 5.3 20.3

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 35: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Acero, A. (1990).Acoustic and Enviromental Robustness in Automatic SpeechRecognition.PhD thesis, Carnegie Mellon University.

Deng, L., Droppo, J., and Acero, A. (2004).Enhancement of log mel power spectra of speech using aphase-sensitive model of the acoustic environment andsequential estimation of the corrupting noise.IEEE Transactions on Speech and Audio Processing,12(2):133–143.

Droppo, J., Acero, A., and Deng, L. (2002).Uncertainty decoding with SPLICE for noise robust speechrecognition.In Proc. ICASSP. IEEE.

Gales, M. (1995).Model-based techniques for noise robust speech recognition.PhD thesis, Cambridge University.

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 36: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Hu, Y. and Huo, Q. (2006).An HMM compensation approach using unscentedtransformation for noisy speech recognition.Chinese Spoken Language Processing, pages 346–357.

Li, J., Deng, L., Yu, D., Gong, Y., and Acero, A. (2009).A unified framework of HMM adaptation with jointcompensation of additive and convolutive distortions.Computer Speech & Language, 23(3):389–405.

Liao, H. and Gales, M. (2005).Joint uncertainty decoding for noise robust speech recognition.In Proc. INTERSPEECH. Citeseer.

Moreno, P., Raj, B., and Stern, R. (1996).A vector Taylor series approach for environment-independentspeech recognition.In Proc. ICASSP, volume 2, pages 733–736. IEEE.

Liang Lu, Interspeech, September, 2012 RTSC

RTSC

Page 37: Noise Compensation for Subspace Gaussian Mixture Modelsllu/pdf/IS2012_lianglu.pdf · Subspace Gaussian Mixture Models [Povey et al., 2011] j j! 1 j+1 v jk M i w i! i i=1,...,I I Global

Povey, D., Burget, L., Agarwal, M., Akyazi, P., Kai, F.,Ghoshal, A., Glembek, O., Goel, N., Karafiat, M., Rastrow, A.,Rose, R., Schwarz, P., and Thomas, S. (2011).The subspace Gaussian mixture model—A structured modelfor speech recognition.Computer Speech & Language, 25(2):404–439.

Liang Lu, Interspeech, September, 2012 RTSC

RTSC