Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

22
Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present by shih- hung Liu 2006/ 05/16

description

Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition. Bing Zhang and Spyros Matsoukas BBN Technologies. Present by shih-hung Liu 2006/ 05/16. Outline. Review PCA, LDA, HLDA Introduction MPE-HLDA Experimental results Conclusions. Review - PCA. - PowerPoint PPT Presentation

Transcript of Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

Page 1: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis

for Speech Recognition

Bing Zhang and Spyros MatsoukasBBN Technologies

Present by shih-hung Liu 2006/ 05/16

Page 2: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

2

Outline

• Review PCA, LDA, HLDA• Introduction• MPE-HLDA• Experimental results• Conclusions

Page 3: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

3

Review - PCA

1

1 N T

i ii

T x X x XN

1

1 N

ii

X xN

Ti p iy x

Page 4: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

4

Review - LDA

ˆ arg maxp

Tp p

p Tp p

B

W

1

1 J

j jj

W N WN

1

1 J T

j j ji

B N X X X XN

( )

1 , 1, ,T

j i j i jg i jj

W x X x X j JN

Ti p iy x

Page 5: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

5

Review - HLDA

1

2

2

Ti ig i g i g iy y

i ing i

P x e P y

,1

,

0, 1 0

0,

j

pj p j

jp

n

( )

00

pj

j n p

Ti iy x

Page 6: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

6

Review - HLDA

1

1( ) ( ) ( ) ( )

1

log ; log

1 log 2 log2

N

i ii

N T nT Ti g i g i i g i g i

i

L x P x

x x N

, 1, ,p Tj p jX j J

0Tn pX

, 1, ,p Tj p j pW j J

n p Tn p n pT

Page 7: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

7

Introduction for MPE-HLDA

• In speech recognition systems, feature analysis is usually employed for better classification accuracy and complexity control.

• In recent years, extensions to the classical LDA have been widely adopted

• Among them, HDA seeks to remove the equal variance constraint by LDA

• ML-HLDA is taking the HMM structure (eg. diagonal covariance Guassian mixture state distribution) into consideration

Page 8: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

8

Introduction for MPE-HLDA

• Despite the differences between the above techniques, they have some common limitation.

• First, none of them assumes any prior knowledge of confusable hypotheses, so their choices are determined to be suboptimal for recognition

• Second, their objective functions do not directly related to the WER

• For example, we found that HLDA could select totally non-discriminant features while improving its objective function by mapping all training samples to a single point in space along some dimensions

Page 9: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

9

Introduction

• LDA and HLDA– Better classification accuracy– some common Limitations

• None of them assumes any prior knowledge of confusable hypotheses• Their objective functions do not directly relate to the word error rate (WER)

– As a result, it is often unknown whether selected features will do well in testing by just looking at the values of objective functions

• Minimum Phoneme Error– Minimize phoneme errors in lattice-based training frameworks– Since this criterion is closely related to WER, MPE-HLDA tends to be more robust

than other projection methods, which makes it potentially better suited for a wider variety of features

Page 10: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

10

MPE-HLDA

• MPE-HLDA model

• MPE-HLDA aims at minimizing expected number of phoneme errors introduced by the MPE-HLDA model in a given hypothesis lattice, or equivalently maximizing the function

m m

Tm m

t t

A

C diag A A

o Ao

, | (4)

is the total number of training utterances,

is the sequence of p-dimensional observation vectors in utterance r, is the "raw accuracy" score of wor

r

R

MPE r r rr w

r

r

F O P w O w

R

Ow

d hypothesis .rw

,m mC

Page 11: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

11

MPE-HLDA

| is the posterior probability of hypothesis in the lattice

| |

|

is the language model probability of hypothesis ,

k : in order

r

r r r

k

r r rr r k

r r rw

r r

P w O w

P O w P wP w O

P O w P w

P w w

to reduce acoustic scores dynamic range, thereby avoiding the concentration of all posterior mass in the top-1 hypothesis of the lattice.

Page 12: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

12

MPE-HLDA

• It can be shown that the derivative of (4) with respect to A is

, log | ,, (6)

, | ,

is the MPE score of utterance r (average accuracy over all hypotheses),

is the average accuracy ove

r

RMPE qr r

rr q

r r qr r

r

F O P O q rk D q r

A Awhere

D q r P q O r q r

r

q

rr all hypotheses that contain arc q .

Page 13: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

13

MPE-HLDA

log | , log | ,

and are the begin and end time of are ,

denotes the posterior probability of Gaussian m in arc at time t.

qr

qrqr

r r

qr

Eqr r tm

t S m

q q r

mr

P O q r P o mt

A A

S E q

t q

1 1 1

1 1 1 1

log | ,t m mm m t p m m t

T Tm m t m t m m m p m m t m t m

Tmt t m t m

Tmt t m t m

P o mC C P I A C R

A

C C diag o o A C I A C o o

where

P diag o o

R o o

Page 14: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

14

MPE-HLDA

• Therefore, Eq.(6) can be rewritten as

1 1

1

,

,

,

,

qr

r

r qr

qr

r

r qr

qr

r

r qr

MPEm m m m p m

m

Em

m r qr q t S

Em m

m r q tr q t S

Em m

m r q tm r q t S

F Ok C C g I A kJ

Awhere

D q r t

g D q r t P

J C D q r t R

39*39

39*162

Page 15: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

15

MPE-HLDA Implementation

• In theory, the derivative of the MPE-HLDA objective function can be computed based on Eq.(12), via s single forward-backward pass over the training lattices. In practice, however, it is not possible to fit all the full covariance matrices in memory.

• Two steps– First, run a forward-backward pass over the training lattices to accumulate– Second, uses these statistics together with the full covariance matrices to synthesize the

derivative.

• The Paper used gradient descent in updating the projection matrix.

AOFAA MPEi

jij

),ˆ()()(1

Page 16: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

16

MPE-HLDA Overview

Tmtmt

mt

Tmtmt

mt

mtmmp

mtmm

t

E

St

t

m

mq

rq

rqrr

rqR

r qr

MPE

r

R

r wrrMPE

mm

tt

Tmm

mm

np

ooRoodiagP

RCAIPCCA

moPA

moPtA

qOP

rqOqPrqDA

qOPrqD

AOF

wOwPOF

C

AooAAdiagC

AnpA

rq

rq

r

r

r

r

r

r

))(ˆˆ(])ˆˆ)(ˆˆ[(

ˆ)ˆ(ˆ),|ˆ(log

),|ˆ(log)(),|ˆ(log

)]()()[,ˆ|(),(

),|ˆ(log),(),ˆ(

)()ˆ|(),ˆ(

modelHLDA -MPE theas )ˆ,ˆ( refer to willwe

ˆ)(ˆ

ˆ,matrix projection feature global

111

Page 17: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

17

MPE-HLDA Overview

mt

r q

E

St

mqr

mm

mt

r q

E

St

mqrm

r q

E

St

mqrm

mmpmmmm

MPE

RtrqDCJ

PtrqDg

trqD

JAIgCCAOF

r

rq

rq

r

r

rq

rq

r

r

rq

rq

r

)(),(ˆ

)(),(

)(),(

)ˆ(ˆ),ˆ(

1

11

Page 18: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

18

Implementation

• 1. Initialize feature projection matrix by LDA or HLDA, and MPE-HLDA model

• 2. Set • 3. Compute covariance statistics in the original feature space

– (a) Do maximum likelihood update of MPE-HLDA model in the feature space define by

– (b) Do single pass retraining using to generate and in the original feature space

• 4. Optimize the feature projection matrix:– (a) Set– (b) Project and using to get model in reduced subspace – (c) Run F-B pass on lattices using to compute , and – (d) Use , and statistics form 4(c) to compute the MPE derivative– (e) Update to using gradient descent– (f) Set , go to 4(b) unless convergence

• 5. Optionally, set and go to 3

)0(A)0(M̂

1i)(i

m)1(ˆ iM

)1( iA)1(ˆ iM )(i

m)(i

m

)1()( ,0 iij AAj

)(im

)(im

)(ijA

)(ˆ iM)(ˆ iM

1 jj)(i

jA )(1

ijA

)(ijA

)(im

m mg J

1 , ˆ , )(1

)()(1

)( iiMMAA ij

iij

i

AOFAA MPEi

jij

),ˆ()()(1

Page 19: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

19

Experiments

• DARPA EARS research project

• CTS, 800/2300 hrs for ML training, 370 hrs of held-out data for MPE-HLDA training

• BN, 600 hrs from Hub4 and TDT for ML training, 330 hrs of held-out data for MPE-HLDA estimating

• PLP(15dim) and 1st 2nd 3th derivative coefficients (60dim)

• EARS 2003 Evaluation test set

Page 20: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

20

Experiments

Page 21: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

21

Experiments

Page 22: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition

22

Conclusions

• We have taken a first look at a new feature analysis method, MPE-HLDA.

• It shows that it is effective in reducing recognition error, and that it is more robust than other commonly used analysis methods like LDA and HLDA